Pavel Tomancak lived through at least two major revolutions. He studied Molecular Biology and Genetics in Brno- admitted into University in a moribund Czechoslovakia, he graduated in 1995 in the nascent Czech Republic. After a PhD with Anne Ephrussi at the European Molecular Biology Laboratory in Heidelberg, Pavel joined the new frontier in the life sciences, moving from the study of single genes to the analysis of global genetic expression patterns as a postdoc in Gerry Rubin group at UC Berkeley. The simultaneous analysis of thousands of genes and gene products pushed the limits of both hardware and software, and Tomancak has thrived ever since, running his own laboratory at the Max Planck Institute of Molecular Cell Biology and Genetics, pursuing key questions in developmental and evolutionary biology, while at the same time driving new technology. A staunch advocate of shared science, Pavel Tomancak develops his software in a open-source format and has made publicly available instructions for microscope assembly. We interviewed Pavel during a recent visit to the IGC for a seminar- a long conversation, part I of which we publish today.
Pavel Tomancak and fly (image by Madalena Parreira).
OT: How did you get from Brno to the EMBL?
PT: I studied in Brno as an undergrad and then I applied for the EMBL PhD Program. I was actually one of the first eastern European students who entered EMBL. I remember that there was one PI at EMBL, Marek Mlodzik, very famous in the Drosophila community, who is of Czech origin. They didn’t believe I was going to come, so he called my parents to make sure that I would actually come for the interview and stuff like that. It was at the time (19)95, so they didn’t have much experience with Eastern European students. I think I was really the first.
OT: Was the transition difficult?
PT: Yeah (sighs). I barely spoke any English. When you get into the PhD program at the EMBL it’s a very complicated interview procedure. You are talking to 15 people in 2 days. Before I came for the interview and I had never used spoken English that much. I read books, but I really didn’t speak English, so that was tricky. But to even pass that interview, right? You kind of have to let the other people speak more than you yourself. And then of course there is a great deal of culture shock, but EMBL is very international, there is a good community of the PhD students you get kind-of incorporated in. I didn’t understand British people until about 2 years into my PhD. I was just nodding when they said something, but I had no idea what they were saying.
OT: Something seems to have happened in between your PhD and your postdoc in Gerry Rubin’s lab. As a PhD student you were working on single gene, and gene pair interactions. And then you moved out into analyzing all of the transcriptome. Why did you switch focus?
PT: That was very clear. First of all, at the time when I finished my PhD, the first genomes appeared, started to be sequenced, and even when I was working on one gene, which is called par-1, I was trying to clone it and sequence the genomic region and everything, at EMBL still. It was very difficult, and then suddenly I realize that I can download the sequence, because Gerry Rubin at the time was sequencing the Drosophila genome by a very targeted approach and he happened to sequence the piece of DNA that contained the par-1 gene. So for me it was kind of revealing that the availability of the genome was going to change everything. And then, also at that time, microarray technology came up first and so I set up my initial naïve postdoctoral project, basically to use microarrays to study development. It was obvious to go to Gerry Rubin’s lab with something like that. It’s very hard to get in, but he’d just visited EMBL and I talked to him and said I want to use microarrays to study development, embryonic development. He said “come”.
OT: Those early arrays were very unreliable in many ways…
PT: I even did them myself. I was actually the person who amplified all this cDNAs using my own hands. I did 6000 PCRs, isolated the amplified fragment- the only thing I didn’t do was to build the robot that spots them. That was done by Paul Spellman, who was at that time with Gerry. But I used the robot to spot the arrays on the slide. I went through everything. That was a fun time.
I came to Gerry to work on microarrays and I set it up and I was doing some experiments. But at the same time it was the beginning of a project to map the transcriptome by high throughput in situ. They did first a pilot project and it seemed to work. But then the person who was running the project basically left and I took over. I took over also because when I was in Gerry’s lab I started getting interested in computer science. I’m what one would call a geek, so I was already close to computers and then I saw that they are doing the project, the in situ project, but they don’t have a good idea about how to organize the database. So I used the freedom in Gerry’s lab to learn the basic skills in bioinformatics and I helped them set up the database and when the project didn’t have a leader anymore I was the obvious person to take over because I was involved in setting up the database. That’s how it developed.
OT: Handling of large datasets is a big problem. Most university faculty in Biology are probably still from the “pre big data age”, certainly most curriculums are. You taught yourself how to develop software?
PT: Absolutely. You will hear that from many people. Casey Bergman is the best person to talk about that because he has a very clear thinking about how this transition is necessary for Biology these days. You need to have some skills in computer science- you don’t have to call it computer science- but it’s programming, right? Because data are getting very complex and you can’t look at it anymore without some computer skills. I learned it myself. Basic things like PERL scripting, how to build a database, a little bit of Java. That was at the time when there were no courses to learn that. Nowadays you have many opportunities to get these skills, and I would say one should do it. It is very, very hard to survive in modern biology without having these skills. You completely rely on collaborations with bioinformaticians. If you have no idea about programming, at least simple programming, you have no idea about what is possible- in computer science everything is possible, but how easy it is to do. So you end up asking your bioinformatics collaborators things that are either ridiculously difficult or ridiculously trivial, and then they might be not interested in doing that. They will think the trivial things you can do yourself and the stuff you’re asking them, well, that’s impossible. So you have to get a bit of an idea about what it means to work as a bioinformatician, to be able to formulate a question in bioinformatics.
OT: What is the skill level (in bioinformatics) of people who apply to do a PhD with you? Are they eager to learn, or are they already coming from a computer science or mathematics background?
PT: There are all these categories you describe. I had most success with hiring people who have a computer science background to whom I give a problem to solve in Biology. It could be technical, or it could be a biological question, but for them it is relatively easy to absorb the Biology. They are professional in computer science. These are great people, and I think they are coming into Biology more and more. Then there are also people who are bit like me. They are primarily biologists by education, but they have quite serious computer skills. I had one student like that, and he was great– he was a machine in cloning and doing lots of things, but he could also program a database or run a bioinformatics pipeline, play with hardware and that type of thing. So those are very good people too. And then the last thing is people who have no idea about computer science, about programming but come to a lab where they have to use those tools, so they learn. I recently had a biologist who had no idea about computers, but now he is running the processing of his SPIM data on a cluster. So he learned it. He’s not an expert in computer science, but what he needed to learn, he learned. The only people who would not work well in my lab are people who are biologists and they are unwilling to touch computer science. Maybe I should say, unwilling to touch computers.
OT: You’re also on the record as a defender of open source code in biology, saying that researchers should not abide “black-boxes” in their analysis software. But a lot of the commercial equipment you get comes with closed, proprietary software packages. Should scientists demand access to that code?
PT: I think that definitely we should. One great example of that is structured illumination. You can achieve super resolution with structured illumination and that has been discovered independently, or at least the imaging and the associated image analysis has been done independently by John Sedat, Rainer Heintzmann and Mats Gustafsson, who just passed away in 2011. They developed the hardware and the software. With the hardware alone you don’t get the resolution gain, you need the software. The microscopes are made by the commercial companies, Zeiss and I forgot the other one, I think Applied Precision. But they also bought the patents for the software, and they re-implemented it themselves and provided it as a closed-source solution. So basically it’s a black box. The problem is that the transformation of the raw images to the final images is very artefact-prone. You can actually make a completely wrong conclusion if the software doesn’t work properly. If your imaging conditions are not proper- if the sample moves or something like that, and there is absolutely no way, nowadays to analyse something like that because there is no open-source implementation of this algorithm, nobody can actually examine what is going on. I’ve been at the meeting of the structured illumination people and they realise that this is a huge problem. Because they know they have artefacts in the images, and they are trying to reverse engineer the software the companies are giving them in order to figure out whether it is artefact of the imaging or artefacts of the software. But they have no access to the code, and that’s the problem. It is also partly to blame the people who have developed the software that they have never implemented it in such a way that it could be used by the open-source community- it requires complex mathematics, it’s not easy. They invited me because they felt that the SPIM community is much better off with having somebody like me and others who are making the software in an open-source environment so that one can really understand what is it all about. I think that even in the structured illumination community they realised that they have to do that, otherwise they can not make sound conclusions about their data.