 I'm Richard Gibbs, and I'm the director of the Baylor College of Medicine Human Genome Sequencing Center at the Baylor College of Medicine in the Texas Medical Center in Houston, Texas. When and where were you born? I was born in Australia in a small town called Warnambool that is on the southern coast, on the great southern ocean there. And I was schooled in Australia. I went to secondary school there. I went to Melbourne University, pursued a degree in genetics and biochemistry and then I pursued a doctoral study in radiation biology until 1986 when I moved to the United States. What brought you to the United States? I did come to Houston and that's where I've been and done my bulk of my professional career. I came as a post-doctoral fellow to pursue general opportunity in science. The particular link was that I had used the gene, the hypersanthine phosphoribazole transferase gene, HPRT, which was a very valuable cellular marker in radiation biology studies. That gene was among the first to be cloned in humans. I think it was the third human gene cloned. It was cloned in the laboratory of Dr. Thomas Kasky, who was in Houston at that time and who provided an environment where one could come and do a post-doctoral fellowship and explore molecular genetic issues related to that gene. When did you first become aware of the Human Genome Project? Well, I arrived in the U.S. in February of 1986 and there was a Cold Spring Harbor Quantitative Symposium that year where the project was discussed and PCR was discussed. And a few other things. And the fruits of that symposium were kind of the general discussion. Now I have to say that PCR was a bigger deal than the genome project at that time because it was really the next increment in where our insights were guiding us. The things we wanted to know were enabled so well by PCR. And so that was the thing that really grabbed us. But the general notion of sequencing the genome was there then and came, I guess, more strongly in the years to come. They made it years to come. Were you aware of the early opposition to the HGP? What were your thoughts on its feasibility? So it was a smorgasbord of issues that were sort of laid down with the suggestion of doing the genome. So the technical ones, it was clearly not possible with the technology at the time. I mean, not possible in any kind of sensible way. We needed technical increments to get there. As a social scientific issue, it was pretty clear there was opposition to it and the opposition was kind of guided by individuals who had a fiscal argument and I think individuals who had a philosophical argument. We don't do science that way. And I have to say that none of those arguments seem to be particularly strong to me. Really, I'd say, even at that time. And I was pretty green too, remember. I was a kid from the other side of the world, my late 20s, and it was all new to me. But even then, I think those arguments didn't resonate as we can't do. We went to the moon, so we've conquered that kind of insurmountable technical challenge before. So the other arguments, I heard them, but they weren't overwhelming. How did the early HGP catalyze and go beyond advances in molecular biology in the late 1980s? Because of my background and because of the work I was immediately doing in Houston, we had a thirst for DNA sequence and for technology. Now, our problem in front of us was we had human genes that we knew caused disease, but we had very little knowledge of what kind of changes in the genes were happening and whether different kinds of changes maybe correlated with different kinds of phenotype. So this is kind of a golden era of human molecular genetics. We're beginning to discover more genes. This was before the CF gene was known and musculoskeletal gene was not known at that time, but we had linkage information pointing to it. That was 1989, right? And this was 86, 87. So there were a few techniques that had come along to interrogate gene structure than Southern blotting, which was essentially the standard before that. And we were at the forefront of developing those technologies. So the PCR came along and we melded everything we were thinking into PCR-guided applications. So techniques that you probably take for granted or people who listen to this will take for granted now were really developed then on the heels of the first PCR demonstrations. So multiplex PCR for deletion detection. We did that. The use of Sanger sequencing to look at heterozygous positions in automated fluorescent sequences. That's kind of part of the menu of everybody uses today. We did that for the first time. Reverse transcription PCR, first paper that was out of us. So we had this tremendous burst of technical activity around figuring out what was wrong with individual genes to explain gene variation and phenotype variation. And so we needed sequence to do that. What were the limits of PCR and how did early sequencing change disease gene identification? Well, so think of this particular gene, the one I mentioned that brought me to the U.S. as kind of a microcosm of all things over those years, right? But of course we were doing this at multiple loci. But the problem there was we couldn't do diagnostics or molecular characterization on individuals prior to 1986. We couldn't do it without inference. So you could take a child with the disease. This is Eklink's disease so the boys only have one copy. If you ever found out the mutation you could go and track it in the family and in the mothers. But you couldn't effectively go easily at all, go and look at it in women to ask, might they be carriers of the disease? So finding the mutation was hard. You could only really do it in hemizygosity. You certainly couldn't tackle random individuals. This is the old days, right? So suddenly with PCR we had some tools that we built to get there. But the thing we lacked actually was full locus sequence information. We had the cDNA. We'd cloned the gene, others had cloned the gene, we had the gene. We didn't have the flanking things flanking the exons. So a milestone project was undertaken of which I was not a central part of but I was in the peanut gallery too. So Dr. Kasky will tell you when you talk with him I'm sure about his work with Wilhelm and Sorge at Emble on the first fluorescent automated sequencer that was built by the Pharmacea Corporation, the ELF sequencer. And so Wilhelm wanted to demonstrate its efficacy. So in Houston we had the large genomic clones with the HPRT-44KB locus in it. So that was an ideal candidate for a big piece of DNA, 40 kilobases, huge, for us to sequence with an automated method. So the production of shotgun libraries was carried out in my lab with my colleagues and those were, my technician took them to Germany and stayed there and worked with Wilhelm's group generating the raw sequence. Now this automated sequencer actually needed individuals to write down the bases as they came past the detector. So a lot of the automation wasn't really there. You know we didn't have computers really. I mean we just had fancy word processes at that time so you know the computer era. So an interesting development in that project was sequencing both ends of a sub clone which is a standard thing today. So there's a good story about that which is that that came about in that project was the first time it was used and used to anchor reads in assemblies. And the reason it came out, it came about is because the shotgun libraries were made in M13 which is a single standard, produces single standard virions which you can then get a very pure DNA prep that's very good for priming and extending for sequencing. Now after using a bunch of these clones from a library, the library was exhausted. There was no new clones to go to but more raw sequence coverage was needed to get a shotgun assembly of this 44 kilobases. So the group decided they'd have to go back to the bacteria that secrete the single stranded virus and extract the double stranded form and use the other end just to get more sequence coverage. And then when they did that and they built the assemblies they said well you know these things ought to go opposite each other. And that was the sequence map gap methods which are in some of the early books you can find. So anyway that project was a fabulous success, gave us the sequence that flanked the exons in that gene and were able to construct PCR primers for each exon, build a multiplex reaction and then do the automated sequencing across them to look for any positions that in females would be heterozygous that could give risk to a child, a boy, if he inherited the one copy with the mutation. So all that evolved over the period of beginning of 1986 to probably 1990 or 1989. So that was sequencing based development, right? And as you know most of the discussion there was not really applied sequencing. There was a little bit of sequencing technology stuff like which sequencing things should be used but a lot of it was about that yak cloning and you know other kind of whole genomy things. And then there was this disease discovery angle that was going on that was finding new genes around CF and muscular dystrophy and neurofibromatosis and then all of those other loci. Were there two different distinct approaches to the genome in the early years? At the time there was this perception that a few groups were just sort of doing genomics as gene discovery and not thinking enough about the whole genome problem. And there were certainly individuals who were thinking a bit about the whole genome problem. They weren't really human molecular geneticists. They weren't really interested in individual genes for disease. They certainly weren't clinical geneticists who were the guys who were driven to do that. Which scientists were thinking about the genome more holistically? Yeah, I mean it was the master of being holistic and not too many others. I mean there's plenty of voice but not a lot of words if you know what I mean. And so not too many I would point to who I mean Ron Davis I think was certainly the other person who in those days was really properly thinking about the composition of these kind of problems. And you know Tom Hudson was in there with Eric Lander. They were kind of off on this mapping thing I think for its own sake as far as I could tell rather than pulling the problem all the way to its end point. So we dug in our heels at that time and said there's going to be a sequencing future and that's what's going to pull this home. And didn't think a whole lot about issues like whether back clones would be better than yet clones or how we get rid of Cosmodes and if we get Lander. We kind of felt that was going on. Someone else Mel Simon of course saved the day right? And came from left field as far as many of us were concerned with Shizuwa, his postdoc who forgive me for not pronouncing his name correctly. Mel came in and saved the day with Bax. The EST stuff and I'm mixing up the years a bit here. A lot of that stuff sort of came and went and didn't impact this fundamental trajectory in my view of getting the right reagent, building a sequencing method that could be scaled enough and building our computational infrastructure, interpretive infrastructure around it. Those were kind of the core things that happened. When did you start to think about the issues of cost, quality and scalability of sequencing the genome? Well, I believe that there were several of us thinking about that but in a fairly unstructured way. You know, there's always this trinity, right? There's cost, there's quality and there's scalability and we were tackling all of them but we weren't sort of driven to one or the other at the earliest time because we were still feeling our way. But there was a pivotal meeting in one of these hotels. I can't remember if it was this hotel or one up the road or one across the road but it was in a basement. Actually I'm pretty sure it wasn't this one. It's the one with the escalator down to the basement. There was an impassioned discussion about cost at that meeting. I think that gelled a lot of other discussion going along. And I am confused in my mind about the timing of that meeting versus the muta meetings versus some of the other program meetings we had but there was certainly a meeting where we had a long discussion about cost. So this was when we were down to like the 25 cents a base kind of place but I remember Phil Green at the end of a long impassioned discussion saying it was something, you probably have the exact quote, but it was something like we were all exhausted, right? And we'd all talked this through. And then Phil said, you know, we better double it just to be real. And so that put things in perspective. And one of the senior scientists wept during his talk. I won't name him but that was another piece of impassioned history you might pick up on. How did you go about establishing the Human Genome Sequencing Centre at Baylor? I was this post-doctoral fellow with Dr. Kasky until about I think it was mid-1990. And at that time he was forming a larger institutional structure, ultimately a department. And he was bringing on new faculty. And that was the time as a post-doctoral fellow that I went looking at other jobs. Ironically I had two hot leads. I went and interviewed and was in a serious discussion with Francis in Michigan and also in Atlanta where the Emory Group was very strong. And Dr. Kasky made me an offer and I stayed and started my own lab. And there was a small departmental corps that did sequencing. And I inherited that and became a departmental corps for about two years. And the senior individual there was Donna Musny who was a master's graduate from College Station. An enormously practical and enormously thorough. And now we've worked together for 30 odd years. And so she's really the mainstay of the quality that we've been able to sustain in the centre. So this was then a lab. So I had a lab opposite the corps and I had kind of grad students and post-docs in my lab. And I interacted with the corps and they did this more process oriented stuff. But it became my sort of little cauldron for doing larger scale sequencing projects. So between 90 and 95 I had a grant I forget in about 92 that was for large scale sequencing. And it was enormous. It was like $600,000 which is a very big R01 type grant at that time. And it was for applying methods to large scale sequencing problems. And we undertook to sequence the human CD4 locus. That was a target. So a general interest at that time and my research interest at that time was to look at genetic variation for traits other than acute disease. And the susceptibility to infectious disease was a very hot topic. This was before CCR5 was discovered. Which turns out to be the most important modulator of HIV susceptibility. At the time we thought CD4 itself which is the major receptor might be polymorphic and that might explain some of the heterogeneity. So we thought let's look at CD4 and look for polymorphisms in CD4. So that's a large sequencing project. We were gifted the clones by Dan Littman in New York and he'd cloned CD4 and we'd worked on those clones and did a large scale sequencing on that locus. Which is actually not unlike HPRT. It's about 40 kilobases or so and has a bunch of exons. So it was a similar project. So we seeded our sequencing, our work with these kind of sequencing efforts. We picked on a few other regions. We were interested in gene density in certain regions. We were interested in part of the X chromosome and issues related to disease that were solved by sequencing. So we had a period of where students and postdocs projects were all about sequencing and they would come in and make the clone and then do the sequence reactions and get help to load them on these early sequencing instruments and all that stuff which we used to cause infrastructure. So that got us through that period up until I think when you were asking about. How did your lab interact with other sequencing projects? So we were humming then. You know, I remember we met with, of course the worm was coming along and we were all good friends. We'd all meet and go to conferences and swap technologies and all of this stuff. I think Rick Wilson was coming along at Wash U there with Elaine. The Boston group didn't do much sequencing. They were just mapping. But the Sanger group, of course the Wash U and the Sanger were united by the worm project. What were your initial impressions of Solarra and their approach to sequencing the genome? You know, I think we're all kind of wired in a certain way that we gravitate to certain kinds of ways of doing things and Craig's splattering of EST sequences was not, we were not enamored of that. I mean we didn't say it was a bad thing, but we said that's not for us. We said if we want to study a gene, we want to find the gene, explain it fully and give a robust description of that gene. Now I would say that at that time I was under appreciative of the power of some of these methods. And so, but we certainly weren't wanting to follow in those footsteps. In fact, Tom Kaskief to whom I seem to be referring quite a bit was an advisor to Merck at that time, perhaps just a little after that and came back to say, guys, if you want to get on this train, Merck and others are going to sponsor this EST program that we're sure you know about. And I was kind of okay, you know, maybe. And then when I heard that Wash U had got that contract, I wasn't particularly phased. But I think, you know, looking back that might have been an opportunity to leverage resources and so forth. But anyway, that's a long answer to your question of what do we think of Craig's. Were there concerns about the quality of the data being released during the Bermuda meetings? No, I don't think that, no one thought that was a great argument. I think that to me the only real pushback wasn't actually an active pushback. It was more like, yes, it's a great idea, but I don't know that people are going to be able to use it that much or they will want to use it because of those issues. But the notion I think that Craig, and you remind me now that Craig did press this idea of quality being an obstacle to people. And I think some others might have echoed that too, but nobody really said, you're right. Can you talk a little bit about the role of Phil Green in the HGP? At that early time, the project said we need mathematicians. And mathematicians who came were first asked to do data management and they fled in droves. And then their tasks were backfilled with people who really were not mathematicians. They were much more interested in more blue-collar issues of data management. So we starved the project a little bit of the mathematicians. I believe we actually drove them off between 1991 and 1996. And the few who stayed, there were some great people, don't get me wrong, but the flood of mathematicians we could have used was not there. So for someone like Phil Green to come along and to develop these quantitative measures on trained data sets was actually a pretty special lucky thing in my view. How did the Baylor College of Medicine Human Genome Sequencing Center come about? The pilot program came along to grow sequencing. And that was my thing and I applied for that and received that. And on the heels of that I said I ought to be a genome center. So with a stroke of a pen and a small sign I put on my door, I said I'm the human genome sequencing center at Baylor College of Medicine with my staff of 20 that grew to 60 or something in the program. These days I can tell you if you want to start the smallest center of anything under any circumstances it needs an active congress. It's a much different day. But back then it was the Wild West and I declared myself no one ever questioned it. And so I just became the Human Genome Sequencing Center. And we were externally funded so it was nobody to come and say you can't do that. So that's how we got seated. Can you talk about the origins of Solera and its influence on the HGP? I mean there was some interesting moments. Of course that time he was joined at the hip to Tony White from Perkin Elmer. And Mike Hunkapilla who was really one of the unsung heroes of all of this period, well maybe not unsung but he's certainly one of the heroes of this whole period, he read the paper, the manuscript that I was referring to earlier, the one from Jim Weber and Gene Myers. And he had just overseen the development of the 2000, this box that did capillary sequencing and gave away the need to do gels or did away with the need to do gel pouring. And he said this maybe would work. And with knowing that he could raise the capital, knowing that there was a stimulate business, he invited Craig to be the person who would run Solera. And they cooked up that plan. I think Mark Adams flew out to California with Craig and met with Mike and Tony and handshake the deal. Which was pretty bold in those days to do a hundred million or maybe two hundred million dollar deal with a handshake. I mean I'm sure there was some follow-up paperwork but it was bold and it was pretty exciting. So it certainly electrified everybody's view. What were your thoughts on the idea of the draft genome sequence? So we liked the idea of a draft. We presented it. So it was not foreign to us. To us it was never a science question because if you're doing a shotgun project you can have intermediate products. That was scientifically logical. Really the issue was sociological. Are you committed to drive it to the finish which you need to? So that was not the dichotomy. We didn't engage the dichotomy in quite the way that others might have. Can you tell us the story of sequencing the drosophila genome? Shortly after that they decided to use shotgun methods and they came to us. Jerry came to me and said would we join with him in a grant to do a shotgun method because we were more experienced at doing shotgunning and shotgunning of individual clones. It was not a whole genome shotgun. So we participated in that grant. The grant was funded right before Solera was announced. And of course we were at Coltsman Harbour again right when Solera was announced discussing the ramifications. And we all went to Plimpton, the room there, and we sat and listened to Craig say go do the mouse and he was going to do this. And then he said of course we're going to do a warm-up genome. I'm not going to tell you what it is. And then as he walked out he said Jerry come and talk to me in the corridor. Of course Jerry was the drosophila guy. It was pretty obvious he wanted to talk to Jerry about the drosophila genome. So that began the shotgun phase of that project. Now as a recent, we hadn't actually engaged in any work with Berkeley at that time. We were just beginning our engagement with them. But we reached an agreement that Jerry would pursue the shotgun phase with the Solera group. And then we would pursue the finishing phase with the Berkeley group. It became extremely acrimonious. And that's a whole other story that probably doesn't need a lot of tape time. But I think both sides could have managed better that part of the project and had a happier outcome for the way the drosophila project unfolded with the benefit of hindsight. But the project got done and redone and redone. And indeed the execution of that shotgun genome was a real milestone on its own. What was your opinion of Francis Collins' leadership during the G5? I think brilliantly. I mean with the benefit of being part of that time plus a number, I'm now a veteran of more consortia than I could possibly remember to name. And that one was the set the bar. Because Francis is an extremely talented manager, apart from his other skills. I can't say enough good things. He both set goals and he allowed people to do their thing. And then he had forums for the bits in between to be well talked through. And he formed relationships with people. I just think he was a model to follow. It was just really great. And we had those Friday calls, the G5 calls on Friday mornings which became part of our lives. And they would manage well and they were reliable and there was minutes. What does a comprehensive catalogue of human variation mean in the context of the 1000 Genomes project? This is guided a bit by our now understanding of how rare, rare variation is. And of course until we sequence everybody we won't have every variant. There's so much new variation. This site frequency spectrum is so steep. Let me say something about that though just before addressing the consequences of the curve. You know when we did the Watson genome with 454. I don't know how well you followed the period of personalised genomics. But the first few individual diploid genomes, actually the first one was Watson. Craig's was done a little after Craig's paper came out a little earlier. But the first one was Watson and the data was generated at 454 and I was on the scientific advisory board. That's how we hooked up with them, we did the analysis. Now that first thing we did was to compare the reference and Craig and Watson which gave us an idea of how much unique variation there was between the individuals. And now I went around and I asked some of my learned colleagues. I said how much rare variation do you know, how extensive is it going to be? Because I knew by this time I'm a little more seasoned than when I'd gotten off the boat from Australia. I knew that once the story was known then the answers would be the story. I asked several people about what they thought that number would be. Nobody came close to the estimate of how many rare variants we had that you would find. So the simple question is if we sequence you and you go into the doctor with your sequence how many times do you think there'll be variation that no one's ever seen before? And the answers were always lower than what we got from that project. So it was a surprise how much rare variation there was. So now with the benefit of knowing that to answer your question the answers are straightforward, one about we need to get this reference genome and we need to find the regions that are more commonly variable and we need to have tools to reliably discern the very rare ones in everybody when we want to because that matters because people will be very different. Before you might have stopped ahead of this need for these tools because the common variation, if we got everything down to 0.5% we're going to get most things people will have. So it wasn't until we knew that wasn't true that we had to change our answer to your question.