 For you, what were some of the most significant scientific insights from the draft sequence paper published in February 2001? Yeah, I think the remarkable thing about the original papers describing the draft sequence of the human genome was not so much that we were convinced that any of the things set in those papers were 100% correct and were going to be held, you know, up over time as being completely the right answer to lots of questions we had, but rather it was really the first time to basically account for lots of things that we were interested in. And so it gave us these first glimpses, if you will, about what was in the human genome sequence. And, you know, different people were interested in different things. Some people were interested to know how many genes did we think there were, now that we've laid out all three billion letters. Other people wanted to know how much of the genome actually encodes for proteins. Other people wanted to know how much repetitive DNA was there. Other people wanted to know things like, well, what's this spatial relationship between different types of repeats, or are all the genes close together? Are they all spread apart? And other people wanted to know things about, you know, recombination and so on and so forth. So there was sort of a little bit of everything for anybody. In other words, whatever your interest was about the components of the human genome, you could go in and for the first time there was a quantitative accounting of it. Even though you knew that the number was just sort of an early estimate and that as the sequence was finished and got to be more accurate, you'd get a much better accounting of that. It was just so exciting to see all this. And what was particularly fun was there was so much speculation before the sequence was first available about certain features of the human genome. Some of the biggest speculation, of course, revolved around how many genes were there. And, you know, very smart people just wildly had different estimates. Some was, you know, few as, you know, maybe 18, 19,000 genes. Others who thought there was well over 100,000 genes. And, you know, they finally had in front of them the human genome sequence, even in draft form and did the initial analyses. You know, the estimates came in at somewhere between 30,000 to 40,000, which turned out to actually be an overestimate because subsequently we've learned it's much closer to 20,000. But still, it was like, wow, we got clarity. It's not over 100,000. It's, you know, more like 30-ish thousand. And even though the number was wrong, it was heading in the right direction. So it was always a little something for everybody. And it really gave great excitement about the value of having comprehensiveness, of having that whole sequence there to analyze and to measure things with whatever it is that you wanted to measure. And do you have any recollections from, say, 1999, where there was an incredible amount of sequencing and analysis being done in order to get the data ready for the paper? What was it like for the project to the time? What was it like for the sequencing center? It was high anxiety. You know, it was high anxiety because of, you know, obviously lots of concern about was the public effort of the human genome project going to be as effective and successful compared to the private effort of solar genomics. So there was that anxiety of just having some competition, if you will. I think there was also high anxiety because it became pretty clear that not only was the challenge actually generating the data, and we always knew that was going to be a big challenge, but it suddenly became apparent that it was not so trivial to stitch together the bits of sequence as it was generated, as it was being done clone by clone, and then stitch it together long run to be able to go from one end to a chromosome to another. And that became a massive computational challenge and it actually brought in people who weren't even involved in the genome project to help solve that. But all of a sudden it was the realization that success at the technical level of generating the data was creating the next big audacious challenge, which had to do with some very sophisticated needs with respect to long term stitching together of the sequence, which turned out to not be so trivial to solve, but nothing like high anxiety to really motivate people to get creative and actually solve that problem. Can you talk about some of the companion papers that were published along with the draft sequence paper? What were some really significant insights from that batch or group? Yeah, I actually don't remember all the companions. When the human genome sequence paper came out, of course, that gave lots of opportunity for people to really jump in and analyze that available sequence and build on the fundamental analyses that were done in the main paper. And once again, everybody brought to the table some interesting area of biology or genome science or some aspect of molecular biology that could be answered all of a sudden with the draft sequence. And so lots of papers were published both at the same time and then in the months that followed that really gave geneticists, molecular biologists, genomicists the opportunity to dig much deeper into an area that previously was only possible in very targeted regions of the genome, but it gave them the ability to do it at a quasi-comprehensive level because of the comprehensive nature of the draft sequence. So with the publication of the draft sequence paper, what work really remained to be done and can you outline some steps or milestones that led to the completion of the HGP in 2003? You know, I can very remember that when the draft sequence of the human genome was published in 2001, on the one hand, it was a time of great celebration and great relief. There was sort of a melting away almost immediately of a lot of the anxiety associated with the CELERA versus human genome project race. And it was very clear that we were going to enter a different phase of the human genome project. And I think the main reason for it, I think it was a pretty clear recognition that having sort of published these two sequences in both draft form, that the next step for each of these groups will be very different. CELERA was going to go do things related to them being a company and the human genome project was going to be the group that was going to go through the painstaking steps of converting a draft sequence to a very high quality finished sequence. Now, that was associated with maybe less anxiety because it wasn't competition, but there was still quite a bit of angst about, oh, my goodness, how long is it going to take us to go from this draft sequence to a very high quality finished sequence, which was always promised as part of the human genome project. And so it really shifted a lot of attention because it's a very different scientific effort when you're just trying to generate a rough draft versus when you actually have to get a high quality finished sequence. I always made and still do make the analogy, any of us and all of us who have written papers or whether for school or for scientific publication, we all remember that getting all your ideas down on paper and getting that draft sequence of a term paper or a manuscript or something, that requires one set of activity and one set of skills. And then when you actually then go to have to hard polish it, that late stage editing, that sort of line by line fine tuning of every word, fixing every typo, getting the prose just right is a very different skill set. And in many ways, the same was true for the second phase of getting the human genome sequence completed, that finishing phase really required craftsmen like fine editing, very tedious, especially because they were developing it almost how to do it as they were actually having to do it in a real production sense. But it was just very different. But, you know, in some ways, there wasn't quite the pressure because they knew that it was going to be the human genome project in the end of the day that was going to generate that finished sequence. And it was just a matter of whether it was going to take, you know, another couple years or another few years. And ultimately, it was completed by April 2003. So can you think of one or two conclusions in the draft sequence paper that that have been heavily revised or or overturned in and one or two conclusions that have really stood up? Well, let's let's start with has pretty much really stood up over time. You know, maybe one one quantitative one qualitative I think the quantitative one is the recognition that, you know, only about one or 2% of the human genome sequence directly codes for protein. I think that figure has basically held up over time, the total percentage of human genome sequence bases that directly code for protein. The qualitative thing that's held up over time is that having comprehensive information about a genome is absolutely game changing for everything you want to do in studying that organism. And it was just the comprehensiveness that we started to get a real appreciation from with that first draft sequence of the human genome that it was just game changing it just really just change so many aspects of how we approach the study of genomes. In terms of a couple of things that really have been have sort of been updated or maybe didn't quite get right initially or or as more rigorous studies have been performed. Some of these conclusions have melted away I think one relates to the total number of human genes. I think the original estimates with that draft sequence were sort of in the 30 to 40,000 range for total number of human protein coding genes that's been downestimated over time. Now we believe it's something around 20,000. So still, it was very clear that more refined analyses was going to reduce that number. And in fact, that's indeed what's happened. The other thing which I actually am not as familiar in terms of the expertise that I have, but I know has always been a source of debate and continues to be is this concept of horizontal gene transfer. There was some evidence in the first draft sequence to human genome that maybe there was horizontal transfer of genes from bacteria into the human genome. And that now we carry with us a whole set of many, many, many genes that once upon a time were in bacteria and didn't come along through the evolutionary process more through our direct ancestors. I think over time, there's been very controversial. I think the numbers of these horizontally transferred genes have been thought to be way overestimated, if at all, present from the original estimates. And I still think scientists are debating some of these aspects of that concept, even in the present day.