 I'm George Church, and I'm a professor of genetics at Harvard Medical School. I'm also a member of the Broad Institute and the Vease Institute. I was born on MacDill Air Force Base in 28th of August, 1954, in Tampa, Florida. Yeah, there's one teacher that I've acknowledged several times, publicly, which is Creighton Bedford in ninth grade and eleventh grade, who's my math teacher, and he essentially let me off halfway through the year, both years, because I had some combination of my narcolepsy and the fact that I was, that I knew all the, I seemed to know the material, so he was big influence on me. Also, my photography teacher from tenth grade, John Snyder, was an amazing influence. And then, well, plenty of them in college, in graduate school, but notably Sun Ho Kim, who helped me transfer from my sophomore year in college to graduate school. Well, I graduated in my sophomore year after working with him for a year, and then tried graduate school at Duke with him, but then I flunked out, because I wasn't paying attention to courses I had already taken, kind of like high school, and I worked as a technician for a year with Sun Ho, so I was with him as an undergraduate graduate and technician, and then he said, you probably want a PhD, so I went to Harvard to work with Olly Gilbert. Yeah, so I worked on crystallography of the first folded nucleic acid, transfer RNA, which was the key to the genetic code, and wrote some software during that time that was in use for 30 years thereafter, but I also did some experiments, mostly software. Well, so some of the software I wrote as a crystallographer, I rewrote as a rotation student at Harvard, again, a crystallography lab, because they were the only labs that had scanners that could scan films, and all data was collected on films that find both X-ray and sequencing data, and then that software was, well, I was a consultant for BioRad for brief file, and it was completely rewritten for genome therapeutics in the late 80s, early 90s, and others used it for part of the earliest stages of the genome project. I think it's a mystery why Harvard would accept me after flanking out of Duke, usually it's the other way around, or nothing. I think it's because I was accepted at Harvard for application in the previous years, and also I had published five papers during the time I was flanking out. I had been on transfer RNA and also on a model for DNA protein interactions. And then Wally, he had come to Duke for a day at the invitation of graduate students. I spent almost the whole day with him because I thought of an excuse why I should be at every one of his meetings. I don't think it had much impression on him. He wasn't on the admissions committee or anything like that, but made a big impression on me. I was pretty, I was pretty sure even before he showed up that I wanted to work in his lab and do DNA sequencing, or possibly crystallography. But I decided on DNA sequencing. Well, yes, so while I was doing the crystallography of tRNA, there was one point we wanted to ask how general our crystal structure was, and did it apply to other tRNAs. So I typed in all the sequences available at the time, which is not something you would do today, and took about an afternoon to type in, almost all the sequences were tRNAs, and I typed them all in, including the modified bases. And I folded them up in the computer and I said, wow, this is really cool. I could get hundreds of 3D structures from one 3D structure and a lot of sequences. So sequencing must be a lot easier, and it's just as good. And so I knew somebody who was doing RNA sequencing at the time at Duke, a senior graduate student, and I decided that I wanted to do something much better than, much higher throughput than that. Maybe sequence every person's genome. And so I didn't really do the math right away, but it seemed like it was plausible. And so anyway, I went to Wally Slap, which at the time, they were just beginning to do DNA sequencing. And so at that time, sort of 50 base pairs was a big deal. So we had a ways to go. The main one that was in use, the one that I typed in all the sequences from, was RNA sequencing, where you would digest RNA, label it and digest it and run it down on paper chromatography or paper electrophoresis. And there was a lot of redactivity, typically, and a lot of high voltage for the paper electrophoresis and kerosene-like substance. And so this high voltage in paper and kerosene was in a room which had an automatically closing door to a bunch of CO2 jets to put out the fire, should that ever happen. And so then we started converting over to gel electrophoresis, due to work with Tom Maniatis and then later Wally Slap. It was in the mid-70s, 77. We were well in the era of gel electrophoresis. Well, so I think the average student would flounder around for a few months, even in the lab where you had protocols working. And last, we didn't have protocols working. They take maybe even longer. But then once you were up to speed, Greg Soutcliffe, the person who taught me sequencing, he was a six-year student when I was a first-year student, he knocked off 4,000 base pair, PBR322, that was a little bit of help from me in about a year. So 4,000 base pairs a year. And he was extraordinary. Fred Sanger was also extraordinary, but there weren't that many at the time. We almost never talked about cost other than when we ran out of acrylamide. And we used so much acrylamide. It was actually a big deal. It was like a $13,000 purchase order because we got it in quantity and high purity. And we used a ridiculously large amount of it until Fred Sanger published his paper on how to use thin gels with thin lanes and low percentage of acrylamide. And he clearly was being cost-conscious, but you wouldn't call it like a technology improvement. It was like three obvious cost-saving things that, by the way, also helped increase the read length. And when I said that 4,000 was a typical length, I mean, most people were satisfied with like one run up on a chest x-ray film, and so they might get 60 base pairs, and that would be their thesis at the time that I entered the lab. But in terms of communities, there really weren't that many. There were little communities centered around Wally Gilbert's lab and Fred Sanger's, and their trainees would slowly percolate out, and Wally's lab at least would send out these photocopied in a very colorful paper, green and pink paper that they would send out. I guess so people could easily find them on their bench among the clutter. Well, in Wally's lab, since it was one of the two centers, it was easy to find out what other people do. There was a little bit of traffic in between, very little. In fact, I think it might have been the first person in Wally's lab to do Sanger method, and a number of people said, you're going to be thrown out of the lab because that's the enemy, and I said, wow, we'll see. I don't think so. And he was fine with it, but that was kind of the limit of the flow of information. Well, I mean they were pretty independent, as far as I could tell. They both noticed the polycrylamide gel electrophoresis on the denaturing gels that Tom had developed, and they tried it out with different ways of doing end ladling, meaning, so if you just throw out where you'd get a ladder where each base was longer by one base per each, so the ladder could either be terminated by a chemical cleavage or by the polymerase falling off, but they resulted in the same sort of thing, but they were wildly different methods. And Gilbert's chemical sequencing took off early. It was a little easier to implement on double-stranded DNA, which was what most people had, SC40, PBR322. And Fred Sanger's was kind of restricted to single-stranded DNA, which PBR, sorry, FIX174 and G4 and M13 were a few examples. Anyway, so almost everybody had a double-stranded piece of DNA that they could label at one end, and so the chemistry took off for a while, but ultimately the didoxy sequencing once they had, once Joe Messing introduced single-stranded vectors that people could use that took off because of a slightly higher quality and no toxic chemicals. Still had PBR2 or S35. As I was leaving Duke, I was thinking about ways to change sequencing radically. And of course, this is not healthy for an incoming student because you just learned the protocols, but the ideas I had kind of in a vacuum because I was in transition from one lab to another were having to do multiplexing, how you can mix lots of things together, so the same volume would do multiple reactions. And I tried it a little bit during my rotation with Greg, and he said, no, no, just finish your project on this plasmid, and I did. I was fine, I was happy with that. And then I did one other project on RNA splicing, and finally I had a moment where I had to decide whether I was going to continue to do RNA splicing or really follow my dream of this technology development, and I decided that the RNA splicing I was doing would maybe impact three or four people worldwide while sequencing could affect more. And so I figured the ultimate multiplexing would be to do the whole genome in one tube. In other words, all the reactions could be done together. Running the gel, you could have the whole genome in one gel line, then the problem was like de-multiplexing it, sort of multiplexed it. And so the idea was to transfer to a flat surface. We tried a number, many different flat surfaces would work, and then probe it and image it, and probe it and image it. And that whole cycle of probing and imaging was the first inkling of next-gen sequencing. I mean, this was in 83 or something like that, published in 84. So people were excited about it. I mean, we had no idea what was coming with next-gen sequencing, but they were excited because you could do it without cloning or PCR, you could do it without implication, so you could get things like methylation and footprinting. So simultaneously, we gave away to do sequencing, methylation and protein footprinting, all from, essentially from nuclei of cells. So multiplexing part of it, I had had as a backup plan that if I couldn't sequence the whole genome in one tube in one lane, or then I would reduce the complexity of the mixture by mixing a bunch of plasmids that had your favorite inserts in them, and then you could do, let's say, 20 inserts, which is what I settled on later. But as it turned out, the whole genome sequencing did work, and it had many applications. In fact, somebody wrote a book about it, or a how-to manual, and then I said, well, the multiplexing might have certain advantages. Even though it's not the whole genome, it'll be more sensitive. I can now use non-reductive methods, because they were not very sensitive. But by reducing the genome size, I could increase the fraction of each target. And so that, I was multiplexing sequencing, which was the first paper I published at becoming a professor. In between, I had a short time as a postdoc at Biogen at UCSF, where I worked on stem cells, mammalian stem cells mostly. And then, but then that 1988 multiplex paper, along the way to that, I developed color metric sequencing, which was later used in, I think, all of the high schools in Seattle, thanks to Lee Hood, ironically using my color metric method rather than his fluorescent method, both of which were non-reactive, but I guess one was more expensive, it was early days. And then I developed chemiluminescent, and yeah, chemiluminescent, which was the main one that was used at Genome Therapeutics, and then with Peter Richterich coming into the lab was key in that. And that changed us from being the biggest reactivity user in Harvard to the lowest reactivity user, because we finally came up with a molecular biology tool, major one with non-reactive. And then fluorescence was the third method that we used with non-reactive. So the multiplexing was a step towards next-gen sequencing, and it was a step away from reactivity, and it was combined with automation immediately. Even the genomic sequencing, we really didn't automate it, even though I had developed the automation software back in 77, 78, it wasn't super popular. In fact, Greg, when he first heard that I was doing this, he says, what do you want to do that for? That's like the only fun thing of DNA sequencing is sitting down with your coffee and reading the gels. So I kind of put that on the side in 78, but it came back out in 88 when we did the multiplexing. Right, well, I heard that from myself. I was one of the first people to talk about it, but the first meeting where it was discussed that I know of was indeed the Alta 1984 meeting in Alta, Utah. I wouldn't call it organizational meeting. It was aimed at a different topic. It was kind of a hijacked Shanghai meeting where there was a very small number of like 10 scientists were invited. I was the youngest, and... It was an estimating mutation rates that might in some way be the consequence of atomic energy or other atomic bombs or so forth, or even non-atomic and other energy sources, anything that could cause it. And so there was a credential mandate to estimate this. We conclude in the first five minutes that it was not feasible at the moment, certainly, and the best we could do was maybe with a dollar base we could sequence human genomes, or actually a human genome as the way it was phrased, and that would eventually lead to another estimate of error induced by energy. But the point was we could do this, and we didn't know whether it was going to listen or not because we knew we weren't answering the mandate of the meeting, which also... It was sponsored not just by DOE, but also by the Office of Science and Technology. It was OST, I think at that time right at that OSTP. And it went back to DOE and went back to the Office of Science and Technology, and at DOE, they just started... They got excited about it, and they started writing checks. Mostly, internally, DOE had it. It had three small advantages. They were already up to speed on fluorescence-activated chromosome sorting, so they could make chromosome-specific libraries. They were good at bioinformatics because George Bell had done more or less what I had done except typed in a whole lot more sequences. And then they were also good at robotics because they had all kinds of robotics for handling radioactive substances and so forth. And then NIH had strengths as well, mostly in mapping, human genome mapping, and in model organisms. But at that point, nobody was talking about model organisms. It seemed like I was the only person at the first three meetings, one of which was not organizational, and you could argue the next two were. So it was the DOE, then the Santa Cruz, which was sort of independent, sort of university-run thing, and then DOE again at Santa Fe. And all of them were talking about the human genome. It was a human genome. It wasn't even a diploid human genome. It was just three billion base pairs. Pretty much all of them came to the conclusion of what dollar base pulled out of the air because I know the average student could not pull it off for a dollar base at that time. And certainly, it was completely neglecting the issues of repeats and scale and so forth. But they figured it was a dollar base. There was essentially no automation. There was little murmurings of automation in Japan. And Watson was like being very nationalistic. Jim took me aside at a wedding at Coltsbury Harbor. And the Japanese are going to just eat our lunch. They're going to take all of our economic wherewithal in genomics. If you don't do something about it, we just sequence E. coli. And I said, well, you know, great. I'd love to have them join. But anyway, they had Fuji was involved. And they were building sort of an automated process for making films, which were gels. So gels were automated, built more or less the way that film, photographic film was made. And so it seemed like a natural thing for them. And then they had a lot of other robotics manufacturers were interested. And I went over to Japan around that time. And it was quite impressive. It was like shock and awe of the day. But they dropped out. Fuji decided it was too flaky. It would hurt their reputation if they had bad quality of any sort. So they dropped out the robotics manufacturers. I think realized that having a robot do exactly what a human does is actually not cheaper than a human at that time. In fact, still to this day, if you want to do something with a robot, it's best to do it in a way that's quite different from the way the human does it. So when we got to next-gen sequencing, when we developed the first next-gen sequencing device, it didn't look like a robot at all. There was no robotics involved because everything was essentially again in one tube and spread out over a slide. And it was essentially microscopy. You had a microscopy with a lot of moving parts. So anyway, that was sort of 84, 84 or 86 when the first three meetings occurred. Right. So it seemed like it was going to be a real thing because in 87 they started giving out grants. I think I might have been the first grant, first genome project grant in 87. It was very modest. Everybody was joking about how I didn't know how to ask for a DOE grant because it was like $100,000 a year or something like that. And most DOE grants were in the many millions of dollars intramurally. But it seemed like a real thing then. And then NIH started getting excited about it. And then the whole biological community kind of said, no, we can't let this happen. We can't let NIH get involved. Don't worry about DOE because DOE was already this kind of this Byzantine, mostly intramural thing. And there wasn't really any way for a NIH researcher to get into the DOE. I mean, I came in straight out of the blue. I didn't even have a lab when I first started talking to them. And so when NIH started getting interested, then people freaked out and there was almost a letter every week to Science or Nature saying, this is a bad idea. I think it's been through a vote. It's probably been 99 to 1. But fortunately, Jim Lawson was involved and he was pretty charismatic and he went, not the word that everybody would use to describe Jim, but anyway, he went and talked to a lot of congressmen. And it became a separate line item, as I recall. There was also around that time, I think it was in 87, there was an NRC, N-A-S-N-R-C, to look into this. And Maynard and I were outside consultants. We weren't part of the committee. Maynard became part of the committee later. But that was, I think it was part of the process of figuring out whether you could do it or not. And eventually, NIH felt that they couldn't let Department of Energy run what could be the biggest and best biology project in decades. So they started working together and then it seemed like DOE was, I think, more technology oriented, but NIH, mostly their strengths was model organisms and mapping. It became a model organism mapping project, for better or for worse. And I think a lot of the mapping stuff was distracting. But they eventually got back to sequencing. Sequencing was in the first round of proposals. I was involved in three of the NIH and two of the DOEs. So in addition to mine, there was another one from Ray Gesslin's lab in Utah for the DOE. So we were the two DOEs that I can recall. And then NIH, oh, maybe four, at Jenny Mao, because collaborative research later becomes genome therapeutics. They were doing mycobacterium, lepre and tuberculosis. And then Wally Gilbert was doing mycoplasma, which is different from mycobacteria at Harvard. And then at Stanford, there was David Botstein and Ron Davis. And I've had many interactions with both of them, all of these people since then. And then the fourth one was Eric Lander. He was his first grant, my second grant. And these were all in the same stack. And they all got funded. And he did his own mouse. And it was mostly mapping. It was like four out of five specific games were mapping all the other three from NIH and two for DOE. We were all about sequencing. And then he had one sequencing section, which was written by Lenny Garanti and I. And even it was only partially sequencing. It was mostly CDNAs in yeast or something like that. Just a little bit of sequencing in it. But as soon as we got the grant, that started to take over, at least all the mapping they would do by sequencing if they could. And then it grew from mouse to human. So I would work with them because of proximity, but I would work with the other teams because I was interested in sequencing. And it ended up, most of the sequencing was done with my methodology. Well, there were all five of the six published using the multiplex sequencing at some point. But the one that did the most with it was collaborative research with chemo-therapeutics. They actually sequenced several whole genomes with it. So I knew Eric well before that. He was in the Harvard Biolabs. When I was a graduate student, I was kind of a visiting professor from the business school, I think, and he was hanging out with Peter Churbis and Bill Gelbart. Bill later went on to have a major role in databases, even though at the time he was a fly geneticist with very little interest in computers. But anyway, Eric would hang out with Bill and learn genetics. And then I went off and did my postdoc and didn't have much contact with him again until I came back as assistant professor. And then we started talking again. Around that time, he was doing a lot of work. He had done a series of rotations, you know, even though he was a lecturer at the Harvard Business School. He had worked not only with Bill and Peter, but also with Bob Horvitz and with David Botstein. I think one with David was the most successful of his rotations, where he did a lot of the math behind ideas that David Botstein had had for, I think, for years. And they together were able to implement these things in either math or software, or both. And so he was well-known already, even though he had not done any experiments. You may have still not done any experiments as far as I know. But anyway, so when we developed this center together, I think he was one of the biggest in that most of the other ones were picking a human chromosome. And I think it was smart to pick a whole genome, because as it turned out, a whole genome approach was a better approach than the chromosome approach. And in the end, the human genome was done by a whole genome. In fact, in a way, we weren't aggressive enough. It should have been a whole genome shotgun from the beginning, which was what I was advocating all the way through. But at that time, around late 80s and early 90s, almost every genome media you would go to, all the technologists, I mean, there were very few technologists, but all the people who called us the technologists were aiming for one X coverage. That was like the Holy Grail was one X coverage. And I just would shake my head and say, are you kidding me? The Holy Grail should be bringing the cost down so you can do any X coverage you want to. But that was just very unpopular sentiment at the time. Even though Fred Sanger had already developed shotgun sequencing, but to some extent, Fred was not a major component of these conversations. He was on his way to retirement, and I think most people considered him like a freak of nature that could do things that nobody else could do and just who knows. And he was much more of a technician than a leader. I mean, he certainly had leadership skills in the sense of setting an example, but he didn't have a big lab. He had typically one technician in the lab and an occasional visitor his whole career, even though he got two prizes. But anyway, when he would come to visit our lab, he would come sit in the room that I worked in and he would spend all his time talking to the technician in the room. He'd be talking about how much Tmed they used and what percentage gels and all this sort of stuff. He was very focused. So, anyway, so Fred's shotgun sequencing did not have a big impact at the beginning. And one by one, Easties Labs had to learn the value of 7X coverage or more. Now, all the organs were great. I was super... In fact, I think I was the first one at all the first three meetings. I said, no, we need genome sequence comparisons. Comparisons means you have to have something to compare to. You'd have two genomes, some will be closer to you, some won't. And I said, and we might as well start with small ones so that we get some payoff early on rather than spending 15 years and then hopefully having a payoff at the end. Let's get a bunch. That was not popular either. Most of the things I was proposing were not super popular. But what did happen is it was almost... In order to deal with the critics between 87 and 90, between the DOE and NIH starting, it was very politically adept to bring in my organisms. That's what helped change David Bonstein from a critic into a supporter of the sequencing. He was always high on genomics and genetics, but to specifically support the genome project, it helped to have yeast as part of the game. And then worms were obvious because Sulston and Waterston had already done a lot of mapping. I mean, they basically had all the clones in hand to go. And that's also what influenced the enthusiasm for mapping is because it worked so well in C. elegans. It didn't work so well in humans. And so those were obvious small organisms. Flies came in surprisingly late in the game. And then bacteria got in as an organism. It was very funny that we called it the bacterium. And it meant a bunch of bacteria, mycobacteria, E. coli, homophilus, mycoplasma from both Wally's lab and Craig's and so forth. So they were all lumped in together as if it was one organism Yeah, he was very significant, I think. So when that 1990 grants went in, where I worked with Jenny Mao and Ron Davis, David Boston were together and then Wally and Eric, the Boston Davis grant had in it was a beautiful grant. I think it was the best of all the grants that I saw. It got the worst score. I mean, it was essentially rejected, but eventually it was overridden. But it projected pretty accurately in a detailed manner all of functional genomics. It was all in that grant. You know, just all this beautiful biology and technology and with yeast is the perfect way to test it out. It was at least a U-curio. It was better than all these bacterial genomes. It was just a beautiful grant and it just got trashed. But Maynard and I were there for the site visit and we could tell from the very beginning site visit that they had just came in loaded with loaded shotguns ready to take us out. And Ron, so at the time there were two ways of doing microarrays. One was Pat Brown's, it was Alpha Metrics, and Ron had bet on the Alpha Metrics, which I think in the end was the correct bet. In fact, totally correct bet was not a raise at all for sequencing or for RNA analysis. It's still useful for SNP analysis. But anyway, Ron just went on to do all kinds of innovative technology and the legacy of that center was Ron's Technology Center which is still in existence today. And just so many interesting technologies have come out of there. Well, like I said, I mean, he was big on nationalism. He was persuasive on getting Congress to vote for it, for getting NIH up to speed. At that time it was not a institute, it was a center. And I think he led to something that was a serious enough effort that it had to become an institute. And I think he recognized the value of the genome from very early on. I don't remember him being one of the doubters that had to be convinced. Yeah, he seemed pretty compelled from the beginning. Other than that, I don't really remember much about his role. I mean, Coltsbury Harbor was one of the places where annual meetings would occur. And Coltsbury Harbor was also some of the grantees. But I don't think that had anything to do with why he did. I think he felt it was the next big thing to do. Coltsbury Harbor had a good meeting structure already for courses and meetings, mainly during the summertime. Including meetings that he had gone to when he was a postdoctoral fellow. And he announced some of his work on the DNA structure in the early days of biology. So he had a warm feeling for Coltsbury Harbor and became the head many years later. And it had an infrastructure for having exciting meetings, relaxed meetings. So I think that that and Santa Fe became the two main meetings. There were also meetings held at Hilton Head, called GSAC. You know, sequencing analysis conference. They tended to be a little more focused on sequencing a little bit earlier while the other meetings were mapping and sequencing. But I went to all those meetings for a while until they got into heavy, heavy production sequencing. I got lost interest and went to the other things. Yeah, very, very rarely, almost not at all. I mean, it was not an NIH grantee, right? So it was a DOE grantee. It still am since 1987. I never got an NIH grant of my own until 2004. Right, the SAICS. The SAICS grant, yeah. I think that might have been part of it. I mean, I, in a certain sense, had one through my collaborations with collaborative research and Stanford and MIT and Wallis at Harvard. So in a certain sense, I had four. But I didn't, I got zero dollars out of any of those four. I mean, to my lab, interestingly. So Eric got 19 million. I got zero from that first grant. So it shows how good a businessman I was. But I was very dedicated and I worked hard on any one of those who wanted me to work on it. But it didn't land me as an obvious advisor to NIH. So I was an advisor to the NRC, the NASNRC in 87. But I didn't really get that heavily involved in the NIH efforts until we got to SAICS. And also, when the $1,000 genome launched roughly around there during Zerhouni's era, I think Francis got the bug of the one called Grand Challenges, what were they called? Roadmap. Right, yeah. Anyway, there were like 11 different possible roadmaps. And I showed up, I did advise on that one. Re-sequencing the biome. No, no, that was one of them. But there was another one on technology. And I was, it's one of the few times I was pretty emphatic. Actually I was pretty emphatic all the way through the project. I mean, my meek version of emphatic, which is they should invest more in technology. Because it was my gut feeling that the return on investment in technology would be bigger than the investment of the sequence itself. Both of which were significant. But I think that once they got serious about technology, the price plummeted by three million fold. And almost none of that price plummeting was due to, was traceable to anything that happened during the genome project. In other words, there were a lot of things, a lot of technology development during the genome project, kind of minor incremental things. But it all just went out the window as soon as NextGen came in. Because the only thing that we really used from the old days was shotgun sequencing, which is we used big time in NextGen. But shotgun sequencing predated the genome project. It was, in effect, was mostly ignored in the first four years of the genome project. So anyway, I think that there should have been more technology development from day one. And finally that was listened to around 2000, I don't know, two or three when they started, they started thinking about the project, the thousand dollars. And even the thousand dollars was too radical. They had to couch it in terms of a hundred thousand and a thousand dollar project. So I probably liked Craig more than most people throughout the years. We would keep kind of tackling the same problems. I think mostly independently. So we both were attracted to small genomes early on. And arguably the collaborative research did the first small genome, which is a helicobacter. But Craig and Ham did the first really peer-reviewed one, which was homophilus. It was in 94 and 95. So that was one example. We also both got the photosynthesis bug about the same time as part of the DOE. We both also did Ham, including Ham Smith as part of this, because he was essentially second in command to Craig and did a lot of the real technology side of things. Ham was the one that made all the shotgun libraries. And he was the one that proposed synthesizing a genome. And he got DOE funding for that. Even though they hadn't put out an RFA for it. And so then I did the same thing about the same time. And the personal genome project was very similar to Craig doing himself. The difference was I got the approval and he didn't. But there were same kind of ideas that you could have an identified individual. He started out not identifying himself, but then later did. We started out identifying ourselves. Anyway, there were many times where we would do similar things. The biggest difference that he's acknowledged publicly is that I was much more into technology development. He would say in meetings, George will develop technology and we'll use it. Which was very, I thought very gracious of him. And I tried to return the compliment as much as I could. And so, you know, and he was, he was an early adopter of everything. And I think that really distinguished him early. I think that really was the first thing was in 87, I think, he had an NIH lab full of ABI equipment. He was a protein chemist. And so most of the ABI equipment was protein. And he had a big budget, but very little room for people. I mean, typical NIH budget, you could get equipment, but not people. And so he had to have like really automated equipment. So when they came out with the DNA sequencer, he didn't really feel that he absolutely needed one. But he felt, why not? It's another ABI machine. Let's get it. And he got one of the first ones. He even got one before Lee Hood did, even though Lee's lab had developed it. They didn't have the budget to buy one, as I understand it. But Craig did. He got one. He was a pretty good biochemistry know-how. And so they optimized the protocols a little bit. And he did something very clever. So he's still at NIH. He ordered the CDNA clones from Clone Tech. So CDNAs was a good choice, first of all, because it got rid of all the junk DNA problems. So 100 times smaller problem in principle. He then sent the CDNA clones from Clone Tech to Collaborative Research, which is where I had my center. And they would take it in laundry. They would do contract work for other people. So they made plasma preps for them. They would send the plasma preps to NIH, where ABI technician, together with one of his technicians, would run it on the ABI machine. So essentially he was using a third company to do the sequencing. And then he would run it without, I mean he wouldn't, he would run it without annotation, without proofreading, straight into NCBI. And so the whole thing was like almost a paper project. It's like Clone Tech, Collaborative Research, ABI in CBI. And to some extent just had to like make sure, just do little quality checks that everything's going okay. And I thought that was brilliant. But a lot of people got angry, that was one of the first things they got angry about. Before they even got angry about the patents, which wasn't his fault, they got angry about the quality. Because a lot of the reads were going in and they weren't even human. They were unreadable. They were, or the good ones had a 3% error rate, right? And so a lot of people said, ah, shouldn't even be in the database. But other people, like I think, maybe it was Bert Vogelstein or several of the cancer people, went in there and they poked around and they found a bunch of cool oncogenes and cancer related things like mismatch repair and so forth. And so the people who had the prepared mind knew how to use it. So anyway, then the NIH did what a lot of public and private institutions did, is they started patenting this because it seemed like it was a good thing. And I'm not sure whether Craig, I don't think Craig initiated that. I think he went along with it. And then that got press. I was like, oh, this is horrible. We're patenting it, but you know, even though it was these low quality CD&I reads, which didn't have an application. You really need to have an, it had to be not only, not obvious, but useful. And a CD&I read was not, I mean, other than what people were doing, which was searching databases with it. And so that got, the interesting aspect of people getting upset about things is user results in the opposite of what they want, right? The more you raise the alarm in public, the more likely it's going to get more money. So the three examples that I was very close to was the common DNA. That attracted so much attention that it basically sparked investors to invest in Genotech and Biogen and Cetus and Amgen and so forth. And just, boom, out of nowhere, they got excited about Biotech. And then the CD&I, the Backel, and then the stem cell. For eight years, NIH wasn't funding stem cells. So California raised $3 billion. I mean, I'm not sure there would have been $3 billion spent in the whole United States on stem cells if there hadn't been this ruckus. So anyway, so finishing off, you know, where I feel Craig's role in all this was, that was the start. I mean, he was really not well known prior to that. But then he formed these related non-profit, for-profit, which was Tiger and HGS. And that was a brilliant business deal right off the bat because the HGS funded Tiger. All Tiger had to do was give them CD&I sequences. Eventually, Craig got tired of that relationship and he bought back Tiger so it could be an independent institute by that time getting lots of grants. And then, I think, then he had this race for the genome, which I didn't think was, you know, I mean, it was just using available technology. He had an inside track with ABI, which was a good thing, obviously. ABI's got a better chance of winning, fighting with its customers because it has unfair. So that was brilliant, too, business strategy. And then after the genome project was over, then he got disinterested in human genetic or sequencing pretty much for a while. He went off into synthetic biology using sequencing as a leverage to get people interested in his synthetic interests. He would sequence metagenomes and things like that. But anyway, so in terms of the human genome, I'm not sure it probably was a good thing to get kind of forced shotgun down everybody's throat. I think he did that. I tried to do that, but I was completely ineffective. He was very effective. And there was a period of time where people were doubtful that you could do holding a shotgun on human a mouse. In fact, I was part of a paper on how to do that, a computational paper. Gene Myers and James Weber from Marshfield did a paper. I was a co-author in that, but it took my name off at the last minute. I just don't know why I regret it. But I believed you could do shotgun on any size genome. And anyway, so he forced people to agree. And even after he did the human a mouse genome, there were still people arguing that it was not feasible or, you know, there was a P&S paper written by Eric and I think it was Bob and John where they kind of like critiqued the whole process. And I said, why don't you and your critique address the mouse genome? Because there really wasn't much mouse mapping, scaffolding at the time you did that. And that kind of showed that you could do it. And they just didn't want to have any part of that. And at the time, the reason they asked, or the reason Eric asked me was that at the time I was the only person who had both the venture and the, I guess it was the Santa Cruz Golden Path or Golden Gate, I can't remember, version in my lab. I mean, the two sides were not sharing sequences, but they were both sharing one with me. And so nature asked me to compare the two. And so they wanted to know whether I felt that the one was a derivative of the other. And in my opinion, it wasn't because when I went and got the Santa Cruz genome, it was actually in shambles. And this is not, I think it's documented in our paper, but not very aggressively. So what we did was we went to NCBI where they had quietly put in an FTP site with no fanfare whatsoever an alternative assembly of the human genome. This is in 2001 where there was a science paper and a bunch of nature papers. And we looked at the two and we said, oh man, the NCBI wants so much higher quality. We don't want the public to look bad. We want to have to put the best foot forward. We'll just kind of just call it the public genome. And I don't know if you saw the movie Contact, but it's a Carl Sagan thing. And they build this machine that allows interdimensional travel and some activists destroy it. But there's a second machine that's built in some, just like Rainy Island in Japan. Anyway, that's what this was like. This was like the second genome. And the reason Santa Cruz was so messed up, well, first of all, they rushed it with a small staff, basically one guy who was a hero who was painted as a hero. But also there were some bits that were flipped that made the assembly hard that were NCBI bits, but they had misinterpreted what they meant. Anyway, we compared the two and they looked pretty good. They were very similar but clearly independent in my opinion. So anyway, the critics didn't want to hear that and I didn't really care. And I think the answer is now, everything's done by shotgun. De Novo sequencing and resequencing. It's hard to say because I think the main innovation there was the 3700 instrument. And it's possible that would have galvanized everybody to go forward. It probably would have taken a little bit longer, but when you consider the goal was a polished genome, I think we might have gotten a polished genome faster, might not have gotten a draft slower. So in other words, rather than a draft in 2001 and a polished one in 2004, we might have gotten a polished one in 2003, actually faster rather than slower. And the polished one got almost no attention whatsoever. And the other thing that might have happened is there wouldn't have been such a panic. I think we're just kind of warming up the idea of technology development. If there hadn't been a panic, then it might have been fewer 3700s bought and more alternative sought. And we might have tried to finish the genome rather than produce a draft. I think all that were possibly unintended consequences, negative unintended consequences of a race. And part of it I think was stimulated by there was a round of review where there were five centers, which was one in Cambridge, Massachusetts, one in St. Louis, one in Baylor, Texas, one at Collaborative Research and one at Tiger. And they dropped the two that were not classical academic laboratories. So Tiger was non-profit, but it clearly was part of a for-profit institution and CRI was definitely for profit. They dropped those two and I thought that could have resulted in more innovation, or at least a different way of thinking, and also that stimulated Craig to go up and do his own thing. So I think that decision of dropping it from five centers to three was a poor management decision, maybe with hindsight. But I think if there hadn't been a race, there might have been more, the one option would be who's got the better genome, rather than who's got the first genome. And we still don't have, you know, it's 16 years later, we still haven't finished a single human genome anywhere ever. I hope to fix that soon, but no one's really encouraged, nobody's particularly encouraging of that, right? So you have to kind of do it on your own. Yeah, I think we should remind ourselves that it's not done yet. And it's some of the most interesting parts of the genome, I think. The centromeres could easily be involved in aneuploidies, in abortions, low birth weight, cancer, all that has to do with segregation of chromosomes, and that's where the centromeres are all about. So I think it can be done. In fact, I think the method that will get the centromeres and the other gaps will be the method that will displace all the other methods, just like the next end did. I knew Francis from his work on jumping, genome clone jumping, and also from 65 roses, and a number of things he had done before he came. I didn't expect that he was necessarily going to be a fantastic manager. You know, it just seemed like he was a regular postdoc, like I was. But I know Eric was quite in favor of it, and I think convinced Francis to try for the position. And it turned out I think he was really the leader that we needed. He was sufficiently into the science that he knew what was right and wrong and could help steer things, but wasn't so micromanaging that he would interfere with things. I think overall the NIH staff played a bigger role in those grants than almost any NIH staff in the history of NIH would be my guess. I could be wrong, but in terms of extramural grants, they were often co-authors on papers. I mean, that's a pretty deep involvement. And Francis kept a very active research of his own, which I think was great. Now whether, I don't think, you know, I don't know what it would have been like with somebody else. Clearly, you know, I think Jim might have been the perfect person to get Congress to vote for it, because he just had more credentials than Francis. I mean, it was a Nobel Prize winner at a very young age, and head of Cold Spring Harbor and so forth, he had some credentials. If Francis had come in, at the time Jim had come in, that probably wouldn't have been so good, even though it would have been a less rocky, there wouldn't have been the rocky exit. But I think Francis was the right one for that point onward. So I've already mentioned, I think some of the false starts were the mapping. Mapping turned out to be less interesting than it seemed at the time and consumed almost all of our resources for the first five years. So even though we ended up ahead of schedule, we would have been maybe a little bit more. And then the race, I consider a false start, because it discourages us from going for quality and it discourages us from technology development. I think the next-gen sequence, so who was involved? Sydney Brenner developed this MPSS method, which used beads, and I think the ultimate answer was in flat surfaces, not beads, but he was clearly a pioneer and around 1994 he published or patented MPSS, formed a company called Lynx. Lynx then licensed some of my technology, some of the multiplex tagging and strategy and I consulted for them briefly and then they merged with Selexa and Illumina. And at the time, what was interesting is when those three companies merged, they didn't use any of their technologies. So Illumina was basically a bead snip company. Lynx was a bead, subsequently by the Legation Company, or cleavage in Legation. And then Selexa was a single molecule sequencing company in its original instantiation. They basically threw out all those, maybe kept a little bit of the microscopy idea behind Lynx, and then in licensed some chemistry and amplification methods and then just started running with it. And there's still some dispute as to who invented the stuff, they had licensed it from somebody maybe other than the inventor. So Jing Wei Zhu was clearly a pioneer in all of the next gen in developing good reversible terminators for, so getting peer reviewed articles. I think we have to remember the peer review, not get so entangled with patents on reversible terminators for fluorescence sequencing and later reversible terminators for nanopore sequencing, which I think is still in the works, but it is promising. So the whole idea of sequencing by synthesis has been greatly helped by having those terminators. What else? For quality, most of the quality came in with half a typing, oddly enough. And because in a way it could have gone down with short reads, because there's mostly quality, most errors have to do with replacements. So if you try to put something that's mildly repetitive onto a scaffold even, it doesn't know where to go. But then half a typing, and I think the first really good half a typing was Complete Genomics in 2012 where they figured out how to break five to ten cells up into fractional genomes into 384 well plates. And then each of those would be read, which was kind of a mixture of 100 kb fragments. And it was as if you had done BACs, which by the way was also an innovative thing. Early on in the genome project was back sequencing, back in cloning. Yaks were a bit of a distraction, but BACs were really the real thing. We had an excellent BAC library from Peter DeYoung very early on, and for some reason it was sidelined, didn't use it. I think it was because it was Peter DeYoung's genome, and so he was a known individual, but they could have gotten an IRB approval for that. Instead they tried to make a diverse set of BAC libraries which more or less failed and ended up with a non-diverse single person. Again, just a de-identified single person, which I don't think is necessarily a plus. But anyway, the BACs were good. The not having the clone of the BACs was even better because then there were some cloning artifacts you would get and some sequences that weren't easy to go into BACs. If you just fragment up the genome and put it into a in vitro amplification, you ended up retaining more of the genome. That was 2012. The Complete Genomics had published a previous paper in 2009, which was really the first, I think, truly inexpensive genome. They had consumables, meaning reagents and supplies and equipment amortization, on the order of $1,500 per person back in 2009. It ranged up to $4,000, but it was in that. It wasn't quite the $1,000 genome. Who else was... I think that's it for innovations and false starts off the top of my head. I think there's relatively little logic because there are relatively few labs that do it. In academic labs, it's much more tempting to be an early adopter. You get labeled as a technologist if you're an early adopter. In industrial labs, there's also a great emphasis on incremental growth and in licensing things that are invented elsewhere. Typically, elsewhere will be some kind of hole-in-the-wall technology group that developed one particularly cool thing. It might be like Stan Tabor and Charles Richeson developed Sequinace. That made its way into the genome project for a while. Matthews and others developed capillary electrophoresis and that got brought into EBI. There weren't labs that were really full-time doing technology development. They might be doing mostly analytic chemistry or mostly biochemistry, and they'd have an insight and then get in the lenses. There are a few labs that were full-time technology. Obviously, Ron Davis' lab, Lee Hood's and mine were arguably three of those labs. It was hard to be academic or industry. You had to be at the interface a lot, which is hard. In terms of patterns, so patterns within that, the general pattern, most technologies get displaced, sometimes without a trace other than the history. It might last about a decade. Decade is a good length of time for technology to last. Our multiplex sequencing lasted about 13 years. Sanger's radioactive sequencing lasted a similar period of time, but depending on how you count the fresh start with the EBI fluorescence sequencing, it lasted a bit longer, but it all depends on where you... I would consider it almost a totally new method, because you're not taking the plates apart and slapping an X-ray film on it. There's just a whole lot of differences. You're running everything in one lane, other than four lanes. The only thing that's so different, the only thing that I have in common, was even the didioxies weren't the same. They were the determinators and other innovations. It takes about five years to go from a concept and maybe a preliminary paper to an instrument. That's a rule of thumb. Then that whole technology will last about a decade, and another one will come in. Sometimes it'll build on top of it's like the EBI did. Other times it'll completely displace it like next-gen sequencing. There's not almost a trace left of the electrophoretic era. Probably the next one might be nanopores. That doesn't quite fit the pattern perfectly in the sense it has taken a whole lot more than five years to get from a concept, sort of like 88 concept, to barely working in 2016. The first human genome was essentially sequenced with nanopore in 2016. The first bacterial maybe a year before that. That's kind of the gold standard now, is can you sequence a human genome at fairly high accuracy? That took from 88 to 2016 to arrive. I don't know how long it'll last, but I think it has the highest probability of displacing the current big iron, because one chip you could probably fit, well, you could already fit 8 million nanopore sequencers, and that chip is cheap and reusable, so you could imagine, and that could scale up to billions the way I and Torrent did. So anyway, those are the patterns that I can see. Well, part of it was single molecules harder than multi-molecule, so it was pretty noisy, and it really required a complete redo. I mean, both of the way people thought about electrophysiology, so when I did the first patent on nanopores, I was thinking patch clamp, and patch clamp doesn't scale at all, so the idea of changing, I mean, just like next-gen sequencing required a rethink from electrophoresis to flat on a glass slide, we needed to move the patch clamp from this sort of artistically crafted pore to something where you could have millions of them on a flat surface. In all cases, we were trying to ape, mimic silicon valleys or micro fabrication in electronics, where you have these super flat layers, and you put, and you use some combination of photolithography and self-assembly to put down the nanopores. Anyway, even with hindsight, that probably was going to take a long time. The other question is... It probably isn't that big a loss because right now, even with all the computer technology we have right now, the great limiting point for the genia nanopores is dealing with the data because you're getting terabytes of data from a little chip in short periods of time, and so we need to, and if we had had those terabytes of data back in the 80s, it wouldn't even be conceivable to deal with it, right? Now, at least we could kind of trim off. We can start building a field program with Gator-A's and GPU's right onto the chip and do a lot of data processing on the chip, but a lot of it's... If you do it really cleverly, it's incompressible data. I mean, the short answer is I think the $1,000 genome was distinctly better than everything else for technology development. And I was not a big fan of chips even though I was one of the first adopters of chips for a few demonstration articles. I was never... To me, I was just playing whether to see if there was something there. And the main thing that I got out of chips was stripping the DNA off the chip. So I used it in a very perverse way is rather than having them nicely ordered, stripped them off and started building big pieces of DNA out of them, or using them for making libraries. And to this day, that's the only thing I use chips for is synthetic biology, not for analytic, which was the point of most of the NHGRI efforts. I think NHGRI didn't get onto functional genomics early enough. I think the ENCODE project once it got started was terrific. In a way, it encouraged technology development. Again, not as aggressively as the $1,000 genome. I think the SEGS grant program was the best of the best. It encouraged out-of-the-box thinking. It encouraged interdisciplinary, multi-teams, not just multi-collaborators or multiple people, but actually multiple teams putting together innovative things. So it encouraged innovation, which is something you don't see often enough. And as it happens, the SEGS grant happened to fund me for next-gen sequencing before the $1,000 genome grant program started. And so when it started, I didn't apply for the $1,000 genome grants because I already had the grant and I didn't want to do double-dipping. But I thought the two of those were complementary. And if I had to choose, I would, at that point, pick the SEGS. I mean, I'd already picked the SEGS before I knew the $1,000 genome project grant was going. But I still would have picked it because I think it was more intellectually exciting. I mean, I went to almost all of the $1,000 genome seek-tech-dev meetings. In fact, I was on the grant-review committee. I was chairing the grant-review committee for a few years. And I loved them because they were full of physicists and engineers. But the SEGS was full of innovation and biology. So they were very complementary. They had two real jewels in the crown. Not just of NHGRI, but of all of NIH. The only thing that comes close to those two programs, in my opinion, are the transformative awards and the pioneer awards. Because they allow you to do things that would not normally get funded in a particular institute. Most of the really cool things that people celebrate or at least I celebrate are these interdisciplinary things that are cut across all the NIH institutes. Well, like I said, they had it sitting on their plate with the Boston Davis grant. And the fact that that grant got rejected by peer-review there in square, it's just the wrong peer-review. And then it got re-instantiated but greatly reduced in budget. I think that was the first sign. And then when I was on the grant reviews for some of the genome, early genome project grants, we were given strict instructions, no biology. And I think that was partly to develop a distinct portfolio for the NHGRI relative to the other NIH institutes and later they started having like joint grants. I thought that was brilliant. It allowed more biology to come in and it also effectively created bigger fund for NHGRI related projects. But I think that was partly the zone of inhibition from the other institutes that prevented NHGRI from doing functional genomics very early. And there was there were always clever grantees to figure out how to sneak it in. But and it is certainly quite healthy now for many years but took a while. So, nanopores in terms of technology, in terms of biology and technology, I would say in C2 sequencing and synthetic biology. I remember we were a SEGS meeting at Stanford was one of the first SEGS meetings and Francis was there and there was a Q&A period where people in the audience asked Francis questions and I think Roger Brent asked the question of would NHGRI consider funding synthetic biology? And Francis gave a pretty quick answer which was no. If I raised my hand and I said what if we had a program where you would do make targeted mutations to see what variance of unknown significance what their physiological effects would be. So that would be sort of synthetic biology but it would be testing the hypotheses that are flowing out of the genome project. And he gave a quick answer which was yes. So it's a really kind of way it was framed and that was the basis essentially for our second SEGS. So the first one developed next-gen sequencing starting in 2004 and the second one was on testing hypotheses using genome engineering which we're proposing fingers and by the time we got the grant we were already doing talents and then CRISPR. So in both cases we kind of exceeded what was going on. So I think synthetic biology is going to have more and more impact on testing hypotheses and moving genomics as often finamins often quoted to really understand something you have to build things with it. And then in C2 sequencing I think is another one more on the back to the analytic side but working together with the synthetic side is as we build more complex synthetic systems like organoids and organs possibly even for transplant for testing hypotheses or testing drugs you'd like to check if they're the real thing and you'd like to understand how the genome, epigenome plays out in every tissue of the body and it's still there's many things that are limited by the fact that I don't know what all the cells are making. We don't have a cell atlas that's a rallying cry that's happening now. I think it has to be a really good cell atlas that ideally in C2 rather than approximating every cell as an isolated sphere not having any neighbors in C2 you get the non-random distribution of proteins and nucleic acids throughout the cell body what other cells surround it but the morphology is important. So I think those are going to be the two big things synthetic, biology testing variance on those significance that we'll be making into the clinic and you can consider gene therapy as a branch of synthetic biology or a sister of it and then in C2 which could help those two along. Well I think the main barrier to PMI success is sharing of data not the quality but I think quality we need to address the quality I think democratization doesn't necessarily result in lower quality. For example the quality of cell phones is much higher now than when only rich people could afford it. Now there's 7 billion cell phones and 7 billion people roughly not quite one-to-one. I think that raw data can be quite poor and the consensus can be quite good so for example PacBio even though it has the worst quality data, raw data it has the best contigs and so when we sequence a genome from de novo we use PacBio because alumina results in 300 contigs and PacBio will put it into one contig for chromosome right away and the consensus is pretty good. Same thing for the nanopores I think one of their advantages is going to be the long reads so long reads gets you high quality just like haplotyping gets you high quality. Anyway I think it's not quite democratized when everybody's cell phone has a sequencer on it then I'll consider democratized because then you'll be reading out your environment as you walk through life and that will be reported out to the cloud. Again sharing data is the key thing for precision medicine I need to know when I walk into this room everything that's in the air everything that's in my food, allergens pathogens non pathogens etc and that will be and I think we're heading there we're sort of in the thousand dollar genome now completely with interpretation and genetic counseling it'll probably be a hundred dollars soon because of companies like BGI and Aluminous moving in that direction they'll only move as quickly as competition forces and move because they are a monopoly and then the antipores will be pushing both of those out of the way with potentially ten dollar or less. Once it gets to a certain low level it becomes democratized and it's free Google Maps is free, a lot of Google services are free to the consumer some of them don't even require you to look at ads I think we're going there very very quickly now even a thousand dollars you can imagine a lot of companies could make money by making it freely available