 Well, good afternoon everyone. Thank you for coming. I'm Eric Green, Director of the National Human Genome Research Institute, and I welcome all you to what is the last talk in this special series of lectures we've had commemorating the 25th anniversary of the launch of the Human Genome Project. As I've pointed out at each of these talks, we had a lineup of these six presentations all geared towards giving us a bit of a reminder and a bit of a reflection of what the genome project was all about, from hearing from people who were participants in various ways in contributing to that very important endeavor. And it's really certainly a great pleasure for me to introduce our final speaker in this series and who really represents a great friend of myself and also the Institute really for many years. And I can think of no one other than David Bentley to sort of wrap up this six-part series and conclude it, I think, in a very effective way as you'll hear from somebody who truly was a beginning participant in the Human Genome Project. By way of background, David graduated with an MA degree in Natural Sciences from Cambridge and a doctor of philosophy from Oxford University. He was then a postdoctoral Bite Memorial Research Fellow at Oxford at that time studying the genetics of Class III HLA genes with Rod Porter. And then in 1988, he became a lecturer at London University developing molecular tests for genetic diseases and initiating studies to map human genomes in Guy's Hospital. It was actually at that time that I got to know David. I was then a postdoctoral fellow. Just about the time the genome project was either launching or about to launch, we actually developed a transatlantic collaboration and really the rest of this has really been lifelong friends ever since and have really, I think, interacted in many ways as each of our careers have taken various twists and turns along the way. Well, in 1993, after the Human Genome Project started, David and I were actually both quite involved in the genomics community at that time, but David made the move at that point to go back to the Cambridge area, take up the position of head of human genetics, principal investigator and a founding member of the Board of Management of this newly formed entity called the Sanger Center in England, which was directed by John Solston, shown here as a picture of David and Sir John is now a Sir John Solston. This was actually took place at a very special venue at a very special meeting. I think this was the first of a series of meetings in Bermuda that were associated with crafting the policy that came out of the Human Genome Project to have a very liberal release of data well advanced publication and the famous Bermuda Principles for data release, which I think is one of the great outcomes of the Human Genome Project came about at this exact meeting. Obviously it was a very nice setting. It was part way between the US and the UK, which is why Bermuda was picked and David took advantage of these surroundings. He was on the right of his life and you can see here him sort of enjoying what you could do in Bermuda when you're not in a dimly lit room talking about hard issues like data release. I remember taking this picture and joking with David at the time that maybe I'll be able to use it someday when he gets famous to embarrass him. So this is one of the times I'm taking advantage of trying to embarrass him now that he is famous. Well, needless to say, David took a leading position in the Human Genome Project and related international consortia over the years, particular as the Genome Project was winding up and then beyond the Genome Project he took a special interest in human genomic variation, a heavy contributor to the SNP consortium, heavy contributor to the International HapMap Project and also the Thousand Genomes Project being involved intellectually at every one of these at some level. But then in 2005, David departed the Sanger to become the Chief Scientist of what was then Selexa, which later became Illumina. And then since 2007 he's been Vice President and Chief Scientist at Illumina. Well, his long-term interest in the study of human sequence variation, particularly human genomic variation, and its impact and implications for human health and disease really have served to really create a path forward that he's been involved with both on the academic side while at the Sanger and now in more recent years on the commercial side working with Illumina. I think his work at Illumina is really focused on trying to develop fast and accurate sequencing of human genomes for adoption and to the benefit of healthcare with a focus on rare genetic diseases, cancer, and I think increasingly all types of relevant applications for healthcare. I think these efforts have been exemplified by the goals of efforts that he's been in some way involved in such as 100,000 Genomes Project, which is a partnership that includes Illumina, Genomics England, and the National Health Service in England. At a more local level to NHGRI, David has been a long-time friend of the Institute. He served in various capacities. I was just reminiscing that David was brought on to our Board of Scientific Counselors of our Intramural Research Program about the third or fourth year that that program existed and served. And in fact, I can see several people from some of the intramural investigators here who David saw at very junior level and have sort of benefited from his wise counsel. But there's been other times we've used him in other ways. He's been involved in numerous planning meetings of the Institute, planning especially beyond the Human Genome Project, various major strategic meetings that we had shown here as one such one. I would be betting that this is probably an early house. Although, yeah, I'm getting a nod that indeed this is one of our important planning events that led to publication of one of our strategic plans. And always, David's advice has been generously accepted and his expertise and input to thinking very broadly about what really needs to happen in genomics and also bringing importantly the international perspective, which we value greatly, has been greatly beneficial to NHGRI and its planning pursuits in doing so on behalf of the whole genomics community. So please welcome David Bentley to the podium. I think he's here today to wrap up the series in a talk entitled The Genome Is For Life. David. Thank you very much. Eric, you're too kind. Well, I'm sure you had even more embarrassing photographs. OK. So thank you all for the opportunity. It's a really wonderful hospitality here that I've received and coming back. It's been a while actually, but certainly this feels if not home, it feels something like home from home. And I hope to really go back in time a little bit to give you some reflections of perspectives that I have drawn on throughout the career, much of which Eric actually outlined already. So what I'd like to do then is to first of all summarize the entire course, which I have learned so much about in these four goals. From the very earliest times and long before the Human Genome Project was truly vocalized, I became involved in finding mutations that cause disease to really have an impact on understanding and treating illness. And then, of course, the second part of this talk will be really to focus on the move into the Human Genome and to sequence it and why some of the immediate follow on projects where we were all beginning to wonder what was it about and what impact would it have. And thirdly then to develop technology. This was the big transition. There was really much to be gained from new technology development to bring genomes to the individual and individuals to the genome. And finally, to really move as recently into healthcare. And I'll have a few slides available anyway, depending on time and interest, to really cover something of the recent work that really was the early foundations were laid some 20 odd years ago and bring us full circle to what we're trying to do, understand and treat illness. So those are the four sections. Let me start off with the early concepts. And I'm going to start off with this iconic slide, which is not just an obvious one to really bring as an iconic slide, but it's one which has motivated me for some time. I spent a little time collecting these images as part of an educational program, but they strike a chord on the top left that is the Human Genome. And that is about the only time you ever see the Human Genome in all its glory and in all its individual chromosomes. And of course, the concept here is of gradually unraveling those chromosomes to look at the beads on the wire, which illustrate the nucleosomes through to the threads. You really can see these wonderful individual threads, which are then of course encapsulated by the double helical structure. The point to make though about this or first point to make is really it's the property of DNA. It's the thread, it's the linearity, it's the continuity of DNA in these structures, which of course confer so much about the genetic properties and everything we've been studying. And I think it's been our core, certainly my course over the last 30 years or so, has been really rooted in the properties of this material. In another very important sense of course, it is the complete genetic code, which is the recipe for life and sense in the first of several references, this really is the genome, which is for life. Let me take you back to my first year as a PhD student for a moment. I actually had the pleasure of working in Fred Sanger's division and this was at the time when they were just learning how to do lambda a second time. They'd got 90% of the way there with restriction fragments and isolating them and cloning them and sequencing them. But the Fred Sanger group had observed in their final publication that various methods were studied for sequencing specific regions, but all were much slower than the random approach. And this really was the beginning of the concept. If you randomly sample and keep on doing the same thing, it is simple, straightforward, you can do it, you can maintain quality and productivity. And you end up with the complete solution because of the continuity of the DNA. And the bottom right was actually a pilot study from Steve Anderson, who is in the lab at the time, who demonstrated you don't asymptotically approach completion, but in his case you actually reach completion as you start to oversample. And of course, oversampling also enables the duplication and replication of results. And that again was a very impactful time. Moving on to Oxford, this was my first contig, or mine and my colleagues I should say, when as Erica's alluded to, the actual concept, the top half of the slide illustrates the unknown and Rod Porter insisted we put that figure in because it illustrated what we knew when we started the project was much less than what we knew at the end. And we were able to really organize these genes and even to discover the evidence for two C4 genes and not one. And really the field opened up in that transition from figure one to figure two was really what it was all about. But it was the first contig. It was physical mapping. It was the beginning of looking at the linear relationship of genes on DNA to actually decode the information. I'm switching gears for a moment. And this I have to say, acknowledging my wife here and not me, this is my wife's work, indeed I met her somewhat during this work and she was involved in utilizing all the studies of genomics to actually investigate individual patient. So you can see we're going to come full circle with this at the end. But here was a patient suffering hemophilia, hemophilia B, in theory a defect of the Factor 9 gene. And she did an extensive six to nine month program, genomic cloning, screening, subcloning. There was no PCR at that point and eventually subcloning and sequencing the exons of the gene to find one mutation. This really was needle in a haystack par excellence. And that induced a change in the protein structure which was a non-obvious mutation. And actually what was needed was a great deal of protein work to actually investigate what did the mutation actually do. And this of course is an interpretation challenge which is still with us today. How do you prove the mutation that you think is there is actually there? This showed a high molecular weight protein in this particular patient compared to the others. And I've highlighted over there the actual N-terminal protein sequence that Anne determined in the course of demonstrating that what she inferred from the sequence was actually true at the protein level and really raised a whole functional implications of these changes. So these are very early examples of problems we still grapple with today. Staying with the burden of proof there's another important example which I wanted to flag because it illustrates also technical challenges in the way that we worked with them. Among a series of hunter syndrome patients which we identified and mutations in was one silent change. And the interesting thing here is a silent change led to a cryptic donor splice site which actually was predicted to remove part of the coding sequence. But again it required proof and as it happens we were working on small amounts of cDNA from the blood. And so the evidence was there before us to show the altered size of the molecule which had actually incorporated a piece of sequence which was abnormal and created something unusual. So the burden of proof was really there and is still with us today in terms of proving what we infer and how do we establish the functional importance of particular mutations. It has got a little easier but this has been going for a long time. Tackling the question of completeness for a moment and the conundrums this was an interesting, this began to move us from single mutations and single cases to trying to understand a complete spectrum of disease. This was DMD and beckermuscular dystrophy. It's milder phenotype. And here one of the observations was two thirds of them have deletions. What about the other third? But also the deletions don't actually correlate with the size. The intensity does not correlate with the size. And what we actually found by extensive searching with non-sequencing techniques first and I can come onto that in a second to find that actually the point mutations were every bit of severe as big deletions because they resulted in premature termination. But they also explained the complete spectrum and shed light on the functional importance of the C-terminus. Lose the C-terminus and you have Duchenne. Keep it regardless of the gross deletion in the middle and you actually have beckermuscular dystrophy. So a fascinating look at the spectrum mutations that cause disease. And this last one steps back a couple of years to factor nine again to look at again a non-sequencing method because it simply was impossible to sequence for mutations, period. If you're going to look at seven in this case and multiple patients with hemophilia B and we had this method which was actually based on the chemistry of Maxim and Gilbert to screen for mutations and actually found in one experiment four out of the seven patients we identified mutations between these two exons. The other important part about this slide is this was my calling card. This was the first time I presented at Cold Spring Harbor and I was very surprised to present this data and that's when I met Eric and I also met Maynard Olson who was fascinated and I realized he'd been interested in the same topic for some time. And so I came on out to St. Louis to meet Maynard and Eric and many of the group. And really this was the beginning of this long and successful relationship all because of this early study. So this really was the very seed at the beginning of it's like prior to human genome at the same time I was working on this we were actually spending some time working on pilot studies for chromosome mapping combining the two interests, if you will. And so I'm going to transition rather rapidly now to the process that then took us into the human genome towards the end of my time at Guy's Hospital and the beginning of Sanger of course. To summarize again just to sort of put us back on track in terms of some crisp facts, if you will. These are my at least four key decisions that were taken in terms of the planning and the strategy for the human genome project. The first was to sequence all the genome and not just the CDNAs. There was a tremendous argument about whether to sequence the non-coding or just to work on CDNAs and clearly for the very reason that you want to understand the unknown you should sequence it. And this was an argument which John held that others did not necessarily agree with. And that was really quite a debate and it was rather important that that debate was won as we'll see a little bit later. Second not to wait for a new technology and I'll flag why in a second. Third really to establish an international consortium in the public genome program. And fourth as Eric has already mentioned to create a formal public data release policy and that's the Bermuda statement right there which I think is probably very well known so I won't go into detail. But it's the freely available in the public domain to encourage further benefit which is the real essence of the Bermuda statement. This was a statement made by John Solston at the time of writing the proposal to start the Sanger Center. It was a very bold proposal. It won two million pounds per page which is about the top density of grant writing which we've ever had in the UK I think. It was bold. The proposal was bold let alone the actual document. And this illustrates some of the little features to capture the spirit of starting Sanger. The Sanger was formed in 92 to sequence the worm and the yeast and actually to work out what to do about the human. This was not a done deal. The Wellcome Trust were funding the human genome at this point. They were funding the sequencing of nematode and yeast and to give us a chance to work on how to tackle the human while some of these questions were still being answered. And clearly that was my role very much to bring human genome mapping from my pilot experiments at guys funded partly by MRC and partly privately to actually form the seed for the human genetics part of Sanger. And I brought with me two chromosomes X and 22 as part of that exercise. We weren't necessarily going to stay at Hingston but interestingly, two old friends got together and this was the beginning of the EBI and the really cementing of the EBI campus that the Hingston campus itself as the genome campus of the Wellcome Trust and Mike Ashburn are there and John Solston really brought the EBI and it's funding the European center to the Hingston campus and that really cemented it. Fred Sanger was actually asked whether he minded putting his name to the Institute. He thought about it and said yes, but with one proviso it had better be good. I gathered John Solston said he wished he'd never, he hadn't asked. Clearly back of all this is the Worm Project and I really want to acknowledge the strong UK-US collaboration which in so many ways exemplified what we were to do with the human genome and John was a tremendous protagonist of this and of course Bob also in Wash U so the chance to meet Bob at the same time as Eric and Maynard was a great opportunity. Sometimes it got rather lively and sometimes it got committed to some satire when the worm was at the heart of the genome project but I don't know if that was the English view or the international view but certainly I think this was provided to me. This is my leaving present from Sanger actually was this picture. Then a very important character Michael Morgan and I've jumped forward now to 1998 to pick out perhaps the most important statement which is in 1998 which is when there was a big debate about the human genome and Solera was on the map and the Wellcome Trust decided it agreed with us to fund one third of the human genome at that point and that was really putting a stake in the ground on behalf of the international community to say this should be done and interestingly there was a real teaser in this. It's interesting because at the time they were in the Wellcome Trust and I was actually here in Colesbing Harbour at a strategy meeting which Craig Venture and Mike Hunkabill were visiting and were invited to and there was a very interesting debate about what to do given this new challenge, this competitor if you will to the public domain project and the tweak, the twist in the tail was that the Wellcome Trust agreed to state that in order to ensure the whole genome would be in the public domain it would have necessary fund a half or if necessary would fund even more. That was a big stake because suddenly that meant suggested that the Wellcome Trust was prepared to have Sangha really do the majority of the genome and that was a tremendous support I think for Francis and colleagues here at the NIH to figure, yep, there is a real commitment to do this and clearly the similar sort of discussions obviously were taking place over here but that was the moment I think at which the project really got cemented in a very exciting fax I think that was received at the strategy meeting and everybody was paying attention to the speaker and Jim Watson came in, this thing had come to his private fax and it landed in the corner of the table and the first person's eyes went down to read the statement and then it got passed along and more and more heads just went down until suddenly nobody was looking at the speaker and everybody was absorbing the implications of this fax and what it meant for the future of the human genome. I forget who the speaker was and this is the Board of Management that really was assembled to put into practice we were mostly in post, though Mike Stratton top right joined us at about that time but you can see the rest of the cast of characters there. There's another way of looking at Hingston, you can see something of the idyllic nature of the process and I just wanted to pause for a moment to entertain you with something else. There was much entertainment and joyous celebrations throughout our time at Hingston but this was a particularly poignant moment that summarizes I think the spirit of Hingston, not only the geography and the architecture but this sketch was something which I drew as a leaving present for our head of corporate services, it was carved onto a park bench but you can see the elements here range from the serious aspects of the nematode worm and the mouse which is being studied at that point and right onto the local hostory where many of the most important decisions were taken, it was the red lion and various other aspects. Murray was particularly annoyed at the moles that kept on digging holes in his grounds and so I had to capture that on his departure to say the moles got the last word. Onto progress following the real cementing of the program and chromosome 22 was a really important decision that had been taken some four or five years before. I'd been working on the X chromosome, other groups had been working on other chromosomes and it occurred to me to move to a less populated chromosome made sense for a lot of reasons and also not least it was smaller than all the ones that everybody else was working on so actually attaining that all important continuity of the DNA to describe a complete chromosome or a complete genetic unit, it was only one third the size of the nematode worm so really it seemed to be a much more tractable problem and Ian Dunham who in fact was working in the same lab as Eric at the time over in St. Louis actually came back and started this program and we made very rapid progress in guise and took the entire map up to Sanger and that's how we really got to be in the position we are in essentially the first human chromosome sequence which certainly followed many of the concepts established for finishing C. elegans sequence. We documented it for this chromosome, the accuracy, the continuity and also very importantly the annotation actually using the sequence to really describe the gene structures and also finding a rather large number of pseudo genes which of course were going to complicate the picture of annotation forever to be honest and this is a really important painstaking work that Ian Dunham led and really worked closely both his own mapping and of course Jane Rogers and John Solston leading the sequencing and the overall charge and this is a collaboration with three other groups particularly Japan, America had been contributing to this program as well so again it was already the international consortium was well established even within the same chromosome. Very importantly it happened in December and this was the catch phrase that was adopted by the Wellcome Trust. Chromosome 22 is for Christmas, it came in time for Christmas but we had to reflect on the fact the human genome is for life, it's going to take a little longer but it's going to be with us and so once again the genome is for life was somewhat coined from this Wellcome Trust piece of dog rule. This summarizes the program as a whole. It is a hierarchical or clone by clone approach but what you see here is the layer after layer of maps and I think I was discussing this morning with Chris Donahue the concept of that these were important steps because they provided some level of reassurance about how we were doing and where we were going and some of this was really very novel stuff the genetic map in this case I'm citing the Jean Weissenbach microsatellite map which provided a genome-wide consistent genetic map there were other genetic maps as well particularly over here from Marshfield and others and then the ability to integrate markers that were both polymorphic and non-polymorphic using radiation hybrid mapping and by selecting gene markers building a map of genes of all that was known at the time about not only the panoply but now the order of genes and so once again my recurring theme well we may not get that the recurring theme being that we're looking at order across this essentially continuous piece of string and using the continuous of DNA even though it's been broken up in radiation hybrid panels nevertheless the continuity can be rebuilt by reassembling the patterns so that you can see which gene or which probe is next to which probe on the basis that the further apart they are in the genome the less similar their patterns are their fingerprints if you like and hence pattern recognition from patterns of inheritance in families for the genetic map patterns in radiation hybrid maps similarity and distance is a surrogate for sorry similarity and difference is a surrogate for distance between markers and that is then capitulated in transferring that framework map to the clones which themselves are being overlapped independently by fingerprinting again a method from John Solston and also from Bob Waterston and main art as well I was very much involved in a different method of fingerprinting but the same concept in mind to get rapid repeated experiments that lead to continuity little touch on Steve Anderson's sequencing anecdote from the beginning and so the clone map was assembled really belt and braces fingerprinting which is pattern recognition again to detect overlaps anchored by all those probes from the earlier layers and through to the sequencing which of course is much more familiar to all of us and just to list the features for a moment there but particular to comment on the fact that given it was clone based it relied on the accuracy of bacterial cloning and replication to provide template that was unmutated and also very much provided a source of material not just for the initial sequencing but you could go back to any point in the genome and do localized problem solving and high quality finishing and that's where we really got to the quality we're at today and that's why I emphasize the fact that waiting for a new technology might have resulted in a very different product it was the fact that we had here to these well established methods we knew what we were dealing with and the hierarchical approach led to a framework that allowed for a very highly accurate and highly continuous genome. What about the initial impact best summarized perhaps in the front pages of the journals clearly we're not going to go in I'm not going to go any of the content I'm just going to summarize a few thoughts from the impact of the initial genome this of course was the draft genome at this point there was a complete transition from the finishing being the all important thing to getting it out there and actually in many ways that was a very good thing I think the urgency with which the thing became a race which in many ways was uncomfortable at the time and in theory may have compromised quality through producing a draft the point was first of all that it accelerated the entire process everything came together in coalesced within three or four years and secondly was the commitment to continue after this to get the finishing done to that very high standard that I just mentioned earlier. So once again the genome is for life suddenly this genome exists it's in the public databases and there's a question of reconciling versions of it which got resolved I think later on but nevertheless this is here this is going to change things and it's an important impact which of course we didn't fully understand and if you do go back as I did the other day to read the draft genome paper there are some very humble statements in there it's acknowledged we haven't finished we don't understand much about what the genome means yet we can't devise all the results and all the genes and while things like chromosome 22 were pointing the way at the same time there was really a great deal we didn't know we knew we were putting it down a foundation we didn't know quite what was going to build on those foundations and that's a very important and very appropriate I think conclusion or tone from the paper. What did others think? Oh let's deal with the international genome for a moment this is to summarize three years on contributions the variety of contributions and clearly this is the slightly UK perspective because it divides up all the US labs and makes Sangha look rather big I make no apology for that that is my home country but interestingly the point was that there were large and small contributors and I think while some felt squeezed by the process there were small contributors and recent late entrance to the human genome who actually because there was a transparent and coordinated process they were able to participate and that was a really important element of this. Equally important this provided a jolly good artwork for a t-shirt and of course the nationalism rather took an extra leap when these t-shirts were distributed to the entire staff of Sangha on the finishing of the human genome. What about far-reaching impact? Back in these the overall summaries if you will to which I've attached some thoughts about each of the key decisions if we had sequenced the cDNAs and not the genome we would not have any of this in blue understanding non-coding RNA genes with DNA protein interactions regulation disease causing mutations except in the cDNAs and the higher order structure that's a huge amount of research that has already passed since then which really arose from the determination to decide to do the entire genome and to invest in it. I've mentioned already the advantages of waiting for a new, not waiting for a new technology we captured the advantage of clone-based continuity and the accuracy and also as I'll show a bit later this was the very foundation for new technologies to be successful and I'll explain what that means a bit later. Third the international consortium cannot underestimate the success of the model for sharing consensus standards discipline. It wasn't easy. It was a pretty tough to be part of these consortia but it made a model for the HATMAP and the SNP consortium that followed and provided a level of social engagement and economy and to create the public data release policy. Looked at from other pairs of eyes this remarkable one back in 98 which I have to share Matt Ridley really is something of a visionary he's a journalist. Medicine need no longer treat the population it must start to treat the individual. The seeds of personalized medicine were really here in this document. Far-reaching impact taken up by the other national dailies or indeed the same one what it is to be human was quite simply what on earth are we looking at? The challenges of clinical interpretation are perhaps indeed the medical, the training of the medical profession to understand genomes. Sorry you're not a genius so I was holding your genetic map upside down. Question interpretation remains unresolved and the predictor power of genetic information I really thought this one was going to be right. Now we'll be able to live to 150 and still never see a Brit win Wimbledon. Thankfully Andy Murray proved us wrong back in 2012. So we have some milestones and I'm just gonna give one slide I think on each of the next three just to provide the continuity before I move on to new technology and the more recent implications of the journey from the human genome. But these were the immediate projects that really started from it from looking at a sequence of bases to identifying points that were varied in different populations through the SNP map to then beginning to collapse the information to blocks, haplotypes that made sense and beginning to condense the genome again into more manageable proportions for genetic studies. And then of course associating a subset of those blocks with actual associations with disease through GWAS studies. All to say about the SNP consortium it's been quite well documented but it was an international public-private partnership which took some time to set up and then became essentially an international or academic network for data generation and very importantly data coordination from Lincoln-Stein leading the center at Cold Spring Harbor. One of the key things about this project was the goal was set at 300,000 SNPs. The result was 1.42 million. That's one of those under budget and whatever it is over achieving that Francis so often quotes and aspires to. But how has it done? Well it started before the genome with the reduced representation strategy where we isolated restriction fragment populations to sample the genome at discrete points based on the same size fragment and then sequence to try to then focus the depth of sequence in certain areas. And what you see is what we got there in that there are a few places where the reads pile up enough to identify SNPs and we recorded them and that was it. That was the goal for 300,000 SNPs. But particularly a strategy which I'd like to credit Jim Mulligan for really driving. I'm sorry Jim's not here today but I saw him last night. But as the draft genome was emerging we moved from a pre-genome to a post-genome strategy and we took all the data and aligned it to the genome and of course it then immediately pulled out all the other SNPs and the single reads that until then had been orphaned. And in a bigger way this was the concept that John Solston really advanced that the human genome by its continuity will draw together all other biological information that has sequence attached to it and therefore will actually give a much more complete picture. But to have it happen so rapidly and so quantitatively a four-fold improvement for probably a few nights of analysis from Jim but realizing the value of the genome to make more of the result was a phenomenal outcome for the SNP consortium. Again I'm not gonna say much about the HapMap project except to comment on a very complex organization some 50 groups at this point brought in just to phase one to take the SNPs or a subset that were from the SNP consortium and to utilize them now in multiple populations. Not just the Europeans but the Arabians, Chinese and Japanese groups as well even in phase one. And this project really took off to a much bigger study beyond the phase one that I was involved in and is published in a series of articles. But all led to this continued, this condensation of the genome into a simpler pattern of haplotype blocks and this is illustrated actually best. I think in a commentary of the time by Sfante Parbo who illustrated the population pool and individuals drawn from the pool to take particular combinations of those haplotype blocks and we should consider the genome as a mosaic of these discrete segments which really illustrate our entire history and current relationships. One part of HapMap is community engagement and I saw Charles Rotimi. I'm delighted to see he's here and I hope he remembers this wonderful time when I got involved in the community engagement process just peripherally with the Maasai and another very interesting feature here is it was a delicate process actually to engage and to have their permission to go out to the village. It was brokered by a local doctor, clinician, Duncan Nagari and then the children of course always steal the show and the children had never seen photographs, I don't think but they certainly had never seen photographs of themselves coming out of the back of a camera and this was the digital age arriving to a village near them deep in Kenya. And finally one point about moving to the Wellcome Trust Case Control Consortium. This is moving from now a HapMap project to applying it for case control studies and here in particular, this is one of the big transitions here which I was very conscious of. This was a transition from an international to a national consortium and a remarkably interesting combination of clinical and genobic and statistical individuals. And the other important thing about this perhaps was the concept that the traditional case control study takes 2,000 cases and 2,000 matched controls and the matching is important and you look for associations that are particular to the cases and absent from the controls or allele frequencies which are higher or lower higher in one than the other. And in particular the concept here was to use common controls which of course could go much further if you actually included in this case 3,000 blood donors into the collection and actually they served as controls for all the other studies and it proved to be a remarkably successful strategy. It also really helped I think to fuel the concept of biobanking and having collections of samples that have long term value. It's not just collect it, investigate and then publish and finish. And this for me is the most compelling figure today even. The Manhattan plot, I don't know where it started but certainly this was remarkable in one figure looking at seven disease studies and summarizing a complete genome. This is a whole genome for each of these cases. Association studies and significant associations within that which really illustrated for me the outcome if you will of much of the work that had gone before. We were now layering difficult common disease association loci onto the genome. I just gonna switch gears now because this was one of the last very important things that happened back in 2004. I was actually pressured by Chris Gunter who was then the biology editor of Nature to consider about genomes being used a bit further on and this came up how personal genome information will be used. Now the top left is the entire program we've talked about so far and the bottom left is a distillation of the data to illustrate what is clinically relevant, the causal variance, what's known about them, selected risk information which could be brought together, layered on again to a personal sequence. And this of course is one of the great stimuli in thoughts that were going on in 2004, 2005 that was now stimulated by the question of a new technology or some way of actually taking this very public, very detailed, very sophisticated body of research knowledge and actually applying it to people, to individuals. Getting right back to that first hemophilia B example I showed you, how could this become relevant to people? And that of course from then on really began to fuel a lot of my thinking. And so I'd like to move on now to new technology and very quickly give you something of the flavor of the new technology. Back in 1998, which was before the human genome was really, the draft was just about, I say getting not complete, but rather early stage, probably a third of the genome or something. We were shown this picture, two of us, Richard Durbin and I, were shown this picture of what were purportedly single molecules attached to a surface that he felt might be useful for some form of polymerase-based assay. Now we had no idea whether they were single molecules or not, but the evidence is on the right that if you look at bleaching, you see a stepwise process that when one molecule has one fluorescent dye, it bleaches, it's gone, and you actually see these things blinking or blinking out. And you can actually see then examples of mono dispersion, the single molecules, and a few which had two or three molecules superimposed. And this was actually the concept that started Selexa. We brainstormed a little bit about what could be done with this and we weren't really sure, was there a need? And sure enough, of course there was. Fast-forward about five years, this was an astonishing autoradograph that appeared on a desk just the day I was visiting as a consultant to Selexa. And this demonstrated that they had put together a chemistry with reversible terminators and an enzyme that would actually look at these terminators to incorporate step-by-step 20 bases in an entirely base-specific way with very little residue, a very efficient chemistry on day one. This was the kind of experiment that took about 14 hours to do, and I don't know how many of you will go this far back, but I had a PhD student who manually did PCR before the thermostable polymerase, and he had to do it from 10 at night to about seven in the morning. And this was a similar kind of experiment. She dumped the autoradograph on the table and said, I'm not doing that one again. She didn't have to. This was the essence of SBS. So here's the summary of the technology and on the top right are those arrays, which are now turned into single molecules which are now amplified, but in situ, the amplified, but they're still single molecule arrays. They're still clonal. So here they are very genuinely replacing cloning but still allowing this amplification of template. On the right is the reversible terminator chemistry, very innovative synthetic chemistry. The bottom left illustrates the concept of paired reads. I'll show that again in a second. And the bottom right is the whole image data collection. The optics inside the box and the ability to take many, many thousands of photographs and nowadays millions to actually capture, in this case, the same cluster, about a micron across, which changes color in accordance with the base sequence. And once again, we're beginning to put together that continuity of bases in the DNA, but now in very short fragments in a stepwise manner, as illustrated in the previous autoradiograph. This was the first box. I think it was maybe the second or third. And the only reason to show this, this was developed in 2006. And this was at a time when the NHGRI genome centers were going through their renewal. And clearly there was a lot of thought about new technology and that was an important feather to have in your cap when you were actually going for renewal. And this is one notable center, not mentioning names. It was the most northeastern of them in the US. And we got on a phone call with the PI in question. Who was noted in the Human Genome Program. And we said, well, you know, it's not really ready. You know, it's not, you know, it's with signs of signals working. And we're not sure if it really works in a reliable enough for you. And so the response down the phone came, send it anyway. I don't care if it works, just send it. And in fact, remarkably, it did work. But it was just that timing was kind of perfect. So the concept of short reads remained a mystery to many people. But this is where, again, the human genome played an enormous part in impact where once you have a known sequence, you can align very short reads. You can remove the ones that have ambiguous locations and you end up with a high fraction of the genome built to a consensus. Add in the paired reads on top of that and you now have two chances to align a read. And here you don't have to remove the ones in the repeat and that actually increases the coverage. Something which looked good on paper, but over the course of the next few months, proved itself to be pretty much right. And this was the basis on which we went for a human genome. And I show this because there were a number of human genomes that were well known from the HapMap and actually Francis, and Eric, and Jim Mulligan were all involved in actually the optimal choice of a genome which partly thanks to Charles Rotimi, I felt also had to be an Africa genome because there weren't African genomes really focused on in the same way. So we wanted to complement what else was going on and actually sequence a genome of the Yoruba from Iberdon, Nigeria. And this was the anatomy of the genome, deep consensus, I won't go into the numbers, but we were looking at structural variants within this genome and very importantly, assembling novel pieces of genome that were not in the reference. So the blue are the aligned reads and the red are assembled reads across new areas that were not in the reference. And this really illustrated the concept of getting into the genome. This was published in Nature alongside a tumor normal pair from Washington University, from Tim Leigh et al and Elaine Mardis and Rick Wilson, and also alongside a genome of an Asian individual, all in the same issue of nature. And if you recall a little few slides ago, I was talking about where Chris Gunter was pushing my thinking in terms of personal genomes. To my astonishment, the front nature, the front page of nature picked it up and provided this very provocative cover, which I didn't really even fully get for a few minutes until I saw. The original draft sequence cover was safely contained in the break glass when ready. But are we ready? And suddenly our life is in our hands once we let this genie out of the bottle and genomes become for everyone. And it was the dawn of the personal genome age, really brought in front of us, quite literally in front of our noses by nature themselves. Another paradigm I just want to reach just to mention briefly is shortly after this, and I guess it was probably a year after this. Mark Omara, who I think spoke to you recently, though I gather from Videolink, a busy man, actually provided this. He said he'd tell us a story about canned sequencing informed therapy, and this was a case, a secondary metastases from a cancer, which he sequenced genome and transcriptome and showed exactly why the existing drug didn't work because the mutations were in the target, identified a pathway that was knocked out in 17 different ways or disrupted in 17 different ways by what he had observed from the sequence data, and furthermore identified another drug which was FDA approved already and could be immediately used. And this was absolutely was tailored medicine, personalized medicine, exactly the illustration of the concept that Matt Ridley had written in I think the Daily Telegraph some years before. And this was really an eye opener for me in a very big way and published somewhat later on by Steve Jones et al from Marko's group. So now we add two more milestones to the summary, personal genome sequencing has been illustrated and genome sequencing can improve therapy. And this is really where I wanted to move to the final piece of my presentation to illustrate where things are going now, which is if you will, our own contributions to what the future of healthcare might look like. Just to reiterate this slide which had come from 2004, I added two important arrows that I probably should have added earlier. How do others benefit after this? And the answer is by doing that, that the actual outcomes from every clinical decision and individual sequence should be fed back in some way to the community. So the main thing was to start on the process of getting individuals supported by the genome, and this is a fast genetic diagnosis where an individual undiagnosed condition actually we can do a whole genome in four days and get the answer back by filtering an apparently inordinate number of variants that might be relevant very quickly down to six or to one. And indeed to illustrate that once the genome had spoken and revealed a mutation in question, then it is possible to actually confirm the diagnosis by working in tandem with the clinical observations. So the genome and the clinical observations come together to give a convincing picture that this is a mutation of relevance. And of course, one of the reasons that is relevant is you can look at all the previous examples and this gene was previously known to disrupt copper metabolism, which is indeed extremely relevant to the diagnosis which was eventually made, which was Menci's disease. The same thing was true of cancer and this is quite simply a rerun of Marco. This is in the UK and the reason that I'm showing this will become evident in the next slide, I think. But this was a glioblastoma not responding. We sequenced the three biopsies and gave the answer back in 10 days. And just as before, a few CNVs in this case hit the same pathway and disrupted it in multiple ways to actually indicate that there was a highly disrupted signaling pathway. And not only that, but once again, there were actually no less than three drugs that could hit this pathway and were potentially of relevance to the non-responder who was still in the clinic recovering from the surgery. And so this is really a homegrown example and we're getting back to the UK now for a moment thanks to Marco's paradigm. To actually illustrate what was being talked about in the UK. So one other collaboration that's important and this is where really we get to the momentum of setting up something. Is this really going to happen? Many of the thought leaders were involved in collaborations. This is a collaboration involving 500 genomes which actually on Mendelian cases revealed 34% success rate in identifying mutations. And if you include the parents where you can increase the power to detect day nobos, potentially dominant mutations and compound heads in recessives, then actually over half the cases were actually successfully interpreted. And again, essentially in a few day turnaround. At least these some of these were done long because it was a research program which started early. But nevertheless, the concept of turning from the anecdote to the small cohort, then asking is this now viable for a larger scale study? It was clearly part of this collaboration. And those thought leaders, John Bell among them, were actually then present in a remarkable event. And I thought the Olympics was about support and athletics and things like that. But it turns out the Olympics is just as much about business summits and looking at the future. And so I gave my slides to the organizers of this business summit and they came back covered in Union Jacks which once again reflects on the continuing national spirit within the UK. And I was left with only two thirds of the slide area to actually put data on. I said I would never show these slides again but I just do take some pleasure in sharing the joke. At least I hope it's a joke because I certainly wouldn't put those Union Jacks there if I were me. But you can see some of the slides I already showed you turning up. And once again, the thought leaders were there chattering in the audience saying, you know, we could really do this. And so it proved and cutting the story short if you will. The inception of the 100,000 genomes project at 2012 at the Olympics, David Cameron, the prime minister if we get this right, we could transform how we diagnose and treat our most complex diseases. Not only here, but across the world. And so here again was a national organization but I think in a different way reaching out hopefully to the concept of the international relationship. Diseases being the same the world over. Apologies the infectious diseases of course in developing world countries are a much bigger health burden but the cause of a particular disease can be the same across the world. Also very importantly, David Cameron he's not a geneticist. He's not a doctor. But it's amazing what individual chances happen. And David Cameron's actually elder son actually died of a rare and undiagnosed genetic disease. So he knew from a parent point of view what this was all about. And I think that helped hugely in short cutting much of the communication. Two years later, we actually signed the deal. And of course the newspapers once again blew it up out of all proportion and said English patients are set for a 300 million pound genetic revolution. To this day I'm struggling to know how they put 300 million together in the arithmetic. But anyway, I've left out a lot of building up to the whole process. But what I would like to say is that by this time of course we had launched a system which was about a thousand dollar genome and you can argue what about the indirect costs and the fact it doesn't really work that efficiently in many organizations because you can't get the sample supply. Nevertheless, that whole last technology development only happened because of conversations like this to identify the need. It really is, it's another partnership. It's a public-private partnership. Not just to do the project, but going back two years to decide whether to do the project and what it would need. So the goals are summarized on the right and you can read them to benefit patients, to create a transparent program to kickstart the collection of data on cancer and rare genetic disease in a secure manner. And just to illustrate the progress, some 13,000 samples have been delivered to date at a purpose-built facility. Coming back home, this laboratory, it's an aluminum lab, but it's actually in Hingston. And so it benefits from the genome campus. Indeed, it was funded by the Wellcome Trust, this actually infrastructure and the various contributors to the program are acknowledged in their relevant logos at the bottom. My last data slide, so I wasn't supposed to say data, wasn't I? Anyway, a data slide, just some recently solved cases coming back over here to the US. And this illustrates perhaps an important element of the whole genome and whole exome for that matter, but the hypothesis-free search finds answers where you don't expect to find them. It also now finds mutations and causes of disease which involve complex mutations, CNVs, very much part of the story. And these are clues to how we can now move from the 34% successful diagnosis or the 57% diagnosis. What is it that stops us getting to 100% successful diagnoses? And among other things, heterogeneity of clinical observations and so on and so forth, are actually, it's rooted in the genome once again. There are complex variants that are relevant. And this is illustrated perhaps even more importantly in this last case, where our investigators in my San Diego lab actually looked very hard and found this mutation, attempting mutation in the top per haplotype, but they couldn't at the time. CNV detection wasn't really there or thereabouts yet, but actually on closer examination, finding a CNV in the other haplotype revealed this to be a compound heterozygote and results in a diagnosis that had been waiting most of the 16 years of this child's life resolved as a result of this particular test. So summarizing where this is leading us at the moment and this is something which I'm sure is on many slides. I think it applies to the embracing of the concept of precision medicine. It starts with the patient and here is a timeline over the life of a patient to illustrate the clinical observations along the top. This is unfortunately a cancer patient who suffers a prolonged struggle to cancer before dying. But on the bottom is a series of measurements that can be taken throughout the life of the patient, many of which have already been taken. But these arrows in particular refer to the opportunity for intervening to look at sequence information that may be relevant to correlate with the clinical information. So it's not a static one off here's a patient get a genome. It's how do we much more fully integrate sequence information and genomics and ultimately the information from the genome into managing and understanding better this patient's condition both in a reactive and indeed a proactive way. There's some pre-symptomatic screening there in the blue arrows. Now we collect sequence information and so once again in a third or fourth way perhaps now this really is the genome is for life. This genome is this person's genome. There are somatic mutations in this person's genome but they go throughout this person's life the germline and the somatic mutation profiles in order to really but then they're telling us something they really are providing some beacons about this individual, their response to drugs, the possibility they might acquire a particular condition risk susceptibility at birth and so on and so forth. And once again if all this information is not left to die in a clinical archive that becomes part of a knowledge base it supports of course both research and the decision either about that patient or of course about patients in the future and that really is I think the twofold vision. I just came back to summarize at the end the four goals. I hope I've covered something of this remarkable journey with you and shared some of the perspectives from it. They weren't all of course drawn up on day one these have emerged over the course of the journey but clearly there's been a remarkable thread and ability to gradually develop a better vision a better view of what needs to be done. I can't do acknowledgments of course really because there are so many many people including collaborators and many of them here and all over the world. What I would like to do is to give you a little insight into my current research group who are very active in some of the most recent research at a recent scientific conference and there they all are. Thank you very much. David people please come up to microphones and either I'll to ask questions. Let me start off ask the first one. Do you, I found it rather ironic you talked early on about the raging debate at the time of the genome projects start about the value of whole genome versus cDNA in some ways that's now reverberating currently with lots of discussion around whole genome sequencing whole exome sequence. You want to comment sort of on that historic irony it seems like did we not learn our lesson or why people were digging their heels in and pushing for whole exome approaches only. It's certainly a historic irony. I'm not sure we didn't learn the lesson I think it's a different lesson or a more complex lesson. It is one thing to characterize once and for all a reference that is going to support everything possible in the future with a strong emphasis on discovery. It is another to decide what information you need in order to make progress in the medical care of an individual. And so if you choose to think the answer is in the exome if you previously done a hundred and you're pretty sure where you're going to look then that really it already transforms diagnosis from a few percent before exomes to 25% 30% with exomes. That's a massive win for the patient indeed for translational research as well. What I've always tended to do is to try and look beyond that well that's all very well but what about the other 70% and that's where we do a lot of comparisons ourselves and others with exome versus genome and in fact a particular feature now is to take exome patients where no results have been found for the exome and reflex into a whole genome to see if they can teach us more. And two things happen. One, three things happen. The first is I remember three but the first is that you can find variants you can't find with the exome or not easily. You get better coverage of the exome with the genome and more of the structural variants are easier to find. The second thing actually is that the I'm only gonna get to two, I've forgotten the third one. The second one though is quite simply it is technically much easier to do a genome. There is no amplification in a PCR free system and think about that the GC rich over the promoters and other things tend to get lost in amplification. And so every other protocol for genomes relies on amplification whereas the whole genome is not. And so these are simple in a sense of course you want to do the best for every patient but it's costly, it takes time, it's too much data and of course that's where we're working on trying to address those questions and learning because ultimately for each patient if I was sick I'd like to have a genome done. At least if I was sick for something that wasn't understood it would provide more information. And if that appetite takes off going back to the public-private partnership I mentioned if there's an appetite the technology itself will evolve more to suit what people want to do. And so it's a very real and very live tension between exome and genome. Councilor, yep, Les. Yeah, two questions. Thank you for the wonderful lecture. With respect to the unfulfilled promise of things that are yields 34% or whatever it is how much do you think we have to gain from potentially a future transition not genomes aligned to a reference but actually individual de novo assemblies? I don't know is the answer. And I think that going de novo assembly is going to introduce new technical challenges. There is no question about that. If you're looking at the utopian medical test tomorrow I'll wait till next year. We'll be de novo assembling genomes. Even if we are we operate with a whole new complexity of unknown weaknesses, faults in the system and the importance of systems is that you have to bed them down and learn. So I think in pragmatic terms or practical terms it's going to be a while and I don't think the technology exists yet. I think the other way or the other side of this coin is that it is premature to suggest we've got as far as we're going to with the 34% or whatever it is and the next thing has to be the de novo assembly because we haven't. I tried to illustrate for the CNV examples there. We weren't detecting CNVs a few months ago. We're not detecting them well now but they're already making a big impact on diagnosis. So that alone is one of the simple gains. We haven't really incorporated the local de novo assembly which I think is a much stronger and more immediate contribution. So there are other ways and the goal has to be not how do you get to a perfect genome, whatever that means. The goal is how do you maximize the utility for diagnosis and for the patients. And in that sense I think we have a long way to go in our current trajectory. I will give one more example. So I'm taking up time here. But a lot of people think, well, we've aligned the, it's very relevant to your question. Is that you take the short reads, you align them to the genome. I've used the pairings and we've got longer reads and everything's great for most of the genome. But bear in mind the complex variants are the ones that have scrambled the genome most and have disrupted the individual reads most and are least likely to align to the reference genome because they just don't exist. So we've spent the last year, one guy in particular spent almost his entire year, he's one of the ones in the kilt, who is actually mining the unaligned reads which are available in every band for a whole genome. And what you find there is signatures of complex structures, signatures of junction fragments. You also, and I didn't show this, you find triplet repeat expansions which are completely diagnostic. ALS, 97, 98% sensitive and specific. And fragile X, we can diagnose. And, but the information is in the unaligned reads. So to some extent this addresses the question of, well, can I move away from the aligned genome? Well, we are, but we're just looking at the data we haven't looked at yet. And people haven't thought about it very much perhaps. It's hard to analyze, but it's quite possible to pull in all sorts of information about perhaps the richest, the most rearranged reads and actually make sense of them from the existing technologies. That was only the first question. I would think so. Second one's the harder one. I don't like the multi-part questions. I do want to talk. I really appreciate, I couldn't even remember my third point. So two questions, I don't have a hope. So you graciously glided over some of the controversy in the genome project where more than a few people said it wasn't even science at all and it was a gross waste of money, et cetera, et cetera. To an extent, it is a bit of a disruptive science in that it somewhat upsets the apple cart with the very formal hypothesis testing mode of science that we've been used to. Where do you think we are in the big picture with science and then also medicine coming to accept the hypothesis, not quite hypothesis free, but not totally hypothesis driven approach to science and diagnosis? So I wouldn't regard that as a big tension on the basis that for a start, the genome doesn't dictate that you take a hypothesis free approach. You sequence a person's genome, we've reduced it to a really economical quick process. You can still ask any hypothesis you like. You sequence a genome and ask, does it have Delta F508? That's a very hypothesis driven question. There's just something different you're working with. The great thing is you can then, an hour later or even 10 minutes later, you can ask another hypothesis. You can create 10, 20, 50 different hypotheses and just press a button and answer it. You don't have to order another single gene test. You don't have to order a panel. You don't have to wish the next gene you want to ask about was on the panel. It's all in the data. And that is a really important element of having all the data. That is one of the underlying precepts of the human genome as well. So that's an important balance of not saying that there's no long hypothesis, that we have to adopt a hypothesis-free approach. I think the other thing, which is touching, and I'm not sure if you're alluding to this or not, to be honest, but the actual debate about is it really science going back to the human genome, it's not rigorous testing. And I'm not quite sure where this question could go many ways. But just for a moment, I consider all they're doing is sequencing the genome. That's not rigorous information in some way. But one of the key things from that very first slide, of course, is that the DNA sequence information, it's digital, it's locked in the structure of the helix. And when it is actually written down, it is unequivocally right, or you can assess the errors and really look at it. It's very hard to do that with many biological measurements. In that sense, proving the correctness of the sequence itself was something that was actually rather easy from the point of your sequence and consensus. And a lot of work went into accuracy and QC and so on. And it's important to build it in again for personal sequencing. And I appreciate that hugely. But it's not that it's un-rigorous. The sequence does actually sort of prove the identity of the base calls in the construction of the sequence or the assembly of the sequence. I don't know quite if you were going there or not, but I... It was an opportunity for you. Indeed it was. Thank you, ladies. Well, I think we'll stop there. Thank you, David, for doing a terrific job wrapping up this series. Thank you all for participating. Thank you, Eric.