 Thank you very much, Mark, for the introduction and thanks, Eric. It's really a huge pleasure to be here. Surviving is thriving in the genome business, and so I congratulate you on surviving for 10 years. We've had many collaborations with you too, Eric, and I want to thank all of your staff particularly for the warm and heartfelt and intellectual and productive interactions that we've had on these projects and others. I think, sorry if I misspelled Jim's name, but no, it's been great. You've just got a first-rate group to work with, and for that, I thank you deeply. You are one of the genome community, and like us, I think that you've been staring at these kind of pictures for this 10 years and saying what are the secrets that are locked within these data, and like us, you've responded to the mysteries of the challenges of wanting to understand what's there by generating more data to drive such endeavors as comparative genomics and understanding on their own how other species work, and that's been a shared endeavor with you and something we've all done ourselves. I'm not going to talk about that today. You have some others speaking on the same issues and I'm sure some great insights on the nature of what's going on in comparative genomics. I want to focus a little more on some issues closer to home in the area of human genetics and what is happening now with understanding of human disease mutations and to dwell on those issues because I really believe we are on the cusp of another opportunity such as we haven't seen for many years. If we look back over the last two decades and ask what have been some of the most exciting technical developments, and if you believe that technology drives the knowledge that we all feed from, the technical developments are important, then you remember, of course, polymerase chain reaction coming along and transforming the way we can do molecular biology. If you're a little older, like some of us are, you remember molecular cloning coming along and changing biology, and of course fluorescent DNA sequencing and some of the new generation of sequences have come along and given us a boost too, but there's another development now that they were all talking about and I'll show you some recent data from of how to enrich for DNA sequences to study them in particular. That was really exciting. I think the application of those methods is going to be transformative in the very near future and bring us somewhat closer to this lofty goal that we all think about, which is that one day you will walk into the doctor, maybe you won't even need such an invasive test as a blood draw, and that will lead to, through sequencing, to things that really do make a difference to your health care. Now, if we're going to get there, we've got to understand a lot more about our genetic variation and about how genes underlie disease work, and of course there are massive endeavors doing that right now. This is the most important curve we should think about, of course. It's the curve that reflects the frequency of variation in the population, that is the individual allele frequencies along the bottom axis there, and actually the fraction of SNPs up on the other axis there, what SNPs are present at what frequency in the population. We're all familiar with the HapMap project led by Francis, an international collaboration discerning now millions of markers genotyped across reference populations in search of the common disease, common variant alleles, through tagging, SNP tagging, and large genome-wide association studies, and of course we're celebrating those studies. Recently the genome-wide association studies from the Wellcome Trust gave us these beautiful papers that are showing multiple diseases with hits, and there are other studies here, very closely on the heels of those, showing us regions of the genome that clearly contain alleles somewhere that are causative of at least risk for genetic conditions. So we're all excited about those, but actually at this time there is a real paucity of actual causative alleles, and if you want to look at the funnest story in the genome right now, go look at chromosome 9, where there is one region that seems to be associated with everything, I mean really associated with everything, not spuriously. And among other things, it's only got two genes that are known so far, unless someone here in this room knows any others, we'd love to hear about them, and no causative alleles, but involvement in diabetes, heart disease, cancer. So we'd really like to find those alleles and be able to study them and can add those back into our equation that we're going to use for real medical healthcare from genetics. Now if we drift to the other side of this curve, to the rare mutations, then we can think about the Mendelian diseases, which classically we regard as sort of private mutation diseases, those things that affect small families and small groups of people. And I'm sure you're all here for me with Victim UQ6, wonderful contribution to all of science and medicine through building the Mendelian inheritance in man catalog. There are probably about 2,000 diseases now for which the alleles are known that actually link genotype to phenotype, and of the probably 8,000 that are ultimately tractable. Each one of these is a very valuable asset in this whole endeavour because of course we have linkage now between a change in the genome and a phenotype and engagement in that whole pathway and the ability to study more and to figure out what's going on. Now most react to this part of the story by saying well that's good, but of course those are only rare things and if you ever want to study them you've got to go and find a family with them because they hardly ever occur at all. In fact, if you compile the frequency of occurrence of many of these disorders, you find they are indeed rare, but because in this modern technical era now how to detect all these things we'd better rethink out what rare really means because if you look in enough random people you can often find mutations in these disorders. There's been historically a real bias of ascertainment and not because we only study those families where they present clinically, so our knowledge about what the real frequency of these things is actually quite limited. Now my colleague, that slide I just flashed was a plot showing that many of these things, although rare, do have multiple instances of the same mutation in different individuals. Now of course I'm forgetting to mention that amongst the rareness of course is mutational heterogeneity, that is those collections of mutations are all different, which is what makes them private. Now my colleague and friend John Belmont has done a nice experiment where he cataloged many of these mutations from no one causes of the disease, built molecular probes using the MIP technology now from api-metrics and made those custom probes to over a thousand individual mutations in about a hundred of those diseases and went and plotted those, tracked those in non-phenotypically derived populations in the HapMap samples. And what was quite striking is that most of these have mutational examples that can be found out there in these un-phenotype populations, which actually tells you if you'd studied those populations first then you came back to the disease, you would have at least a resource that you could use in that endeavor. Now, why am I telling you all this? Well it's because there is a great focus now appropriately on these rare mutations, perhaps the ones that aren't quite so rare as the Mendelian ones, but the ones that are in between those very, very private single-family mutations and what we know to be detectable by the HapMap type technology. And these mutations fall in this frequency range of about 0.05% all the way up to about 5%. Mutations that contribute to disease but aren't detectable either by the Mendelian methods or by HapMap. Now we know that these are physiologically important. We know this from many studies. Most recently and in most focused and elegant way, Helen Hobbs and her colleagues, Jonathan Cohen and others, have shown that in some of the lipidemias that if you use statistical methods to, well, if you type your patient population according to biochemical criteria and categorize them according to this kind of distribution you see here and look at individuals on either end of that distribution, then you can find among candidate genes individual loci that have clusters of mutations that occur more frequently than you can at the other end of the locus. We've seen similar kind of thing in idiopathic epilepsy. It's been a focus of our study at Baylor where we've been looking in ion channels by DNA sequence to find mutations and similarly we find clusters of mutations in certain genes but we haven't got the same ability by our phenotyping now to categorize the population to derive causes of epilepsy. But what we have found and able to say is that there are many, many of these non-synonymous mutations and their occurrence is occurring in interesting patterns but we don't have a complete data set. We wish we knew more about the complete spectrum mutation in the ion channels. One thing I can tell you is that if you look at the frequency of the mutations in the samples that we have analyzed and you extrapolate back to the whole population size and the degree of completeness of the study, that tells us by this guesstimate, there is probably more than 3,000 non-synonymous mutations in an individual. Now, that logic and that history has led us and others to say, well, wouldn't it be nice if we didn't have to guess about these non-synonymous mutations in populations, outside of phenotype populations if we had a really robust catalog of all the potential coding mutations in a defined population. And in recent discussions and refining the idea, it seems like a good idea if we could take, say, 1,000 to 2,000 people from just the right populations and find all those mutations and use them in these studies and in many other uses that you can imagine to have for that kind of data set. And, of course, enjoying all the time that if we have mutations in genes, we do have at least some chance of knowing what they do functionally as opposed to mutations that happen out in the introns that are such a big struggle. Well, if we wanted to build such a catalog, how would we do it? Well, clearly the good old fashioned ABI Sanger technology that is very robust and effective is too expensive to use. Right now, it probably cost about $400,000 for each sample that we tackled. So if you can multiply that by 2,000, it's a big number. So we're looking to the new technologies to help us here. And, of course, 454 and Selexa are the two that are most out there. Applied Biosystems has machines out there, but not too much data on the shock floor yet. And I'm not going to talk about the Selexa. I know others will this morning, except to say that the methods, of course, have great potential and are really filling some niches very actively. We're finding that the Selexa data is very complementary to the other 454 data. But we think not the appropriate technology to tackle this problem. We think that for the 454 technology is actually entirely appropriate for this problem, perhaps better than for many other things that could be used for. I want to take some pains to point out that although I've had a past association with 454 on their scientific advisory board, very much enjoyed. I'm no longer, oh, this is an old slide. I'm no longer a member of their board. It's been some months. So you can absolutely believe everything I say. OK. Now, here's in the detail level, what is so good about that? Of course, the reads are longer. That's good. And there are many complaints about how you get accurate base reads, particularly in these positions of homopolymeric runs where you have to get a good estimate of what the height of that peak is to know how many bases in a row there are. These are extremely robust chromatograms. That is, for the same sequence repetitively, you get the same pattern. And that's a real asset in this endeavor of resequencing. And there are some other subtleties about the method. Of course, you separate molecules, which is common to all the new technologies. You don't have to guess positions of heterozygosity. And there are a few other subtleties, too. We can come back to that. Now, we spent about a year trying to use large-scale PCR, putting hundreds of individual PCR primer sets together in order to get a front-end feed into the 454 technology. And we're frustrated. Well, first, we were pleased by some of the success, but we were frustrated by how practically difficult it was to get really good balance representation of individual elements that are put into the mix. So we're extremely pleased to be able to work more recently with the Nimblegen group, who, as I'm sure you will know, make custom-made DNA chips through light synthesis not using hard mask, but make custom chips they can make overnight very quickly. We work with them to produce a series of arrays that could be used to capture part of the genome and enrich for that part of the genome away from the non-contaminating material. We were really, really impressed with the very first experiment where it's been shrunk down a bit here, but this is the capture region. And these are where the individual reads are mapping from a series of exons in the very first series of experiments. And this experiment we used 6,500 exons, about 660 genes. So it's about five megabases of data. And we went on then to show, and actually the papers just come out today if you want to see it in nature methods, describing the use of this across the region. The method works fabulously. We get near theoretical maximum enrichment. Now we've started on phase two of this study to produce a whole exon chip set. Now it's not possible yet to squeeze all the exons onto one chip, but colleagues at Nimblegen have built a series of seven microarrays containing a one set of the human exome. We can talk about what the exome is, of course, there are multiple definitions, but it's got most of the genes on it. And with a slightly different design rules from the initial set. And those have been put now through one 454 run of each and we're building a series more to get more coverage. But right now we have about 900 million bases of data from these exons, these arrays representing about 200,000 exons. There we are, 200,000 exons. And the data seem to come off these machines with a high degree of efficiency, I think because you get rid of all the DNA repeats along the way. And the mapping is just as good as it was in the first experiment. So the bottom line there is that we have some, have representation of a version of the full exome. Here's some of the regions are showing that the overall coverage is pretty good, although of course some regions unrepresented. And another example, here's a close up of one region showing that although the design dictates that most of the action should be right next to the exons or the targeted region, that there is some spillover because I guess the molecules, some are a bit longer or some have little networks around the molecule. So there's many issues to optimize here. Here's a nice close up of a mutation that's found in a biologically important gene. It's an in frame change nicely shown as a heterozygous position right there. So it's all working really well. And this is what I mean by this being on the cusp of this new technology. These are some more details. I'll rush through this slide because I've not got a lot of time, but basically there's some coverage issues. We'd like to see this curve showing that there was more even coverage, maybe about 10 fold of every exon. And that's what the battle is. But for such an early stage of the development of this technology, it's looking really pretty good. And there are many, many, many avenues for improvement. I don't know how many of you remember the first few years of polymerase chain reaction and spending those nights in the laboratory with non-thermal stable enzymes and no picture on a gel because you couldn't see the enrichment. We're kind of at that stage. Now, of course, even with that step, we have just a stop-gap measure until we can get complete genomes on a routine basis. And again, with our colleagues at 454, with them paying the bill for all the data, we have a single whole genome sequence. And I'm not going to talk much about it. Of course, the secret's out who it is. And I should point out that, you know, given the debate over the last few years about de-identification and then the thorny question of what would you do if your data were put out on the web and what would others do to you if all your sequences were put out on the web, we took some pains to create this diagram and what you'll see over on this side is the Baylor Human Genome Centre and there's the individual and there's the public data release. So there is some distance between us and, of course, full knowledge of the individual about what he's doing when he's putting his data out there. That data, in our latest analysis of it, is about 7.4x coverage and I won't give you hardly any of the details. Here's one of the two things I'll show you. This is a read map showing you how nicely the copy number variants all come out just from read mapping. Some of these events here are due to the nature of the underlying reference sequence but others are genuinely copy number variants and we've actually recovered sequence junctions with copy number, you know, the actual sequence of the copy number mutation in it. And back to the genes and the exon issue, though, what's astounded us is that we're found right now. We have, we think we've got nearly 9,000 non-synonymous SNPs and some of this, of course, depends on your gene definition and which genes you include and don't include in the mix. There's another genome paper out there describing another single individual sequence with 6,500 non-synonymous SNPs. This is a lot of potentially functional variation to exist within each of us and the point is if you go back to that curve and ask how steep is that curve on the left, we really don't know and this is the kind of data that will bring us to that answer. And here's a chart of what you see. If you take some of the, those mutations that are in disease genes, disease genes because they're known to be mutated in the Mendelian catalog and you go to your genetic counselor and say I want you to help us with a genetic counseling problem and then you give them a spreadsheet with 300 genes and you say we've actually got a couple of thousand more, we really don't know what they are but can you tell the patient, in this case Jim Watson, what these mutations mean to them and immediately reveal how, I think how far short we are at the knowledge base that we really need to be offering meaningful data for the meaningful counseling and information that means that will be relevant to patients for this kind of deep and rich data. So that's all I'm gonna say in the science. To conclude, I would say I really do think it's true that we're entering another era of this new technology and we'll be able to produce quite copious amounts of these focused data in a way that we've really all been dreaming of for quite some time. I wanna thank colleagues at 454, particularly Michael Eggholm, who's a very dynamic and very bright individual and similarly Tom Albert at Nimbledgen. And I hope you all have some chance to interact with them and all my colleagues, George Weinstock and Donna Musny and others at Baylor College of Medicine, John Belmont, of course for the Mendelian experiment and I believe I have a picture of the group here. Thank you, Eric. Once again, I'll encourage questions. This is, here we go. Do I have to turn this on? No, I think it's on. Thank you, Richard, that was a great talk. And as you know, I worked with one of the individuals with their genome sequenced and it was a very touchy subject about what to tell the individual about what we were finding in the genome and the individual wasn't Jim Watson. I was at a conference last week where George Church gave a talk on the personal genome project that he's leading and he did a survey of the audience asking how many people would be willing to have their genome read. And I was shocked by how few people raised their hands and I was wondering if you've gotten a similar response, done a similar survey, if maybe we could do one here today. So, I didn't catch, was the question for the audience at the other conference, would they not? Would you like to have your genome sequenced? And put on the web publicly? No, George didn't mention it that way. It was just, how many of you would like to have your individual genome sequenced? Yeah, so I guess, okay, hands up. Who would like to have their genome sequenced? And who's willing to put it on the internet? Okay, we had about a one-third retention rate there. I think it's a much more enlightened audience than last week probably. And I know that we have several studies underway right now. Really tackling the question is what are people's patience attitudes to participation in these studies given these new awarenesses for me? Right. Richard, the 2,000-person sequencing, are you gonna collect phenotypes? Are you gonna collect phenotypes of the 2,000 people that you wanna sequence? No, the proposal I would favor is to build a robust catalog that tells us about the population frequency of these events. And that will really tackle the question of what about the phenotype samples? In two ways, one, it will provide this kind of universal control set and in fact, it will provide the knowledge base to build a set of C-SNP probes that should be informative in at least a subset of those kinds of problems where you can do low-cost genotyping across those phenotype populations. And then I believe the technology will advance through this period, which is our tradition, right? And that will drive the application of these methods straight into the phenotype sets. There's no question ultimately you've gotta look at phenotype samples, but I think we've been a little blinded by that fact and extrapolated back to say there's no point to build these catalogs. Okay, I think we'll move on.