 Thanks very much, Bob. It's a pleasure to be here, so I'd like to thank Eric and his colleagues for including me among this group. And it's a pleasure to be introduced by Bob. We've had a number of interactions over the years, mainly focused on finishing genomes. In fact, Bob, I think it was you that actually compiled the list of the most difficult genomes to finish. And so we're very excited to now have, we have in review, I guess, a manuscript describing the platypus genome, which really was, as you told us, the most difficult sequence. So as I said, it's a pleasure to be here. I want to talk a little bit about some of the next generation sequencing technologies, and you heard a bit of this from Richard, so try to be complimentary rather than redundant. I do want to say, excuse me, a little frog in the throat here this morning, I do want to say first congratulations to NISC. You know, genome centers are all about accomplishments and milestones and firsts, so I have to give you guys credit. I think you're pretty sure you're the first genome center to have a tent anniversary party at least, so nice going. But if you think about this, for those of you that really know Eric, this is not all that surprising because Eric loves to celebrate. So I've known Eric for a long time, since 1990, when I moved from Caltech to Washington University. And he ended up just down the hallway from me, and back then we had next generation sequencing, at least the first iteration of next generation sequencing where we actually started getting rid of radioactivity and the autorads that we used to manually read into the gel. And so we had a lab where we had a couple of these boxes that are shown over there on the left. This is the old ABI 373, and Eric actually got one for his lab just down the hall before he moved up to Bethesda. So this was exciting back in those days, 36 lanes, woohoo. Of course, we now have additional next generation sequencing. This is the next round and shown on this slide are the three current platforms that we're all trying very hard to understand how they work and what the impact might be and what sorts of experiments that we can do that even just a few years ago we really could have only dreamed about doing. So this is the SELEXA instrument now manufactured by Illumina on the lower left. The 454 instrument up in the center and the new ABI solid instrument down on the lower right. So I want to talk a little bit about what we're trying to do with some of these platforms. I want to go back a little bit because, you know, the promise of the human genome sequence was always that it was going to revolutionize biological research, going to revolutionize medical research and the way that we look at ourselves. And even just having conversations about now how many people in the auditorium would like to actually have their genome sequence. It's pretty amazing to think back on this. But the reference sequence, version 1.0 of the genome now allows us to ask amazing questions. What's come along with version 1.0 of the genome sequence is an amazing amount of technology, software tools and infrastructure, such as we have at all of the genome centers that are represented here today, to allow you to actually use this reference sequence to ask interesting questions. It's buttressed by ancillary genome sequences, the mouse, the chimp and others. And all of this really allows us to, as I said, do experiments that we previously only dreamed about doing. And now what we want to do is to start applying to all of this infrastructure and resource to cancer and other diseases. And really now with these next generation sequencing tools coming along, it allows us to do this in new and exciting ways. So if you think back on how we approached cancer initially with sequencing, you heard Richard talk just a little bit about this. We used PCR and we've been using PCR based sequence for many years, but one of the things that we tried to do maybe five, six years ago was to try to understand how to do it in very large scale, high throughput PCR based sequencing. So we came up with all kinds of computer tools to pick primers and to keep track of things. But the most difficult thing was then going through and looking at all of the data and trying to find where we actually believed that we were seeing synonymous changes or non-synonymous changes and so forth. But this was the paradigm a few years ago. We want to approach a cancer, we have a list of candidate genes that we think might be involved in a particular cancer that we were interested in. And then we get a large collection of patient samples. Large a few years ago was about 50, maybe 100. And it actually worked in some cases. This is one of the poster children for this type of approach. A number of studies were done. We actually did one from our genome center in collaboration with Harold Varmus and colleagues at Memorial Sloan Kettering Cancer Center, where we looked at a number of kinase genes in non-small cell lung cancer patients. And in the epidermal growth factor receptor gene, voila, we found quite a few mutations all focused in the tyrosine kinase domain of this protein. And this just, this coupled with some very nice phenotype information, one of the things that we found were quite a few patients with non-small cell lung cancer, typically non-smokers, who were treated with tyrosine kinase inhibitor drugs, such as Eresa and Tarsiva, had mutations in their EGFR genes more often than not. So this was very exciting and sort of seemed to be a promise of things to come from additional sequencing. The work that we did there in lung cancer, coupled with the work that was going on at both the Baylor and Broad genome centers, led all three of us to sort of put our heads together on a little project that we call the tumor sequencing project, or TSP. And in this project, we expanded our candidate gene list to about a thousand genes. It actually turned out to be closer to 600 by the time we were finished, with about 200 very high quality lung adenocarcinoma samples. And just briefly, the TSP was organized late 2005 with the focus, as I said, was on lung adeno. We had a target list of about 600 genes, three sequencing centers working together. We enlisted several cancer centers to mainly help us collect and characterize lung adenocarcinoma tissue samples out of an initial set of about 800. We focused down on about 200 that we thought were quality enough for sequencing. Divided up the labor, each of the three centers did about 3,000 amplicons, roughly 300 genes, and we had a common set of roughly 100 genes that we could use as sort of a cross-center comparison. We're currently now, we have all the sequencing done and we're currently in the process of going through this data analysis. Some interesting things have come out of this already. This just shows you some mutations that we've seen. If you then stratify all of these samples and you look at smokers versus never smokers, you can find some differences. For example, you're more apt to find mutations in EGFR, GRB2, and GRB7 in never smokers, whereas KRAS and MET mutations are more common in smokers. Likewise, you can use the mutations that you find in some genes, notably TP53, IRB3, and AKT3, to sort of get some idea of tumor grade. So as you can see over here in grade one tumors, we didn't find any mutations in these three genes. In grade two, we start to see TP53 mutations popping up, and then in grade three, we start to see more TP53. The IRB3 are coming up as well as some mutations in AKT3. This is early data, but interesting trends nonetheless. The other thing that we found that was really exciting, and there's now a paper in press describing this work, early on, we used AFI snipper rays to sort of qualify the DNAs that we wanted to sequence, get some idea of what we actually had in hand, make sure that these were sufficient quantity and quality for sequencing, and we actually found several interesting amplifications that represented potentially new targets that we could add to our sequencing list. So we did see through these amplifications some that had oncogenes in the intervals, and this wasn't necessarily surprising, but there were quite a few. TITF1, for example, was exciting, and this is a lineage gene for the development of lung tissue. So the idea here was to start to plug together the two technologies, sort of a whole genome array-based approach with the sequencing focused on PCR. So this has sort of led to our second paradigm, rather than simply focusing on sort of hypothesis-driven gene lists, which are somewhat biased in terms of the expectations that you have of what's going on in the genome and a particular type of cancer. We now are moving more to including data-driven targets that come from using such orthologous technologies. This is array-based RNA profiling, CGH, and SNP genotyping, and this has been quite exciting. The next sort of large-scale collaborative phase of cancer genomics is underway now, known as the TCGA, and this is a project that involves the same three sequencing centers, but now quite a few CGCCs, or cancer genome characterization centers, and the idea is the same. The sequencing centers have gotten started off on hypothesis-driven gene target lists, and then using data that comes from the various types of analyses that these centers are doing, we add additional targets that the sequencing power can be focused on. Well, I want to go back and tell a little story because I want to take you into some of this next-generation sequencing technology. We started another cancer project in 2002 as a collaboration between the Genome Sequencing Center and our colleague at Washington University, Tim Lay, who was interested in acute myelogenous leukemia, which is a very nasty adult leukemia. And when we started, it was the same sort of paradigm. It was an attempt to use high-throughput PCR-based sequencing and focus on a hypothesis-driven list. A couple of features of this project. We used primary tumors rather than cell lines. This is our substrate for sequencing. In all cases, we would use match normal tissue along with the tumor tissue so that we could get a quick idea as to what we were seeing in terms of germline versus somatic variations. We had a discovery set, a small set of samples of 96 matched tumor normals. And then any time we would find what looked to be a mutation in that discovery set, we'd then go in a re-sequence and an additional validation set of 94 tumors. We had a gene list of about 450 genes. And then we also used sort of this orthogonal approach with CGH arrays, expression, profile, etc. to contribute additional targets to the list. And this worked quite well, and guess what? We found mutations in genes, and none of the genes that we found mutation. And we're all that surprising considering that we were looking at leukemia. So we asked ourselves a lot of questions, and one of the things that certainly kept us up at night was, what are we missing? We're focused only on exons. We're mainly focused on exons for genes that we expect might have mutations. What about all the other genes? So in terms of starting to move away from hypothesis-driven gene sequencing, there are a lot of ways that can go. The Hopkins study that came out, Velculescu was the senior author. This group looked at 13,000 genes using, again, a PCR-based approach in 11 colon and 11 breast cancer cell lines and found interesting mutations. But this is just a start. This is a relatively small number of samples. How do we scale this up? The problem with PCR-based resequencing, it's relatively expensive. It's diploid at best. Some tumors, you're going to have many, many more copies of particular alleles. And it's low coverage. So how can we improve? And also, again, what are we missing outside of the exons? Well, we decided to try to take the next step with our AML project, and we have started on sequencing a whole genome using the Selexa technology. This is our case, referred to as 933124. This was a 57-year-old Caucasian female who presented with a de novo M1 AML. At her initial diagnosis, she presented with 100 percent myeloblasts in her bone marrow sample, and this is what we've used for our studies. The patient relapsed and died 11 months later. In doing quite a few different types of analyses on her genome, we found that she has completely normal cytogenetics, as best as one can tell. Using nimble gen arrays, the 2.1 million array, we found one tiny amplification on chromosome 7, about 7KB. There was no loss of heterozygosity detected on the AFI 500 case-nip array. We did find through our PCAR-based sequencing that two mutations expanded internal tandem duplication in the FLIT3 gene and then a point mutation in her NPM1 gene. We had informed consent for whole genome sequencing and eventually data release, and off we went. This just shows you the histology sample. This was 100 percent blasts on the slide. We had a really clean tumor here for sequencing. It really weren't any worries. This is a liquid tumor, so there weren't concerns about stromal contamination. As I said, she presented at 100 percent blast. She also has, as it appears, a completely diploid genome. One of the things that we always ask ourselves when we start sequencing a genome is what kind of coverage do we need? We had some idea of this for the old sort of ABI-based sequencing. You could say that when we got to 8X or 10X coverage, we had a pretty good idea of what the utility of that sequence might be. Well, using this new sequencing platform, where we're getting much shorter reads, we had some questions as to what coverage would we really need, and we spent a lot of time busying our statisticians and trying to come up with coverage models, theoretical coverage models, that maybe they were right and maybe they weren't. One of the measures that we thought we could come up with, perhaps, is we've collected all of these polymorphisms using various types of arrays. Could we simply use those as a way to measure coverage? So as we generate sequence, can we go and look for all of these SNPs that we found with arrays? And as we find 90, 95, closer to 100%, can that then give us some sort of metric of how close to finished we are with the sequencing? So here's just a quick update in terms of the numbers. As of last week, we had done 55 runs of the Selexa slash Illumina instrument, collecting 32 base reads on each of these runs, about 44 billion base pairs, which calculates out to about 14.5x haploid coverage. We've detected 210,000 SNPs in this genome. 83% of these are present in DBSNP. And then here's our coverage metric. Out of the 481,000 SNPs that we saw on arrays, we've now identified about 183,000 or roughly 38%. So just in terms of going through this, how close are we to being finished exercise, right? Using this metric, 14x haploid coverage represents about 40% diploid coverage. Our theoretical calculations said that we were going to need 25 to 30x coverage with these short reads to reach a goal of about 99% of the sequence coverage and variance detected. So this looks like we're on the right slope here, and it also seems to converge nicely with some of the other centers that are starting to use this technology are seeing with regard to coverage. We also use the new technology to do some CDNA sequencing. So CDNA sequencing is not a technique that's dead and needs to be put away. This has actually been quite useful. We've used a number of different CDNA library construction procedures and normalization schemes that all fit very well with the whole idea of putting little bits of DNA on solid support as one needs to do for these new platforms. And these were sequenced on both 454 and Selexa. So I have our pipeline up here, and I think just the key points are, is that as these reads from both genomic and CDNA libraries come through and are checked here, looking for SNPs and small indels and so forth, one of the key things that we do is at some point, especially for non-synonymous and splice site putative variants, we then go back to the old PCR based sequencing pipeline to try and validate as well as look at the same sequence variants in other AML patient samples. So what have we done so far? We've really focused as a top priority on sequence variants that appear to be non-synonymous, that are not in DBSNP, that are detected multiple times in the CDNA sequencing effort, and that are detected at least once in the tumor DNA. So this again, this is ongoing. We don't have all the coverage that we'd like yet from this particular patient's genome, but we found 59 non-synonymous variants in 43 genes. Most of these are likely rare SNPs. The two somatic mutations that we had found using the PCR based approach with this same genome were found again with the Selexa sequencing method and the internal tandem duplication in FLIT3 and the NPM1 mutation. One additional somatic mutation was discovered in FLIT3, a non-synonymous change here at amino acid position 194. All of the other variants, and these are all coding, all of the other variants were localized to genes that had not been previously implicated in AML pathogenesis and hence were not on our original target list. And we have identified at least three other putative somatic mutations, and these are currently going through our PCR based sequencing pipeline for confirmation. This may be pretty hard to see from the back, but what I'm showing you here is just what we can get out of the CDNA sequencing approach. So I'm showing you the one somatic mutation that was found in FLIT3, and what you have here is an alignment of Selexa reads from the tumor genome, as well as from the CDNA sequencing effort. So we get a nice match of the T's, reads with the T's and reads with the C's, so no more of comparison or figuring out how high a little peak is underneath another peak. You actually get a nice, almost digital readout of the frequency of these two alleles. And we see the same thing over here. This would suggest about a 50-50 match, a 50-50 expression levels between the two alleles. And we can then extrapolate that to quite a few other genes. So what you're looking at here is for several genes listed here. These are the frequencies of the express copies of the variant and the germline sequences. So for example, in this one here, PTPN11, we see about 200 to 1, or perhaps 0, the variant allele as compared to the germline. Up here you get a little bit more of a mix, about 4 to 1. In some cases it's a little bit more of the 50-50. We've also discovered several novel splice variants using this. Genes are listed over here. For example, RPA1, five new splice variants have been detected and characterized using this combination CDNA and genome sequencing approach and quite a few others as well. So just to summarize a few points, next generation sequencing is here, at least this iteration of it. And we can clearly see already that it will have a substantial impact on the study of the cancer genome, as well as for other human diseases. The coverage models for next generation whole genome sequencing are converging. We're starting to better understand exactly what we can do and how much work it takes. These ancillary or orthologous genome-based technologies are really crucial for understanding the target genome before you actually start all this large genome sequencing. So the SNP arrays, I think, still have some value. And then this transcriptome-based approach using CDNAs, either as a standalone approach or in concert with a whole genome sequencing effort, represents a pretty powerful adjunct for cancer genome analysis. More is clearly needed. We're very early days of all of these technologies, and one of the things that I think you'll get the message of here today is that all of us are trying to figure out how to bring these to bear, what they can be used for, what sorts of upfront strategies and tools and technologies we need to develop to make them even more powerful. So if my last slide will come up, which sometimes it does and sometimes it doesn't, I can see a list of some of my colleagues. The acknowledgments doesn't want to happen. If anybody can explain that bit of Macintoshology to me, that would be appreciated. So thanks. Jeff? So I'm trying to decide if your argument, to some extent, might be for or against the Valescu model, meaning that ultimately, a year from now, three months from now, the whole genome association and the whole genome sequencing will both be of sufficient depth and sufficient quality. But if today you wanted to define mutations at a level of quality that at least based on standard Sanger-based sequencing you have and others in this room have defined, the value proposition would still seem to be weighted in that direction. And yet, so I almost, again, I'm trying to understand the dichotomy between the short-term value of implementing ABI-based sequencing and supportive cancer sequencing today across defined exons versus waiting for whole genome sequence. So I think it's, like I said, Jeff, I think it's still really early days on a lot of the sort of approaches that I described that we're trying to use on AML. I think there still is value in the PCR-based approach. I mean, clearly you find things that are of use in studying cancer. So I wouldn't shut down that pipeline quite yet. I think we can improve on it still, and then we can at some point transition nicely, I think, to some of these next-generation technologies. There's still a place for a hypothesis-driven sequencing, and I think Richard's example of the capture array and the 454 sequencing is a nice sort of next place to go, if you will, for that sort of targeted sequencing. And it's cheaper and allows us to do a lot more work. Karen. Hi, Greg. Nice talk. I was wondering how long you think that you and the rest of the community will be tinkering with this iteration of next-gen sequencing before the next iteration of next-gen sequencing comes along? So that's a great question, Karen. I think we have plenty to keep us busy, at least for, what do you guys figure, another 10 years. But we already see things that are right on the horizon, I think, in the next year or two, that will give all of these current platforms a bit of a run for their money. So that's exciting. And it's nice to have sort of a competitive situation now when we all went through the time where there was only one player and technology moved along as that player allowed. One more over here. This is actually something that occurred to me during... Uh-oh. You're... Yeah. It's supposed to be on. Not just shout. This is something that actually occurred to me during Claire's talk. I'm not sure if you're the best person to answer it, but it seems to me that with the variety of microbial genomes that exist in humans, and the collection of variants that each individual would have, and then again, the host genome of human... Excuse me. And the difficulty we've had making associations between variants in the human genome with disease, is there anybody who's looking at associations between the microbial, the particular microbial variants that people have, say, in their colon or in their lungs, with the variants in the human genome, and the possible impact that has on cancer? Yes. And Claire talked a little bit about this microbiome initiative, which is just underway. And I think it's initially a cataloging exercise, but the association, especially between health and disease, is right around the corner, I think. I don't know if Claire wants to add to that. I agree. She agreed. We have concurrence. Okay. Thank you, Rick.