 We'll start up again. Are the three of you going to stand the whole time? You look like you're ready to be shot over there. So we're doing these one at a time. The two people who aren't speaking don't have to stand. But we are now going to shift gears a bit. Panel three is on disruptive technology and the data deluge. And the first speaker from our extramural division is Jeff Schloss, and talking about challenges of the changing genomic landscape, advances in sequencing technology. Thank you, Eric. So many of you have walked around in laboratories, and you know that laboratories have benches and lots of bottles and a few instruments scattered around. In contrast, the laboratories that sequence the human genome for the human genome project tend to look more like factories like this, where you'll see rows and rows of sequencing machines. And this is what it looked like, what the sequencing centers looked like, when the human genome project was being completed. During the time the genome project was done, the cost of sequencing came down quite considerably. This was as a result of several things, technology improvements and automation and economies of scale like that lab that you saw. We noticed that it was taking about 10 years to reduce cost by 100 fold. And so at the time that the human genome project, so it was at Eric or Francis said it cost about $400 or so million to sequence the first human genome. If you had done it again at that last moment, it would have cost about $50 million, that's the estimate to sequence a human genome in about 2003. As you know, what we're here to talk, part of what we're talking about today is the subsequent technology developments that have brought those costs down considerably, which now seems like second nature, but during the planning process, so actually I guess no one's mentioned the planning process really. So Eric will talk about the planning process that we're embarked on right now, but the last time we did that was in 2002 and people asked whether there might be technological leaps that seemed so far off, this was 2002, as to be almost fictional, but which if they could be achieved would revolutionize biomedical research and clinical practice. And one of the several ideas that were listed there was do you think it might be possible to develop technologies with which to sequence a human genome for $1,000 or less? So fast forward to today and we're not there yet, but we're actually getting reasonably close and this slide just represents one of the reasons that we got there and that's that the whole workflow to doing sequencing has changed dramatically. So that the process by which we sequenced human genomes involved not only the sequencing machines, but lots of robots, large machines, upstream of the sequencing machines and handling every single sample in a separate test tube. In contrast, today we can make a library of a whole genome in one test tube and then apply that to the sequencing device. And second, what we call a run, that is running the machine once on a bunch of samples, you could only analyze 100 samples at a time when we were doing the human genome sequence the first time, now the machines analyze something on the order of 100 million samples per run. Now the runs are longer, they take about a week or two, whereas they used to take an hour or two, but even so the multiplication, the multiplier there is very dramatic. So the process is simplified and so what does it look like? This is what today's sequencing machines look at, there's a microscope in there essentially and they see these little dots and each dot is a feature that originated from a small fragment of the human genome and what the machine is watching is as time progresses over minutes or hours, it's looking at the change in color of each of those features. And just to give you an idea of the scale here, 100 microns in the lower left is about the size, whoops, about the size of a human hair. So that's how many features you're looking at and of course the slide extends vertically, considerable distance, that's the number of different sequencing reactions that you're looking at in the size of a human hair, right there. So the NHGRI launched a program to advance this technology development, I won't really talk about that program in this talk, but you have summaries of that in the materials that Larry sent you links to. And so some of the outcome of that is this series of commercial machines that have been released starting in 2005, one a year, and then many of these machines have been upgraded now, updated and new machines keep coming out every year to produce these torrents of sequence data that you've been seeing. This, I'm not gonna go through this slide in detail, the point here is to say that there are quite a number of machines that are out there now doing sequencing, they have considerably different features, that's good for any number of reasons, that is the qualities of data that you get from different machines are different, different machines are good for different experiments, perhaps most importantly, there's competition, which is really valuable for making this, for accelerating the pace of technology improvement and accelerating the pace of cost decrease is this competition. And another advantage of these machines over the machines that were used for sequencing the human genome is they can do a number of different kinds of experiments and they do the experiments in different ways. Again, I don't have time to go through that in detail today, but a number of the kinds of experiments that were used for that were done by micro arrays, today are done on sequencing machines and they give us more quantitative data than we could get on the micro arrays. So these are very versatile machines. To give you an idea of the kind of technology advances, I've prepared these slides, so if in 2003 you calculated what it took to sequence the human genome, you can run through the details are here, but the bottom line is with the technology we had that sequenced the human genome, it would take about three months with 100 machines to sequence the human genome. Did that calculation in 2007, so that is shortly after the first few next generation machines came out and it would take about the same time, three to four months, but instead of with 100 machines with one machine. Late last year, same calculation, now down to less than one month with one machine and projected by the end of this year, one week with one machine, but you saw all those little features, you can now pack so many of those individual genomic fragments on one machine, that in one week on one machine you'll sequence two genomes. So the numbers are just stunning. So I had to comply with Francis' prediction that somebody was gonna show this slide again, so there it is. This is updated for another year and the cost has come down to roughly another order of magnitude. So where are we going? We're not done. The technologies that we have in place now are producing vast quantities of reasonably high quality data, but there are a number of challenges to using them, several of which you'll hear about in the next couple of talks, but we're not done with technology development, there are new machines and new approaches on the horizon. Some of them eliminate all the microscopy and eliminate many of the custom reagents. So this is a company that announced earlier this year, the availability of its machines, and this uses instead of using fluorescence, which most of the other machines are using, this simply detects a chemical change, change in pH, the ion concentration. So the early chips have one and a half million pH meters on the little chip, the chip is this big, it's a little computer chip. And will be a purely electronic readout. Other approaches, there's an approach that uses what some people call free running polymerase that's being developed by two different companies. Both of them use nanotechnology as a key feature in enabling the visualization of the sequencing process. And again, because of time I don't have, I can't go through how the technology works. The bottom line is that this technology can not only read the readoff, the sequence data, as the trace on the bottom shows, but very importantly, considering what you just heard in this last session about methylation, it can directly read from genomic DNA molecules without doing all the chemical conversions that Andy talked about that they have to do today that are quite expensive. It can directly read from genomic sequence, where's the pointer, there it is, methylated bases. So this is a big advantage if this technology ends up working extremely cost effectively and at very high accuracy. The other thing that these free running polymerase technologies will probably offer is very long read lengths. So the technology was used for sequencing the human genome. The first time gave approximately 800 base read lengths. Most of the technologies in use today give somewhere between 100 and 300 or 400 base read lengths. And this is a big deal when you're looking for some of these other kinds of variations in the genome, such as structural variation and copy number variation. It's very hard to get with short reads, but if you can get long reads, you can put the genomes together much more easily. It should reduce some aspects of the bioinformatics burden that you're about to hear about. And then finally, quite a number of people are working on nanopore sequencing technology. The idea here is that you'll put a DNA molecule through a small hole that's the diameter of the DNA molecule. And that if it works that the ion, if you flow ions through the channel, and the DNA is going through the channel, that the different bases in the DNA will disrupt the ion flow differentially. So you'll be able to read A versus G versus C versus T with a purely electronic signal. And again, advantages here, let me just show you this and I'll talk about advantages. And once again, people have shown that you can not only read off individual AC, G and T, but also methylated bases using this technology. So once again, no conversions. You could do this, you could do these measurements directly on genomic DNA isolated from cells and potentially on very long molecules. So there's a long list here of advantages if this technology works. Long reads working directly from genomic DNA. It's non-destructive, so you could actually go back and read the sample over and over again. This would simplify a number of kinds of experiments, for example, microbiome sequencing. If this works for genomic DNA, it should also work for RNA. So we could directly read RNA molecules and their modifications, methylations and so forth. It would be a fully electronic method, which means that you should be able to develop small handheld devices so they should be deployable in other than large sequencing centers or large clinical labs. This is of tremendous interest, not only to the genome sequencing community, but to several other communities like Homeland Security, for example, for monitoring emerging infectious diseases or potentially bioengineered disease agents. And so I'll wrap up with this last slide to say if this nanopore approach works, how long would it take to sequence genomes? And with a small array, that is an array of just 1,000 nanopores, if you can do this, you should be able to make arrays of many more than 1,000 nanopores. It would take much less than a day, larger arrays, perhaps just a couple of hours to sequence a genome. So that's what I want to say. Okay. Our second speaker, is this next one down? Yeah. It is Vivian Benazi, who's going to talk about the Bioinformatics Challenge, data deluge and analysis. So my talk's a nice sequitur to Jeff's lot because I want to talk to you a little bit about the data volumes. And I think Jeff gave you some numbers, but I want to give you some context here. Firstly, the old sequencing methods, the ABI 3730s, which are really not that old, generate about 30 megabytes per run, and I'm just using averages here. But when you look at the new data volumes, Jeff also had a slide that covered these numbers. Depending on how you do these experiments, you can see just from these numbers that the volume of data is 10 fold, 100, even greater. The other issue is that as these newer technologies, the third generation technologies are coming out, I'm not entirely sure what their data volumes are going to be, but we know they're in their terabyte ranges. That creates a huge problem on the other end for informatics. So let me give you an example here. Most of what we've talked about, and I assume we talked about it this morning, describes base pairs, but we in the informatics piece deal with the bytes, and it's not a one to one ratio here. The ratio is one to 20, and these numbers change depending on what you actually store, and the numbers are also changing based on the technology we have. But I'd like to point out that a base is not a byte. So you have to think about what it is you want to store, and if it is a one to 20 or greater ratio, the problem here is when you store it, you've got a lot more to store. So what you want to think about here is what is the cost to store this data. This has all sorts of implications for data centers and how you store it and what you store. So for the computational challenges, remember informatics is a combination of biology and computing. So you've got to mesh those two fields, and so clearly with all these data volumes, you're going to have an infrastructure which needs to be able to handle this type of data. In the past, as I showed you before, when you've just got stuff coming out of a tap, most of us biologists who had computing training like myself can usually write programs, but as you start generating a lot more data, you really do need to bring folks in with IT backgrounds and software engineers. So just to summarize some of the computational challenges on the infrastructure side to support this kind of work is the data storage. As I've just mentioned, with that volume of data, it's going to impact the storage. Secondly, when you've got that much data and you want to do something with it, it's going to impact the type of analysis you want, so CPUs are going to have an impact here as well. You're not going to be able to run this on Excel and your desktop. You can try, but it'll die. So the next thing here is we need to develop new hardware for this kind of work and also software, and I'll explain a little bit about those, but we need different kinds of architecture to support this kind of data analysis and storage. Of course, when you talk about volumes of data, again, I think Jeff was talking about volumes of data, and I know you've spoken about this in a previous talk. A lot of these next-generation sequencing machines are going to be covered, they're going to be placed in labs that generally didn't do this in the past, which means you've got large sequencing centers from the first picture of Jeff's slides. But you've got to think about that smaller labs are going to be doing this as well. So you've got now the large, ginormous sequencing centers, and then you've got these smaller groups who are now going to have one or two of the aluminum or whatever machines in place. And I think one of the numbers that Jeff said, and he can correct me later, I think is about 70% of, I think the aluminum machines you were saying, go into the smaller labs, is that right? Think about that. That's now going to impact the amount of data. Now, when you want to do something with that data and you want to, say, move it to NCBI, EBI, any of the large genome repositories, you've got to move that data around, which means it's going to impact the way that you can transfer data. And if you have a lot of data, most of you who've used the internet have realized at times it can be very slow, especially when you try to upload stuff through your ISP and how slow that can be. But at the same time, I'm sure you've had, as I've had the example of this very recently, where Verizon comes in and actually digs up my yard to put in files. Why? So they're actually increasing the pipes, but it takes time to do that. So we have to think about what is the impact of doing this on this kind of work? Lastly, but not least, is data security. Obviously, as we move data into, and we move data around, particularly when we're dealing with human subjects data, how do we protect that? And it's really strong about making sure that as we generate this kind of data, that we're very careful about not releasing it under certain, without certain policies. So, here's our challenge. We've got all this data, and we're assuming we've looked at some of the infrastructure pieces. In the past, as I mentioned, in the top slide here, you can drink from a water tap, but if you try doing it from a fire hose, as this gentleman has tried here in the bottom of this picture, you're not gonna have a lot of success. So, if we look at the analysis challenges, assuming that we're working at our infrastructure challenges, we have to look at developing new tools. Because remember, we're gonna have not just more data, but the ability to compare complete genomes against each other is just something we're gonna be able to do. There's new types of analyses that we can do, and we need tools to be able to do that. We also need to be able to refactor old tools that were important for doing analysis. For example, common tools in informatics or assemblers for genomes. Also for gene finding tools. Those kind of things need to be refactored, because we still need to do it from the next gen sequencing technologies, but they don't necessarily run on the systems in terms of the infrastructure. Or they don't work because the sequences are very short, as we've heard a lot, those base pairs, they're very short reads. We also need to optimize a lot of these tools to work on those new computing architecture platforms. The other thing is, I think we need to have improved visualization methods, because as we generate all this data, if we see a lot of numbers on a screen, computers are good at that, but we visually want to see things in graphs and some sort of visual display. And we also have to think, and I'll bring this point home again too, is as we generate all this data, think about all those folks in the smaller labs that don't have high-end informaticians. They need to be able to crunch this data, be able to see it efficiently. So we got to realize that we're no longer catering to those geeks, which I include myself, in terms of who can actually use this. So I just made the same point again. Robust tools, making those tools robust, often when we write these tools, they're scripts or some sort of program that we write, that doesn't transport well to different systems. So we need to make them robust, particularly for those non-informatic specialists that can actually use these in the laboratory. Data integration is also really important. We have all this volume of data, we want to analyze it and find interesting biological themes in this kind of information, but we need to think about integrating it with other data types. For example, we've talked a lot about genomic data, but clearly functional information from the genome is extremely important. So integrating that with proteomics data will be really key to look at function alongside. Included in that, obviously, is a lot of image data is coming out. Gene expression data is a good example of this. And as we add this, first of all, image data is quite large. So you have to think about the compression issue. So there's the compute aspect, but you also need to think about that kind of data is very valuable to you in terms of visualizing what you can see from genomes. So how do you integrate those two pieces? Another piece here, and I won't go into a lot of detail, is the metadata. When you're generating sequence information and you generate base pairs, often it's related to some sort of experiment that you're actually doing. So you want to be able to catch you the information about that particular experiment. For example, the run, the run length, anything that you did for your PCR, any of the experimental information, and you also want to link it to potentially clinical information. For example, we've had the microbiome project that I'm working on, but TCGA also fills into this. You take sequence base pairs, but it's all related to a human, and you want to link that information back. So you've got your sequence information and you want to take it back to the actual patient itself, and you need to be able to think about what that data is. That brings up issues of data volume, how you store it, the fact that you've got clinical information data that you need to protect security, that you link these two things, data integration. So you need to think about these pretty carefully. Last but not least, as we generate this data, when you have a lot of data, you need to think about standards. If you don't, we'll end up having the tower of Babel, and we've certainly had examples of that through history, and this is just going to be multiplied enormously with the volume of data that we're actually going to have. So what do we look at in terms of solutions? There are a number. I'm not claiming that we've solved it, but we're thinking about it. First is data reduction. From these raw base pairs that we generate, they're bits of base pairs, right? They're just bits of data. So there is discussions abounding which relate to do we keep the derived data? For example, do we keep the assemblies? We don't keep the raw sequences that were there. Do we keep enough information about those raw sequences that tell us stuff so we don't have to keep all those terabytes of data? An example of this, do we keep the SNPs that are stored? Do we keep the genes just a little listed? Do we believe that the assembly is correct that we don't need to go back to the base pairs? The other issue why this is important is as these new technologies come about and they become cheaper, you have to remember that you can then resequence something rather than storing it. So here it may be good to generate all your base pairs and then just keep the actual assembly itself and throw away the rest. If you make a mistake, you can resequence and it's cheaper. Another point that I forgot to mention before, everyone's had these lovely slides which talk about the reduction cost of base pairs. Like to remind you that the cost of storing something is actually now more expensive than it is to actually sequence it. That's part of the problem. So the two-fold issue here is one, it costs more to store it and two, you have a lot more of it. So Informatics is becoming much more of a bottleneck than it did in the past. And so for the reasons I just described before, we need to really address those. Couple of additional things, which is we really need to actively involve the community in our discussions and we have done that. And I mean in this case, both the biological community as well as the computing community. We've had two workshops recently and there's some folks in here that I talked to recently about these workshops and you can talk to me later if you want more information. One was the Informatics Analysis and Planning Workshop which was to really talk to the community about what were the analysis needs, what were the kinds of things we had to solve and I just touched on those. It fits, that particular workshop fits into the overall NHGRI planning process, which I believe Eric Green was talking about and Jeff mentioned as well. So to help us figure out what those strategies are in terms of the Institute and also the Cloud Computing Workshop which I'll explain a little bit in a minute. So here and the last piece here is education. That is we need to be able to train people in the use of these tools, train more people to actually develop these tools and to be able to integrate those pieces. And the last slide. So in terms of a solution, Informatics is both biology and computing and so the top of the slide is really that integration piece. I didn't speak much about it today but one of the possible solutions we could look at for the computing is potentially using Cloud Computing as a way of storing and managing and doing the analysis of this data. I haven't gone into it today because we just don't have that kind of time but those of you who've used the Cloud, example, Twitter, Gmail, most of you have heard of it, or the Amazon Cloud, a lot of people are using it because of the ability to store large amounts of data and you can actually do computes within the Cloud which speeds a lot of things up. So the key here is we really need to integrate and think about how we wanna do it. And I'm gonna finish with this last slide. I was debating whether I'd do it but this is how I feel a lot of the time. And it really highlights the problems that I see in terms of biology and computing. Not only do we have data volume issues but we have the issue that we can't talk to each other very easily, right? So quite frequently I'm asked about the database and yes, I tell them what color do you want. And we need to think about how we communicate with each other. So even if we solve these problems in terms of analysis and infrastructure, we also need to solve the problem of communication and I guess that's part of the reason for being here today. Thank you. And our last speaker in this panel is Jim Mulligan who will give us a view from a sequencing center director. No slides. Yeah, very good. I have no slides. I was asked to give this talk at an accelerated pace just like the sequencing world is running right now since we are just before lunch. And so I'll try and get through what I was going to talk about today in a little less time than I was given. Previous speakers have introduced already both what is coming in the technology world and maybe what the solutions are. But I am running a sequencing center now that Eric ran up until just December of last year which he initiated back in 1997. And when he started up the sequencing center back then through cooperative agreements through all of the institutes because he really felt that having a sequencing center at the NIH was important. He established the NIH Intramural Sequencing Center or NISC back then in 1997 and was able to hire six people and purchase six of that time, the latest sequencing machines which could generate in total all six machines about a half a megabase of sequence per day. Things have come a long ways since then. We now have a budget of $7 million a year. We have 42 people on our staff and it's a wide range of expertise and a lot of robotic automation. The laboratory that we have is about 5,000 square feet in size for all of the sequencing and laboratory operations and then the rest of the space and other 5,000 square feet or so is set up for doing all the analysis and storage of data. Prior to the introduction of these new sequencing technologies, we were operating at a level about 7 million sequencing reads per year which is quite a nice pace for a medium scale sequencing center which we have. And we were working on projects like you've heard today, the comparative vertebrate sequencing program, the ENCODE project, the mammalian gene collection and medical sequencing. So how has NISC dealt with the latest round of disruptive sequencing technologies that have been introduced and the associated data deluge that we're talking about in this session? Well, first of all, we do keep our eyes open. We know what is coming in the technology field and try to be prepared for it. We also work closely with other sequencing centers, larger centers like the Broad Institute and the Washington University Genome Sequencing Center and they have enough capacity at their centers to really try out the latest technologies at a very early stage. So we do take a somewhat cautious approach. We listen from them and then once we have an idea of what will be the best type of technology to bring in, we'll bring it in in an R&D type environment. Fortunately, Dr. Elliot Margulies did that with one of the machines, the what was then called the Selexa sequencing machine, now is owned by Illumina and called the Genome Analyzer 2X. About a year ago, after it had been in R&D for about a year, we realized it was time to move it into production. So it was transitioned out of the R&D environment and into production very quickly. We now have two kinds of next gen sequencing machines. One is the 454 and the other is the Genome Analyzer 2X. With eight of those Genome Analyzer 2X machines in production, we can now generate 50 gigabases of sequence. That's five orders of magnitude more than in 1997. So we've come a long ways and it's caused some stress. I must say. Like it's caused a lot of strains and but we're all also really quite excited about having this new technology. Just to give you an idea, the lab staff had to learn new protocols, technicians learned how to operate new machines. Our sample tracking system had to be reprogrammed to associate new types of data and information with each sample. The number and diversity of kinds of projects expanded. Data transfer rates changed dramatically, putting new strains on the network data storage and computational needs. The software development team was stretched to its limits at times trying to keep up with not only processing the data that was relentlessly pouring through because these machines were online, but we also had to adapt every month or every two months to a new software update from these sequencing technologies, sequencing companies that release new data with those machines. So it's caused us great stress, but I must say it's great potential as well. The NIH, NHGRI's intramural program has invested in a lot of the expansion of the next gen sequencing machines and related infrastructure. And now we can support various types of data in our processing. So we're working with chip-seq type experiments from investigator-driven projects like chip-seq experiments, which are chromatin immunoprecipitation sequencing, which investigates interaction between the proteins and DNA in a cell. RNA sequencing, which looks at genes within the nucleus of a cell that are being expressed, and to medical sequencing, which looks at changes in the DNA that are associated with an individual's cause of disease. Let's, if that's what we're looking for in that type of experiment. So we have a mix of projects at the sequencing center. We have the large-scale projects like the Human Microbiome Project, the Undiagnosed Disease Program, and ClinSeq, which you'll hear more about later today. And it's great to have in a production environment these larger projects, because that gives us kind of a buffer of a whole bunch of samples to keep processing as we move forward. And also accept into the sequencing center dozens of smaller projects that will range in size from one to two samples up to a few tens of samples. And over the last two years, we've been bringing in these kinds of projects, the small-scale projects, at about one or two per week. So it's an incredible rate of induction of these projects. And we talk to each of the investigators, figure out what they want to do, and implement an optimal sequencing type approach for their research. To give you an idea, we've already had 20 of these smaller-scale projects completed, and we have another 33 of them active in our pipeline, and another 14 are waiting on delivery of samples. One of the issues with receiving samples, especially human samples, is that they have to have been consented properly before we can apply some of the sequencing technologies that we're using, for example, whole exome or whole genome sequencing. So they need to go back and review their IRB and their protocols to see if it's okay to have their samples sequenced. If not, they may need to go back and get those re-consented. But once we do have them, then we run them through the sequencing pipeline. We can give them variation information very quickly within just a few months of delivery of the samples. And now, having had the next gen sequencing in production and applying it to various projects, some of the first publications are starting to come through. One is titled Massively Parallel Sequencing of Exons and the X chromosome identifies RBM-10 as the gene that causes a syndromic form of cleft palate. And Dr. Les Beeseker here today is senior author on that paper. And that's just a tip of the iceberg. There are many more projects that are coming through that are transitioning us now into the clinical, well, we've been in the clinical focus of part of our operation already for about four years, but now with the next gen sequencing, we can really push that forward even faster. And at a scale that the individual investigator can come to us with a few samples and with their own budget can have them sequenced in our lab and get data back fairly quickly. So it's opened up a whole new world. And even though it's been pretty tumultuous transition, it's been well worth the efforts. And I think a lot of good science is underway and we'll continue to be underway. Well, I just see people haven't stopped coming to us with more samples and more projects. So it's an exciting time. We love what we're doing. It's been stressful at times. And Jeff, if you're right in getting these nanopore sequencers running, it'll just be even more exciting, I guess. Thank you very much. Okay, this panel is now open for questions. Everybody getting hungry? One of the issues I would raise, I'm not sure actually it's a question, but maybe anybody can comment on this. And I think it might be of interest to the audience is that major technology advances are a blessing and a curse. You sort of heard about that. You hear about it, Jim, saying that sort of is very stressful to change. It's not just true for big centers. You can imagine that what's going on now with the sequencing technologies is that it's like every six months, another instrument becomes available. And so for individual investigators, that's a huge commitment for them to purchase one of these instruments. And they're constantly, the big centers are constantly wondering should we make a major investment in buying 10 of these machines? But there's also just hundreds of individual investigators who are trying to decide, should I buy this instrument or should I wait six months and buy the next one? And it's not just the price of buying the instrument. It's the personnel investment and getting that instrument to work well in the laboratory. It's, of course, the computational challenges that Vivian spoke of. Every instrument, the output's a little different. The software needs to be a little bit different. It has different nuances. So it's a blessing and a curse. These technologies are revolutionary in their nature, but they come with it some hard decisions, especially when money is tight. I mean, these instruments cost half a million dollars often. Individual investigators, that's a lot of money to raise to buy it. And if it's gonna be an outdated machine in a year, they'll be kicking themselves. So it's, there's a lot of nuances associated with that that shouldn't be underestimated. I don't know if anybody wants to comment. I mean, I was just making that observation. I guess it's a question of which problem you wanna have. Right, right. And the other thing about it is so different than what the situation was for so long. I mean, throughout the Genome Project, really for a few years beyond, we were all, couldn't believe we were still doing sort of the classic Sanger-based di-deoxy chain termination sequence. We couldn't believe we were basically dealing with one company, maybe two. And, but at least everything was very standard and very stable. And yeah, we got the price down incrementally. But then you gotta be careful what you wish for, because now we have the other extreme where there's so many different things that you sort of can't figure out where to put your money down. Question, yeah. On a sort of related question, I'm wondering if the NIH and in fact, the federal government is making plans for resources to be able to analyze all of this data. I mean, we're getting into ecobytes of data here pretty soon. And, you know, having myself been tested and having thousands of genomic traits annotated, I have no idea what to do with that because analytical tools don't really exist. You know, is there another phase here that's being anticipated and planned for? Absolutely. I was gonna make some comments about, I mean, Vivian set it up a little and she could extend anything now. I mean, I was gonna make some comments about that at the very end. But without a question, data production is not limiting now in biomedical, at least certainly in genomics. Computational biology is becoming, and just computational challenges are becoming the bottleneck. And it's not, it's being discussed extensively at NIH and it's getting considerable amount of attention. I'll have some things to say about that at the end. I don't know if you wanna add anything or Jeff. It's being discussed not only at NIH, but there's a whole industry growing up around this. Just at the Consumer Genomics Conference in Boston this past week. And there's, the venture capitalists are interested a lot of different kinds of companies with different models for how, so, well, actually it's gonna go hand in hand. The collection of the data with human genome sequences with phenotype data that are in medical records will drive the research to add meaning to the individuals, to the interpretation of the individual's information, but also how to represent that to consumer independent of the medical profession and how to represent it to the medical profession so that they can not have to be absolute experts in every condition, in all the ins and outs of human genetics, but can treat these data much the same way they treat many of the other biochemical tests that you have done when you go into a clinic and be able to sort out what do I do with this information for this patient? It's a big industry growing up around this. So your question, if I heard you right, was what is NIH doing? Is that right? Well, yeah, I mean, are there resources? I mean, in other words, as they're planning for, when this deluge sort of, we do get to the thousand dollar genome. So let me talk to you a little bit about that and to lead on from what Eric was saying. Firstly, a lot of institutes, ours included, tend to think about this institutionally, but we've seen this issue across so many different institutes. In fact, when I did my cloud computing meeting, we had over 15 people from different institutes coming in because they knew they all had the same problem. So one of the things that I'm taking hold of is looking at what does the cloud provide for us as NIH to do this kind of analysis? And in conjunction with that, the Center for Information Technology at NIH, which is basically the nuts and bolts of the computing, the wiring, all that kind of stuff. I've been in discussions with them as well because they don't do the scientific research, but they provide the nuts and bolts for us to do that. So one, working with different institutes, coming together to a cohesive fashion about how we can marshal our forces is one of the discussions. And the second one is working with, for example, CIT, to figure out how we do it. For example, in their case, they recognize that there may be a need for having a cloud within the context of NIH. The reason for that is the volume of data and the security issue as well. And the third piece to that puzzle is we've just started exploring some work with third parties, both commercial and academic, in terms of how we would do that because they're very interested to work with us. And so those are the three areas that cover what NIH is doing. And just to extend what Jeff said, Vivian, and if I can set you up when you had your cloud computing meeting, Jeff was saying the private industry is getting interested in this problem. Who were some of the participants in your cloud computing meeting? So Google was very interested, and Amazon and Microsoft. And they're all... They all showed up, right? Yeah. I mean, 10 years ago, we would invite groups like that to come to a genomics meeting. They'd yawn and say, not interesting. Now they're coming. The other thing is we had the Argonne National Labs. They do a lot of high-performance computing. They're very interested in this as well. And I think we've heard this before where we're now becoming interesting to these communities because our data wasn't sufficiently large or complicated enough. And now we're very complicated and very interesting. So I guess that, by human nature, makes us interesting. This is a short comment, too. I was happy to see on your slide, you were mentioning environmental, scanning other areas that need to be integrated with the genomics. Absolutely, I think we need to not be myopic here. Okay, one more question before lunch. I'm Louise Rosenbaum, I'm at NIAMS. And so just to step back to your comment about the individual sequencing centers facing these challenges. What sort of efforts are there so that perhaps there is more consolidation? I know that the Broad offers sequencing services for people from other institutions, but it's a challenge of scheduling. So how that might be able to be arranged so that it can be more productive and less effort for individual sites. Does anybody wanna take that? Is Adam? CTSAs, I think, are organizing that. But again, is that limiting and what options are there in the future so this could be done more collectively? Is Adam still here? He would be the best, I guess he's not here. Jeff, do you wanna? Well, so we've, there's a lot of debate whether consolidation into a few large centers is the best way to go versus trying to disseminate. We're supporting both approaches. A lot of people don't want to have to wait for large centers to be able to do the production. I should say that actually you can also, most of the, many of the companies are many of the sequencing companies, but also many other companies run sequencing services. So if all you wanna do is get a few sample sequence, you don't have to set up a sequencer, you can go and just send your sample and get it back in a matter of a few weeks. So all the models are being supported and we don't want to sort of converge on a single model because we think that would be too constraining. Part of the value of these instruments is that at some level you don't have to have as large an infrastructure to do a significant sequencing projects as you had to even three or four years ago, but there are some costs. So another thing to add there is, if I heard your question about in terms of how do we work with these other institutes, many projects, the human microbiome, TCGA, a lot of these, all the sequencing centers, the usual suspects, you know, JCVI, Washu, Baylor, and Broad, they're all working on these projects and our experiences of that are that they have to work together more closely because when you're generating base pairs from a particular project, it's not related to a chromosome location like many years ago you do a sequencing of a chromosome. They're just generating base pairs, which means they have to, all the base pairs are going into a pot. We have to analyze it so they all have to figure out standards, how they work and how they operate. That's one thing. The other thing is I've got an example with Washu, which is Washington University from St. Louis. They're getting a lot of people coming to them saying, I want to work on smaller projects and because they don't have the computes to do that and Washu's generating the sequence data, they're working with these guys. So they're sharing their knowledge and telling us a little bit about it and I think as Jeff is pointing out, each one of the different institutes is doing this as well. So it happens on a couple of different levels. Okay, I want to thank this panel for their talk. Thank you. So we, it is lunchtime, but we need to keep moving on the schedule. So right outside of these doors are some box lunches. You are all welcome to go grab a lunch, bring it back to your place. We're going to try to start off in about 10 minutes. And while you are eating, our luncheon speaker will be Sharon Terry. So if we could please.