 So, we were probably the smallest group, I think, of the breakout sessions compared to genomics and evolution, but notwithstanding we had a pretty lively, sometimes confusing discussion which I think we finally cemented near the end. We started out with kind of revisiting, I think, some of the things that we think are so important and foundational about this program that has existed in NHGRI since almost its inception. And the kind of the fundamentals which I think we were all in agreement is on essentially that evolution is the unifying principle by which everything that we're doing rests. So, studies of variation, studies of mutation are essentially fundamental and how those processes occur require really comparative genomics. The idea of genotype-phenotype correlations, which is what most of the people in this room are interested in, also benefits and I would say it probably makes most sense in the context of evolution. And so it provides us an unbiased framework for discovery and prioritization of regions and I would argue that as we move perhaps into interrogating non-coding sequences, regulatory sequences and trying to understand them, that the comparative genomics and evolutionary aspects will become more and more important. And I guess the last important point is that NHGRI really has blazed the trail in terms of this research. In terms of mammalian invertebrates and specifically there's the expertise, we have the computation, we have the resources in terms of libraries and other types of things and we have the consortia. So the track record and the ability to do this type of work really surpasses any other institute at NIH. So we began by first kind of, and I'll do this very quickly, just reviewing the accomplishments and sorry to those that we don't list as an accomplishment but there have been many in this area over the last 15 years. 60 vertebrate genomes have been sequenced in some form or fashion and aligned with the human data revealing about 3 million specific evolutionary conserved segments so about four and a half percent of our genome. I think the important point to think about is that this is work in progress. In many cases the genomes are not assembled or they're just used essentially to align to the human reference so we don't in many cases have standalone high quality or even working, reasonable working draft assemblies for many of the genomes. So I just looked at the average N50 contig length for primate genomes and it's on the order of about 25 kilobases. Point number two, one of the missions that we've had for many years is essentially to reconstruct the evolutionary history of every base in the human genome. We are not there yet. We've made some good strides in this area but we're lacking critical species in terms of high quality. We're lacking prosimians. We don't have a high quality Tarsier reference genome for example. There's one and a half million gaps in the Tarsier assembly right now. Less than 50 percent of that sequence can be aligned to the human reference. So if you think this is fit to complete, you're wrong. We're not done in terms of this project, at least in terms of how we set out. We've done moderately well. I changed this from deep catalogs to being catalogs. There have been efforts for the Great Ape Genome Project, Rhesus macaque, African green monkey to begin to survey some of the genetic variation that exists within these species. There's been studies of inbred Drosophila strains to understand really provide a framework for quantitative genetic trade studies. And so these have been good. More could be done in this particular area. And the last point, which we had some debate whether this was part of our purview, but we think this is fundamentally comparative in nature because it involves comparative genomes, both past and present, is the fact that we've had, and I think this is an achievement, the Human Reference Genome Consortium, whose mission it has been to continually improve the quality of the reference of the human genome as we go forward. So for many of you, you might think of this as a housekeeping exercise to kind of finish off the gaps. It's much more than that. The regions that are being tackled right now, and it's a targeted approach for specific regions, or regions that are incredibly diverse. Think of MHC, think of T-cell receptor regions, think of regions around signal duplications, highly dynamic, gene-rich, important in terms of human health, and a highly variable. There's more variation in a three-megabase stretch, tens of kb, hundreds of kilobases of sequence variation between different haplotypes, which has not been cataloged because of the complexity of that type of variation. So as a result, some of the holes, and we kind of really echo the first goal of the last group, is that we have not yet comprehensively assessed all genetic variation in any single genome. So it's not just a question of illegal frequency spectrum, it's a question of getting all the variation, all the indels, all the structural variants, all the copy number changes. And there are more. The estimates are four to five times more base pairs affected by structural variation than by single base pair and indel events. All right, so that's kind of the achievements. This is just to remind you, this is taken from two papers, just kind of the phylogenies that have been tapped in terms of this. I highlight here a few. I just mentioned that gorilla, for example, is we've worked on many of the primate genomes. The gorilla genome was recently sequenced and assembled, put together. Its average contact length is about 11 kilobases, I think of my last recollection on this. There's about a half a million gaps in the gorilla genome. And what that means is when we did the four-way alignment with the apes, which includes humans, 30% of the genome could not be aligned in that four-way alignment. So only 65, 70% of the genome could be aligned. That's eukromatic, that's genes. So we have very much heterogeneity in terms of the quality of the genomes that have been generated. Many of them have been just used simply to align to the human reference. So many of the mammalian genomes, roughly the 34 that have been done, 29 depicted here, are not particularly high-quality draft assemblies. The only high-quality assemblies on the slides, really, to be honest, are human and mouse. And many of the others are in various stages of working draft. So when we set out the goals from our group, we basically went to the, we actually broke each one down to really four things. Essentially, what's the big question? And why is NHGRI relevant in this particular question? Second was the tactic or the approach. Third was details. And fourth was justification, not in that order. So I think we agreed in our session that this was the single most important goal. People can disagree with me that we're in that breakout group, was to move from aligning genomes to essentially doing de novo sequence and assembly without guidance. To be able to take a genome, no matter what species, human otherwise, and be able to generate a high-quality de novo sequence and assembly of that genome. And so we have the specific. We would suggest or argue that what NHGRI should invest in, and not be dependent solely on the commercial sector for this, was to advance sequencing technologies to advance or assemble a genome for $10,000. So this does not generate 40x sequence coverage with the lumina. This is to actually assemble. The cost of assembling genomes is actually still, has been prohibitive. We have some statistics now based on assembly with one human genome using long read pack bio data to suggest it would cost us about $60,000 in the Hadetida formal to assemble a genome with an N50 contig length of 4.4 megabases. That's 150-fold improvement in terms of N50 contig length based on just standard aluminum sequencing. I don't think it's unreasonable to think that we could have an order of magnitude drop in that cost to get to us to a 10,000 genome assembly. We suggest that one useful, and so this is the specifics, I think Jeff Schloss asked for this specifically, one specific approach or area that NHGRI could invest in would be to apply this to a finite number of human reference genomes. So to generate human reference genomes at quality of the existing human reference genome or better for 50 different humans representing diversity or sampled broadly across humanity. So we're thinking of this as kind of what we call gold genomes, very high quality genomes where most of the bases and structural variation copy number in those have all been resolved. We think it would be incredibly powerful because it would give us a comprehensive view of the types of genetic variation that exist kind of in the sweet spot right now where we can't access very well. So as a member of the structural variation working group for the last X number of years, part of the thousand genomes and before that earlier in other projects, we are not particularly good at detecting inversions. We are not particularly good at genotyping or detecting insertions. We are terrible at terms of complex structural variation events with certain segmental duplications. And so this is an area if you could think about where we would have 50 reference, call them continental genomes if you will, Africans, Asians, Amerindians, Europeans, where we would have high or very high quality references at those positions. And this would give us, I think, the first truly comprehensive view. We just ran some statistics recently and we think based on what we've been able to do on one genome comparing aluminum packed biotechnology that between 50 base pairs and 5,000 base pairs, we are missing in 90 percent of deletion variants. We are, I should say, 62 percent of deletion variants, 90 percent of insertion variants. So if we think we are completely understanding a variation in the human genome, we're really mistaken. I pushed for this, but there was push back in a group, so I'll just mention this. I think the goal should even be better than this. I think we should push to sequence from telomere to telomer every human chromosome, including the dark matter, the centromaric, the acrocentric. I think it can be achieved, won't be achieved today, maybe not achieved in the next couple of years, but no other institute will achieve this. And we know that variation within centromere, we know that variation with telomeres is important in terms of human health. This one, this is the big lofty goal. What makes us human? This requires an emphasis on primate genomes. We still have not achieved this, which is something that we set out over 10 years ago, which is to assign every human lineage specific or genomic change to a specific branch on the evolutionary tree of primates. Many of the group were most interested in those human specific changes with functional consequences, including gene innovations. And so we are still discovering new genes in 2012, 2013 that aren't in the human reference genome. These are typically often duplicated genes, but they're also important in terms of human health and human adaptation. So in terms of specifics and concrete, we would argue that it's possible with all the resources that have been generated now to focus on generation of high quality de novo assembly of non-human primate genomes. We suggest as a straw man 16 primate genomes, including many which have already been in the working draft stage, and to assemble them at the quality of the human reference genome. 16 is a number that we use based on looking at available resources, including back resources, but also having at least two representations from every major phylogenetic branch from the human lineage. This would provide us fundamental information on processes of mutational processes, speciation, differences in a lineage-specific sorting, gene flow, and et cetera. And there was some discussion in our group, and we think it's an interesting observation, that many of the recurrent microdeletions that are actually mediating genomic disorders in the human population are caused by human specific duplications and complex regions that have evolved over the last five to 10 million years of human evolution. There's remarkable genetic variability in those regions, which predisposes some individuals in certain haplotypes to have recurrent microdeletions, and others not. The last point, our goal that I'll mention, was essentially this, to obtain nucleotide level resolution of every conserved functional element in humans. We are not there yet. We heard some great stories yesterday about the power of actually comparative genomics and helping to identify regulatory elements. The story from David Kingsley and finding that mutation in the regulatory element for KIT, and how these weren't detected by ENCODE, but were picked up as being based on comparative analyses. The data that's out there right now, which is roughly the 30 mammals, gets us down to about 12 base pair resolution. Simulation suggests that if you push this to 100 to 200 million genomes sequenced deeply, you'd get down to a single base pair resolution. And I think that's an easy target without any advances in sequencing technology right now could be generated. Some people said, well, maybe this will be done by the 10K diversity project, or 10K genome project, or other projects that are out there. I don't think so. It may be, but that's not their mission. The mission here should be to sequence genomes, make that data publicly available so everybody can analyze it as quickly and rapidly as possible. This would allow us to quantify the selective constraints on each element or cause mammals and integrate with existing data sets in ENCODE and both mouse and human. And if advances in computational technologies and advances in sequencing technologies came along, it wouldn't, I think, be on the pale to think about doing additional mammalian genomes at high quality like we have it done for the mouse. All right, I'm gonna turn this over to Andy. I wish there was a button like that where I could blank all your screens too. If you think of the top causes of mortality and morbidity in humans and consider cardiovascular disease, COPD, stroke, diabetes, a list that Mike Banki gave, they have two things that are in common. One is that they're adult onset and the other is that they're remarkably sensitive to the environment so that your risk is a function of your environmental stresses as well as some attribute of your genome. So we're all unique. Some of us are more unique than others. And we're unique not only because of our genomes but because of that trajectory of environmental stresses and environmental exposures that we've had during our lifetime. Now, when you're faced with trying to infer causality in a situation that's so high dimension where the factors are confounded so badly as they are with genotype and environment in the case of humans, you have a very tough problem ahead of you and we're all familiar with that. Now there are two major impediments then for studying and understanding causation in the face of adult onset diseases of this sort. One is the fact that there isn't really a good controlled experiment that any of us can do. There's no control for that experiment. And the other is because it's observational and the other is that we can't replicate the experiment. We can't take the same genotype and put it in a bunch of environments and ask what happens. So there's good news and bad news. The good news is that model organisms do precisely this. They allow us to put the same genotype in multiple environments and take apart genotype by environment interaction very carefully. The bad news of course is that when you do the right experiment, take a set of zebrafish or mice or flies or worms through a set of different environments, the rank order of the phenotypes that you score almost always flip around. That is, genotype by environment interaction is almost universal. So we're in a situation where we need to understand how do genotypes respond to different environments and the answer is we need to figure out what's the best way to work with model organisms. Now we all know of examples where model organisms are terrible for modeling specific diseases. There's a human disease where the mouse doesn't even have that gene. But we need to move beyond that. We need to ask the question now using genomic technologies, what really is the best model for each specific disease? And I think Aviv Ragev's talk was particularly kind of informative in thinking about how genomic technologies could really incredibly sharpen our ability to focus on model organisms for specific diseases. If we had catalogs of the short she described for mouse genes for instance, there's great opportunity for taking this forward. So goal four then is to leverage the power of model organisms in functional genomics. Of course this resonates very well with goal one where Eric Boerwinkle emphasized the fact that we need to understand basic biology before we can understand disease. I would argue that model organisms are important path to that. And of course in goal two Rick Myers and Mark Erstein argued quite well that model organisms do have the ability to allow us to infer function at the adult stage or whole organismal stage. So some of the points for how we would proceed to do this by for instance doing applying large scale genomic and other omics technologies to reference panels. So many of the model organism communities are working on functional genomics. There are such panels as the collaborative cross, the diversity outbreds and so forth in mice or many others in other model organisms. The idea of taking human mutations forward into model organisms and studying them both at the cellular as well as the organismal stage. This will scale beautifully now with CRISPR technology. We'll be able to make thousands of human mutations and put them in adult mice and put them in different stressful environments. So by doing this in the across different environments then we'd have this really good handle on the sort of full degree of genotype by environment interaction in those particular physiologies that are relevant to these human disease states. One other idea was that well as we study these other organisms in the more comparative evolutionary kind of view of the world, we look at for instance naked mole rat, we find the fact that they don't have tumors. We might identify genes that we suspect might be important in that process. What about doing the reverse experiment and taking some allelic states from these non-model organisms and putting them into human cells and seeing how they behave. That's a kind of out there idea that I won't take credit for. Okay, so that's the end of the model organism sermon. Getting back to the grand scope of comparative and evolutionary genomics, all of the things that Eric told you about couldn't be done without serious improvements to the computational infrastructure that we need. So we need to develop informatics infrastructure to produce, display and quantify these multiple species genome alignments. An alignment of species is a central, of species genomes is a central tool for inference of where along that phylogeny to particular changes occur. When we layer on that functional information as well, we get terrific insight about the way genes and phenotypes have evolved. So this required development of algorithms, software alignment methods, requires development of new browsers. Anybody who's been involved in these projects know that you have a constantly shifting coordinate system for the genomes as you discover huge insertions in one species that weren't in the others and so forth. We need to devise methods of analysis of complex chromosomal rearrangements, methods of representing genomes in the face of those rearrangements. And finally, we need to produce benchmarks, quality control metrics, and assessments of accuracy of these methods. So I'll summarize just by kind of listing all of the goals, going through these quickly. Reminding you that evolution is the single most powerful unifying principle in all of biology that we, the history of biology is that we have learned an enormous amount from that now. I'll warn us against the arrogance that we all are somewhat subject to with the power of the tools of genomics to think that, well, now that we can do this in human cells, we don't need to think about anything else anymore. We can just do all the manipulations in human cells. I think that evolution still has an enormous amount to teach us that model organisms have also marched forward in their technologies for manipulating and perturbing genomes. We need to develop strategies and technologies to obtain high quality DeNovo reference sequence. This will be applicable throughout all of biology, not just even the goals that Eric outlined. We need to, I mean, Evan, sorry. We need to target multiple primate genomes to infer high confidence, all the human-specific attributes, so this is enormously useful in doing comparative biology, seeing what are uniquely human traits and how are we different from our closest relatives. That's of also tremendous intellectual interest, and I think we could bring in the public in sort of sharing the excitement over these sort of aspects of the science that we do, the sort of fundamental question of how did we evolve from our most recent ancestors is one that resonates very deeply with the public. By sequencing multiple mammals, we were told last night, there are only 5,400 mammals that are known, so eventually we might be able to get there, but starting with the first 200 using current technologies, this is not expensive. We could easily get to the point of being able to identify all the human-specific conserved elements from those alignments and comparisons. The fourth point, again, the model organism thing, I'll beat that drum one last time, there are still of enormous utility for understanding many aspects of basic biology, but including particularly these context-dependent variant functions, where those contexts include anything from diet to drug treatments and so forth. Those scale beautifully in experiments with model organisms, and so sequencing reference panels of them would enable those studies to proceed at a much greater pace. The fifth goal was, again, this going development of software tools for dealing with multiple sequence alignments, and I'll close by just emphasizing again that all of the things that we talked about, none of them actually are on the purview of other institutes. The National Institute of General Medical Sciences does do a lot of evolutionary biology. They have funded some model organism work and so forth, but the scale of the sequencing, there is an aspect of this problem that is uniquely NHGRI, and we would like to see them do it. So with that, I'll take questions. David. So I want to say, point out that this so resonates with the goal that we heard from Heidi, detect all types of clinically relevant variation in a single genome scale test. That's very, very consistent with the 10,000 genome goal, and I would say that I love these charts that we've been seeing with the Moore's Law and the approaching the $1,000 genome, but there's a lot of wishful thinking in there. Those aren't genomes, right? We really need to get back and do this the way that Evan described beautifully in this so we can really say that we're sequencing the whole genome. I agree. I was really pleased, I was hoping that one other group would come up with the importance of being able to do a single genome without alignment to the reference because that is fundamental human genomics. A wise man once told me is that our field is skin deep in terms of its fundamental algorithm, which is to understand all the genetic variation comprehensively, and once we do that, that's finite. We can assess links to human phenotype. And so this is where we will be, whether NHGRI leads to charge or not, in five, 10 years from now. We won't be doing these alignments anymore. We will be doing de novo assembly. And there has been a spectacular improvement starting with $300 million in 2000 to what we can do now in terms of getting a real high quality. And it's not as if the private sector isn't gonna play an important role. Companies like PacBio, Nanopore, Oxford, Nanopore, they can continue, but if there is an incentive, a push at some level to drive this even more, I think we could accelerate and get to that point of a single genome assembly, instead of 10 years, five years as part of a routine clinical test. Jim. Just follow up on those statements. I think this is so important, and particularly from the evolutionary aspect, since new mutation frequency is much greater on a local specific basis for copying them versus single nucleotide variants, we need really to get better structural information and assays for this. And it ties you right into an institute nobody I've heard talk about right now, and that's environmental health, the NIEHS. And we've already got evidence from Tom Glover's work that hydroxylamine, the chemical we use clinically induces CNV mutations at high rates in both yeast and mammalian cells in vitro. So studying these mutational methods, I mean the Ames test doesn't even test for copy number, it gets only at mostly single nucleotide variants. So how the environment interacts with our genome with respect to copy number is a total black box. I mean a related idea to this inference of mutational processes is inferences about differences in recombinational processes, which are also fundamental to well evolution and everything else about the map. So, yes, I agree. Richard. And the follow up on that. I mean the fact that Alec Jeffries has nicely shown PRDM-9 alleles influenced genomic differences order rates and that you can change that in different environments. I think we have to go at that in a big way for mutations. Richard. There seems an opportunity to move the center of gravity of disease models towards primates from mouse. And whilst you mentioned primates in the other context, I think you didn't strongly state building and exploiting primate models for human disease should be a priority. Was that discussed? Well, we actually didn't discuss it too much. I think the impediment there is the problems of working with primates. It's just there are limits to what we can do. They're not totally insurmountable, but that's a limit. Well, I think you could argue too that there's been technological changes in all areas that really change the complexion of that. I was interested that when in goal two you didn't mention archaic humans in this high confidence list of human specific genome attributes. I think that is, I know it's, I mean, it's not science fiction, obviously. No, so we discussed this specifically. And this came up with the, in the context of, yes, there'll be more archaic hominins that'll be sequenced. Most of that will be done with short-read technology largely because the fragment lengths are so short in these archaic hominins that it doesn't really lend itself to a de Novo assembly of Neanderthal or Denisova, for that matter. But it's generally something that we feel is going to happen, whether any is dry, invests or not. So we were looking for those seven characteristics that they laid out in the beginning, high throughput, consortia, technology advance. I mean, I mean, that's a focus on the data generation component of another data analysis component. I think it'll be very foolish in the data analysis component to ignore the archaic human. Absolutely. Absolutely. It's there. And just a kind of push. I'm a, I think on the model organisms, I think there is this huge, the gene environment effects. It's very clear that you just have opportunities there, especially with fixed genotypes to explore things which are very, very hard to do observationally in humans, I mean. So I really believe that, Andy. A personal plea is that we don't narrow ourselves down to just mice and zebrafish as model organisms. I just think that we've got a much bigger repertoire of organisms out there than just saying model organisms equals a very small list of species. And there's a bigger diversity of useful organisms out there beyond those two. So we'll include Madaka. Mark. I just wanted to say that I really agreed with Evan's point about the impact of structural variation, the importance of having high quality genomes. I just want to mention that in the functional breakout group, we also did really talk about thinking about the functional impact of structural variants. I mean, it's much more complicated and potentially much larger than single nucleotide variants. We really have to think about ways of thinking about this impact. I agree. Carlos. Just to add a UN's point on the potential for archaic and ancient DNA. In fact, there's a ton of technology development recently that it's sort of upending this issue about how much can you really get out? So when they do, for example, the single stranded library prep, it turns out you get many more molecules than the double stranded. And that's why you're able to take these Denisovans to high coverage. So it's actually an area that the US has invested almost no money in. All of the development has actually happened in Europe. And you could imagine that, in fact, there may be some bone somewhere that have somewhat larger fragments that could be sequenced. So I wouldn't totally rule it out and nobody knows how far back you could go. We don't have a homo erective sequence yet. That doesn't mean it can't be done, but it just hasn't been prioritized in terms of what's been done. And they're basically two, three labs in Europe that are leading in that area. So just to goal five to point five, was there discussion on the committee about partnering with say NSF on advancing some of the alignment algorithms and comparative data analysis tools because they also expend a lot of money in this area. Be good partnership. There wasn't, that is a very good idea. The NIH, NSF joint developments in quantitative biology have been very encouraging and that's a very, very good suggestion. I mean, I guess my feeling on this is that most of the genome browsers that are being used, I mean, that's just one obviously way to do this. But are pretty, have been driven largely by the genomics community funded by Welcome and NIH. And I think, I mean, we have a lot of experience in this and there's been a lot of discussion, I've been at several meetings about how would you, if you had 50 human references right now at high quality, how would you display them so people could access the information, optimize their mapping so they could find the right genome and still be able to communicate these ideas. So not trivial at all. And I mean, for sure, if there is value for adding NSF and partnership, I mean, we should take advantage of everything we got. But I think my feeling on this is that we as a community have taken the leadership role in this and we should continue to push on this because this is again, not an easy or solved problem. Eric. So I wonder if it's okay to make a comment across the four sessions so far because we're, I assume, coming to the end of the, so it's a spectacular range of projects and I want to share the general enthusiasm about the specific things that have been proposed. I think many of these things in this particular breakout are incredibly important. But I'm starting to think about what's missing in what we've been talking about this morning. And I think it's probably there but it's maybe hidden a little bit when we're spending a lot of attention on structure. What are the nucleotides? What are the variations in them? How do they correlate with disease? And then we mentioned disease-relevant functions. And we talk a lot about mutating the gene and seeing in an assay what effect it has. What I'm wondering is missing is the connection with cellular circuitry, that being able to accomplish our goal of interpreting variation in the context of disease, we may be able to interpret variation in the context of the protein that it affects but to truly interpret it in the context of disease, there's a set of NHGRI-ish kind of activities of systematically dissecting cellular circuitry. When this enhancer gets affected, when this protein gets affected, what are the consequences? 108 genes in schizophrenia get identified or 60 genes in heart disease get identified. How do we recognize what effect that is having on the cell? And so I would not like NHGRI to have to pay for all of it, but I do think that there is a set of infrastructure. It's a little related to the LINX project. It's related to what Aviv was talking about yesterday. It's related to going from individual enhancers to whole circuits. And somewhere, NHGRI ought to be the intellectual leader of that. And it ought to be paid for maybe by Common Fund or others, but we're not gonna be able to interpret disease without. We're gonna get, based on everything that's been laid out here, a great description of the structural problems, I agree, Dan, to completeness. We're gonna get great correlations. We're gonna get protein structure responses. But there is a piece, and we haven't defined it here, that we better define, and it's a set of databases about circuitry, circuitry responses, cells. And I was unclear whether it was in-bounds or out-of-bounds, but as I think about what we're doing, if we don't make sure that that piece gets done, it's gonna be really hard to interpret these subtle mutations, even with all the assays and all the other things we're doing. So I don't mean to destabilize anything here by arguing it, and I think we should go forward with these things, but somewhere, we better also launch a process that's doing that. But I think to make sense of those, Eric, you really have to start by making sure that you've got the finite aspect of our universe, which is our genome sequence, totally understood. Because all of those things really make sense in the context of the variation in which those mutations are found. Look, I'm agreeing we wanna get that sequence, but what I'm disagreeing is you first must do that, because to meet our goal, which is understand disease, we must do that, and then we're gonna have to interpret that in terms of physiology. And all of the structural stuff, very important, isn't gonna get us the physiology that we owe for disease. So it's not at the expense of it, we just also, in addition, separately, not competing, better be doing it, that's all. I mean, it reminds me of something 10 years ago, when we were actually analyzing first Ventur's genome and comparing it to human reference. And Ventur's genome was, if you remember, I'm sure you do, it was significantly shorter. And where it was shorter, were all the genes and all the segmental duplications which were not in Ventur's genome. So we could not, as a community, have begun to understand break points of genomic disorders, copy number variation, without actually the investment of NHGRI in terms of building a better reference. Because what you can't see, you can't assay. So I think. You're defending the need to get complete structure. I'm totally agreeing. My comment is independent of your comment. Let me totally endorse all of what you're saying and then add, we still are not going to be able to reach our goal of interpreting disease without additional things. So I, all of it is great, but we have an obligation to go all the way and I was just trying to figure out what piece feels like it's not been discussed in this meeting, that's all. Myos and. It does, and I was going back through the slides and again, a lot of the emphasis was per variant, understanding the effect of this variant and my concern to you, and it's exactly that, that when you look at the variant-centered point of view, you don't, for example, have a circuitry centered. You don't see what happens across large numbers of interacting things. For example, take cellular circuitry and break it down into a catalog of 2,337 processes, so that we have a finite list of those processes and we understand the context in which that variant functions. So a lot of what we have is bottom-up construction from individual variants, interpreting it for patients in the clinic. Again, incredibly important, I'm not arguing against it. I'm just saying that the bottom-up inference will, is gonna miss things if we also don't have a sort of top-down completeness of a wiring diagram and many of the things, even in that presentation, number two, didn't really get at that higher-order picture. So again, not that we shouldn't do all those things, I'm just concerned. We do have some key players in system biology. I agree. In this room. Let's start with Manolis Kellis and then Mark. So I mean, first of all, I just wanna briefly second Eric's point and basically say that we had this fifth panel that was never created and I think systems biology could have been one of them. I think this paradigm of learning what's common across all of the variants that are associated with the disease and then learning common properties of these variants like what tissue are they in? What type of enhancer are they in? What motifs are nearby? And then using that knowledge and applying it back to individual variants is something that's emerging a lot in our community and something that has been a paradigm of genomics. The fact that you have the whole genome allows you to learn global properties and then go back to the individual regions armed with those properties and interpret them better than what you could do in isolation. I think it's a paradigm that's pervasive and I think systems biology approaches and regulatory genomics approaches could be one of these sort of fifth panel type recommendations. Going back to the, and this is actually related to the comment that I wanted to make on the comparative genomics, the heroic effort that we saw for the blonde hair variant where that particular nucleotide could only be interpreted through not just sequence level conservation but understanding exactly what are the regulatory regions that are active in these other genomes and what are the motifs and how they've moved and how they've changed. I think it's something that should be routine. It's something that rests upon sort of comparative genomics to provide it as a resource to the whole community just like ENCODE has provided a set of regulatory annotations at varying degrees of resolution and varying degrees of sophistication. I think comparative genomics should have a mandate to provide such a list so that the next time we find such a motif, you don't have to go through years of experimentation that you have the catalog of exactly how all these motifs have changed. And the recent realization that, I mean from mouse ENCODE, which is not yet published, that there's a huge amount of regulatory conservation between human and mouse, which is actually not reflected in the nucleotide level conservation, means that it's imperative for us to develop better methods for understanding regulatory evolution and regulatory conservation because there's bound to be much more conservation and that's what we're starting to realize with mouse ENCODE, much more regulatory conservation that sequence models allow us to infer. And if we had a better way of detecting that and that goes back to the proposal with NSF, then I think we would provide a great resource for understanding disease. You're straying a bit from Eric's primary point which I completely agree with, which is there was an awful lot of language in this meeting that had the view that the effect of a SNP is this sort of unitary thing that you can study outside of the context of the rest of the genome as well as different environments. The genotype by environment thing was one way that we're sort of stepping away from that view, but the idea that that genetic variant is embedded in other genetic variants, there's gene by gene and other sorts of things. And the way we word this now is to think of human diseases as perturbations of these networks of genes that are involved in them and metabolic disorders, particularly there's a lot of literature there. I do agree completely. Andy, I'm sorry and everyone, I'm sorry to stop the discussion but we're well over time at this point and we do need to break. Thanks.