 I think most of you know the format now. We're going to have a series of short talks from the researchers and then we will go to Q&A from the phones. We will start off with a brief introduction from Chris Gunter, who's a senior biology editor at Nature. And then we've got Frances Collins, director of the National Human Genome Research Institute at the NIH in the United States, followed by Mike Snyder, director for Yale Center for Genomics and Proteomics. And then conclusion by Ewan Burney, who is the head of the Genome Institute at the European Bioinformatics Institute. Okay, Chris, I'll hand over to you. Thanks, Ruth. I, as Ruth mentioned, I'm a senior editor at Nature who handled this paper and I just want everyone to thank everyone for being here today and also want to let you know that this paper will be freely available once it's published, which is later today. It will also be freely available right when it's published, which is great for us. We're really honored to have this paper here. I also want to let you know that there will be a web focus about this, which will have a number of links for people that are interested, including an archive of related papers. And this is at nature.com slash nature slash focus slash in code. So that is a resource that will be available to people afterwards. And if you have any questions during the day, you can let me know. I can direct you to the right people or whatever you need. And now we're going to hear from Frances Collins. Well, good morning. Good afternoon. It's our absolute pleasure to be part of this press conference revealing some very exciting data that's come out of this project called ENCODE, the Encyclopedia of DNA Elements. The National Human Genome Research Institute was the major supporter of this in terms of the funding. And we are elated with the results that have come out of this in the way that it has annotated in a really remarkably detailed way the function of about 1 percent of the human genome. And we aim based upon that, and I'll come to that in a minute, to now scale this up in the not too distant future and apply the same approaches to the entire human genome given the success of this pilot project. Sequencing the human genome, which was, of course, the major flagship goal of the human genome project, succeeded in April 2003 in giving us the entire instruction manual for human biology. But it is written in a language that we are still trying to learn how to understand. And in fact, what ENCODE is all about is exactly that, building an encyclopedia that helps us interpret the language of the human genome in a way that tells us what functions are encoded within this remarkable 3 billion letter script. And that script, of course, written in this apparently simple alphabet with just four letters, but somehow carrying within it all of the instructions necessary to take a single-celled embryo and turn it into a very complex biological entity called a human being. So the encyclopedia before ENCODE was limited pretty much to what we knew about the parts of the genome that code for protein, the exons. What we've learned through ENCODE, and you'll hear about this in much more detail from Mike and Ewan, is that in fact there's a lot more going on outside of those exons that's critical to function. This is, I think, therefore a landmark in our understanding of human biology. I would point to a connection here that might be relevant for today's discussion. All of you who have been following what's happening in the field of genomics will have noticed in the course of the last few months an outpouring of discoveries about genetic variations that are associated with common diseases like prostate cancer or diabetes or heart disease. And in fact, when you look at those discoveries, it is striking to notice that the vast majority of them do not appear to be due to variations in the exons, variations in the coding region of a gene. But in fact, they appear to be regulatory changes. ENCODE is in fact a very powerful tool now to begin to understand how those regulatory variants may affect function and confer risk of disease. So the fact that these projects have been proceeding in parallel, one understanding human variation, the other trying to understand genome function is going to help us a lot in moving into this deeper understanding of how life works and how sometimes things go wrong and disease occurs. ENCODE is also a prime example of team science at its best. This paper being published in Nature today represents the joint efforts of 35 groups working together side by side from 80 organizations in 11 nations. And all of these groups agreed at the outset to share information, technologies, data, everything, and to get together as part of this consortium focused on this same carefully chosen 1% of the human genome. And I think that has worked remarkably well. I think none of the data that came out of this would have been as rich as it is without the ability to cross-compare between different groups, different technologies, and see what you could learn when you intersect those kinds of discoveries. We are delighted to have had a chance to support this, but I think it's important to say that the success of this now puts us in a position to scale this up. We spent $42 million on the pilot project that's being published today in Nature, and now we are prepared, based upon what we've learned from that, to go from 1% to the whole thing. And in fact, grants have already been solicited and received and are about to be reviewed, and we will make those awards by the end of September. You might wonder if it costs $42 million to do 1%, how can we possibly afford to do this thing? Well, I'm happy to say that the experience that has come out of the pilot project has really helped us understand efficiencies of scale and the technologies have advanced as a result. And so we believe that over the next four years, we can apply most of those same technologies with a rigorous approach towards data quality for less than a total of $100 million. So that tells you just how rapidly this has been progressing. And of course, all of the people who are working on other parts of the genome that didn't happen to be working in a place where ENCODE was studying the details are looking forward to seeing the ENCODE projects come to their town and to basically lay out the same rich detail of genome function that we now have for this carefully chosen 30 megabases. So I just want to conclude by saying what a remarkably positive experience this has been in terms of scientific collaboration and particularly to recognize the leadership of this project, people who basically put their own scientific efforts very much into this collaborative work, recognizing that the output of this would be a collaborative paper occupying some 18 pages in nature, but feeling that the goals were so important that this was the right approach to take. And I think the results speak for themselves. I particularly want to thank you and Bernie, who will speak in a few minutes, who has led the analysis effort, a very critical part of this. And one of the delightful aspects of this whole ENCODE project has been seeing how the experimentalists and the computational experts have really gotten together around a very explicit data set and learned a great deal from each other, really moving us into a new phase of computational biology. So I'm delighted to be part of this press conference and I will be happy to answer questions when we get to that point. But now I would like to turn this over to Mike Snyder from Yale who can tell you something about some of the experimental methods and some of the results. Okay, hi. It's also a pleasure for me to participate in this conference and tell you my perspective on this. I think to echo some of Francis's comments, understanding the human genome is not a simple process. It really required a diverse set of technologies and really a diverse set of researchers to be able to experimentally characterize the genome and then analyze it with the various analysis tools that you and Bernie will tell you about. And so what was done in this project was really to bring together this very diverse set of tools to focus on 1% of the human genome. The 1% that was selected involved some, about half of it was areas of very high interest where there had been quite a bit of characterization already. And then the other half was randomly selected regions that should be representative of the human genome. And based on the amount of effort that was devoted to this, this is an incredibly large project. And I would say the size and scale of a project for this type of work had never been done before. And just to illustrate this point, over 400 million data points were generated to be able to get the information that led to the maps that are essentially the product of this project. So it required, as I say, the expertise of many, many different researchers and many different groups, over 35 different groups were involved, as Francis said. Now a lot of the projects were really directed at trying to understand which regions of the genome produced the information that ultimately leads to forming cells and forming a human body from a single cell. And these projects were largely voted to try and understand where the genes are. And genes, as many of you probably know, encode RNAs that actually transfer, that represent the information from the genes that ultimately gets used to form different cells and program our developmental pathways. And what we really, one of the goals of this project and one of the things that came out was, in fact, understanding where all the different RNA-expressed regions lie in the human genome. And so where the various parts of genes are and get a better understanding of the protein-coding genes. And most important from that part is to try and understand exactly the RNAs they're making. And then also to try and identify new genes and new sequences that are expressed as RNAs. And so one product was, in fact, to basically lay out this information of where the express regions lie in the human genome. I think another major emphasis was to try and understand where the regulatory sequences are that control the expressions of the genes so we can understand when the genes are expressed and which cell types, when and where, and also what happens when a barren expression occurs, such as in disease states. And so it's very important to map these regulatory sequences. They're very hard to find and you need comprehensive experimental tools to be able to find these things. And basically a large effort was devoted to mapping these out. And we now have a much better understanding where a lot of the regulatory information lies and certainly regions that were not known to encode functional elements before, that is they were thought to be junk DNA. It's now clear that some of this encodes regulatory information. A lot of the projects, as they say, are centered in those two areas. And just to give you an example of two of the highlights that came out of this project, one is our complexity of the RNAs that are made is much, much higher than had been appreciated previously. So that is to say, I think we knew before that genes can often encode more than one RNA. And one product of this project was to discover that, on average, each gene was making at least five different kinds of RNAs. And the complexity then is much, much higher than people had realized before. Another thing that was appreciated is that, as I say, we can map out many regulatory elements throughout the entire region. And one consequence of this is that we can now understand where some of the key regulatory regions lie. And there are elements called promoters that lie at the starts of genes. And we now, for example, have mapped out twice as many promoters as people had appreciated previously, in fact, over twice as many. So what's happened is we now have a much better map of all of the RNAs, the information that's expressed from the genome, and also where the regulatory sequences lie. And so we have a much better understanding of where these lie as a consequence of this. And in my view, I envision this a lot like a sports car with analyzing the human genome, it's like looking at a sports car. When you first look at it, it looks pretty simple and elegant, but as soon as you start prodding around under the hood, you realize exactly how complicated this is. It's very complicated, and it's a lot of fun to actually try and sort this out. Now one thing, a few other things emphasize that made this project possible. One is that a lot of the technologies that were used to be able to analyze the human genome were all essentially invented over the last seven years or so. So this project probably could not have been carried out in the previous century. It really had to be done in this one. And so there's been a lot of new experimental methods that we're all brought to bear, and they've all been invented quite recently. Another big challenge is that because so many different researchers were involved in generating different kinds of data, we really had to establish standards. And as Francis indicated, it required a big community effort to be able to compare data sets and work with us. And you and I'll probably talk about this further. But basically, what we needed were standards so that we could actually compare the results from the different groups and get them integrated so that they made a lot of sense. It would be like two different people playing in the sports if they didn't have the same rules, it would be a mass. And in this particular case, we had to set the same rules so we could compare results and such. And so that's all I wanted to say at this point. And obviously, once all this data were generated, it required an incredibly intensive analysis. And that's actually the effort that UN has very, very nicely led. So with that, I'll turn it over to UN. Okay. So thank you to Francis and Mike. I'm going to reiterate many of the points previously made. And just to put myself in context, I'm really a computational person. I was, in fact, trained as a biochemist, but I am one of these computational people who was brought in, in some ways, to analyze the data. And there was a great interdisciplinary team spanning many, many countries and spanning groups, some who generated the data and some who brought their own sort of statistical methods. And we went from physicists through to mathematicians to biologists in the entire group. So I think the last kind of big experiment across the 1% of the genome was finished in early 2006. And long before that last experiment was completed, we established many analysis groups that worked together. We met physically a number of times, but mainly this is done by email and phone conference. And over about two and a half years, we therefore analyzed this data in quite a lot of depth. And it is very daunting to get your head around this set of results. And I'm afraid part of that is just that we are complicated creatures. Although our genome is simple to write down, there are only four letters, we are complicated. And it's not surprising that the understanding of this is also a complicated process. So what I'm going to give you now is this huge 10,000 mile view of what we discovered. And the main thing is that if you go back 10 years ago, when people first started sequencing the human genome, they were surprised at how little DNA was involved in protein coding regions. So only about one and a half percent of the letters actually make proteins at the end of the day. And until that point, proteins were the main thing that people understood. And people rather dismissively called the rest of it junk DNA. Now I think everybody who was in genomics knew that this stuff wasn't hanging around for the hell of it. And what the ENCODE project has really underlined is that the junk is not junk. It is very active, it does a lot of different things. And as Mike mentioned, one of the big surprises was that the junk, the regions, intergenic regions between genes seems to be alive, not only with regulatory regions, which we suspected, but also this transcription. There's a lot of transcription that happens across the genome far more than we thought of previously. Now there are now a myriad of little stories and depending on your biological geekiness, you get excited about these stories at different levels. So some people genuinely get excited about the fact that say, for example, for early replicating regions of the genome, rather sorry, late replicating regions of the genome, we have discovered a histone modification, which is histone-3 lysine-27 tri-methylation that is correlated with that. Now that excites, would you believe, some people in the biological community. I really doubt it's going to make front-page news, but it's one of those kind of details in understanding that is coming out from this process. But in making all of these detailed investigations, we also discovered a real conundrum. And the conundrum was the fact that we, alongside this experimental evidence on the human genome, we also generated the most in-depth mammalian comparative mammalian sequence. So not only sequences from genomes which we have genome-wide, such as mouse and rat, but some weird hedgehogs, the platypus, the baboon, a bunch of other things were targeted, we got targeted sequence just across this 1%. And this allowed us to define the set of bases that are important for mammalian evolution, where seemingly all mammals have needed this piece of genome to do something, and therefore it has resisted change over evolution. And the surprise was that many of the functional elements that we found on the genome were not conserved across mammals. And that was against our expectation going into the project. We expected maybe a little bit of things that were sort of human-specific, but for many of the regulatory elements and even more for this RNA, we're finding between 50% to 70% of those functionally defined regions as not being conserved across mammals. Now there's certain technical worries about whether there's some technical issue here. I will save you the details and reassure you that this problem is not a technical problem. And then you can go perhaps, so you have to have an explanation for what these additional experimental elements on the genome are doing that are not being used in other mammals. And one explanation is perhaps these are all the things that make humans humans. But to be honest, there's a lot of them, and that's against our sort of basic understanding of how common mammalian biology is. And it doesn't really fit with the diversity of different elements, many of which seem to be doing very basic, not merely mammalian biology things, but vertebrate biology things. And so we think quite a few of those are what's called neutral. That is that they appeared by chance over evolutionary time, so a sequence mutated to form a new structure that bound a protein to this DNA sequence. But that was neither to the organism's benefit nor to its hindrance. So it just hung around for a while. I often think of these as kind of gate crashes in a party, and they're just hanging around, taking the drinks, looking at the scenery, and not necessarily involving themselves too much in the business of a meeting or something like that. And so it feels like we have quite a few of these gate crashes that are hanging around over time. And that aspect of many of these elements being neutral are not either positively selected, that is important for the organism, nor in the negative selection. That is if you remove them, they will cause the organism harm. It's quite an interesting shift in perception for many biologists as well. Now we're not sure what the proportion is, how many bystanders there are versus how many people are doing, how many of these elements are doing very specific and important things. This is more a question of speculation and taste at the moment, but it gives you an insight into what are the sort of new discoveries we're learning with this data. Now even one can get quite excited about the sort of new bits of biology one's trying to understand, but let me emphasize that the utility of this data is much more basic. It's about what Francis mentioned about regions of the genome that come out of whole genome association studies due to genetic risk factors. So here there may be an element that predisposes someone for diabetes. One variant means that you're high risk for diabetes, another variant means you're low risk. And as Francis mentioned, many of these regions are not occurring in protein coding regions. In fact, they're occurring out in these regions that people previously thought of as junk. Now, what we have now is a much, an incredible catalog of elements to be able to say, to start to say, why is it that this part of the genome is changing the risk? And I think as ENCODE goes across the genome, we'll be providing researchers with a broader and broader set of annotations to understand how their biology and in particular how disease biology really happens and therefore hopefully get more insight into how to cure those. So I think I'll leave it there. And I guess it's back to you Ruth. Thanks, Ewan. And just an apology, I stumbled over Ewan's title. He is Head of Genome Annotation at the European Bioinformatics Institute. So now we'll go over to questions from the phone just to remind you because if you could say who is speaking when you answer a question and also to the journalists on the line, it can be easier if you direct your questions to a particular speaker. So yeah, over to the questions. Thank you. Ladies and gentlemen, if you would like to ask a question, please press seven on your telephone keypad. If you change your mind and wish to withdraw your question, please press seven again. All questions will be asked in the order received and you will be advised when to ask your question. The first question comes through from the line of rich advice of the Washington Post. Please go ahead with your question. Hi, thank you. I have two questions. One is, I'm wondering if the new revelations that sequence alone is only a small part of what makes DNA behave as it does undermines the value of the relatively simple genetic tests that are now being produced to help people predict disease and recognize the new therapeutic approaches to diseases. And secondly, I wonder if someone could address the question of what does this mean for the definition of the word gene? I think Francis, you should take the first one and I'm happy to take the second. Sure, good morning, Rick. Hi. I think the question you asked is a very interesting and appropriate one, basically reflecting the increasing recognition of the importance of epigenetics, that is the things that modify the function of DNA but don't actually change its sequence. And in fact, ENCODE as being published today is perhaps the most insightful detailed look we've had at epigenetic changes that affect DNA function. Many of the things reported in this paper, especially the ones that look at binding of modified histones that basically determine whether a particular area of DNA is available for transcription or not. DNA's hypersensitive sites is another very useful indicator of whether a particular part of the genome has an active functional role. All of those things are being cataloged by ENCODE on this 1% of the genome in a much more rich degree of detail than had previously been possible. But your question really was, what's the relevance of all that then to genetic testing? I think if we're talking about a hereditary contribution to disease, while there may be a few examples, and you can count them on the fingers of one hand, I think, where epigenetic changes are proven to be heritable, where you have a DNA methylation change that predicts a disease which appears to have been passed through a family. There's a couple of instances of that, for instance, with colon cancer. And even there, there's some considerable skepticism about what is being detected in terms of an epigenetic change of the DNA is actually reflective of some sequence change a little further away that got missed. So I think when it comes to hereditary predisposition to illness, DNA sequence is still going to be an extremely valuable window into that if we can build the catalog of information about which sequence changes are associated with which risks. And that's not going to go away. When it comes to another approach, as one might wanna take, for instance, in assessing whether a particular tumor is going to be a bad actor or whether it's going to be relatively easily treated, then DNA sequence will be very useful, but certainly we will want to be taking advantage of all of these epigenetic marks that tell you what's going on at the functional level. And of course, you can already see that happening with some of the array studies that are used to tell whether a breast cancer, for instance, is going to need chemotherapy or not. Those are basically looking at gene expression. That in turn is an indication of the epigenetic state of the genome. And when you combine this, ultimately, with both an epigenetic look and a DNA sequence look, then I think you'd have the richest possible array of data for looking at something like cancer. But again, to come back to, I think the fundamental question about whether DNA sequence data is gonna be useful in the future, certainly for hereditary risks, it will remain the mainstay. And you had a second question. Yeah, about genes. So let me answer, let me try and take that one on. I mean, Mike, who's speaking? No, it's you and who's speaking. Okay. Yeah, sorry, you and the speaking, yeah. This is one of those sort of Wittgenstein-like questions about how one uses language. Before molecular biology was even invented, the word gene was around. And it meant something then, and that was about the way information was transmitted in what was very discrete units as one monitored the information between generations. In the kind of revolution, the 70s of molecular biology, it was quite impressive to see that that basic genetic definition of the word gene matched seemingly a very clear-cut molecular biology definition about a set of RNA transcripts that are on a particular point of the genome. There may be multiple transcripts, but they're all localized in one place. Now, what we've done here, I think, and there's a number of other papers that have supported this, is that that view of transcription is not really correct. Transcription looks like a much more complicated, intercalated, internet-worked set of transcripts, some of which we had previously recognized as protein-coding transcripts. Now, the fact that transcription is much more complicated, I don't think removes the concepts of what a gene is. There's still a really rather discrete unit that gets passed between from generation to generation, and the evolution seems to care about. When we look at the way usually quite divergent species, this sort of concept of a gene still seems to be a useful gene. What I think is gonna change is that concept of a gene is gonna have less of a clear-cut correlation to the molecular biology, and so we'll just have to use our words carefully and know when one's talking about transcripts and transcription, and when one's talking about what's always been a much looser concept of that of a gene. I hope Mike would agree with that. Yeah, I think that's quite true. Thank you. I would like, if I could add something to Francis's comment, though, that I think what this project really does is it helps define, this is Mike Snyder speaking. What this project helped us do is help define the functional elements, which helps us zoom in on where to look for differences in sequence that might relate to disease. So I actually think this project helps us interpret the genome a lot better, and when combined with the sequence, it'll really further our ability to zoom in on the relevant regions that might be leading to disease. It's something Francis touched on by one emphasize. I totally agree with that, and I think, again, what's likely now to happen with these discoveries about genetic variations in common disease are illustrative. When people report, as they have been doing in great profusion of variations in the genome that clearly are predictive of a heightened risk of a disease like diabetes or breast cancer, basically you're identifying a sequence variation that somehow, some way, confers that risk, but to understand the functional elements of the genome through this encyclopedia is the critical step to try to make sense out of what is otherwise a statistical statement. We want to go from statistics to functional understanding, this is the way to do it. Thank you. The next question comes from the line of Roger Highfield from the Daily Telegraph. Please go ahead with your question. Hi, yeah, it's, we've had all these studies, a genome-wide studies coming out in the last couple of months, which as you said in the sort of preamble, some of them are linking to these regulatory regions that we used to ride off as junk. The fact that they don't link to coding regions, do you think that's gonna set back efforts to turn understanding of these genetic linkages into treatments? So this Francis Collins, let me take a whack at that and see if others want to add. I don't think this is particularly surprising. We expected that variations that play a role in common disease would be relatively subtle. These were not expected to be knockout blows sustained by genes as you might expect to find, for instance, in a highly heritable condition like cystic fibrosis. We expected these would be subtle changes that altered the function of the gene a bit enough to confer a heightened risk but not a certainty by any means of illness. And I think there was a debate until we had the data about whether those would be subtle changes in the coding region or subtle changes outside the coding region. And clearly as the data is pouring in, it looks as if the majority of them are going to be outside the coding region and in the regulatory regions. Now what that means for treatment, I think is hard to predict. One can imagine that this could actually be a good thing because it would tell you that there's a subtle tweaking of the expression of that particular gene and therefore that particular protein in a person at higher risk. They're making a little too much or not quite enough and the ability to compensate for that by providing a drug, a small molecule is certainly something one could approach with some enthusiasm of the likelihood of success. You could contrast that with perhaps a circumstance where the finding in a particular disease is that the protein being made is toxic in some way and that might be more difficult in some ways to try to compensate for. What it does mean, and I think this is a serious part of it, it's gonna take us a little longer perhaps to sort out exactly what is the mechanism of disease risk for those situations where the causative variation is not in the coding region but somewhere nearby because we'll have some work to do to figure out exactly how that works and what is the consequences. The gene expressed too high, too low, in the wrong tissue at the wrong time. What's the actual detail here that a drug would need to compensate for? But that can get sorted out and once it has been, I think it is not clear to me that the pathway towards therapeutics is gonna be particularly altered in terms of the time required depending on whether it's a regulatory or a coding change. Actually, could I just ask one little follow up? I remember famously when you unveiled the genome in the White House, you likened it to an occasion of worship and you said it was humbling and awe-inspiring to look at something that was only known to God. Francis, why did God make it so awfully complicated? Well, I think we are intended to be complicated and we obviously are. And I think you can't imagine a circumstance where a one-dimensional script of three billion letters would be sufficient to generate the kind of awesome complexity of a human being without a great deal of elaborate regulatory network operating upon this instruction book. So I don't think it should be considered surprising at all that what we have uncovered is in fact complex and I think we all are, from whatever our philosophical perspective, rather odd by what we're seeing, perhaps a little daunted by the complexity of trying to understand it, but also I think feeling really fortunate to be here at this point in history, looking for the first time at some of this amazing complexity of how the human genome works. Thank you. The next question comes through from the line of John Lowerman from Bloomberg News. Please go ahead with your question. Hi, thanks for taking the question. So two questions. Does this term now or is there some DNA that we still can call junk? And then I have a second question, which I can wait to hear. You got cut off, I couldn't hear. Okay, we missed the first part of the question. Okay, so the first question is, does this mean that there is no such thing as junk DNA? Is the term junk DNA junk itself? Can we just get rid of that term? And then I have a second question, which I can give later or give now. Okay, well I never, this is you and Bernie, I never liked the word junk for DNA. It's a very dismissive term of something that we don't, we just basically didn't understand. I still think there's a class of DNA which, that's user, which definitely looks like it's parasitic. These are these repeat elements. And in different genomes, they've had an awful lot of fun copying themselves around and about. Now, as parasites sometimes do, perhaps some of them are useful. There's a nice little sort of side story there about whether they've actually helped us out. But there's certainly a class of elements that biology relatively well understands, which is parasitic and copied multiple times. The rest of the genome, I just don't find the word junk very useful. And for the rest of it, I don't think it is useful to talk about junk. What I do think is something to think about is this concept of sort of bystanders, about bits that sort of are on for the ride, are just sort of hanging out at the moment, but in some sense may become useful in the future. If we sort of run evolution forwards a couple of million years, then maybe that region of the sequence may become useful. And that concept of neutrality, which has been around for a while for about the way proteins evolve, is I think a new idea in about a more general thing about how these different elements do appear and disappear. So I guess here's my, can I just ask for clarification here? So what percentage of the genome would you say that we now think is actively involved in making a human being compared to what we thought before? I don't think we know the answer to that, so there's still a lot more work to be done. This is Mike Snyder speaking. I think, I mean it's clear that 5% of the genome is constrained, meaning that it cannot evolve, so one could use that as a number, but I don't think we really know for sure. Again, 1.5% of codes protein, a certain fraction of codes regulatory elements, we have not mapped all of the regulatory elements yet in the human genome. This is I think a great first start, but there's still much more to be done. And we don't know, although there's a lot of these new transcribe regions and new RNAs that are appearing, we still don't know exactly what those do, whether they have a function or not, and whether they are contributing. So it's still not clear, there's still a lot more to be learned. Yeah, this Francis Collins, let me try an analogy on you. I think the point that you and Maidboson has prepared remarks in the answer to this question is a really critical one, maybe one of the most surprising fundamental findings of this paper, namely that there is a lot of biological activity going on in the human genome, for which we cannot show evolutionary evidence for constraint, suggesting that this is in fact, sort of like clutter in the attic. It's not the kind of clutter that you would get rid of without consequences, because you might need it. And if natural selection comes along and needs to operate on something, you're much better off if you've got clutter in your attic than if it's spick and span. But then we come to a definitional question. I think what you could say is that most of the time, the human genome is operating on the first and second floor with maybe 5% of the genome doing whatever needs to be done in terms of daily activities. But over evolutionary time, a much larger fraction of the genome, the stuff that's up in the attic becomes important, is probably responsible for the fact that it's got us to where we are in terms of complexity, and it's still there, waiting for natural selection, perhaps, to call upon it. Okay, and my second question is, does this mean over the years, we've seen various comparisons of humans, from one human to another, and the amount of genetic similarity among humans, as well as the genetic similarity between humans and non-humans. And I'm wondering, does this change any of that? Does this mean that, in fact, we're much more varied from one person to another? Is that a possibility? I mean, this is you and Bernie here. I, as part of being head of sort of genome annotation, I kind of look after many, many species at the EBI, a whole range of little beasties, from chickens through sea squirts and other things. The striking thing about humans is how similar we are. We are remarkably similar, compared to nearly any other sort of species that you go out and look at, we are incredibly similar. And not only are we incredibly similar, but to be honest, it's my opinion that the only sensible way to view our genetics is of one population. Although there are certain aspects of that where we pick up on certain things that we feel are important, such as skin color, when you actually go back and you look at the genetics, these distinctions are not useful distinctions to understand the genetics. As a kind of the way we think about the species, we just seem to be one population. Now that said, it is the case that these new regulatory elements show more diversity than the protein coding genes. That is that either because they genuinely contribute to more of the sort of functional differences between individuals, or because many of them are perhaps these bystanders, the clutter and the attic, it doesn't matter that they're more diverse than the bits, the gram floor and Francis analogy has to be quite locked down. You can change your attic without sort of much consequence. But that has to be seen in the context, even though these are the regions which show more diversity, that's got to be seen in the context of a remarkably little amount of diversity across us as a whole species. And us as a population, we really do I think look like one population far more than we look like separate populations. And that's sometimes hard to get your head around, but that's the way the data is. So no increase really, no change in the amount of genetic diversity between say one human being and the next. Not really, I mean just to say that where we do see these regulatory regions showing more diversity, but I wouldn't want to interpret that in some strong implication about what the differences between individuals are. And any times we talk about the differences between individuals, I think the thing to absolutely stress is that we are far, far, far more similar to each other than we are different. And you should meet a C-squirt to really understand what difference between individuals mean. And that's a whole new world out there. Thank you. Thank you. We have no further questions. Just a reminder, if you would like to ask a question, please press seven on your telephone keypad. Okay. We have no further questions from the phone. Okay, great. So thanks so much for dialing in and a big thank you to all the speakers. Just a final reminder that the embargo on this is 6 p.m. London time, 1 p.m. US Eastern time today. Thanks very much. Thank you. Thank you. Thank you.