 So while people are sort of digesting their coffee in that, one of the challenges Debbie that you alluded to and Wendy as well is that these do require very large collaborations and publication is sort of the coin of the realm here and there is great angst in terms of how one gets appropriate credit when you are the 103rd author in a 200 author paper and maybe I guess you, Debbie, you were involved in several of those from the hat map and that. Can you give some comment as to how the people who were involved in that, how that has affected their careers and... It hasn't really affected their careers that much because when they've participated, they've been full participants. So a hundred people worked with them on this and they could ask any one of a hundred individuals for a reference letter for promotion or any other thing and they could, so I think that, and people know exactly what they contributed because it's out in the open and it's clear. Also, people can come in from the outside and say, hey, I have a great idea. I want to try this and there is a feeling of, ooh, this sounds really good and people inside say, well, let's have this person come in and collaborate with us too. They don't even have to do that. They can just do it on the outside. The other thing is that the data goes in the database and anybody can touch it before a publication. What I want to say is that all the journals have also been respectful. You may not know, but there were many papers submitted on the human genome sequence before the people who sequenced the human genome actually had a paper, but they were similarly rejected from the journals. So this does happen and by forming large consortiums, the major journals are all aware that this is an issue and they make sure that they quality control what comes into them from a particular group. So I think that people should think about that. Do you want to comment on that? Well, I think that the promotions committees are hopefully understanding the changes in the way first authorship works and understanding that these are large collaborative studies that many people have contributed significantly to and that now with NIH allowing multiple PIs on a grant, we're also seeing more and more manuscripts with multiple first authors with the asterisks saying these two or three authors contributed equally. So I think we're just evolving in that arena and I'm optimistic that it will continue to evolve as we continue to have these large collaborative efforts. Hi, Vasant from Framingham. Just to add to the discussion, I think one of the strata we need to protect are the young investigators because they are evaluated on the basis of first authorship or whatever. So the consortium model works very well for the senior people. So in parallel with the progress, I think some of the evaluation criteria for the young investigators also need to be changed so that they are not just based on the number of first author publications they have. Otherwise, they will be left out of this important field, which is moving ahead. And some of our fellows have raised this issue that when we participate in this, what do we get out of this? Would we be the first author? And that really is not very practical when we are talking about the consortium model. So it would be nice if there are strategies to protect the career of young investigators who are interested in genomics and population epidemiology. One other comment is that we also realize that there are going to be so many things to write about because there's so much data that there should be something for everyone to be able to have a first author paper on even if it's not the major finding that comes out of collaborative work. So I think we find that in these, like in MESO, we have hundreds of paper proposals, and then we count the number of papers that come in. So there's a wealth of potential opportunities, and so although we all sort of worry about it, in reality there's more than enough for everybody to do. I just want to reiterate that last. There are so many paper opportunities in these types of, there's no limitation. And many of the junior people in genomics have as many first author publications as they can write. And they do participate at a level as junior investigators that, frankly, many people would kill to be a part of. So I think there are many advantages to young investigators. Yes, on the floor, please. Hello. Dr. Dickerson talked quite a bit about non-coding sequences that may regulate transcription and translation. And I was wondering, in understanding these better, what are new techniques that people need to think about to start understanding the importance of those regions? Well, I think we need to tackle some of those regions. Unfortunately, we tend to tackle those regions that have a gene first. We often can predict and don't really care about mechanism, even though we know mechanism is important for treatment. For example, in the CAD link that everybody saw, you might have promised I'm not. There's a microRNA in a couple genes, but it's mostly non-coding sequence. We don't even have a complete picture of the sequence variation in those regions. For example, we've never sequenced a complement H factor in multiple individuals with macular generation with complement H. So we need to go in and look at the variation in this region, get a more complete spectrum, and then begin to apply some of the tools that we know, like transcription profiling across it, but we can't get all the tissues, predictions of transcription profiling, multi-species comparison, and begin to actually do animal model studies on some of these if we can. And those are very costly studies, so we need to know a lot more before we do that. But I think with projects like INCODE, we go across and apply everything that we know in a very large-scale way. It won't be perfect, like the HATMAP, but it will capture a lot of what we know. But we need to know more about even the variation that we have inside the genes and regions to explore that. Debbie, could you comment a little bit on how population studies might be useful in this kind of work? For instance, in the sequencing you talked about looking at the tails of the distribution, but are there other ways? People who have one variant who then you kind of consider them to be almost a knockout and look for, you know, a human knockout and look for other sequence variations in them? So it's unclear actually how to do these sequencing right now. Even when you have a hit, do you just take your affected allele from multiple individuals and compare it to what you consider the wild type and what is the wild type? I'm not sure I know what a wild type human is. When I look around the room, boy, we're wild. I'll say that. Me too. And so I think that knowing what to choose, there are differences in if there are things, there's something called there are ways using multiple populations. If you have a phenotype that's present in multiple populations, for example, you can, we know that the association between SNPs in different populations, for example, individuals of African ancestry have less correlations among SNPs than individuals of more, of isolates from that like Europe or Asia. And so you could think about looking across populations to narrow regions. You can think about looking, but I think early on we should sequence as many people as we can afford. And I think that's one of the areas that people are getting all these hits, but they're not really thinking about how to actually go in and look at these regions. We're just going to start piling these things up and not really have a clue in terms of what the actual functional alleles are. And I think people will believe that the functional allele is the SNP that was on that chip. And most likely it's not. That's an excellent point. And I think, too, we ought to be thinking about how we can use population studies to do, in addition to sequencing, are there ways to do some transcriptional profiling? And I confess, I don't know what samples you need to even do that. So are those just DNA samples you could get, or do you need to stimulate someone? Or is it RNA? No, no, no. Most of the time what we have is B cell lines sometimes, or blood. And that's just one type of tissue. We don't often have cardiac biopsies of the same people that were studying for cardiovascular disease, for example. It'd be great if we did. I'm not sure we'd get too many donors. But we do need to think about making repositories of some of these tissues over time so that we could, for example, even if we had 100 individuals that we had a complete genome scan on, we could think about expression profiling on those 100, and that might contribute more to the knowledge database using the best technology of its time. So these could be independent studies that we could develop over time. We don't know what the appropriate tissue is, because we're not really sure when we get our hit where it's expressed in, or if it's even expressed in the genome, although we know expression is very pervasive, almost every region is expressed. And we have no idea if that's meaningful. Yes? That actually brings up my point, my question, which is what about the role of abnormal methylation and these epigenetic kinds of mechanisms? And this is something that the ENCODE project is getting at, and profiling on a large scale. They have the patterns of methylation. They have patterns, they have almost every pattern on 1% of the genome. But what they did when they were doing that was get away that they could, and they're now thinking about how they could put all this across the genome. Now, methylation also is very tissue-specific. I mean, there are many things that we need to get at in tissues that we're not getting at right now, because we don't have the archives of the tissues. But I think if we do even have a small semblance of these tissues and could develop it over time, it would be extremely important. I think you'll see eventually some time, I wouldn't be surprised if there was something that allowed people to donate their bodies to science for like an anatomy, a genome anatomy project or something like this. Yes, please. Can you comment on the portability of the HATMAP for the populations not represented? Yes. So there's a couple things I should say about the HATMAP. The HATMAP gives very good coverage of many different populations and people have looked at coverage going to different populations. Is it effective for other populations? It drops in a little bit in its ability, its transability, maybe by 10%. But what's important to know is the chips are not actually mirror images of the HATMAP. They also have a drop in how much information that they capture. So you get obviously drops in the amounts, like 10%, 20%. The coverage is not as good for all populations. The drops are about the same, though, from where you're starting from. But obviously we have significant power to detect. If we have a scan on individuals, we can only, we can add to that and get better coverage as we understand more about the immunogenome. The HATMAP is just a starting point. It's not the end point. We can just continue to add to our data sets. But people have looked at portability. It's extremely portable. And we might note that the original three HATMAP populations, which were the CEU population, I forgot what it stood for. But anyway, the European Caucasian population, the Yoruba in Nigeria and the Hanshinese and the Japanese were the original three populations. But those have been recently expanded by seven other population samples. Again, I believe this were all trios, if I'm not mistaken. They could be. But it was to get at this issue of trans-ability. Many people though have tested this independently in their populations and have seen that it's quite transferable and I believe it will be for most of those populations. But we will have actual tests of it, which is really important. And keep in mind, too, that the HATMAP was looking at common variation and so that's probably why it's so transportable and it may be that some of the rare variants will not be as... I just want to bring up something, if I can, on the CEFs. The CEFs actually, for those of you who are interested, were the original families that were used to build the linkage maps of the human genome. So they became carried over into building the association maps of the human genome because we had cell lines from these individuals and unlimited DNA resources. And the individuals had mostly consented to do the types of studies that we had already planned to do. So it seems like an unusual population, but it was the population that was used for the linkage maps that were developed. Yes, please. What about the centromeres and telomeres? Is ENCODE going to try to tackle those at all? The structural variation project may get at that because there's a lot of structural variation there. There's probably the most difficult regions for us to tackle, and they won't be great regions for these whole genome sequencing strategies that we're coming up with because they're the most repeated. Those regions are extremely important. We're getting more and more information, but those are the tough regions. What you should know about genomics is we do whatever we can do in the quickest way possible. And getting 80% of the data is always easy. We have about 70% that takes the next 20 years. It's like we have... Jim could probably tell you, what percentage of the genome do we actually have in the database now? It's not every single nucleotide, but... Yeah, right. So, because we're missing some sequences around the telomeres and centromeres, we know genes exist that are not actually on the reference sequence that we have. Where are those genes in the assemblies? We know that it's not perfect. So over time, though, we go after more and more of those regions. And actually, those regions are the starts of some great genomic careers. I can think of a young person who started out working on these regions of structural variation who actually was an assistant professor in the genome, Evan Eichler, and now is the leader of the structural variation project because that was... That's how he started. What? That's how he started. Yeah, that's how he started. And because he was looking at these regions of the genome that nobody else wanted, these segmental duplications that everybody wanted to stay away from, and he made his career around that. So, you know, there are lots of exciting and interesting findings that will lead to brand new projects in areas. And I think it's the same way for epidemiology. I think that these datasets provide such an opportunity for young people to really find an area that can be their own and that they can take forward. I would agree, and maybe even come back to a comment you made at the beginning of the discussion session. Debbie, when you said that you felt participating in the HATMAP had no effect on people's careers, I assume you meant had no adverse effect on their careers. Because I can think of several very young folks, Mark Daly, Paul DeBacher, Isaac Peer, and that who basically, you know, had questions that they sort of wanted to play with the data and late at night or whenever else, came at it and looked at it in a very different way from anyone else and really came up with some wonderful insights and perspectives. And you don't have to, you know, even come up with fantastically brilliant things to be able to contribute dramatically to just looking at associations and controlling for confounding and other things that epidemiologists do extraordinarily well and that we have the patience to do because, you know, the substance of many, many careers. And so I think rather than being concerned about our young investigators, we should find ways to fund them to do these kinds of analyses and encourage them to put their data into databases so that others behind them can do the same kind of work.