 I would like to propose that we move to our interest speakers, and put this at a high time. We have a good discussion, but we move on to the main discussion now and to address the specific points that Paul has raised in the context of that wider discussion. I think that it might make sense as we address what works and what is missing to maybe divide it into the population variation and then the patient variation associated with benntyn. is it associated with phenotype, and maybe if we start with population variation. I think it would be good to get a sense of people's satisfaction with what's available in terms of population variation, both in terms of single nucleotide variants, but also actually all forms of genetic variation. Is it in the form that is dense as people would like? Is it in the form that people would like? Has it come associated with the annotations that people would like? I'm sure Steve would be very interested in what's the single thing that you would like to see that isn't there. So I just want to add an informatics comment that the NHLBI exome variant server that's hosted out of the University of Washington on the Seattle Seeks webpage has 5,200 exomes available for people to look at, to look for their variants. Nass. Hi, I'm Nass Rahman from the Institute of Cancer Research. Yes, sort of going on from that and this question of the things that we often think is known and robust but actually isn't. One of the things that I think often doesn't get talked about but I think is very important is that aspect of actually the annotation of what the actual variation is. So we increasingly talk about that it's going to be easy to detect these variants but what do we do about them? But actually the calling of those variants, the annotation of that variant says are really, really big problems. I'm sure anyone who's been doing exomes will see deciding which transcript to use, deciding how you're going to call it and then if you're historically going to try and get things out of the databases there's been quite a lot of changes in how we've defined those clinically which reference we've used so that if you look historically for a number of genes you'll see the same mutation called different things and that's obviously going to impede our ability to maximise the use of that information. I think there's been quite a lot of movement to try and standardise that and often that's with relevance relation to the genomic reference and I can see that that's probably the way it has to go but what it does mean is that you end up with very, very long strings like RS numbers that are impossible to keep in your mind in a clinical way so a mutation that you may feel quite comfortable about now it's some seven digit number and you can't remember it but I think this is a really important problem that if we can try and standardise now we're actually going to annotate variations that we're going to need if we're going to clinically interpret them. Just to follow up that point, is the HGVS nomenclature a comfortable system as an alternative to RS numbers? I mean it does give you some transcript contact. I mean yes, I mean I think, well in some ways I mean clinically I think most of us use that, that has changed a lot of things in that has changed over the years and I think there are a number of ways in which that doesn't naturally integrate very well with a genomic based system and I think there needs to be a coming together that will allow and it's not automatable in a way sometimes I think that one would want for a simplified system on genomics I'm sure a lot of people have different views about it but it's a problem and deciding what the gene is obviously is a problem. So what degree does the LRG initiative that Paul mentioned address some of the concerns that you have about that? I'm not sure. Paul, do you want to just, I mean because I sent some over out there. The goal of the initiative is to create a stable reference sequence that's something that is not changing, not versioned, is not an ensemble transcript, a RefSeq transcript, something that over time may have been reported in a way that in fact doesn't map to today's reporting and gives the ability to report from one version of the human genome assembly to another as that changes and collect information that exists in the genome coordinates and put it back into a smaller set of locus coordinates. As I said, this is an informatics solution. It largely does solve these informatics problems, but there are I think still challenges to adoption. Howard? So I've actually got two points to make now. One is in direct response to this that while I think it would be great to have a single reference sequence to find it and be done. I'll report on a conference that NHLBI hosted a couple of months ago where we discussed the same issue and the general consensus of that group was that while it would be ideal to have a single, never-changing reference, the realistic solution or thought process is with evolving technology, with ever-increasing knowledge, that may be utopian and it may require that references be date and time-stand, methods of sequence generation be annotated so that as things inevitably evolve, it's possible to go back and look at what it was compared to. A reference standard is critical, but it probably is in the real world going to have to evolve. The other point that I would make, which kind of follows on that and the previous part of the discussion, is that thinking as a clinician, I don't really care if it's an RS number or if it's C677T or if it's the word thermal label MTHFR. What I need to know is how do I translate from one language to another? So one of the databases that I think is missing is something that very easily lets me move from one to the next so that no matter what a patient comes to me with or someone else comes to me with, I can say, oh, that foreign language that you're using translates to this language that I'm familiar with, and then I can go out and figure out what to do with it. I think that that's where the... I'm so glad to hear that you're working on standardized vocabulary and structured language and this sort of thing and obviously with you and with NHLBI both working on this, that goes a long way to create the functionality at the user side to be able to navigate more seamlessly because in my vision, the report comes into the electronic health record. There's an interpretation with the length that, again, the clinician doesn't need to understand the language but can click on that link to get to where they need to go to understand how to use it. So my question is, obviously you're two big players in developing the standardized vocabulary and structured representation. Are there major players that contribute to these types of databases who aren't a part of that effort to develop a structured vocabulary and so what do you think we would need to do to bring them in so that we can all be working off the same page? So I think in general, and it's... I think it shows both the importance of the problem but there's also an unfortunate nature to it that there are, for almost every big player, there's an effort to create some standardized something. This actually shouldn't probably be a surprise but we certainly have worked closely with a number of different groups and we continue to do so. The collaboration between EBI and NCBI I think adds weight and a center of gravity to things that makes it more difficult to ignore but for us to be effective, we do have to... We can't sit in a room together, a whole bunch of informaticians and come up with the answer because the answer won't be useful. It might be useful but it will be more useful if we are properly integrated in the right communities to do this. I think I can speak for Donna, she may want to speak for herself. We have made a number of efforts to contact these groups and we are working closely with them and be they groups such as local specific databases, people who work with diagnostic and run diagnostic labs, people who are also trying to aggregate information for use in clinical interpretation. So as co-chair, I'm going to from time to time just kind of pull out something that I at least think might be an important action item to take forward and it seems to me that this would be something where we would need to explore the role that perhaps NSG or I could play as a convener to bring the different people around the table because we definitely don't want to be in a VHS, Betamax situation and let the market decide. I think that you're absolutely right. We need to get the right people around the table so that we can do it and make sure that we don't have a sort of a balkanisation of this whole process. So before we kind of move away from the discussion of population variation and get into the possibly thornier issues of the clinical information. So now it's people's opportunity to give their wish list to Stephen Paul. So are there any burning desires of what people want to see from population variation resources? Bruce Blumberg, Kaiser Permanente. I want to follow up on a point that Mark made earlier that is that we're not all from Wisconsin. We need a database that will be just as relevant for underserved populations as for the people who've had their genome studied so far. And stating the obvious when I look around the room, the people in this room aren't very representative of the population of the United States either. So we really need to be careful to provide a database that's relevant to all populations, not just white upper middle class populations. So I guess one point I'd like to make relating to that is that we saw Les show that a frequency of one and a half percent I think was the cut-off used. Now it's one thing to study a population and be confident that you've discovered all the one and a half percent variants. It's another thing to be confident that you've got the frequency right so that when you say it's one and a half percent variant, it is actually one and a half percent variant. And I think that's something that needs consideration is actually the depth to which we go within each population to be able to make those statements carefully. Just on those reference populations, people have been talking about the reference genome. The reference genome actually now isn't just a single version because capturing variants is one thing, but there are regions where there are completely alternative alleles in the population and the reference genome is beginning to capture that in the form of patches right now. So ultimately maybe it has to become a whole graph structure and that's what you should be doing your analysis on. But the individual point variants, but there's also this internal structure of the reference which people should be aware of. Steve, last word on the population variation. Well, there's another dimension to this that DB SNP has struggled with. There's a tension between trying to be ecumenical for all variation and then realizing that some people are interested strictly in germline variation and others are interested in somatic variation, both of which could be benign and just neutral polymorphism. That comes up in sequencing. Right now we're trying to put all of it in DB step, but clearly partition it so that you can get through the downloads, the things that are germline, the things that are somatic coming in. I'm just wondering from a group like this is that the right approach, are there other categories? We're going to face this again with epigenetics and marks coming in. Is there thinking about how we want the repositories to accumulate these data, separate resources, one place to go but clearly identified, just some feedback on that would be helpful for planning. One point on population databases, one of the challenges we have is we look toward databases like DB SNP for what the control population is, but in fact a lot of the populations from which that data came are diseased populations and when you dive into that data it's very, very difficult to figure out where it came from, whether it was part of a disease cohort or whether it was part of a true control cohort which in and of itself is challenging because sometimes these are younger individuals when you're trying to study a late onset disease and that's very challenging. Ways to have a little better description of what those populations are from which this large scale data sets are being submitted would be extremely useful. Another minor point in using DB SNP particular RS numbers are not variance their locations and so we have numerous examples of both a pathogenic and a benign variant mapping to the same RS number because it's a location with both a C to an A being pathogenic and a C to a T being benign and makes it very, very difficult to decipher what's in there so on the population level those are some of my wish list. Along those lines it's pretty easy for me to have an RS number to go into DB SNP and find the information I want and they can confirm, you know, in the HGVS nomenclature with it but my analysis programs don't work that way. They give me the HGVS nomenclature and I can get to the RS number but it takes me about three or four clicks to get it in one click so even though yes I need a translation between the two as Howard was saying I'd really like just to work out of one nomenclature. Right, so we move on to the variation to attach to phenotype so I want to just raise one kind of very high level point just to gauge people's views on it is we've been talking about clinically actionable variants rather than clinically actionable collections of variants or clinically actionable genomes we've been talking about very variant by variant approach does it concern people that we're not taking a genome perspective we're taking a variant perspective? Howard? So, yeah, duh. Wait, there's more. What I would love to see of course we need the research to tie together one sequence variant with 10, 50, 100 other sequence variants but at the next scale up we need to tie that into epigenomic variation microbiome variation the next cool thing is going to go I forgot copy number variation whatever the next cool genomic or proteomic thing will be but that's still not enough we have to tie that to environmental variation let's not call it the viral gnome please but as far as database construction assuming for the moment that we can snap our fingers on all that data exists what I would love to see is smart informatics people build these discrete databases in such a way that they all speak to each other seamlessly behind the scenes that I don't have to look at and I can build my own search clinically or my own resource clinically that says show me the sequence variants and the microbiome variants among smokers and just let me put it all together and tell me what I want to know and cross reference all of that so that if I want to drill down and get all the way back to the primary literature so I can go back like I said was that a reliable study that it was based on before I came up to the clinical action information I can do what if I want and if I just want to get the answer and go I can do that too so maybe by next week lots of nodding heads when I mentioned the question though Howard McLeod from University of North Carolina I would strongly encourage us to not build one database I think that we're trying to build something that fit is purpose built for a bunch of different purposes and I think that was called the UGO and it didn't work very well but even the Hummer is no longer in production so we can have multiple databases that just have to interact with each other there are multiple ways of translating from Italian to English none of it makes sense but it's still there are those ways so I think we need to get away from one we're talking about variability and we're trying to build one size fits all and it's okay to have something that's geared towards clinical application and something that's geared towards genomic science and as long as we are thoughtful on how we build it So you don't know what to do but to build on that I think that's a really important point for the meeting as a whole is that the purpose as I see it is not to try and create the database because that will be doomed to failure particularly since we don't know where 90% of the heritability currently lies It's something like that we're in the same boat as a physicist who somehow misplaced 95% of the universe but I think the issues that are salient what I would like to see come away from the meeting are an identification of we know what we currently have in terms of resources where are the big hairy gaps that we don't have and of those big hairy gaps which are the ones that would be most appropriate for NHGRI to take some ownership on to say we need to fill that gap and which are the ones where we'd say this is an important gap but this probably would fall better to this group to be able to do it with the underlying assumption that there will be the ability to aggregate then I think that's something that Kent is going to be talking about a fair amount in his talk about how that type of multiple database integration system will work I also think it's important to think about how this is ultimately likely to be used I mean if this is going to be used in as I suspect it will be electronic health records which are increasingly more and more commercial products it's going to have to be done in a way that there is a place that these commercial EHR vendors can point their software at and on a regular basis get feeds so it's not going to be a clinician going and looking in a database it's going to be a clinician looking in electronic health record where the electronic health record system has taken that data in and through some magical process that we all still need to think about has presented them with the information they need to see so it's really a big step to go from being able to look in ensemble and actually get a regular data feed into an electronic health record system that a vendor is going to use Rex I agree with you but I think I also hear around the room that we still need or there's a reason to develop some sort of common resource that has sort of the common clinically actionable variance to date that could be useful for the clinician that does need access to the EHR Andrew Andrew because you have to build that anyway the underpinnings for any of these pipelines actually ensemble Paul mentioned this predictor thing which already is a service that is there on top of Andrew Johnson NHLBI in terms of thinking about areas not currently covered I think it was alluded to in one talk that phasing is something that maybe not well captured and could be important at least for some genes so I've been interested to hear from Paul or Steve or others in terms of thinking in a forward perspective genome sequences may allow us to to phase variance over long distances but then that's going to be harder to store in the databases and require a different framework So I could just quick reply I mean what we're doing now is looking at genotypes as a high throughput deliverable magnitude larger than SNPs because it's the product of every one type of genome but coming in with these data are phase information and so we are looking towards storing it as haplotypes as it has been doing data reduction on that dimension where it's delivered 1,000 genomes is probably the leading project doing this and it's a little easier than in a clinical context looking with deep population samples so I don't think the methods are there yet but trying to get the infrastructure worked out and how we would exchange that information as the necessary next step and then I think we can piggyback clinical linkage interpretation on whatever we've worked out for the big population projects Bill, can I ask, I like the way this conversation is going because I think it is going to be a network of networks, we have to understand that we're going to be using modern network technology just like the finance industry does just like the entertainment industry does well, I don't know I can put all my financial records into mint.com and I can get some pretty sophisticated analysis out and just two other relationships to that one let's please not make be dependent on EMRs EMRs are undergoing massive consolidation I'm not sure that the ones that exist now will exist 10 years from now they are not nimble, they are not flexible to do this. What they will be is one source of information that we will have primary source of information that we can combine with other sources and I don't think it's too early to put one other resource on the table that hasn't been put on and that's the citizen of the United States citizen of the United States is increasingly going to own, control and distribute their information and they're going to do that with genomes. Genome wide scanning is going to be on people's Facebook pages to exploit the fact that people have time opportunity to be able to do this sites like Ancestry.com have 26 million family histories that people have invested time to collect and and to date from various sources. There is enormous potential out there, they are a resource Do you have a proposal how that potential can be captured? Sure, you take the capabilities of an Ancestry.com and you begin to transition into medical histories they're interested in Ancestry DNA and ethnicity DNA which is going to be important but who knows more about their epigenic, their environmental exposures than the individual does that individual can anandate their individual thing and most people are altruistic they would like that information to be used in an aggregate source to advance research so there is an agent out there that can help us build one component of this network of networks that has the time and interest and it's most important for them to be able to do it so I think we have to look outside our traditional scientific spectrum to say not just to criticize these people who are saying they are offering DNA tests direct to the public but saying okay who of those are people that we might be able to work with in the future to be able to take this thing forward in a better and faster fashion I wanted to make two comments really in some aspect of practicality too so I was really struck by the contrast Howard got up and said clinician wants it hard fast and now and then Paul gave a talk and I just thought gosh I can see why his wife screams at him because I once said I'm joking but I'm being tongue cheap you actually said using the command line you can customize this in a really simple way okay that was a real contrast because it was exactly not what the clinician was asking for so what I wanted to ask you is I know who your audience has been or who your client base has been up until now but what I just really wanted to ask in a really practical way is is EBI beginning to rethink what its client base would be and would some of those clients start to be more clinicians and genetics clinical labs as distinct from very much on the research side and if so what initiatives are happening around that and then before your answer just to make the other point which I think picks up on your point I think we need to remember today that causation and prediction are two completely different things they have some overlaps but they are not entirely overlapping and so one of the challenges will be that was yeah we can look at variant by variant and think about causation actually what's more challenging to think about is that it's not just in racks alluded to you know what of it you do score from 33 variants or whatever but actually your entire GWAS print out could be a predictive tool itself at that level and that poses a whole other set of challenges but how you might present how you might collate on that information do you see what I mean you're right the top bullet on the the final slide that I had we are not and we do not want to be and we are not marketing ourselves to be a clinical decision support tool I believe that we should be in the next level down so the people who want to build those interfaces for the clinicians should find in us basically that what was just described the place that they can point their resources at to get updated information on a regular basis now I think at the EBI so I'll put it very strongly if after 20 years of collecting the world's biomolecular data this data turns out to be useless if we are planning human health we will have really screwed up and so we are interested in making our resources absolutely as useful for interpretation and healthcare approaches but to be honest I think there's probably two MDs out of 500 people at the EBI and neither one of them are practicing and so our role is deeper in the infrastructure generally we want to make better connections so that we can provide that when people ask for things we're giving them the types of answers that they want it was kind of the point made about the genome as a graph I don't even want to consider the genome as a graph and I'm certain the clinicians don't but somebody has to make this work and that's got to be deeply buried in the system that's very helpful question for the back way comment Gail Herman clinical geneticist nationwide children's hospital and I think this is going to only work as good as the clinical phenotyping that goes into it if we want to look at clinical utility and so how to address to get as much clinical information that's accurate that's updateable if you start with kids they're going to get older and they're going to have new problems and you want to incorporate that and I really like the idea over there of having people put in their own information it may not all be accurate but it's probably going to be as accurate as a lot of the information you'll get from other sources I think clinicians are going to need to educate people maybe about how to put information clinical information into these databases but I think the clinical labs know that when you have a detailed form maybe one in ten times you get the information you ask for so you can't go back from the lab data to the clinical phenotyping that you like very often so that really is a big question about phenotyping which might be something that we want to we just want to have in the last couple of minutes voices that we've not heard from thank you very much my name is John Parkinson I'm from the UK and I have the privilege of running a million EHR database and soon will be a 52 million EHR database and I just request from a comment made a while ago that we don't generalise about EMRs I call them EHRs and throw them all away there are some very good ones out there and I think we need to respect that I think you disagree um we could disagree perhaps we should talk about it okay any last final comments that people wanted to have a few briefly one or two sentences okay so basically I got nervous when something I said in my talk about phase that I need to know if this mutation and this mutation in the same gene are on different chromosomes to then a discussion about haplotype blocks and I think that's my concern about what I'm trying to do in the clinical to what you know people are trying to do at a genome level the haplotype of blocks unless these two variants on the same chromosome a long ways away actually contribute to the phenotype I don't really care about it so I just it's like it jumped from here to here and I'm in the clinical lab I need to be you know I'm I need to know more than I mean I need to be able to dig down more than what Howard was saying that his clinicians want a you know two minute thing I'm the one having to go in and do that but I can't go from here to there and sift through all of that data so I need to be able to do it not in two minutes but within 30 minutes you know and get it so it really made me a little nervous to see how a simple little phrase jumped and all of a sudden got to the point that I couldn't do you two other comments there we go I just wanted to not taking the sides on the EMR discussion I think the broader perspective is that we have to be open to the idea of non-traditional ways to access data and I agree with Bill in the sense that we've had some discussions with Ancestry.com about incorporating family history data along with the family structure I think that as we look at some of the information about where is the missing heritability it seems like at least some of it may be an idea where SNPs that are probably family specific so understanding family structures at a genealogic level is probably going to be important and so thinking about that type of information the phenotyping information is critically important and I think patients can probably supply some of that but that is a role I think where electronic data warehouses that are generated through EMRs if we can figure out a way to get the data out can also enhance for no other reason than to do validation of patient enter data but I think that we can't I would agree with Bill in the sense that we can't lock ourselves into thinking that the way we've always done medicine is going to be sustainable in the future it's going to have to be more distributed and so we need to be able to create a innovation space that allows us to test out different ways to do it and see what works and what doesn't I think we're going to have to leave it there because some of us are fueled by caffeine and we don't have very long to consume it now so so so thank you very much for the discussion and we'll return at 5 minutes past 11 I believe