 Great. Can you hear me okay? If we can hear you. Okay. Fabulous. So as Dan said, I'm Marilyn Ritchie. I'm a faculty member at Penn State University, and I'm the PI of the genomics portion of the Coordinating Center, and I am a member of the Marshfield team in Emerge, and I've been part of Emerge since it started. I really briefly want to just mention some of the things that the genomics work group and all of the members across the site have done in terms of genomic discovery, and I will be briefed because Rex covered many of these. In Emerge 1, we did some genotyping. Some sites have started some sequencing. A large bit of emphasis was placed on QC, and specifically related to merging the data set. This is, to use Dan's word, unique. This is one of the unique features of Emerge, rather than focusing on meta-analysis. We've really emphasized mega-analysis and merging the data set across sites. It's given us a lot more power to detect associations. In Emerge 2, we've spent a great deal of effort on imputation. We've started some GWAS studies, some interaction studies, and others that I'll talk about. So as I mentioned, QC was a major emphasis specifically related to merging data sets. We have multi-ethnic groups. We had multiple genotyping platforms in Emerge 2. A real benefit to Emerge is access to individual-level data across the consortium, which is very different from a lot of other consortia. We've published a number of papers on QC and merging data and the like. We've done GWAS studies in a number of areas, including binary disease outcomes, as well as continuous values. They have gone on throughout Emerge 1. They've started in Emerge 2 as well. So I think we've had a lot of success in that area, but we want to be able to push new areas of discovery. As we said in Emerge 2, we've spent a lot of time on imputation, and the precipice of this was because we had so many different platforms, and we wanted to be able to merge across sites. In the end, right now, we have a QC clean version of 51,000 samples. We have several thousand more to include soon. These data have anywhere between 8 and 15 million SNPs, depending on what QC criterion you want to use, and so we can use these to do a number of different genome-wide association studies. As Rex mentioned, we've spent some time on this null variance idea, so these are looking for variants that perhaps may be more rare that are predicted to be null, so stock gain variance or lack of start of transcription. We did this using a bioinformatics approach, and we've looked at both the genotypes and the imputed sets to see the occurrence of null variance, and you've already seen this slide, so I won't spend a lot of time, but we wanted to be able to look where are we observing these null variance across the genome, and specifically right now, we're focusing on regions where we see stock gain variance. These are variants that we want to be able to go back to the EMR and do a few ass analysis and find, do we have clinical characteristics or traits that are associating with some of these null variance? Because they're rare, there are things that may not have been observed much in the literature, but because of the sample size of EMRGE, we may actually have the power and ability to find such clinical associations. As it was mentioned, we've been embarking on the PGX project, which is really our first sequencing endeavor as a group. These are the 84 pharmacogenes using the platform developed by the PGRN, and the coordinating center has been doing multi-sample variant calling on this set. So far, we have over 2,000 samples that have been sequenced and called as a group. We're doing a lot of different quality control, and we will be embarking on some FIWAS and some medWAS or drugWAS association studies with those. And these variants are available currently via SYNC, which is our public repository. So you can search in this site to see the number of variants that we have found in each gene for the 2,000 samples so far. What's been most of the time talking about and to try to drive the discussion is what we might do in terms of discovery in EMRGE 3. And I did take this discussion to the full genomics workgroup as well, and so some of these ideas are coming from the various sites. We think that there's enormous potential for future discoveries in EMRGE, and this is kind of going off of what's been going on at some in the chat room. I don't think we want to necessarily just do more GWAS and find more of the small effect variants that other consortia are finding. I think one benefit that we have is currently the set of 50,000 samples with shared individual level data, so we don't have to rely on meta-analysis. We can actually do a combined joint analysis of these data. And from the earlier slides, it sounds like there is potential to add another 50,000 samples to this over the next year of the grant. Many other consortia are only sharing summary statistics. And as we just talked about, the EHR enables vast potential for phenotyping. And there are a lot of things that we haven't done a lot of in EMRGE yet. And part of this is just, you know, it takes a lot of time. It took a lot of time to get the data imputed and merged and cleaned. Now we're ready to do analysis. The phenotyping also takes time. But because of the EHR, we can start to think about treatment outcomes. We could do disease subsets based on different clinical characteristics. You know, because of the EHR, there are different features of disease that we could start to think about in terms of sub-phenotyping or endophenotyping, which I've participated in a lot of GWAS consortia. Often we don't have those other traits captured on samples to do that. Extreme phenotypes is an area that is gaining popularity, especially when thinking about reverence. We can do things about direction of causality because we have the data over time. And a big thing that's come up a lot is this idea of doing longitudinal GWAS. You know, every time we present in meetings about a merge, we tout how we have this strong EMR and longitudinal data and how we can use that as an advantage in our association studies. Yet for the most part, we haven't really exploited that. We've only started to scratch the surface on what we could do. So in addition to just those types of analyses, I think there's a lot more that can be mined out of the data that we have. And there's a list here, and so I'll just talk about some of them briefly. We do have a highly multi-ethnic set included in EMRGE, and so we have the opportunity to look at racial or ancestral disparities in allele frequency for important variants in addition to disease disparities. We know that there are certain traits where we see disparity based on race ethnicity, and that is something that we could start to explore in much more detail than we've done. Structural variants or CNVs, this is something that we've really just started. David Crosslin and his group has led an effort to call CNVs in the EMRGE 1 data set because we had the raw data such that CNV calls could be made. As part of EMRGE 2, on the last genomics workgroup call, we discussed getting those raw files from all of the EMRGE 2 sites so that we could call CNVs broadly across the network. A lot of work has been coming out lately looking at the burden of CNVs across the genome, and there are different ways that that could be calculated, and that's something we could do with existing data in EMRGE that we just haven't done yet. Low-frequency variant analysis, so certainly within the realm of the PGRN Seq where we have specifically targeted a lot of rare variants by doing sequencing, even within the imputed data, we have a lot of low-frequency variants. In many association studies, people tend to do a minor allele frequency filter and only look at common variants. I think we have the opportunity that we could look at rare variants and consider doing the burden-based analysis of rare variants, been rare variants based on different functional characteristics, gene or regulatory regions, et cetera, and analyze the data in that way. We've only recently started exploring gene-gene and gene environment interactions. This is something that in the field we're seeing more and more of. People have been saying and suggesting that interactions may be important, but because sample size has been limited, there hasn't been statistical power to find interactions. In EMRGE, we have actually published a few from the Marshfield project looking at gene-gene interactions in cataracts. We've also done gene-gene interactions in lipid levels, and very recently we've started doing some gene environment interactions as well. The gene-gene interactions that we're finding for cataracts are actually replicating across the network. So these were discovered in Marshfield and then replicated in the other sites. So I think this is something that we should be pushing more of, and I've started collaborating with CHOP on some of their interaction analyses, and I think we're going to bring some of those to the rest of the network. Pathway analysis has become much more popular in the field and integrating functional data from ENCODE to help target regions of the genome to look at and get away from this protein coding region emphasis, especially when looking at kind of polygenic or interaction models. Integrating other epidemiologic data. A lot of these academic medical centers are positioned in areas of the country where vast epidemiologic data is being collected by other units. There's a lot of GIS data, there's air pollution data, there are things about heavy metals that the CDC and other epidemiologic units are collecting. We could integrate that information with the data we have from the EMR and use that towards better gene environment analyses that I think would be really useful. And then we've also talked some about other molecular data, and I won't spend much time on this because this is something that Debbie is going to talk more about, but I think there's a lot we can do with the existing data and emerge without generating more data, but of course, if we had the opportunity to generate more data, either RNA-seq methylation, more sequence data, more targeted high-throughput genotyping platforms, considering, especially if we started considering RNA-seq or methylation, we would really need to consider different tissues rather than just DNA from blood. But if we added those data to what we have molecularly in the GWAS and imputed data and emerge as well as the PGR and seek data, it would add great opportunities for discovery to couple with the wealth of information that we have in the EHR that we just haven't really mined to its full capacity. So to summarize, I think we as a group think that there's a lot more discovery that can be done and emerge, and as part of Emerge 3, these discoveries can be made either by performing much more comprehensive analysis with the data that we already have, and I think that we'll see a lot more of that in this last year and a half of the network as it stands now, because it took us quite a while to get the data imputed and clean, as there really was not a literature out there to look at for how you merge data sets that have been imputed. And so we focused on meta-analysis, and so we really had to kind of pave the way on how to do this, and so we're actually writing that up now as part of a special topic in Frontiers and Genetics, so that other groups that are interested in doing more of a mega-analysis instead of a meta-analysis would know the lessons that we've learned in Emerge. And if we perform more data generation via sequencing or other technologies, of course that would allow us even greater opportunity to do more discoveries in Emerge. I think we as a group should try to incorporate more types of data, such as data from the environment, or at the very least considering multiple phenotypes or sub-phenotypes within the EHR, and some of the groups have been working toward this end, but I think in Emerge 3 it could certainly benefit us to allow for the methodologic approaches to analyze these types of data. They're needed, and they're critical to really push some of this forward. I think some methods are out there, and some of us within Emerge are working in that space, but in order to really capitalize on the data that we have in front of us, I think bringing in new ideas is going to be to our benefit. And I will stop there. Thank you, Marilyn. We'll pass the gavel to Debbie Nickerson as the reactor. I was just hoping I wouldn't have pushed the wrong button. I think I need to get off of here. Is that all right? Is that better? Yeah, it's fine. Is that better? Yes. It's certainly audible, and we have your slides up. Okay, so I'm getting a lot of feedback on you. No, we're hearing you clearly. All right, so just thinking about Emerge 3 and reacting, I certainly don't want to think that sequence is the only way. I just wanted to bring it up because Emerge 2 is moving into sequencing. And I think that there's a lot of reasons to think about sequencing in the future, and it brings together all the phases of Emerge. If I can have the second slide. I mean, from applying sequencing on a large scale, an error was certainly a big part of this. Everyone has highlighted that fact that there is an extensive amount of rare variation in the human genome. And if you go to the next slide, most of the variation in the genome is rare. And if you go to the next slide beyond this, what I've done is I've logged the minor allele frequency, and if you just advance, I'm highlighting 1%, which most GWAS chips with imputation will capture. And you can see that's just one part of the picture of natural variation in the population. If you advance again, you can see that there's a large fraction that we're not capturing. So I think everyone is interested in running the gametes of allele frequencies across the population. So if you advance in the next slide, if you look at deleterious alleles and there's lots of ways to look at this, you can look at it by loss of function, splice variance as Emerge is doing, and you can also look at applying every predictive method that you have. There's more than 7, more than 10 of these. But if you look at rare alleles in the population that are predicted by a large number of methods, those alleles that are predicted by all to be deleterious are very young. This is the age of the mutation in the population. So smaller is younger. And so these are alleles that also we would like to get at. If you advance in the next slide, Emerge 2 is certainly getting at that through a collaboration with the PGRN by applying a targeted-based sequencing platform that has been developed by that group. And the genes that are covered there by looking at extensive data, although individual variants may be very rare in the population, if you add up their frequency, just the rare variants, it's suggested that 7 to 12% of the population may carry a rare variant that's unique to the individual, but across the population is very important for phenotype. So looking at these questions of the overlap or the in-between for rare and common is very interesting and important question. So if you go to the next slide. So the case for sequencing is obviously the majority of variants is rare, but it can be collectively common. The most impactful are not only rare, but also young and ancestry. Maybe even family specific. When we talk about family history, is there some way to bring families into the study? And that's just the concept. And I think that sequencing is the perfect segue from merge 1 and 2 into 3. And just to state that sequencing is not the only way, but also looking at expanding the dataset across the genome will be very important. And if you advance to the next slide, it interfaces with large projects like ENCODE that has given us new knowledge about non-coding variation and where it's present in regulatory regions within the genome. And if you go to the next slide, this is just a slide that I stole from a colleague, John Stam, that his graduate student, Matt Murano, actually looked at the intersection between GWAS hits and DNA sensitivity sites and found that there was an enrichment of GWAS hits in these areas. And I know this is an area that ENCODE is very interested in, the intersection of these non-coding variants and where they overlap with the merge sites. If you go to the next slide, there are external groups interested in mining electronic medical records. And this is just an example that was in cell in the past year where 110,000 medical records were looked at for connections. And they found a connection between associations and Mendelian diseases. And it's intriguing to think about how Mendelian disease variants may interact to contribute to common complex diseases or traits. We've known in cancer for many years that underlying susceptibilities in specific genes provide the first hit. Is this true for other diseases beyond cancer, like cardiovascular disease, diabetes, et cetera? Sequencing would enable us the ability to look at this. If you go to the next slide, there are also great interest in reporting incidental findings from sequencing and obviously recommendations have been put forward about the genes that you would want to look at in this regard. And that's an important, certainly something very important for a merge to look at. So if you advance to the next slide, sequencing would allow us to explore the spectrum of actual variants and the sequence. And also, at the same time, will permit the intersection of rare and common, the contribution of rare, perhaps Mendelian, because many of those actionable genes are related to Mendelian diseases. How do these contribute to common diseases is an important question. If you go to the next slide. So the idea of sequencing and talking about sequencing, whether it's selected targets like the pharmacogenetics panel or the return of results targets or exome sequencing. I mean, genome seems perfect. The price is dropping. We're hearing $1,000. It's coding plus non-coding, which extends into some of the areas of interest for mining. For the GWAS studies. And then in terms of looking at the phenotype, I just want to end with what was been most successful in applying sequencing in some of the large-scale sequence-based consortium I've been a part of. And if you advance on the next slide. And that's really skipped a slide. Okay, so go to the next slide. This slide, this is it. Okay, so sequencing the tails of the distribution. I think that phenotype and mining phenotype, the way you've done in a merge, is perfectly suited for finding the best types of phenotypes to think about sequencing. And the outcomes that have been positive from the large-scale application of sequencing has been looking at the extremes of a trait. At any time, we really went to an extreme. And I'm talking about using like an emerge-sized record-based, like 100,000, and picking out the extreme tails there. Obviously, you also find mistakes out there, which to merge is very familiar with. But you can get down to the most important set. And I think that Josh Denney talked a little bit about that with Stevens-Johnson syndrome and the fact that you, if you can get down to that small handful of individuals sequencing would have also in addition to finding the location. You can do it by GWASP, but we'll get to the functional variance by sequencing. And just to give an example of this. So we'll ask you to wrap up? Yeah, I'm wrapping up. If you take an example of high and low LDL, you go to the next slide. If you look at the distribution of variance by these little triangles at the tails of the distribution at the high level of LDL as a trait, LDLR had a burden of rare variance in the population. This is expected from indelian. Hypercholesterolemias is pretty common in the population, estimated at 1%. But also at the tail of low, peskinine actually fell out. There was a burden, but also a more common rare variance that was present. And this was sequencing hundreds of individuals, not thousands. So a few vests. I think there's a discussion and Steve's leader is going to take it from here. Okay. Well, Steve, if you're there, we'll segue immediately to you. I am here. Can everybody hear me? Yes, we can. Good. In retrospect, I guess I should have... Can we go to the next slide? In retrospect, if I had to use the smaller font like Howard did, we could get these all on one slide and look at them at the same time. Let's open up into two slides. On this first one, I think you should get the sense by now from both Marilyn and Debbie's presentations that the consensus of our group was that discovery research should remain a high priority for the future. And working with the phenotyping group, one step is obviously to decide what traits or phenotypes are sort of high priority for the next phase of the network. And then the topic for discussion, and some of this has been occurring in the sort of online chat, is whether or not just go with existing data and work with improving the analytical tools and the methodology to use existing data or whether there should also be some effort into denser data generation, whether it's by next-gen sequencing or exon arrays. As a sort of a non-genomic person pursuing this more from a pharmaceutical perspective, I guess we could also add in that existing data could also include the longitudinal phenotypes that several people have alluded to and using this in the context of the contribution to disease progression. This also lends itself at gene-environment interactions as well, also the impact of therapeutic interventions on the trajectory of progression. The second point that came out of our discussions was this whole issue of not throwing the baby out with the bathwater and looking at the importance of rare variants. And again, up for discussion would be the most appropriate platforms, whether it be a genotyping or a sequencing platform to capture them, but also resources that may need to be put into developing appropriate tools to detect their effects. Next slide. This next point gets at one of Debbie's last points and that is considering study designs other than a straight GWAS type format for discovery purposes. For example, the example that she gave looking at extreme discordant phenotypes, at least for continuous variables and coupling them with your platform of interest and I've put a whole genome sequencing here. And the potential for this particular approach to be a little bit more efficient in identifying causal variants and especially rare causal variants. And then the last point that our group would propose to the larger group at whole would be again something that was mentioned by both Marilyn and Debbie and that is looking at other sources of genomic material, RNA or going back into the DNA and looking at relations, for example, for these genomic analysis. And then on the EMR side of things, looking to see how additional data can be captured or parsed to look at environmental factors and comorbidities for gene-by-environment interactions, for example. So those are the four issues that were raised by our group as being something to pose to the rest of the group for their thoughts and comments. And I'll toss it back to the chair. Okay, the floor is open for comments or questions on EMR and genomic discovery. Mark Williams here. A thought occurred to me as Debbie was talking, again trying to bridge the tension that we have between discovery and implementation. This was in the context of the rare variants. I think one of the issues that we're all going to be dealing with as we receive secondary findings from our genome's exomes and high-density chips about, you know, that we're thinking about clinically returning is the lack of information that we have on the impact clinically of some of these rare variants, even in genes that we know quite well. One of the things that we'll be doing is to try and use our traditional methods of contextualizing that data using family history and other sorts of things to understand what's the potential impact. To me, that seems to lend itself to the idea that if we did a rare variant focus, we could study how we could use electronic health record mining to try and contextualize rare variant information to add additional information for clinical return and implementation. So that could be a potential study topic for eMERGE-3 that would bridge, again, this discovery and implementation chasm. This is Dennis Rodin. I have to say Rodin now because there's other down on the phone. I agree with Mark, but at a practical level, I think you have to make some attempt to limit the minority frequencies down to which you're willing to go. If you find a rare variant that is one in a million or one in 100,000, it's going to be very, very tough unless you know something about the biology to assign any kind of phenotype to that. And so I think the sweet spot for us is probably minority frequencies around 0.1 percent plus variants in disease genes that have been implicated. And as Zach said a couple of hours ago, you find that variants that have been implicated as causes of hypertrophic cardiomyopathy or channelopathy are actually much more common than you'd give them credit when you start to look across very large populations and we're finding that along the way. So I think that one thought is exactly which rare variants you would want to focus on. And I think that the variant of uncertain significance is one in 10,000 or one in 100,000, something that Emerge is really, really well suited to attack. It seems to me that we should report all variants even if we only get once in 100,000 people and put it in a database because other people are going to be putting that data forward and annotating those variants even if we can't determine ourselves alone if they're pathogenic it's going to be really important going forward. So Dale, I totally agree with that and that's what we're going to be doing in Emerge PGX and as data accumulates worldwide you can start to make some sense of that but I think over the next five years it's a one in a million variant unless there's some biology around it it's going to be hard to make sense of. But yeah, I totally agree that we have to figure out a way of archiving this worldwide. So this is Haukon here. So as you know the new platform from Illumina on the X10 which is currently tailored towards whole genome, I mean it's very likely going to be adapted to exome even though that will probably take some time but an exome could probably be sequenced for about a hundred dollar sort of say a year and a half from now. So in the interim a strategy to sort of customize a chip with this rare variant content particularly content that are sort of with potential or putative damaging impact or loss of function variance and so forth and that can actually be typed now extremely cost-efficiently across you know thousands of thousands of samples for a relatively sort of low amount of money even though it's going to cost some money. So in the interim that would potentially be a very very powerful strategy across all the sites that would open up the rare variant content for all the female types that we have and we don't have that today. This is John Harley Cincinnati. I just ask the question that when we concentrate on rare variants and we don't have all of our samples genotype we rely on imputation and as the frequency of the variant drops the accuracy of the imputation is disastrous. And so how do we, you know, we don't, we aren't able to take advantage of our huge numbers because the air introduced by imputation is so big. Is there anybody that has a solution to this problem? You need to sequence. You need to sequence all those people. So this is Rex. I'd like to weigh in. I'd really like to endorse the idea of thinking about environmental factors. We've played a little bit around with IS tools and I think, you know, one of the things that we could do very well which isn't done very well in most cases given the longitudinal nature of the people that we're following is to think about some of these environmental factors. I think there's going to be increased opportunities to capture some of these bits of data. Marilyn talked a little bit about environmental protection agency measurements that are being made. I think to be able to start to tackle gene environment interactions using GIS approaches in some of these environmental measures is also something that would uniquely be possible in an eMERGE 3 for us to take a look at. Well, this is Chris and I, while I find that idea elegant, I want to make sure we're somewhat cautious and thoughtful about this. For some populations and your Chicago population, Rex might be a superb candidate for this. For other populations, they're not always, as we say, population-based. And hence the density of sample cases in any environmental geocode. You run into power problems very quickly with environmental association, particularly when you're treating it as a covariate and a substrate. Chris, this is Marilyn Ritchie. One of the other things that I think folks could think about, and this is something that Marshfield has done in eMERGE 2, and that is to use the Phoenix toolkit as a mechanism to collect environmental data. We were awarded a supplement as part of the Phoenix Rising program by NHGRI, and so some of the Phoenix toolkit measures were sent out to the eMERGE participants. And we've actually started mining that data, and we're finding really interesting environment results for type 2 diabetes and some for cataracts, and we only implemented a few of the Phoenix toolkit measures. That's something that other sites could do either electronically or paper forms. It's something that you could port to an iPad that people could do in clinics. You could put it on the web that people could do through their My Health at Geisinger or Vanderbilt or what have you. And that's another way that, based on efficient environmental data, you could collect it on the participants in the biobanks. I certainly agree that would be hugely more efficient and wouldn't suffer the broad association problem that you have with geocoding, and I actually think the Phoenix toolkits would be the appropriate choice for collecting that kind, so I agree with that. That might be another agenda item to put on the discussion with large health system when discussing it with the vendors of health records because the patient portal is going to become a part of the mandated electronic medical records eventually, and as they're building them, it would be nice to have patients uploading various lifestyle things that can be merged with their electronic medical records. One question are there other things that the Coordinating Center should be working on in the future? I mean, they did an awful lot of work with data cleaning and then the imputation. But are there other things that would make the data set more effective for other analyses? That would be a good focus for Emerge 3. So one focus there, this is how Conn, is on the copy number variation analysis because that's another whole dimension that, you know, focusing there from the rare variants standpoint and most of the data is typed on Illumina that can open up a very fruitful sort of discovery focus across all the phenotypes again from a data mining standpoint. And algorithms can, you know, we have algorithm that can be applied on these data at the individual sites or jointly. And the whole thing is sort of meta-analyzed together. This is Terry. I did want to ask about the issue of sequencing. When we've approached sort of large scale sequencers the question they often ask is, well, how many cases of a given disease do you have? Because they're very interested in looking at, you know, thousands or tens of thousands of cases of disease X and that has not been something that Emerge has really focused on because we're sort of phenomic as it were. So how do we address that question other than say, gee, we've got so many wonderful phenotypes isn't this just as good or better? So Gail, you? So I think we don't need to have a disease focus. I'd be really excited to see one specific 56 ACMG gene. We know what those genes do but we don't know if the variants in those genes do. And we could look both for variant annotations pathogenic and importantly not pathogenic. Everybody's going to have sequence variants if you don't know what they do. And then we could also look for pleotropic effects of those same genes. So there's a discovery possibility there too and then there's lots of implementation questions. How do you maintain my health is I'm very concerned about those 56 genes now because of the NG recommendations. So how do you implement that? How do you do decisions before? How do you educate providers? What do patients want to know? What do they need to see? You know, I think it really fits all the things that we can do really well. And having those phenotypes that we have so in such depth gives us a unique resource for that kind of annotation. And I think even at the pediatric sites there is really important work you know 49 I think out of the 56 that have pediatric phenotypes plus the pediatric sites really could look into this idea of mandatory return of adult onset findings to children which has been a hugely controversial recommendation. They could really ask their family what do they want. Ask their providers what they want. But I think that that is a space where there's a lot of controversy, a lot of interest for the health system and we have a really unique capability. And I don't think it's something to do with 16. I would add a couple more by the way. You know, I think that's I agree with Gail. I think that there are this is Irwin. One could add to such a panel I think which would be extremely meaningful and something that can be done uniquely in eMERGE. Things like a list of the highly penetrant forms of diabetes, highly penetrant monogenic forms of diabetes and others. I think which you know would be very helpful to understand among sort of common complex diseases what forms are diagnosable on a molecular level and to what extent is that to how frequent that is. So to be interested in Steve Leeder and Debbie Nickerson's comments on that, let me just ask both of you had pointed toward non-coding variation and here we're talking about really focusing on genes even though there are some non-coding reasons obviously in the entrant. So Steve or Debbie any thoughts? I think it's great to look broadly at genes but I think that different platforms have different outcomes in terms of what you look at. I mean many people are sequencing whole genome but they end up looking at only the coding and that 20% that are well annotated by encode as being highly functional but I you know I think broadly whole genome is an important route to go because you can look at variants that are difficult to look at like indels and CNVs by just sheer capture. Thanks Steve, what do you think? Well for me non-coding for me really points more to regulatory regions as being of interest but you have to understand I'm coming at this from a pediatric perspective as well in that when we are looking at things in kids there is so much change that is going on between birth and sort of adulthood that you have to look somewhere besides the coding region of a gene for what's changing as kids grow and develop and to some extent we know very little about how this really works in senescing adults as well as we move towards a geriatric population the non-coding stuff really I'm really thinking about important regulatory regions and being able to identify those and characterize them. Steve, this is Dick Winchelbaum in all of our studies of variation in cancer drug response the majority of the hits that are functionally important regulate transcription they're a non-coding region. So this is very brilliant. It's clear that from an economic standpoint we can't do whole genome sequencing of 50,000 people but we could look at a smaller number of genes and since these genes have been implicated in human disease they're reportable they're actionable we can look both at exomes and introns if we focused on a subset of genes it would be a parody for looking at the whole genome it would be scalable and I think that there's 50-some genes here maybe all of us have some other favorite genes if we had 100 genes it's kind of catchy instead of 1000 genomes we have 100 genes that are looked at across a large number of people and again this would be as a parody what are we going to do when we have a large number of whole genome sequences. This really cuts across all of that. I would point out that's very reminiscent of the decision early on with the ENCODE project to tackle 1% I mean I don't know if my life is ideal or not that's a separate issue but it is reminiscent that the same rationale went in and we thought we were going to interpret the whole human genome so there was a whole process to pick the 1% which was kind of complicated but we got there everybody said the 1% until you felt comfortable enough to scale to the whole genome so this would be a similar circumstance however many genes you pick I think we could also I know I think we could do really well in that avenue so is the idea to do non-coding as well of those genes I think if you could find that so if you know the regulatory reasons of why we're playing I was actually going to clarify you would take a gene and you would just go end to end maybe X number of bases upstream and downstream and just do the whole segment as opposed to what Terry was implying was sort of known functional non-coding regions I'm just trying to stimulate the conversation maybe towards XM versus targeted panel what's the difference there if you get XM what's the difference these are actually asking very different questions if you're only going to go XM you're going to make the assumption that that's what you're going to find that's not the idea that you have these types of genes that are interest and they're non-coding so you want to get a complete inventory deep and lots of people that's what I wanted you to say Eric I don't know what particularly what I thought I heard but what I also heard was a variance approach which was you take the X number of genes and you take all the Xsons and the introns any regulatory regions you know of and the introns are elsewhere the extent you knew them there are some that have regulatory regions that are identified in other chromosomes and so maybe look at those as they become added in but it takes two years Debbie or more to develop a targeted platform like this I think it's much easier now than it was I don't think it's I think we have a lot more experience and I think the PGRN has great data with the PGX they can look at these questions Debbie what do you think molecular inversion is going to take next down now you know I think it's a matter of cost and cost and ease of implementation I think some can be cheap but whether they're broadly accurate to many genes is not known we will give it the car we put this into an RSA I would suggest that we could be agnostic as to the technique and let the people that are putting in proposals discuss how they would do it because as was pointed out there are a number of us that are going to be generating large numbers of exomes and genomes and so that would also allow then for a methodologic comparison about what's the best way to actually do it okay and we're going to need to wrap up the discussion if your car here there are these commercial entities that are trying to make these panels of two or three thousand genes so that if you have a patient with marfan or if you have a patient with hypertrophic you just order that set and then you can just pick and choose and analyze because what's happening is that realizing that there's many variants that may cause for example aneurysm and so I end up ordering a panel of fifteen candidate genes which is like five thousand dollars and I may still not get the information because they may just do certain variants and so I think you know that's another you know it's not in the whole exome but it's like what are the thousand or hundred or fifteen hundred genes that are most often used in the clinical setting and perhaps go with those and also it would go back to Debbie's point that if you're using some of these in the clinical setting you would have a familiar structure to interpret the variants much more efficiently okay so I'd like to thank all of the participants for actually all of our panels this morning for a very rich and thoughtful discussion for future directions and e-merge and we're going to I guess we're down to about a twenty minute break for lunch so careful because you're not going to get all these people down to the cafeteria we'll do our best so we'll plan to start somewhere around half past the hour the idea of these folks should everybody out there on the call should go run for lunch and bring it back on the first floor here you come up