 So, again, for the people that are watching via the Internet, I'm going to introduce Rick Lifton. He's going to present on behalf of the Center for Mendelian Genomics Consortium. Rick is a former member of this advisory council, and he is the Sterling Professor of Genetics and Chair of Genetics at Yale University. He's also an HMI investigator and a grantee of the Centers for Mendelian Genomics Program. And welcome back, Rick. Thanks very much. I'm delighted to be back to see a bunch of old friends and have the opportunity to speak today on behalf of the Centers for Mendelian Genomics. We've been running for two and a half years, and the centers are composed of the University of Washington Center, the Yale Center, and a combined center from Johns Hopkins and Baylor. And if we think about what the goals, not necessarily the goals in the first funding session, but I think we have eliminated most of the barriers to understanding the contributions and causes of all human monogenic disorders. And this writ large ought to be among the challenges that we see in this field going forward. So if we just think for a moment about the importance of an impact of Mendelian genetics, certainly rare mutations with large effect provide very fundamental insights into human biology that allow diagnostic testing that are important for disease prevention therapy and prognosis. I've heard a lot about this in the last session, but certainly the discovery of BRCA1 and some of the other mismatched repair genes in colon cancer leave little doubt about clinical utility following discovery. Similarly, the ability to use rare mutations with large effect to identify tractable therapeutic targets has really been a boon to the pharmaceutical industry, and a number of targets that are under active development in industry have come from the recognition of distinct phenotypes from rare mutations. NAV1.7 in pain, ROMK in blood pressure, PCSK9 in LDL cholesterol, the orexin pathway in sleep, the amyloid precursor protein pathway, and Alzheimer's disease, the ability to make bone mass by inhibition of a sauce with monoclonal antibodies. These all came from rare Mendelian traits and the ability to see what the clinical impact might be of these rare mutations coupled with also the knowledge of what adverse effects do or do not occur with rare variants of large effect are really motivating factors for understanding more of these in the human population. So if we just think broadly about the status of Mendelian genetics, we focus largely on the protein coding region because that is where most of the business that we have recognized to date has occurred, of course, does not mean that that will always be the case. But simply to make the point that there is a lot of discovery that remains to be done, there are about 21,000 protein coding genes in the human genome, and only about 3,000 of these have been linked to about 4,000 Mendelian phenotypes, and so not to put too fine a point on it, even if we allow for 15 to 30% homozygous embryonic lethality, there's an awful lot of genes that haven't been accounted for by, from rare mutations to date. So there's a lot of opportunity for new discovery in coding regions, and of course this leaves 99% of the genome open for discussion as to what we might find when we get around to actually being able to, in a cost effective way, investigate the rest of the 99% of the genome. But what really has motivated the centers for Mendelian genomics was the recognition five years ago that the cost of DNA sequencing had come down six orders of magnitude and that we really ought to be able to tackle problems that we hadn't been able to previously, and this then led to the recognition that one could fairly effectively and cost effectively sequence the coding regions of the genomes rapidly and inexpensively, and of course this is the language of the genome that we do understand, and we're not very good at reading that language, but we're better than we are at the rest of the genome, so many of us thought we would be able to use this to fairly rapidly begin expanding our understanding of the consequences of rare variation. So the centers have converged on fairly similar platforms and analysis strategies and are currently largely sequencing about six exomes per lane on the high-seq aluminum platform. There's new technology that's coming out this year that will continue to evolve the capability, but you can do this at high sensitivity and high specificity for quite low cost, and at Yale we're at about $500 per exome and I think that's a pretty good standard after you amortize the cost of the equipment include all the labor and the cost of data production and storage. So the field really got its founding back in about 2009 with two papers that appeared in short order from groups that are involved in the Centers for Mendelian Genomics at Yale making a clinical diagnosis and at University of Washington identifying the cause of a Mendelian disorder. The group at Baylor and Hopkins have very broad and deep contributions to the field of Mendelian genomics, and I think these centers have really put together quite a strong program. So if you think about where would we be looking for new Mendelian diseases? Well certainly if you go to the online Mendelian inheritance in man you'll find about 1700 Mendelian traits that have been reasonably well characterized but have not yet been solved. So those are fairly obvious substrate for new investigation, but we think that that is really the tip of the iceberg because there are parts of the genetic landscape that have been unapproachable until the advent of this new high throughput technology. So for example there are a number of extreme phenotypes that arise from consanguinous union whether you typically do not get enough information for mapping either because you only have a couple of individuals with these extreme phenotypes and you can't adequately map the locus or there's high locus heterogeneity. So every little family that you study has a different gene as its cause and mapping approaches have not been successful. A second area that we think is quite attractive are potential dominant reproductive lethals. So if you have a dominant trait that is reproductively lethal every time it occurs those mutations are virtually always going to be de novo and there's almost there's no mapping information there unless you're fortunate enough to capture a deletion that localizes the location of a disease gene. And then increasingly apparent that there are diseases that feature that are commonly caused by somatic mutations other than the classic cancer syndrome so that we all already know about. So we think that there's a lot of ground for opportunity and what I want to tell you about in the brief time is some of the project progress that's been made. So key to this effort has been patient recruitment and the centers for Mendelian genomics have worked through a very extensive worldwide network of collaborators. There's been a small contribution from a web portal where anybody can go online say I've got interesting samples. Are you guys interested in sequencing? But this makes up a small fraction of the total number of samples that have been recruited for this study. There currently are more than 400 collaborating investigators from more than 25 countries worldwide on five continents and collectively across the three centers. We've recruited more than 17,000 index cases in the two and a half years that we've been funded. This covers about 800 different disease identities but depending on how you define that it could be a little bit less or a little bit more. So in terms of progress to date in the two and a half years we've sequenced more than 12,000 exomes for this project. They've published 84 manuscripts including gene discovery papers in Nature Science Cell, Nature Genetics, New England Journal, American Journal, Human Molecular Genetics. These include reports of 60 new disease loci and genetic disorders and it's estimated that there are about 200 more disease genes and traits that have been identified and are as yet being worked up and as you know in order to have high impact papers these days a lot of functional data is required and many of our collaborating investigators are in the process of pursuing these. I want to make the point that our mode of interaction with our collaborators depends on the collaborator. Our favorite collaborators are the ones who say you give me the data, I'll do all of the sequencing, all of the downstream work, write the paper and do it themselves. That's wonderful. That's music to our ears and we have many investigators who are capable of doing that. We also have investigators who are completely clinical who have no molecular capabilities but they're very shrewd at recognizing interesting phenotypes and they come to us and say I've got an interesting cohort or an interesting group of families, I know that this is distinct from any of the disorders that I've seen previously and would love to collaborate with you and we do the sequencing and do the analysis and do most of the follow-up work within the centers. So it really depends on the nature of the collaborating investigators. So I want to give a few examples to start with classic Mendelian syndromes where people have recognized that there are new disorders, there are disorders that have not yet been solved. This is one such example. This is a group of disorders, spondylo-metaphysiol dysplasia and these patients are unique in that they also have cone rod dystrophy, suggesting that this was different than any of the other disorders in this class. Classic segregation suggesting recessive trait and exome sequencing fairly quickly identified mutations in this gene, PYCT1A, in six unrelated pedigrees, independent mutations that co-segregate with a trait that leave no doubt that this is the gene that causes this disorder. And this gene is of known function. It's involved in the classic pathway in the biosynthesis of phosphatidylcholine and homozygous loss of function mutation are the cause of this trait with short stature and this cone rod dystrophy and this was published this spring. Similarly, mutation in a very interesting story in a mechanical sensory channel, PISO2, cause three related phenotypes that are related in having contractures of the hand and feet. I have variable associated features sometimes with cleft lips, sometimes without deep-set eyes. And it turns out that these patients in the study of 35 families, the 35 families with 13 different mutations in this gene that cause different phenotypes along this spectrum. And there's quite striking genotype-phenotype correlation on this spectrum with patients that have one particular mutation being particularly prone to having cleft palate. And if you don't have that particular mutation, you don't have that particular clinical feature, again published this spring. So there are many more diseases that we've studied and characterized the molecular cause of that were previously recognized as being clinical syndrome. But there are many more in which the clinical syndrome has not been previously described, but the genetics has driven the recognition of the clinical syndrome. And one that was published this month on the cover of Cell came from two groups in the Centers for Mendelian Genomics. And both groups, the two groups independently, identified the same mutation, resulting from a founder mutation that occurred about 16 generations ago in Turkey. They succeeded in identifying nine distantly related kindreds that all shared the identical recessive mutation segregating with a distinctive trait. So the mutation is in a Gene Clp 1, which is a kinase that is required for tRNA splicing. And that's recessive loss of function mutation prevents splicing of virtually all tRNAs with the accumulation of improperly spliced tRNAs, activation of oxidative stress signaling pathways. And the results are an interesting constellation of neurologic phenotypes. So looking at the MRIs of normal and affected individuals, the affected individuals have microcephaly, they have large ventricle as a consequence, fluid filled ventricles, they have hypoplasia of the corpus callosum, you can see compared to normal, they have hypoplasia of the pons, they have profound loss of the and cerebellar mass. And again, the magna cisterna enlarged as a consequence of the hypoplasia. So these mutations are recessive, apparently loss of function mutations. And the phenotypes have been mimicked in both fish and in mouse, demonstrating the consequence of these mutations. Just this week in the New England Journal of Medicine, another one of a kind family with a regional phenotype from extended families in Iran, who are segregating a unique phenotype of early onset coronary artery disease with age of onset around age 44. Obesity mean body mass index of 33, hypertension and diabetes and 25 affected individuals all share the identical haplotype and mutation in this dual specificity kinase, DERC-1B. And not much is known about the biology of DERC-1B, but one of the findings published in this paper demonstrated that this appears to be an activated gain of function mutation in that the mutant DERC-1B promotes increased adipogenesis in 3T3 cells in culture. And there are several other interesting phenotypic effects of this mutation. So again, recognized to be a potentially Mendelian trait, and the genetics helped establish that. Similarly, going to areas where genes have previously been found, but there are leftover patients and families that are not yet explained, has led to the identification of a number of disease genes. And this is one example, hemolytic uremic syndrome. So there are a number of genes in the complement cascade that have been identified as underlying this thrombotic phenotype. And yet there are patients who are unexplained and this led to exome sequencing of these patients. And this has turned out to define a unique clinical subset of patients who present in the first year with hemolysis thrombocytopenia, kidney failure and episodic recurrences. They progress to kidney failure by age 15. Importantly, immediately from a clinical standpoint, these patients do not respond typically to anti-compliment treatment, but they're unlike patients with the complement problems. They are cured by renal transplant. So this immediately has clinical implications for the patients who have these mutations. The mutation that causes this syndrome is recessive loss of function mutation in diacylglycerol kinase Epsilon. And DGKE is involved in the metabolism of diacylglycerol to phosphatidic acid. And in the absence of this enzyme, you have elevated levels of diacylglycerol, which is a potent activator of protein kinase C in the professional thrombotic cells, the endothelial cells and platelets. And this is known to be activating for thrombosis. So we think we can explain the pathogenesis of this disease from this mutation. So we've also been able to identify a number of mutations and mechanisms that would have been very difficult to imagine solving in prior eras. And one, I think, very interesting example is a form of fascioscabular humeral muscular dystrophy, in this case type 2, FSHD2. So FSHD1 is known to be caused by mutation on chromosome 4 at the Dux4 locus. So Dux4 is a homeobox transcription factor, and it's normally only expressed in the germ line and is methylated in somatic tissues, and it's found in a tandem array on chromosome 4. And there is a facilitating allele in which Dux4 has a polyadenylation sequence that's present that allows it to be expressed in somatic cells. And in conjunction with a mutation that causes hypomethalation at the chromosome 4 locus, this is sufficient to cause this FSHD1. But there are patients who have clinically indistinguishable syndrome, FSHD2, who do not have this contracted array at the chromosome 4 locus, suggesting that there might be a second locus that modifies this, that enables the phenotypic disease to be expressed. And that turns out to have been identified by exome sequencing of these patients, which has identified mutations in the gene SMCHD1 in 15 patients, 15 of 18 patients with this phenotype. So they have the normal array, they have the polyadenylation signal, but they have to have this mutation in SMCHD1 in order to display the phenotype, and they have the phenotype because this gene is required for the normal methylation of that locus. And so in the absence of that, you have a facilitating mutation at the chromosome 4 locus. In conjunction with this mutation that causes it to no longer be fully methylated, you get this phenotype where this gene is now expressed in muscle cells, and this leads to activation of pathway that causes expression of genes that are normally expressed only in early development and leads to death of skeletal muscle cells. Further, somatic mosaicism has become apparent to underlie a number of monogenic disorders, and there are several examples that have come out of the work in the Centers for Mendelian Genomics thus far. One of these is defect in glycosylation, and there are many defects in glycosylation that have been described, and this is a nice demonstration where combined biochemistry, identifying patients who don't fit with known syndromes led to recognition of unique syndrome featuring loss of both the galactose and cialic acid from multiple branches of complex type N glycans and led to exome sequencing, which led to recognition of unique loss of function mutations in this UDP galactose transporter, and strikingly, in all of the patients that have been identified with this mutation, the mutation is somatic mosaic, and so in males this is an excellent gene, in males you find a residual wild type sequence on the excellent gene along with this mutation, and the assumption is that you can't be hemizagous null for this gene, there are tissues that absolutely require this, the wild type gene in order for survival to persist, but in the presence of this mutation, you end up with selective defect in particular organs that are compatible with survival, but leave patients with intellectual disability, seizures, and other afflictions. Similarly, a new spectrum of disorders that I think are particularly interesting has come from the recognition of somatic mutations in the skin, the skin obviously is a great candidate for looking for somatic mosaicism, because you could actually see the defect in the skin, and it turns out that quite remarkably, there are three distinct skin lesions that all arise from activating mutations in rats, we've known about rats mutations since 1983, one might have thought we knew everything that we needed to know about rats, but it turns out that there are three distinct lesions, nevus sebaceous, these cutaneous nevi that give this waxy yellow appearance with alopecia, and these are associated with mutations in either H-rass or K-rass, in classic activating mutations. There's a second quite distinct phenotype featuring an epidermal neva syndrome, but in addition, the striking woolly hair, and it's hard to appreciate from this image, but there are these tight little spirals of woolly hair that are heterogeneous in the background of this otherwise completely straight hair, and if you take these hair follicles and do PCR on the straight and the woolly hair, there's a mutation in rats unique to the woolly hair that is not found in the wild type hair, and the patients that have been identified thus far all have the identical mutation in H-rass, and then lastly, there's a syndrome of cutaneous nevi, and you can see that they follow these dermatome lines, indicating that these are resulting from the same somatic mosaic mutation likely, but they're also associated with renal hypophosphatemia, where the kidney can't hang on to phosphate, owing to elevated levels of FGF-23, a syndrome that has been characterized as caused by mutation in a variety of genes, but quite strikingly in these patients, this is both features, the skin features as well as the dysplasia in bones associated with renal hypophosphatemia comes from somatic mutations in H-rass or N-rass, and G-13R or Q-61R mutations in these two, and this mutation is actually present in the dysplastic bone as well as in the skin, and FGF-23 is expressed in the bone not in the skin, so we think we know where the phenotypic effect of these mutations lie. So these are examples of new somatic syndromes. Thinking about de novo mutations and where we might go to find de novo mutations that haven't been previously described, severe diseases that impair reproductive fitness seem obvious targets, and so in collaboration with NHLBI we sequenced a large cohort of patients with severe congenital heart disease, disorders of the heart that typically do not survive the first year of life without surgical intervention, and we thought these were good candidates for de novo mutation, and in this study we took advantage of the fact that we can actually get developing heart tissue from the mouse and sequenced in John and Cricket Sidemen's lab at Harvard, sequenced the RNA in the developing mouse heart and identified the top quarter of genes expressed in the developing mouse heart compared to the bottom quartile, and then asked what is the frequency of de novo mutation in parent offspring trios in these kindreds and found an excess of de novo mutations in genes expressed in the developing heart. Getting back to Jim's point, we're not very good at identifying from mis-sense variants which variants are pathogenic and which are simply neutral, so as we enrich we think for damaging mutations by going from all mis-sense mutations to mis-sense mutations at completely conserved positions plus damaging mutations and then just the overtly damaging premature termination frameshift and splice site mutations. There's an increase in odds ratio and all are significant, but most strikingly these mutations which are virtually all heterozygous converge on chromatin modification and there are there's a notable increase in enrichment in genes involved in chromatin modification and in particular modification at two at three positions H3K4 methylation, H3K27 methylation and H2BK120 ubiquitination and there are mutations that introduce methylation at this site and remove methylation at this site, mutations that increase ubiquitination and remove ubiquitination and importantly ubiquitination at this position is required for methylation at H3K4 so we think that this is a compelling pathway for that's involved in congenital heart disease and strongly suggests that this is a very dosage sensitive pathway a little bit too much a little bit too little and you get phenotypic effects. We anticipate that these will quite be quite variable in their phenotypic manifestations as well as their penetrance but of course we're driven by identifying cases that were ascertaining cases so again getting back to our earlier point we don't at this point know much about what the penetrance of these variants would be except to say that about 10% of congenital heart disease in this study appeared to be attributable to de novo mutations and then lastly combining both the somatic mutation and de novo mutation we have the unusual occurrence of identifying patients with de novo germline mutations or the identical mutations in benign tumors and these tumors cause aldosterone production in the absence of the normal stimuli for aldosterone resulting in hypertension and these patients with germline mutations where every cell in the body has these mutations also develop primary aldosteronism but in addition they have profound problems due from seizures and complex neuromuscular disease because the same calcium channel is used in the brain and in other parts of the body these mutations electrophysiologically are activating mutations they are activated at less depolarizing potentials this is the normal mechanism by which aldosterone production is regulated and one of these mutations also impairs inactivation of this channel so in this quick tour through what we've been up to I've tried to show you is that we're succeeding in recruiting interesting patients analyzing them and identifying interesting mutations that are providing new insights into biology and I think they point toward a future where we really have the ability to start thinking about in a coherent way of determining the consequence of mutation in every gene in the genome and there's a commentary in science just last week on this point just to look at our data in the first 2000 Swedes we sequenced at the start of this work we found a new gene with a homozygous loss of function mutation in about one in 18 of these subjects we do the same experiment in the offspring of first cousins we find instead a new homozygous loss of function mutation in almost every patient that we sequenced so there clearly is a path there that suggests a way to try to identify mutation in every gene in the genome and of course the trick there in this genotype driven screen would be how to append a phenotype to those individuals and that is a as challenging and it requires careful thought as to how you ascertain your patients if you were to try to do this kind of study so I think among the lessons that we have learned thus far in the two and a half years that we've been running is I think we all are impressed that there is no substitute for both clinical domain and genetics expertise and I think our general experience has been the people who have been successful in identifying interesting families that have novel genetic disorders in the past are very likely to come to you with interesting things that will turn out to yield interesting results and that of course is difficult to universalize but we've been quite impressed with the talent the specific talent of our clinical collaborators from around the world it's clear that extreme phenotypes from consanguinous union will continue to be fertile ground for discovery of new recessive loci it's also clear that some of the traits that we have studied have extremely high locus heterogeneity one of the largest projects that we've done across the CMGs has been cerebral malformations of the brain we've sequenced collectively about a thousand patients with congenital malformations of the cerebral cortex they're probably about 70 new genes that have come out of that set but it's striking how few recurrent the gene burden at any individual locus remains quite small and this then raises the potential for rapid functional screening using CRISPR in model systems for example to try to if you have a gene that causes microcephaly in a human you make a CRISPR mutant in a mouse or fish you may be able to rapidly decide which genes are likely worth going forward with haploinsufficiency due to de novo mutation appears to play a significant role in several congenital disorders the work that I described on congenital heart disease there's further work on congenital abnormalities in kidney development and there's existing work from the groups at Yale in Washington and autism that have not been part of this project that point in the same direction I think going forward there clearly is a need for extensive worldwide collaboration as well as data sharing and these of course raise as many issues as we can hope to solve this afternoon but I think I'll stop there with thanks to all of my collaborators at University of Washington at Baylor and Hopkins and at Yale thanks very much for your attention okay Jim so that's really neat stuff I was wondering what your thoughts were I know you guys are predominantly pursuing whole exome sequencing right I was wondering what your thoughts were about you know non-genic regions and and are there any data to suggest that that's the answer may lie there and so I'm so glad you asked because we are at this wonderful transition point where up until you know the last year if you had asked this I would have said well yeah we could think about that but it's so expensive and so arduous to think about and we now have a practical experience of going down this road with a number of disorders and find that at the end of the day there are some patients who look either the same or more interestingly a little bit different where we haven't found anything in the exome and these are the kinds of things that we're really enthusiastic about going after whole genome sequencing you know if you were to go through all of OMIM and ask at the end of the day after linkage has been found right so how many well-mapped Mendelian loci are there where after going through everything you're left with nothing in your hands and that number is very small we can probably count that on two hands the number of loci in which mutations are exclusively outside the protein coding part of the genome so that is not to say that there will not be more interesting mutations that cause large phenotypic effects outside the exome but my own view is that those will be somewhat we'll have to be somewhat selective in how we deploy that technology but we're very enthusiastic particularly in these cases where we think we've excluded the obvious mutations in genes and we're left with nothing and we think whole genome is an obvious thing to do in that setting and then real quick if I can just follow up you kind of implied at the start that you suspected that we have 22,000 genes and there should be 22,000 diseases right but wouldn't you think from our you know knockout experiments in mice etc I mean do you think every gene will be associated with the disease well so as I indicated we think 15 to 30% will be embryonic lethal so we'll never see viable humans who have those mutations but yeah I think evolution is pretty strong force and most genes are that are not rapidly evolving and are you know on their way to extinction are being maintained in the human population because of selection and now of course the challenge is many of the phenotypes will only be displayed if you're in the right environment right and so I think this gets back to the question of you know if you were to pursue this who should you be sequencing well if you sequence healthy individuals you're gonna have a very difficult time appending a phenotype to those loss of function mutations should you find them so my personal bias is to start with individuals who have significant medical illness particularly preferentially those who are not died you know are not are not already carrying an obvious diagnosis that doesn't suggest some genetic factor that you haven't already measured but those are biases I think there are different ways to skin a cat and I think it'll require exploration to see what works so Rick sounds like the program's been spectacularly successful but I'm intrigued as you think about looking out to non-coding and then I'm also intrigued by and then could of course be many explanations for you so there are a lot of a lot of a lot of papers in the pipeline a lot of sort of hits that have been found and not yet published and so it seems to me that your whole study your whole study will you know if somebody wanted to do a psychological analysis I imagine there are publication biases in the and I'm thinking about interpretability and you're in a sense you and your colleagues are collecting very large numbers of variants in individuals who a priori I presume have pretty high in putting everyone's reckonings have a rather high likelihood of having a genetic etiology yep and so and of course my sense is the higher profile the publication the greater the likelihood that one has found highly interpretable mutations and so if you could speak to the this question of the challenges of interpretability of the data that have arisen in the sort of a meta-analysis of the overall yeah so I think it's a I'll focus my comments on the congenital malformations of cortical development because there we have a really large data set of patients from consanguine is union and Marat Ganell and Joe Gleason I have really contributed a very large cohorts largely recruited from the Middle East with consanguine is union and the state of that analysis is there are there's an excess of genes hit more than once I but there is a striking paucity of genes hit five times or ten times we'd say oh this is you know an incredible slam dunk so you're more in the situation of realizing that there is likely very high locus heterogeneity there are a lot of interesting genes in there the best estimate is somewhere around 75 from Gaines data set which I'm most familiar with I would you know by analogy Joe's set is probably of similar size and try thinking about how to deal with this is I think an interesting one because it ultimately pits the investigators interest a little bit counter to what might be the more public interest and so one of the goals in the program is to get this data to the extent possible and that's dependent upon the what's allowed by each IRB that for which the sample was approved into the public domain so as you might imagine the investigators want to end up with a great paper where they've done functional analysis all the way down the line and that sometimes runs counter to getting it out quickly which would be of more general potentially of more general utility there of course is nothing wrong with having something that nails something into the ground it says there's no question that this is the disease gene and the two cell papers that came out back to back this month it's a nice demonstration of very careful work two groups found nine families with the identical mutation identical by descent independently did functional work that really characterized the gene and its biochemistry and its consequences and balancing those I think is a bit of an art form. Howard. Thank you very much they're both during the concept clearance stage and in the kickoff stage at council we heard about the European efforts and the Canadian efforts and these people's efforts and that people's efforts and the U.S. effort was tried to was talked about well we'll take 200 of the genes or we'll take some number at that point that was being how's it played out now in terms of both your interactions with the other places that are trying to do the same thing and then also you've you've alluded to maybe not really a network nature to this where two groups within the same network published two separate papers at the same time finding the same thing which you maybe that was data that was before the network started but you how's this thing going in terms of the the social interactions of the thing. Yeah so the sociology of this I think is is really quite interesting. Most of it is built on personal relationships that investigators within the network have with investigators all over the world as I indicated there are over 400 investigators who are ascertaining patients that are coming in to the CMGs. Most of those are either one-offs of I've got a small group of patients with a unique syndrome that I think there are patients left to have a diagnosis or they have a cohort of a particular type of patient the cortical malformations of the brain would be a nice example where most of those samples have come from a small number of individuals who have collected cohorts of these samples. So it's largely a chance ascertainment but it's chance that has been shaped by 20 years of prior experience of the investigators who are leading each of the CMGs that they already have existing relationships with people from around the world who typically have worked with us in the past. So by its very nature it is it is not as coherently organized where you would say let's open OMIM and just go down the list and you know we'll take A through L you take M through Z and we'll go that way. It is by its nature a bit of a disorganized free-for-all and it was a I think a happy occurrence that because there's communication among the groups. The two groups said I've got interesting mutation in clip one. I've got an interesting mutation in clip one let's figure this out and submit the manuscripts together. Now you might say well you know everything should be just in one pot and it should all be shared. I think that argues against the way scientists are driven. People actually have personal interests they pursue science because they're passionate about solving a particular problem. I'd be reluctant to try to suck all of that out and say let's you know make all of the data immediately publicly available and thereby take away anybody's motivation to actually finish projects and publish them. But I think that's a balancing act as to how do you strike the right balance between immediate complete open access of all of the data. I can tell you if our collaborators were required to make all the data publicly available immediately it would be very difficult to get any of them interested in collaborating in the program. They of course want some time and the negotiation has been how long should people have to look at the data and the number is six months right now. Rick that was very nice first of all. I want to drill down a little deeper in the science of the somatic mutation. I mean in each of the two examples you showed we had a hook you know one was x-link and the others it was a skin phenotype so you can see it like the colored hamsters or something. It's how do we systematically get a handle on the role of somatic variation in human health and disease. Given that many of the tissues are inaccessible we tend to rely on circulating lymphocytes which every time I see these slides and we talk about it in the group it makes me shudder that you know maybe we're not seeing the genomic DNA as it was inherited. Do you have a feeling of how we get at this more systematically and deeply? Well so a great question and of course the challenge is if you know among the limitations that we have in medicine aside from cancer we rarely get tissue and you know I can tell you in the renal field we don't biopsy kidneys almost ever anymore unless there's a very specific question that you want to answer and I think it is the major limitation of humans which are otherwise a spectacular model system if you will because we understand human physiology we run circles around any model system in terms of our level of understanding of human physiology and also of course the ability of the subject to tell you what's wrong with them and our mice are not very good at that our fish are really bad at that and so we understand human phenotypes in ways that we can't begin to approach others but we do but the major limitation is exactly as you note that we do not routinely get tissue and it's chance ascertainment if when we do and frequently those aren't the patients that we're most interested to get I and I wish there were a convenient way around that but it's it's not obvious to me. Rick can we get a can you clarify something for me you've gone before about the whole genome versus whole exome when you gave the example of if you go to OMIM and blah blah blah you only need a few hands or a couple of hands to all the cases that are but but isn't there an ascertainment bias there I mean there's still a lot of wait a second but in OMIM there's a lot of conditions that are that for which we still don't know the genomic basis thousands right or hundreds that's right and so do we know that those are going to reflect the ones where they are known and they may not maybe the reason they're not characterized yet is because it's harder to figure out what's wrong. So the question that I posed was if you take the low side that have been mapped by traditional linkage approaches right recessives dominance excellent and say how many of those after going through exomes do we not have an explanation for the answer is extremely small there are very few well mapped low psi where we don't know what the underlying basis is in a substantial fraction of the patients you know they're always if for any protein coding any gene that is has protein coding mutation that causes a disease there about 15 percent of the alleles that you never find by sequencing the coding region and those presumably are deep intronic mutations or promoter and enhancer mutations and those are quite plausible but there are very few right hereditary persistence of fetal hemoglobin is a you know one example and they're a handful of examples of where the mutations are exclusively in non-coding regions based on linkage studies so I'm not holding my breath that way so if I understand you're excited about the prospects of doing some whole genome sequencing in a subset of these cases but you're not optimistic that that's going to solve a lot of them unless they're hiding under these de novo you know de novo mutations that we never would have mapped that escaped all the traditional mapping approaches I I think those will not turn out to be a common cause of Mendelian disease do you see it I'm sorry no that's right no go do you see it potentially kind of interesting paradox in what you just said which I don't doubt about and the fact that so many of the relevant GWAS hits are in non-coding seem to be non-coding oh so I think that's a great point so one potential explanation for that so so the paradox would be if the GWAS mutate hits are all pointing to Mendelian loci we must be really bad geneticists not to have recognized the underlying Mendelian loci all these low these many years and so that's one and I think perhaps the explanation is many of the GWAS loci point to the only mutations that you could get at a locus that are compatible with survival so one possibility is that these variants only occur in regulatory sequences because if you had large effect mutations they'd be lethal and we'd never see them and again another limitation of human biology that we're not very good at capturing lethal alleles Bob you want to just go up to a microphone at the table and push the button you're allowed from the peanut gallery so Rick it seems like a lot of these projects are international and on your last point of data sharing and all that I was just wondering if this is the direction we're going into global science and all that is there anything we should be paying attention to about sharing of data IRB ethics review any issues that have emerged that we should be paying more attention to well it is certainly clear that the international standards for IRB are very different than what we have evolved in the United States and our IRB issues it had significant impact historically over ability to maintain insurability and employability and those are not issues that typically arise in other parts of the world that have nationalized healthcare systems so some of the issues are qualitatively different in the United States versus internationally so so there is heterogeneity and I think some of the heterogeneity is a social construct that probably should not concern us too much but I think there are requirements that we try to keep the same standards internationally for protection of human rights that we would subscribe to in the United States I think those are critical on the data sharing side I hope that the other groups get as far along as we are because I think it will be great benefit to us so as I mentioned we've got a lot of interesting singletons that we can't that we you know again as we said earlier in the previous session where you look at mutations I bet that causes this disease but you've only got one allele or one homozygous mutation to work from whereas if somebody else found another one that would strengthen the case enormously or two other groups found independent mutations in the same gene and that's where I think data sharing will make a very big difference and in the CMG the group at Johns Hopkins has come up with a matchmaker algorithm where you put your gene onto a server you don't disclose anything about what the phenotype is we just say I've got an interesting mutation in this gene or in this phenotype anybody else have anything that matches that that they care to exchange data with and this matchmaking platform I think has some potential to try to provide a path forward where you could turn these singletons into things that might more quickly turn into genes that you really could say you've solved something actually had a slide about that my director's report this morning talking about this program you guys put out on the web other questions for Rick simple minded question probably not a simple answer what's completing us what's keeping us from completing the catalog here defining all Mendelian disease yeah so ascertainment right takes scaling up the collection of patients and dollars and the good news is the cost keeps coming down right as you know with the next generation of instruments once we get to a point where we can sequence on the new alumina platform and do exomes despite what they might want us to do with the data it's going to cost about two hundred dollars to do an exome there's still marginal benefit to doing that over doing whole genomes if you're looking for protein coding variants so I think this will be with us for a while although I remain enthusiastic about whole genome so barriers cost and and patients and you know we I think in the cmgs we under budgeted initially for ascertainment with the hope that the web portal would provide huge influx of patients and we realized about halfway through year one that that simply was not going to happen and so we really had to scale up our ascertainment and that really has been ramping up and you know now going forward our ascertainment is roughly matching our capacity so we're in pretty good shape for collectively sequencing somewhere around six thousand exomes for the each of the next two years in this program but I do think it's something that we really you know ought to be getting on with I think it's clearly has to be part of the to-do list in genetics and genomes Rick was that your biggest disappointment in your opinion over the last two and a half years that you didn't get bigger people or a greater number of people coming to you or was there another was there anything else on that disappointment list because obviously lots of things have done really well no I think you know I can't say anyone in the program was really surprised because most of us who have done human genetics have realized that you know this is a retail business is built on personal relationships people want to work with people who they know and have built a level of trust with if you're gonna you know send samples halfway around the world and that's very hard to replace with a web portal so I don't think that came as a great surprise but I think you know the program is is doing very well now on the recruitment side and interesting things are coming in or being discovered so I've got very few complaints time to put you back on council rick that would be a complaint thanks very much thank you rick