 Good morning, and thanks to the organizers for asking me to present, and thanks to Chris for a sort of challenging overview of where we currently stand. So for the next 30 minutes, I'm going to talk to you about the Centers for Mendelian Genomics. I thought I would start out with just some interesting commonalities and contrasts I see between CMGs and comps. Then I'm going to tell you about where the CMGs stand right now, and then finish up with just a quick overview of the burgeoning collaborative effort that we have going on. So I think that the CMGs and the comp programs have a shared objective. That's to annotate the human or the mammalian genome with one or more phenotypes for every gene in the genome. Essentially it's a whole animal functional test for every gene in the genome. It assumes, this goal assumes that highly penetrant, likely rare in humans, variants can be found for the vast majority of genes. Now since these are rare, the search space has to be very large on the human side, so we have to look across really the world's population, and we have to be very effective in identifying those few rare individuals who you've heard a lot about, rare disease already, who have some particular phenotype that I always say to my residents, mother nature is trying to tell you something, and you have to be smart enough to figure out what she's saying. So it's basically this simple model, the classic phenotype model with gene, and I'm talking largely about protein coding genes here, but some perturbation in the gene leads to a perturbation in protein function, and in the intact organism some phenotype, and it's this phenotype or set of phenotypes that we want to obtain for every gene in the genome. Now the CMGs in the COMPs have this common goal, I would say, but they have different approaches, and the approaches I think are complementary. First of all in the CMGs we start with the phenotypes, there's some exceptions to this but for the vast majority of cases we collect individuals, as I say, from anywhere around the world, and then have at it with modern genomic techniques and genetic techniques to try to figure out the variants and the genes responsible for these phenotypes. In COMP of course you're starting from the gene and determining the phenotype as a consequence of gross perturbation of the gene, and each of these approaches has their strengths and weaknesses. In COMP I would say the strengths are that it's a bounded scope, that is to say roughly how many genes there are, and you're well on your way as we heard to making a knockout for every one of these genes, and I hope I see no reason why you will not continue until you've done that very last one. So you have real focus in the program. The weaknesses are that for the most part you're dealing with knockout alleles, and because of the sort of pipeline and sort of production approach to phenotyping then you have to balance maybe all of the possible phenotypes you could measure versus those that you can put into your pipeline in an efficient way. The CMGs, the strengths are I would say we see individuals with alleles of all types at any given locus. Tremendous allelic series, and very often different alleles have different consequences for the phenotype so there's a lot to learn from that set of alleles. The weaknesses are that the individuals we study are rare, as we've already heard, and very often they're imperfectly phenotyped. That is to say we get some patients that come into the CMGs that are very well phenotyped by some very interested investigator at a high quality academic institution and somehow or another they have resources to do to support this phenotyping. But more often than not we get referrals from a healthcare provider in a third world country that does not have the resources nor the aggregated group of experts to really do as much phenotyping as we would like on these individuals. So these are just the realities as I see them. The goal is a common goal and the realities are that we sort of have complementary approaches that suggest that there's every reason for us to work together in this common goal. Now let me turn to the centers, more detail about the centers for Mendelian genomics. The goal of when they were formulated were now seven years into the project. The goal was to identify all genes with high penetrant variants and link those variants to phenotypes. And the strategy varied a little bit from CMG to CMG, but in general we recruit well phenotype pro bands and that may be a single family with one or two affected individuals or cohorts of patients with a single presumably homogeneous phenotype from anywhere around the world. And then we typically perform whole exome sequencing and more so now, a little bit more so now whole genome sequencing on the relevant family members that is family members picked to give us the greatest bang for the buck in terms of balancing sequencing costs to information. And then we use genetics, family relationships, allele frequency data, functional predictions and so forth to try to sort through the long list of variants that we will obtain to figure out which variants are likely responsible for the phenotype that caused the person to be submitted. And then in the case of the CMGs we return the information to the submitters so it's a real partnership between the CMG centers and the submitters who can be anywhere around the world as I've said. And we then urge the submitter to publish the results as quickly as possible because it's important that the work get out. We do increasingly now have developed a good pipeline to post our results on the CMG website so that one can go and at least see that a particular gene has been tagged with a particular variant that we think causes a particular phenotype. Down on the bottom are the symbols, the four centers are Broad, Johns Hopkins and Baylor Together, Baylor Hopkins, the University of Washington and Yale. So this is the way I think of the CMGs. We're looking at the entire population of the world. We're searching for patients who have rare and unexplained phenotypes. So I sort of see it as each of these families represents almost like a colony of cells on a bacterial plate. And each one is an example of mother nature trying to tell us something. And each one then is both a challenge to figure out what is being said but also an opportunity to learn from this patient something about biology and medicine and hopefully something that will be beneficial to the patient and their family going forward. So we tend to keep score using OMIM and Chris already mentioned some of this but as of early this morning OMIM lists about 8,500 Mendelian phenotypes. There are 3,961 genes which can cause Mendelian phenotype, can house a variant that causes a Mendelian phenotype. Those 3,961 genes explain 6,259 phenotypes as Chris mentioned. There are some genes which have variants that cause two or more very discrete phenotypes so that a clinical geneticist would never have guessed a priori that they're due to mutations in the same gene. And Lamin A really has by my count somewhere between 13 and 15 discrete phenotypes. So some genes are really quite good at causing many different phenotypes. The biology behind that is a very interesting question. So that leaves us with at least an OMIM roughly 2,000 plus unexplained phenotypes. And as Chris also indicated there's a steady influx into OMIM of new phenotypes. Increasingly I might say I don't have the numbers but many of the new phenotypes coming in already come in with a gene. In the past the phenotypes came in as a clinical description and a clinical paper. Nowadays they come in more often than not as a clinical description plus some information about the genes and variants that are responsible. But the numbers not the main thing I want to make in this point is that there's plenty of room for new discovery and the number keeps going up every year. So before I give you the current state of the CMG results let me just be clear about some definitions. First of all we use the term Tier 1 to mean that situation where we are at least 95 percent confident that we have identified the variants and the gene responsible for the phenotype and the evidence that we sort of used to get us to Tier 1 are multiple families with the same phenotype and rare variants in the same gene or one family with variants in a candidate gene plus model organism data that recapitulates the phenotype and or robust mapping data and supportive functional data. Tier 2 is when we're quite confident but not as strongly confident as Tier 1 and very often that's a single family with a strong candidate based on variant frequency gene function pathway knowledge and so forth. So you will see on the slides that follow a breakdown of our discoveries in terms of Tier 1 and Tier 2. The other term that will be coming up is what we call phenotypic expansion because many of the phenotypes that we deal with are rare. There may only be a handful of patients in the medical literature. So if you have a set of three patients in the medical literature and the phenotypic features of those three patients are whatever, it's unlikely that you will have plumbed the depths of the phenotypic abnormalities associated with this disease. So as you find more patients you find more nuanced understanding of the phenotype and we call that phenotypic expansion discovery features not previously known to be associated with a syndrome or disorder and that happens over and over again. A reason why a patient, we evaluate submissions and we say this looks like something new. We do it and we find that it's not something new. It's an example of a known disease with a known disease gene and we didn't recognize it before we did the study because the phenotypic features didn't overlap perfectly with the phenotype that we already knew. You learn a lot of biology but you don't find new disease genes. So there's a review written by Jen Posey at Baylor that sort of sums up where the CMGs are currently. The paper is just recently, a few days ago, been accepted and will come out soon but these data are from Jen's paper and represent the first six years of the CMGs. As I said, we're right now in year seven, Q3. And so I only will point out a few things here. Let's see. I just point on the left-hand side, it's a pretty big project that all the CMGs together have nearly 60,000 samples from 22,000 families. A large number of collaborators around the world, more than 2,000 coming from 82 countries around the world. So it really is global and it's reach. Having said that, there are major countries with huge numbers of population that we have not touched at all. For example, China and India, we really have not touched at all. And not very much in Africa and better in South America but not as much as one would like. We've studied in the middle 20,000, more than 2,000 phenotypes and you can see the whole exome sequencing and whole genome sequencing. And on the right-hand set, the publication number is up to 465 with lots of authors from lots of countries. So it really is reaching around the world. We're finding lots of phenotypes and we're making a dent in this problem. There are a couple of reviews if you want to read about what we're doing, this one in American Journal of Human Genetics in 2015. And then I said, I already mentioned the one by Jen Posey, which will be out shortly in genetics in medicine. One of the things that the CMG centers have done is invariably each center has developed some sort of database to keep track of their patients, to evaluate their patients, and they've modified it to some extent for their own uses. So everything I'm saying is about all the CMGs, but in this case I'm going to show you a Baylor Hopkins centric database, which we call FinoDB, and it's where the members of the Baylor Hopkins Center put their submitted cases. We currently have 7,573 submissions. The submission may be one family or it may be a cohort. So there are about more than 10,000 affected individuals in the database. We've sequenced currently around 9,600, and the VCF and Antivir files are in there. This database is searchable by all parameters, the phenotypic features, genes, variants, it's interfaced with OMIM, phenome central, and lots of other stuff. So it's a very useful database for us. It gives a chance for cases that come in through Baylor to be searched by Hopkins and vice versa. We're now trying to, the submitters have some say in how much of their data is available. We're now trying to make the data in FinoDB more accessible to the public, and we've recently instituted something called variant matcher. So if you're interested in a particular variant in a particular gene, you can search FinoDB and see if that variant has been recorded. And if so, whose sample has that variant, and you will get the email of that person, and then you can reach out to that person to see if you have some commonalities. There's also an educational instance of this database, which was created by Nada Subraya, where she took Ceph families, well-sequenced Ceph families, and then spiked each Ceph family with a particular disease gene or variant known to be inherited in a certain way and so forth. So we give these unknowns to medical students, and they work their way through FinoDB like a computer game. It's a good way to attract their attention, and they learn a lot of genetics doing this. And we make FinoDB in all of its varieties freely available. It's been downloaded by more than 450 centers. So if you're interested, let me know. It can be modified, might be useful for various aspects of the COMP project. Now, what are the CMG gene discovery rates through six years? In the left-hand panel are what we call novel genes. These are genes that were not previously known to be disease genes. We currently have 1,252, and you can see the breakdown between Tier I in blue and Tier II in whatever that color is, and that makes up about 39% of our discoveries. Known genes with phenotypic expansion, about 400, and then known genes that is known disease genes we didn't recognize the phenotype and ran them through the process, there are about 1,550 genes in that collection as well. So lots of new disease genes, lots of phenotypic expansion, and so forth. This is a key slide. The y-axis is the number of discoveries, and the x-axis are the years of the CMG project. The key data are in the reddish columns, which are the cumulative novel disease gene discovery. And the point we want to make here is that the rate seems to be going up steadily, and currently it's running at about 200 novel discoveries per year, and one novel discovery per about 30 whole exome sequences. Now, this curve is sort of interesting, or this slope is interesting from a number of points. First of all, at the beginning of the project, people thought that there might be a very limited number of Mendelian disorders, and that we would get to that asymptote pretty quickly, and it would slope over as we finished harvesting the low-hanging fruit and really had hard work to do, so we don't see evidence of that. On the other hand, we've gotten a lot better at it. We have a lot more resources, a lot better technology, so it may be the fact that this curve is continuing up as some mix of less low-hanging fruit and better technology, so we're able to keep up the rate. I think it's probably some of each, but I can't say for sure. The other thing is, obviously, I think you'll quickly realize that this prediction of 200 novel discoveries per year is going to take us a long time to tag every gene in the genome with a phenotype. So we need to get better at this, and there are lots of improvements, as I said, so it may well be that this rate will pick up as we get better, particularly at identifying patients with phenotypes that need explanation. One of the tools has been very helpful in gene identification. Remember, I said in a Tier 1 category is to find multiple affected individuals with a similar phenotype and rare variants in the same gene, and the same Natasabrea that I mentioned earlier, and Otahomish at our place developed this gene matcher program, which all a user has to do is register and enter genes of interest, or if you're a mouse geneticist, enter the human orthologs of the mouse genes, or if you're a fly geneticist or a fish geneticist, enter the human orthologs of those genes that you're interested in studying, and put those genes, enter those genes into gene matcher. You can also enter phenotypic features and other things, but you don't have to do that. You can just, all you have to do is put your gene of interest in there. And if anybody else around the world has said that they have an interest in that gene, you will each automatically get an email saying someone else is interested in this gene, and here is their email, and you can then do with that match what you wish. We don't see the match. We get a readout of the number of matches, but we don't see the match per se. So it's, in that sense, confidential. Here are the data as of August 1st. There were 72,849 total matches. 9,487 genes were entered. There are about half of those genes that have been matched, and so far there are 96 publications citing the value of gene matcher in making, catalyzing the ability to find a new family. So it's turning out to be very popular. It's used around the world. And it seems to be getting more and more valuable as time goes on. Now, here's a lot of data from the CMGs on one slide. The bottom line is that all of these different parts of this slide measure in some way the transition of these research results to the clinic. So we heard from Chris concern about the time it takes to take, move molecular discoveries to the clinic. I think this is going pretty quickly for the CMGs, in large part because the clinical whole exome sequencing labs use the known disease genes to help interpret their results very quickly. On the upper in panel A, I'm sorry, there's some overlap there, I'm sorry, but in panel A, that's the number of molecular diagnosis for novel disease genes discovered by the CMGs in the Baylor Diagnostic Clinic over the last few years. And the panel C below it are the percentage of the CMGs discovered new disease genes already in the gene test registry. So the vast majority of them are in the gene test registry. So it's been a pretty quick migration of these discoveries into clinically relevant databases to improve molecular diagnosis going forward. Obviously this is an important responsibility for the program. Now the CMGs going forward, there's going to be lots of continued work on gene discovery. There's lots of new sequencing technologies that are and so forth that I've listed here that each of the CMGs is testing various aspects of this out to try to improve their solution rate. We're learning a lot, in addition to gene discovery, we're learning a lot about mechanisms of Mendelian disease. It's not the simple system that Gregor Mendel described in 1865. It has a lot of similarities. He did a pretty good job, but a lot more nuances. We need a lot more functional studies and this is one of the things that the mouse project brings, makes possible. And we increasingly are interested in overlaps between the distribution of Mendelian phenotypes and Mendelian variants to those variants causing common complex traits. So there's a lot to learn in that interface, that overlap. Now the CMG and COMP collaboration began roughly two years ago. I remember having a lecture with Bob Brown and Steve Murray, and we all thought there was a clear win-win value for the CMGs. Information about the mouse models really helps us be confident in our discoveries, allows us to evaluate high quality Tier 2 genes and pick out the one that really is the responsible gene. It also tells us, gives us a resource for a look at doing functional studies. And for the COMP project, it helps you prioritize your long gene list into those genes which are most medically relevant. And hopefully the back and forth over phenotypic features of individuals with variants in a particular gene helps enrich both of our studies in terms of understanding the consequences of molecular variation. So we had a, this discussion has led to last year a COMP CMG satellite meeting at ASHG in 2017. That was a very productive meeting. We all agreed to move forward. Working groups were established, and we've had monthly calls for two working groups since then. Steve Murray and his colleagues evaluated the CMG Tier 1 and Tier 2 gene lists asking which ones were already, had already been knocked out by COMP, which ones are already in the COMP pipelines, and which would be ones that would be, COMP had not yet touched and should be distributed among the COMP groups. Currently I think there are 140 that were, Steve randomly divided between three COMP centers listed there for the knockout pipeline. The first mice of this collaboration are expected towards the end of this year. And obviously it will be ramping up thereafter. We've put in place mechanisms for the COMP group to see the Tier 1 and Tier 2, CMG Tier 1 and Tier 2 gene lists as quickly as possible. That quickly as possible is our progress reports which occur quarterly or perhaps they're monthly or perhaps they're weekly. I can't tell. It feels like they're weekly, but I guess they're quarterly. And then that data goes to the CMG Coordinating Center, Tara Matisse and her colleagues at Rutgers. And we're also now in the supplemental projects funded by it to various COMP groups talking about selected high value variants put in by gene editing for specific alleles which we think will have interesting biology, interesting phenotypic consequences and be really high value for both of us. So that is moving well. This CMG production list is visible to COMP PIs on IMITS. Both the public and private genes of private genes are ones in which the submitter won't let us post it yet. And so we talk to the submitter and say, you know, this is a really good thing. You better take advantage of it. Both Tier 1 and Tier 2 along with where those genes stand in the mouse pipeline and other bells and whistles are being added that make this, I think will make this platform quite valuable as we go forward. There's a CMG landing page that's been created with the features shown here and it shows the status for CMG genes where we stand with those. And this interesting table down at the bottom, I'm sure you can't really see that, but it shows a phenotype overlap score between the OMIM clinical features and the features in the knockout mouse. And there's the website down at the bottom. So going forward, I think we will continue to improve the informatic resources for this collaboration. We need to improve, as we go forward, we need to improve communication between the comp centers and the relevant CMGs. I think as we begin to see mice, the CMG investigators are going to be reaching out to the relevant comp investigators to talk about phenotypic features and so forth. A lot of this will be learned as we go. I think we have good momentum now, but I expect it will be more and more over the next year or two. And I think all of us would agree, all of us involved in this would agree that communication and efficient data sharing are key to making this successful and optimizing its output. So with that, thanks for your attention and thanks also to my CMG colleagues which are listed there for you to see. Thank you.