 All right. Good morning, everyone, and welcome to week nine of Current Topics. Last week, as you'll recall, Karen Mulkey from UNC discussed how we use genomic approaches to study complex genetic diseases. Today, we're going to turn our attention to the intersection between genomic technologies in the study of single gene disorders and how a comprehensive understanding of the variants that underlie Mendelian disorders have great potential for bringing new therapeutic approaches for the treatment of these diseases to the fore. It's my great pleasure to introduce to you this morning Dr. David Valley, who is the Henry J. Knot Professor and Director of the Institute of Genetic Medicine at the Johns Hopkins University School of Medicine, where he also holds the titles of Professor of Pediatrics, Ophthalmology, and Molecular Biology and Genetics. You're a very busy man, so he's also the Founding Director of the Center for Inherited Disease Research. For those of you who don't know, CIDR is an incredible national resource that works with investigators throughout the country to better understand which genes are responsible for a wide range of genetic human diseases. His own research programs focused on understanding the clinical, biochemical, molecular, and therapeutic aspects of a number of human diseases. And his work in this area has led to the conferral of many honors and accolades for his contributions to the field of medical genetics, including his election as a AAAS fellow being named a diplomat of the American Board of Pediatrics, and his election to the Institute of Medicine of the National Academy of Sciences. David also has a very strong commitment to education, having directed the pre-doctoral training program in human genetics at Hopkins and the very widely known short course in medical and experimental mammalian genetics that takes place at the Jackson Labs up at Bar Harbor, Maine, each and every year, which I think you're on what, the 53rd or 54th iteration now. Finally, he's been a very longstanding friend of the Genome Institute, having served as a member of NHGRI's National Advisory Council for Human Genome Research, and in his leadership role in NHGRI's Centers for Mendelian Genomics Program. We're very excited to have David with us this morning to give us his perspectives on the search for Mendelian disease genes, and more importantly, his vision for what lies ahead. So please join me in welcoming today's speaker, Dr. David Vallone. Thanks, Andy. Thank you very much, Andy. It's a great pleasure to be here. The escorts on campus this morning asked me if I knew my way around the campus, and I said, well, I'm down here about every other week. But it's always fun to come back, and right now I'm particularly enthusiastic about the title of my talk. We're heavily involved, as you will hear, in the search for Mendelian disease genes, as are many others around the world. But we're very eager to see where this process leads us and what the field will learn from it as we go forward. So I just got my slides here. I want to make sure I did all the right CME things. I only have one disclosure. I'm fired up about genetics. I don't have any other commercial deals. So at the start of my talk, what I like to, I hope, convey is that looking for Mendelian disease genes is actually a lot of fun. And if you look back in the history of genetics, I think you will see both the fascination and the excitement with this process starts from the very earliest times. So for me, the fascination is exemplified by Gregor Mendel, who you all know published in 1865, and spent a few years before 1865 trying to understand what accounted for the variation in pea plant phenotypes. And he was very eager in making the connection between the phenotype of the pea plant, here the color of the flower, and what he called factors. He didn't have the term gene. He didn't know about DNA. But he was a very careful and thoughtful mathematician. And he recognized that the transmission of whatever it was, whatever these factors are, that they were transmitted from generation to generation by a very simple, clear set of rules. And the rules he enumerated are actually used every day in genetics clinics around the world. What an amazing accomplishment. Now, the excitement for me is exemplified by this. This is a picture of Th Morgan, the so-called father of the fly room at Columbia University. He did get his PhD from Johns Hopkins, I must say. And Morgan had been looking for an experimental system in which he could do genetics, basically. And about two years before the date that this paper appeared, he became interested in the possibility of fruit flies. Now, you have to put yourself back. And at the time he developed his interest, fruit flies and genetics were a non-entity. I mean, they just weren't on the radar screen. So for about a year or two, he worked hard to learn how to culture drosophila. But like any geneticist and like Mendel before him, he needed a phenotype. And he didn't have a phenotype. I don't know if any of you have worked in fly labs. I spent all four years in college working in a fly lab. And you get to understand the look and the feel of fruit flies quite intimately if you spend all of your days looking down the microscope or watching them fly around your head, around the room, and so forth. So this is the beginning of Morgan's paper. And he said in a pedigree culture of drosophila, which had been running for nearly a year through a considerable number of generations, a male appeared with white eyes. Now to me, I can just imagine the excitement he must have felt when he saw after a year of looking at all these flies, one that looked different, an outlier. And so of course, he either saw it in the bottle and some legends say he saw it on the wing and had to catch it and put it into the bottle. Normal flies, as you can see below here, have brilliant red eyes and the mutant that Morgan saw had white eyes. And the rest they say is history. But to me, I just wanna emphasize the fascination with trying to find and understand the biology that's behind these clinical phenotypes and also the excitement of when you find it, it opens the door to a whole new area of biology and a whole new understanding of human health. So it's fun, that's the bottom line. Okay, a little background, the burden of Mendelian disorders. There are about 8,000 Mendelian phenotypes enumerated in online Mendelian inheritance of man. I'll mention that again in a moment. About 65% are autosomal dominant, 30% autosomal recessive and 6% X-linked. Most Mendelian disorders, probably around 90% of them present in the pediatric age range. That is to say they're fairly egregious disruptions of physiologic development or homeostasis so they make their appearance well early in life. They have an incidence of about one in 200 live-born infants. So we think of them as rare disorders, but in aggregate they're quite common and there are various surveys that have been done indicating that they account for about 10% of hospitalized children. And this is, they've been done in such a way to account for the fact that not all the surveys were done in very high tertiary care level hospitals. So they make up a major burden of illness. Now, so it's an interesting and important topic. Now, many of you I think are relatively new in science and you may think that the way one clones the gene is that you go to the genome browser and you look up the name of the gene and then you have it right there in about 30, 35 seconds and you maybe make your own CDNA or maybe you dial up and order a CDNA and it comes in the mail the next day. And so that's the way we're doing it right now. I just wanna remind you that prior to the Human Genome Project, disease identification was a slow process. Here are two prominent cloning papers done the Fennel Allian hydroxylase by Savya Wu and the NF1 gene by Francis Collins and perhaps others who may be in this room. And it was roughly, if you were lucky, two to three years per disease gene back then. So a PhD student, the thesis would be to clone this gene and then the interesting biology would have to be done by someone else who came behind you. Here's just the kind of paper that kind of publications that we would do with saying or sequencing and we'd be so proud to link the exons together into a CDNA. So I hope you realize how different that was compared to what it is now. Okay, so what changed of course was the genome project and I think you all know, but I'll just reiterate that it was conceived in the mid-80s. It was debated and argued for a few years as to whether or not it was a boondoggle, whether it would drain all money out of our one-based research, many other concerns. It finally started October 1, 1990 after the so-called Elberts Committee of the National Academy of Science brought people together and after a lot of debate they decided it was a good idea. The draft genome, human draft genome was complete in 2000. This New York Times headline is the headline that announced it. There were two competing teams that were pushing on getting this first and miraculously, one led by Francis and one led by Craig Venter and miraculously after this mini-year race they completed the genome on exactly the same day. So announced on June 27th, 2000. And then of course we went on, and particularly the public project went on to develop a high quality reference genome and Francis declared victory in 2003. So that really changed things in terms of the idea of how we could find genes, how we could find the genes responsible for Mendelian phenotypes and so forth. However, there was one little hurdle we had to get by. Once we got the reference genome, a number of people said, but wait, geneticists are interested in individual variation so we need to really see this reference genome is actually DNA derived from a very few number of individuals. In the case of the Solera project it was mainly derived from Craig. The, so we had to look and see how much variation is there in sort of control or normal human genomes around the world. And the end result of that effort, a hat map and thousand genomes is summarized for me at least here an inconvenient reality. Each of us varies from the reference sequence by about three million single nucleotide polymorphisms and plus a large number of variable copy number variants. So if you're interested in finding the mutation or the variants that are responsible for a Mendelian phenotype that you are particularly eager to solve, then your challenge is to filter out all of the neutral variants or all the variants that are not relevant to the phenotype and find the one out of these three million or three million plus CNVs that's responsible for the phenotype. And there's a real sort of sort of pit that you can fall into here that you see a child or a patient with a phenotype, you sequence a genome, you find a variant in a gene that seems to you to make sense and you make a little hypothesis, sometimes it's straightforward hypothesis, sometimes it's a fairly elastic hypothesis, and you say this must be the variant. But we know that you cannot do that, that you have to really use every tool at your armamentarium to figure out which is the actual causative variant. So I think the seminal paper in this effort, this is a seminal paper in which Mike Bamshed and his co-workers and Jay Chenduri and his co-workers said, wait, we can use genomics and next generation sequencing to solve this problem for us. So most of you know this paper but I just want to reiterate it because it is a very important paper in my view. They did an experiment, and like any good experimentalist they started with a positive control. They said let's take a rare genetics syndrome for which we know the causative gene and we will, the gene is MYH3, and we will sequence a individual with this disorder and see if we can find that variant. We know what we're looking for, we just want to see how easy it is to find. So of course they sequence one variant, one individual, and they looked actually just at the exome and they found in this individual 4.5,000 variants that were deemed to be possibly functionally significant. They said well this is a rare disorder so let's eliminate everything that's in the common databases and that got them down to about 500 variants. They said okay let's do a set of controls and eliminate everything that's in the controls that got them down to 800 variants. They combined these two criteria or filters. They got down to 360 genes that had variants that were not in either of these two resources and they said let's pick the ones that are really predicted to be damaging and that got them down to 160 genes. That's still a lot of bench work to do if you want to work through those 160 genes. So they said well let's just do another unrelated case and go through the same exercise except we'll combine the second case with the first case on the hypothesis that whatever the gene is, both of them will have variants in that gene. Not necessarily the same variant, probably different variants, but should be in the same gene. Well that really helped. They got down to 10. That's manageable and they said well that's great. Let's do a third. They got down to two. They said let's do a fourth and they got down to one which was the correct gene that they needed to begin with. So that says, of course some of you are thinking it's a good thing they didn't have locust heterogeneity because that would have blown this exercise completely. But this says that if you use genomics and you use some clinical genetics then this is a way to solve this conundrum of how to sort through all of these variants to get to the causative variant. I'll just say parenthetically for the dysmorphologists in the room, I never was a fan of dysmorphology but I think this result and many, many, many results since them really have verified that dysmorphologists when they categorize disease, they're often right. Even just simply from the appearance of the patient. This would not have worked if it had not been for the hard work of clinical geneticists, dysmorphologists who recognized this as a syndrome and said here are four individuals with this syndrome, whatever the story is. So that's great. Now at a risk of advertising our own work we followed that paper and we said, look, that's sort of a brute force genomics approach coupled with clinical genetics. It clearly worked. But let's not forget the rules of genetics, the rules that Mendel taught us. So we collaborated with David Goldstein at Duke. David for the students in the room came to give us a talk. During the lunch time with the students he mentioned that he would do, he was interested in doing a whole genome on a patient that had a clearly Mendelian phenotype that was unexplained. Now most of the students were interested but they didn't follow through. Fortunately for me a student in my lab was interested and followed through. And she within about three days, this is not a sobriety, within about three days she was in contact with David with a family and with a patient. And that patient is here, this is a patient of Julie Hooverfong's. And so what the plan we came to was well let's just do the whole genome sequence on this patient. We will still have the problem that I've mentioned to you. But the family, while not big enough to do classical linkage studies and get a lot score of three, certainly is big enough if we do a linkage study we'll be able to eliminate large regions of the genome which clearly are not shared by all individuals with a phenotype. So we did just exactly that, we sequenced that patient, we genotyped the other 12 individuals shown with the red asterisks. And we did linkage calculation not with the aim of finding a significantly significant linkage peak but simply of finding all the regions in the genome that could not fit this family and eliminate them. And that allowed us to eliminate about 98% of the genome. We got down to something like 278 genes in the areas of the genome that were left. And we started searching through these peak positive lot score peaks. You'll notice they're quite modest. This I think is one, I can't read it from here, but. So sure enough in the second or third peak we looked at, not the highest, there was a gene PTPN 11 which had a clear loss of function, a small frame shifting mutation. We said this could be it. We did the segregation but of course the linkage had already been done so we knew the segregation was gonna be okay. It was and we got another family with the same phenotype and sequenced this gene and sure enough that gene also and that family of the gene also had a loss of function mutation. So basically QED in about six weeks time. So we would argue that you sort of couple genomics plus genetics, use genetics where you can and you can develop a strategy to find the disease genes. So where do we stand now? Now I'm thinking as I gave this, as I'm giving this lecture that I usually the night before recheck OMIM and I didn't do that for this lecture. I apologize. So these data are as of about three weeks ago, April 8th. And I mentioned that OMIM is the idea of my colleague, Victor McCusick, now deceased at Hopkins and the resource has now been run quite ably by Adahamish, the clinical director of the IGM for the last dozen years or so. And so we go there to find out these kinds of numbers. About 8,700 Mendelian phenotypes on April 8th, 3,181 disease genes. So those are genes which the evidence strongly supports that variation in that gene, some sequence variant in that gene causes a human phenotype. The big number, but as I put here, it's only 14% of the total. And there's still around in OMIM about 5,000 unexplained phenotypes. Or no, I'm sorry, that explains 5,000 phenotypes and in OMIM there's still more than 3,000 unexplained phenotypes. Now any of you that work in a genetics clinic know, and this has been the subject of many studies, that about 30 to 40% of the patients that come into a genetics clinic, you do your thing and you don't get a diagnosis. So there are many, many unexplained Mendelian phenotypes, many, many, that don't even have a name. We can't put a diagnosis on it. We say whatever this is, it's some constellation of clinical features and we have to figure out what the story is. So this gives you a sense of where we stand and also the magnitude of the challenge. And I think I will just mention here that this number is very interesting to me. Some people would say, there's sort of two schools of thought right now. One is that if you're smart enough and you look carefully enough, you will find a Mendelian phenotype for every gene in our genome as a rough approximation. That in a way is thinking evolutionarily. Why is the gene there? What are the selection factors that keep it functional if it's not important for something, right? So that says we're only, we got 85% of the way to go here yet. Now there's another school of thought that says, no, I don't think there will be Mendelian phenotypes for every gene. I think there'll be some subset of genes. Maybe it's only gonna be three, 4,000. That in which variation can have a strong enough effect on the phenotype that it will cause something that segregates as a Mendelian disorder. So that's an interesting long-term question. I don't know the answer to that. As I said, I favor every gene has a phenotype, but I recognize that often that means you have to look very, very, very carefully for the phenotype. And you all can think of reasons why a variation of gene might not have an immediate phenotype. Maybe perfect redundancy in overlapping biological systems, or, but again, evolution would begin to take care of that, or maybe early in utero lethality. And I think we do need to look at the Mendelian causes of early in utero lethality. I think that's gonna be less of an issue though, because any gene, any disease gene that we know has a spectrum of functional mutations across a spectrum of functional consequence. So I think if a gene is very highly conserved, most mutations may be lethal, but there'll be a few that just modestly tweak the function of the gene, and will result in viable phenotypes. For example, think of now our ability to diagnose males with Rett syndrome that we didn't think was a biological possibility before. So just keep looking. This shows the rate of increase in identified genes over time. And it's going up. I'm a little surprised to come back to the argument small number of Mendelian disease genes, or big number of Mendelian disease genes. I would like to expect that it's starting to rock it up because of this development of the strategies that I mentioned. And it's not. However, if you look at a recent paper by Kim Boycott, Nature Reviews Genetics in 2013, she actually makes this calculation, worked very hard to do it in the survey PubMed and so forth. And here's the rate of the number of new disease genes discovered sort of pre-development of those strategies and post-development of those strategies. So that says that as expected, there is a substantial bump in our ability to find these genes. And I'll just have to come back a year or two for now and tell you which way the story goes. Okay, now let's talk about what are the consequences of finding Mendelian disease genes. And I'm sure you can all think of them. I just want to enumerate them so that we are all on the same page. So first of all, and importantly, it connects a particular gene to a particular phenotype. We have many genes still in the human genome that are relatively unannotated and we really don't know what the function of the protein product is, or what would happen if that protein product was in some way defective. So this Mendelian disorders make these connections between genes and phenotypes. It also connects a particular phenotype to a biological system. That is a set of proteins that work together in some system, and typically there's a hierarchy of biological systems. And it shows us the consequences of how the system works normally and how the system works when it's perturbed by mutation. Again, very important for understanding of human biology. It unravels locus heterogeneity. I'm sure you had, that's when one phenotype is caused by mutations in several genes. Think of retinitis pigmentosa, for example. And so it's a way to find all of the genes that can give this common phenotype that we're not able to unravel at the level of the clinical exam. It enables precise diagnosis and counseling. This is, of course, critically important for our patients in terms of what the diagnosis is exactly, what are the prognosis of patients also accurately diagnosed at that level, and what are the counseling issues around that diagnosis. And it's the first step in the path towards informed treatment. Now we've often in the past sort of thrown off as the last thing, well, this will help us learn how to treat disease, but I hope I can show you later in the lecture how this knowledge really allows us to develop rational approaches to developing treatment and that those treatments, in my view, I don't have data, but in my view, we're seeing increasing successes in that effort. And I think this Mendelian disease gene effort will really support that in a very positive way. It turns out also to be a tremendous research stimulus because you have a phenotype, you know the gene, you wanna understand the biology, you wanna understand the biology to develop treatment, you wanna take that treatment back to the patient, so it turns out to be a tremendous research stimulus and just as an aside, not on this slide, it also is a great educational activity for those of you that are in training, the ability to see a patient with an unknown diagnosis in the clinic to do these kinds of studies to find out the gene that's responsible, think back to Mendel and Th Morgan. We'll excite you, you'll become interested in that biological system, you wanna know what's wrong with it and your patient and what you can do about it. So it's very educational as well. Okay, so I would give, as what I consider the gold standard for how rational development of treatment can really make a difference, the work of my colleague Hal Dietz at Hopkins, who as I think almost all of you know has spent the last 35 years, I guess 30 years working on the Marfan syndrome and he was a co-author on the paper in Nature in 1991 in which the gene Fibrillin-1 was identified as having mutations responsible for the development of Marfan syndrome. Here's a man with Marfan syndrome, tall, stature, long limbs, dolicosthenomelia, pectus caranatum, sometimes scoliosis, dislocated ocular lenses and of course most critically, dilatation of the aortic root which can lead to aortic root rupture and sudden cardiac death. So when Hal and his team, which included Victor and Reed Piritz and Claire Francomano and others, Gary Cutting, found the mutation and showed convincingly that mutations in this gene Fibrillin-1 caused this phenotype. The idea was that Fibrillin was an extracellular matrix protein that had played some kind of structural role, think Kevlar or something like that in the extracellular matrix and that when this gene was mutated and the protein Fibrillin-1 was defective that the connective tissue was weakened and could tear and break in response to physical stress. That seemed to make sense for the aorta and it was a little hard to figure out why they had long arms and long legs but we sort of brushed it out of the table and so forth. So we thought we had an understanding and at that level of understanding, the idea of treating Marfan syndrome was quite hard to contemplate because basically you would say, well, if you have to replace that protein, you'd have to get a normal amount of protein or at least a functionally significant amount of protein into the extracellular matrix throughout the body and that's really a tall order when you think about it. So time passed, work was done, Hal worked harder and harder and harder and harder on this disorder and the breakthrough really came when he realized that Fibrillin-1 actually has protein motifs that are like protein in the extracellular matrix that binds TGF-beta and it turns out that Fibrillin-1 of its functions, not a structural function but really its function was to serve as a TGF-beta depot in the extracellular matrix and most of TGF-beta binds to these motifs that keeps the free TGF-beta level low but is able to supply it in response to various perturbations. So the idea of the pathophysiology of Marfan syndrome switched from a weakness in the extracellular matrix to excessive free TGF-beta, essentially a TGF-beta storm and that was amenable to small molecule therapy. For example, those are in the drug already in use that was known to block the TGF-beta receptor and of course the results of these kinds of study how went on to do all these studies. What's shown here is the measurements of the aortic root diameter and these are on a set of young patients with very severe Marfan syndrome. This shows when lozartan is started, you can see the rate of increase of just expansion of the aortic root and how that slope changes dramatically when patients go on lozartan and this is a one particular very severely affected patient. Essentially it stops the aortic root dilatation dead in its tracks. Amazing, absolutely amazing and it turns out I won't go into it but many of the other phenotypic features of Marfan syndrome are blocked or improved by going on lozartan. In fact, I recommend you all take a little lozartan when you go home this evening. So it's, I think it taught us that we should not think that we should investigate each of these disorders carefully and try to understand the pathophysiologic mechanisms and there may well be opportunities to intervene. And so this should have been clear for a long time. This is sort of an attempt on my part to cite a few Mendelian disorders that really make predictions for drugable proteins. That is to say, if you perturb the function of this protein and it's a protein that can really affect the phenotype, maybe this is a node where if you need to reduce the function of that protein, you can develop a small molecule to inhibit that protein. Or if you need to stimulate that protein, you can develop a small molecule to stimulate it. So of course the granddaddy of this would be familial hypercholesterolemia and the LDL receptor and which resulted in the development of statins. Very widely used compound. Marfan syndrome we've already talked about. Familial amyloidosis where drugs are being developed to improve the folding of these large extracellular molecules in the endoplasmic reticulum. The recent successes in cystic fibrosis of developing allele specific molecules that are increasingly helping an increasing fraction of patients with cystic fibrosis. Narcolepsy, the inborn air narcolepsy led to the discovery of this gene HCRT and developed a drug that perhaps will help with sleeping disorders. An interesting one that just came out relatively recently was looking at patients with congenital pain and sensitivity where the defect was in the gene that encodes this protein NAV 1.7 and if you can develop small molecules against that receptor, the prediction will be that you can have dramatic effects on relieving pain therapeutically. So these Mendelian disorders are pointing the way, they're predictive, they're pointing the way to targets that are pharmacologically interesting. This paper, I show this paper, it's the same idea just came out a couple months ago I guess. And it's the same idea, it wasn't really discovered as a Mendelian disorder but this is, all I have on the slide is enough space to put the authors. But this is a paper by Flamik as the first author and David Altschuler as the last author where after doing 150,000, combining GWAS studies on 150,000 individuals, they recognize that a particular gene SLC30A8 which includes a zinc transporter called ZNT8 that is expressed in the membrane of the granules that contain insulin and the pancreatic beta cells that if you have a loss of function mutation in this gene, your risk of developing type two diabetes regardless of all the other variables is reduced by roughly three fold, roughly three fold, that's amazing. And I think all of you know that type two diabetes is rather prevalent these days. Developing a small molecule that could hit that target likely to be important. Okay, so that's some immediate benefits of solving Mendelian diseases rather rapidly. More interestingly in some ways to me is what questions could we ask if we had phenotypes for let's say more than 50% of the genes in our genome. I showed you that we have about 15% now. So these are more long-term questions that have more biological basis. But I think there are questions that will also will increasingly be able to answer if we have a real significant number of disease genes. So one of them is the relationship between genes and phenotypes. So these are data from OMIM. And what I show you here is the number of disease genes versus the number of phenotypes. So most disease genes cause one phenotype. But you see the curve tails off here and some disease genes actually have variation that are responsible for two phenotypes or three phenotypes. These are phenotypes that clinical geneticists have thought to be a discrete phenotype and were surprised when they see that mutations in the same gene cause two discrete phenotypes. And there are actually a few out here that cause about 13 phenotypes. Lamin A, which is a protein in the nuclear membrane that is responsible for certain kind of muscular dystrophies and progeria and other such things. And two extra matrix proteins, one is an elastin and one is a collagen. So what is the biology behind this? Why is it that mutations in some genes can so perturb the biological systems in such a way that it causes a discrete phenotype and a different variant in the same gene affecting the same protein can cause a completely different or at least what we currently believe clinically is a completely different set of problems. Well, that's some interesting biology. What is that? How do we explain that? I think if we have, we've made a rough attempt at it. We failed, but we're gonna go back when we have more disease genes. This is, here I just emphasized this point. For many phenotypes, we have one gene, many variants, one phenotype, many inborn areas of metabolism are this way, and that may tell us something right there. It may be the inborn areas, of course, we not only see the clinical phenotype, but we have some very precise biochemical markers. So maybe our diagnostic precision is far greater on average in the inborn areas. That may be part of the explanation. So is this biology or more precise diagnostic mechanisms? But the extreme end of the phenotype, the scale, which I already mentioned, one gene, many variants, many phenotypes. Lamina A is the classic 13 discrete phenotypes, shown here, one of them progeria. Here's another one, a form of muscular dystrophy. We would not have, clinical geneticists would not have thought that this person had the same mutations in the same gene as that person. So what's the biology there? Let me show you another example. This is an amazing example also from Hal Dietz's work. So here's Marfan syndrome, long arms, long legs, hyperextensible joints, scoliosis, distension of the aortic valve, all of these features. And it's caused by loss of function mutations in FBN1. Here's another disorder, it's called stiff skin syndrome. Actually, I was involved in taking care of, I think this family for some time, they used to be followed by Victor. They have short stature, progressive fibrosis, stiffening of the joints, joint contractures, Fibrillin1 mis-sense mutations, same gene, amazing. I would have bet my entire fortune, whatever that tiny amount is, that they don't have mutations in the same gene, and yet they do. Now in the case of stiff skin syndrome, all the mutations are in a particular domain of Fibrillin. Fibrillin has many 8-cysteine domains. This is one of the many 8-cysteine domains. So that says there's some interesting biology there. Something about that domain versus the other 8-cysteine domains in Fibrillin has the consequence of when it is altered by mutation of affecting the phenotype in a completely different way. So basically what you're looking at here is allelic heterogeneity, pretty amazing. So we need to understand that biology well. I personally think that part of the answer to the question is this, that all of the proteins that have a lot of phenotypes also have a lot of domains, protein motifs. And so it may be that we start thinking about motif-specific phenotypes rather than gene-specific phenotypes. So it's another long-term consequence of finding a lot of Mendelian disease genes. And we've come to call that phenotypic expansion. And basically I define this as adding additional features to a known phenotype or adding additional phenotypes to a known disease gene. So medicine is very circular in its reasoning. You see a patient in the clinic has all the features of a particular disorder. The doctor says, I think I got the diagnosis. I'm going to do the test for that disorder. And sure enough, it has that disorder. But if you think about it, that's a very circular approach. We're not really looking agnostically at what might cause that disorder. And if we look in a different way that is agnostic to cause, let's say surveying the genome for mutations, we will find aspects of the phenotype that we wouldn't have thought were related to this particular disorder in the past. It just expands your thinking. And that makes you a better doctor and a better biologist. So history shows we find the phenotypes that we know. New technologies, new ways of looking expand the classical phenotype. The classic example of this for me is that Victor McCusick was interested in Marfan syndrome for a long time. He had a clinic, hundreds of patients with Marfan syndrome in, I think, 52 or 53, homocystinuria, an inborn error in methionine. Metabolism was described, I think, by Harvey Mudd and colleagues here. And they had many of the same phenotypic features as Marfan syndrome. Now, PS, they are autosomal recessive instead of autosomal dominant. Victor went to his Marfan clinic and said, wait a minute. I better rethink this. And sure enough, in that Marfan clinic were several patients, several families that had homocystinuria. We had just lumped them together because we were sort of had blinders on to the idea that these subtle variations in phenotype would be important. So new understanding from a gene-based view. So this is an example that makes the point very dramatically to me. This is a patient that I currently follow, a two-month-old who presented at two months of age with dilated cardiomyopathy. She had a family history positive for a Sib who died of dilated cardiac myopathy at eight months of age. That child was worked up by our genetics group and by our cardiology group. And that child had a panel of tests for dilated cardiomyopathy genes. And it was negative. And the conclusion, and it's the conclusion that pediatricians always fall back on, is this must be viral myocardopathy. And the family was counseled that it was likely to be something that would not ever recur again in their family. They tested the Mendelian hypothesis and had this little girl here, the girl who presented at two months of age with the same phenotype. So this girl was on ECMO for several weeks, actually, and then underwent a cardiac transplantation at seven months of age, which was successful, restored her cardiac function. When we started our Mendel project, we thought that this was probably a good one to include in some of our pilot studies. And it turns out she's a compound heterozygote for mutations at the ALMS1 locus, a gene well known to medical geneticists as the cause of Ulstram syndrome. Now, Ulstram syndrome patients in the medical literature at the time were known a fraction of them were known to get dilated cardiomyopathy, but it usually began around 10 or 15 years of age. So that gene was not on the dilated cardiomyopathy panel. That's why it was missed at the beginning. And it is a very nasty disorder, not that there are any that are not nasty, but that involves many, many systems, the central nervous system, the eyes, the ears, the lungs, the liver, the kidneys, and so forth. So this girl now has all the manifestations of Ulstram syndrome save the cardiac problem. So we didn't, so that's an example of phenotypic expansion. That is that early cardiomyopathy is a feature of Ulstram syndrome. We just didn't have that in our mindset at the time. If we did, perhaps we would have made the diagnosis on the first girl, and then the parents could have made their reproductive decisions with the full information. Here's another patient that we've seen in our Mendel Clinic. He has these very striking skeletal features, a characteristic facies, very prominent, what is called sclerocornia, which is growth of the cornea over the iris, and sometimes the pupil, which leads to blindness. Intellectual functioning is normal. Our clinical phenotyping committee for the Mendel Disorders group looked at him and could not make the diagnosis. They were particularly intrigued by this sclerocornia. We sequenced him, and we found a gene in, we found mutations in a gene called SCARF2. They were loss of function mutations. That turns out to be the gene that's responsible for a disorder, which I'm sure is right on the tip of your tongue, the Vanden-Enden-Gupta syndrome. And there were about three or four families in the literature over the last 12 months before we did this young man that had all of the same skeletal features but did not have sclerocornia. So we were focused on sclerocornia. If we had set that aside and focused on the skeletal syndrome features, we probably would have made that diagnosis and just tested SCARF2. But now we know that sclerocornia is another feature of Vanden-Enden-Gupta syndrome, expanding the phenotype. Okay, so that's all I'm gonna say about the phenotypes. Now I wanna say, here's another sort of long-term question. What can we learn about the basic principles of disease from studying a large number of Mendelian diseases? And we know that genes do not work in isolation. They encode proteins that work with other proteins in biological systems. Jacob called these integrans a great word. It didn't stick, unfortunately. And he said that all of biology or life was essentially a hierarchy of integrans. One integran, all the proteins in an integran, and then interacting with another integran and building the hierarchy that eventually ended up as an individual organism, living organism. But we are not so good right now at understanding the rules of how these systems or networks work. And we could ask, in terms of disease, are all biological networks equally vulnerable to disease? And if not, what are the rules that govern which ones are more vulnerable and which ones are less vulnerable? Are all components of a biological system, that is each component of that biological system equally vulnerable? Or if not, what are the rules that govern that variability? Can we predict the consequence of variation in one component? Can we predict the behavior of the system in response to the consequence of variation of one component? That's what we'd like, I think we would like to be able to do. We can enumerate the biological system. Maybe it's so many genes and we say if a variation in this gene reduces the protein function by 50%, what will that mean for the function of that biological system? That's what we need to understand in my view and we can't. So let me just give you a rough example about our all components equally vulnerable. On the left is the RASMAP kinase pathway which has about 30 genes, a large number of phenotypes, certainly more than 15 different phenotypes and no one gene predominates. That is to say each of the components is represented by a set of rare mutations in a particular gene. On the right is another biological system, the biological system involved in biogenesis of the organelle known as the peroxisome, also about 30 genes. One to two, actually two phenotypes, that's all. And about 65% of the mutations are in one gene, PECS-1. So you can see that these two systems are quite different at that level of comparison. Could we look at these systems a priori and predict, well this system's gonna have lots of variation in lots of genes and lots of phenotypes, this system's gonna have mainly mutations in one gene and the phenotypes are all gonna be similar. We can't do that. So we need to learn those rules. We made an effort a few years ago to learn those rules. Barton Childs and I collaborated with Lozolo Berrabasi and Joe McInerney is in the room, he participated in many of these discussions. And we wanted to understand what could we use from having a lot, what could we understand about disease writ large by having many, many disease genes, a collection of many, many disease genes. When we originally decided to do this, we had this idea, Lozolo had just published a paper about an analysis of the yeast gene network and the yeast analysis of knocked out every gene so they know which ones are lethal and which ones aren't and so forth and so on. So we thought, great, we'll take all of the human disease genes, this is in about 2003 or four and there are about 1,500 disease genes, we'll find the yeast orthologs, then we'll ask for the yeast orthologs of the human disease genes, what were the phenotype in the yeast that were knocked out for that gene and could we learn something, could we take that, whatever lessons we learned and bring it back to the human system. And the answer to that question was no, we couldn't because by the time we've got down to how many orthologs and so forth and so on, we had an end of a very small end. We revisited it with this paper and we could begin to group systems and the main thing is I have the ability to show a slide like this with one of these neat network diagrams and we understood some things about disease from this effort. I think at this time we had 1,700 disease genes and we began to understand we could see a statistically significant relationship between the number of connections, that is the number of edges between a particular protein product and all other protein products and the likelihood that the mutations in that gene would be a neonatal lethal. The hubs, the so-called hubs that had many connections were more likely to be lethals. We had to go to mouse knockout data to make that estimate but it's nowhere near what I think we could do if we had a whole bunch of these, maybe better than 50% of our genome and really learn something about the principles of biological systems and their relationship to disease. Are there unappreciated principles and if so, what are they? You can do this, so then you might say, well, maybe I could look at one system that has been deeply explored and to some extent that is true in the eye. This is a wonderful review as written back in 2010 by Alan Wright and his colleagues and they decided to look at one well-defined system, retinal degeneration, that's an accessible system, easily recognized phenotype and at the time they wrote this review there were 185 mapped retinal degeneration, loci and 146 of these, the disease gene was known. So here's a system that's pretty well saturated. So what did they learn? This is the eyeball, the layers of the retina and the photoreceptors and the pigment epithelium down here. So one of the questions they asked is what is the cellular function of these 146 photoreceptor degeneration genes? And the idea at the time was since most of these phenotypes were limited to the retina these were probably genes that were only expressed in the photoreceptors. And the bottom line is that most of these genes actually show widespread rather than photoreceptor specific function. So that says that the reason the phenotype is limited to the retina is because the retina has some special demand on this gene product and the biological system in which this gene product works and that special demand makes it particularly vulnerable to variation at this locus. Not turning it around that this gene is only expressed in this particular tissue. Another, that's a lesson that we already learned, right? If you went after phenylketonuria by saying I'm gonna only look at genes that are expressed in the brain because the phenotype is metal retardation you wouldn't find phenylalanine hydroxylase that's expressed in the liver. So they also found as not unexpectedly that these alleles were extraordinarily rare and that when they tried to compare this set of genes with those genes that were being pulled out now for age-related macular degeneration the kinds of genes and the spectra of variants was quite different. So there seemed to be discernible difference between the monogenic disorders and the Mendelian disorders. They also were able to look at the rate of photoreceptor decline and it turned out to fall for many different, many of these retinal degenerations it fell on one curve regardless of what the defect was and that led to some calculations as the mechanism of cell death and the fact that it's a stochastic cell death with an increased rate in these disorders. So they found some principles. So I think if we can find lots of disease genes we'll find some principles. I won't go into this but the same can be made a same argument can be made with what we're learning about cancer or somatic cell disease gene where my colleague Bert Vogelstein wrote this and Ken Kinsler wrote this wonderful review. They summed the data from more than 3,000 tumors that had been sequenced by whole exome sequencing. They had a collection of 18,000 genes that had mutations so a huge number of genes that have mutations but by applying a set of criteria they came down to 120 driver genes that is genes in which mutation is capable of causing malignant transformation. Then they were broken down into 71 tumor suppressors, 54 oncogenes and they populated only 12 biological pathways. They took an enormously complex, what seemed like an enormously complex biological problem and actually were able to distill it down into some key pathways that begins to be sort of a manageable number for us to get at. So a principle garnered out of a lot of mutation study. And so that's the second question, long-term question. The third question is what is the relationship between rare Mendelian diseases and common complex traits? And so there's two schools of thought, one the variation in the genes involved in common complex traits will be completely different from those that are involved in Mendelian disorders. The other is they're gonna be the same genes with alleles of different functional significance. And there are examples of rare Mendelian disorders leading to an understanding that can be applied to common complex traits. Here familial hypercholesterolemia leading to a better understanding of lipoprotein metabolism and all of the variants that go into coronary artery disease. There is this paper which has a great title. A non-degenerate code of deleterious variants in Mendelian loci contribute to common complex traits. The authors are there, this is mainly from the University of Chicago and other places. They surveyed 110 million, million medical records looking for connections between the genes that are involved in Mendelian disorders and complex traits. The methods and the statistical analysis are totally beyond me, so I cannot comment on whether or not they're valid. But I do like their conclusion, which was that the genes responsible, here's some of their data, the bottom line was that they interpreted their results as indicating that the genes identified by being responsible for Mendelian disorders are also enriched in those genes that seem to be involved in common complex traits. So it suggests a biological connection, which I think most biologists would have guessed would be exactly that way. But this is an area that's gonna be studied much more intensely as we go forward. Now, last few minutes, let me just tell you about the centers for Mendelian genomics. Andy mentioned that. These are NHGRI sponsored project or currently now NHGRI and Heart, Lung, and Blood as well that was an RFA put out to find as many patients as possible with unexplained Mendelian disorders and to use the tools that we've been talking about to find the disease gene and the variants responsible. And three centers were funded, one at the University of Washington with Debbie Nickerson as the PI, one at Yale with Rick Lifton as the PI, and that at Hopkins, we partnered with Baylor. I'm the PI and Jim Lipsky as the co-PI to form what we call the Baylor Hopkins Center for Mendelian Genomics, BHC-MG, and there is our website there. On the left, you'll see the website for the community portal for all three centers. So the goals, identify Mendelian phenotypes associated with more than 50% of our genes, improve diagnosis and form pathophysiology, increase opportunities for treatment, increase our understanding of disease principles and education. These are all the things that we've been talking about during the course of this lecture. Now if we want to find all of the disease genes that affect Homo sapiens, we have to look at the entire population of Homo sapiens. We can't just look in Baltimore or the mid-Atlantic region or the United States, we have to look around the world. In my mind, I think of it like this, that the world is sort of like a Petri dish and we have to look around the world to find all of those rare families and individuals that are affected with a Mendelian disorder, maybe a known Mendelian disorder or it may be a constellation of symptoms that we've not even been able to diagnose it yet. But from the segregation of the family or other reasons, it looks like it's a Mendelian phenotype. Sample those individuals and do our thing and hopefully with some reasonable rate of success, find the genes that are responsible. We've been looking around the world, this sort of sums up where we were almost a year ago, but we certainly now are getting samples from, I think somewhere around 45 or 50 countries around the world and many more in the pipeline right now. The overall strategy is to find well phenotype cases and families, this is obviously, if you think about it, critical, right? We don't wanna be using our resources over and over again to study individuals who have some disorder that clearly is due to mutations in a gene that we've already hooked up to that phenotype. So we have to do very rigorous phenotyping. Then we perform whole exome sequencing on relevant family members. If it's a one family case, the numbers right now are what we're doing about three to four individuals in a family to have a reasonable chance at solving the disorder. We use family relationships, the allele frequency data, functional predictions, model organisms and functional experiments, functional studies, that is real experiments to identify the responsible genes and variants and we return the information to the submitter so that the submitter who's done all of the hard clinical work gets some academic benefit from this endeavor. If you're interested, come into our website and we'll tell you more about it. We've developed a web-based tool that we call FinoDB, by we, I mean Adahamish and her colleagues, that allows for rapid and efficient entry of family or cohorts. The remit was to be a person who knew the family to be able to submit all the information in less than two minutes. It probably takes a little bit more than that but it's certainly less than five minutes. And if you've used it a few times, you get close to two minutes. Provides unique identifiers, it provides a uniform set of clinical features so that our review committee can look at it and say this is a new phenotype or something that we clearly already know. It accepts image data, so x-rays, photographs, so forth. It's searchable, it organizes the phenotypic features in a standard format for easy review and we've added recently an analysis module and all of our analysis is done in the same tool so that we can easily go back and forth between the clinical phenotype and the analysis. This is the pipeline for the Baylor Hopkins Center. It's a similar pipeline at Yale and at UW. But here's the submitter. The submitter comes into FinoDB, submits the information. The information is reviewed by our phenotype review committee. We have a sample acquisition coordinator who goes back and forth between this committee and the submitter to help the submitter. If it's accepted and the LC review committee says the consenting is okay, then it goes through sequencing, analysis. Ultimately, the data go to DBGaP but they go first to the submitter and the submitter has a reasonable period of time to write up a manuscript with that information before it becomes public in the DBGaP. And at Baylor Hopkins, the sample acquisition coordinators are Corinne Bame at our place and Samantha Penny at Baylor. Here's the front view of FinoDB. Check it out. MendelianGenomics.org. You just have to put your name and your password in there, email your password, and then you hop through and you can submit your patients. You might wanna try it out. When I made this slide, we had accounts in more than 35 countries and it's much more than this now. The analysis, the analysis as I think we all know now is a little bit more tricky issue. It's a work in progress. We're getting better and better at it. For these disorders that turns out you have to have individual attention to a particular patient and their family. Success is variable in terms of mode of inheritance. Autosomal recessive are the easiest, autosomal dominant are the hardest. I think can appreciate why that might be so. And the number of samples available from the family, how good the phenotype is. And we think and that's certainly been the case that deeper and smarter levels of data mining will improve our success rate. We have a paper in preparation right now by Nata Sobrea talking about the analysis tool. But let me give you an example of how it works. So our sequences are all done at the Center for Inherited Disease Research at the Bayview campus, which is a joint effort by NIH and Bayview with NHGRI taking the lead. CIDR sequences the exome and gives us an ANAVAR file on a 51 megabase exome capture. We filter the variance according to frequency and we plug in the inheritance patterns that we think are appropriate. We can change it at will, whether it's homozygous autosomal recessive, compound heterozygous, whether pro-brand only, maybe a new dominant, so forth and so on. That gets us down an example case that would get us to 85,000 variance. Only heterozygotes, let's say we think it's a compound heterozygous case, 54,000 variance, only those that affect coding and splice sites, 12,000 variance, excludes synonymous variance, down to 6,000 variance, exclude those variance that are in DBSNF 126, 129, 131. These are all, you can dial in whatever you wanna do here. So these are just, you can dial in each of these filters as you want, 750 variance, exclude if they have a minor allele frequency greater than 0.01 in the EVS database out at UDUB, or in 1,000 genomes, we're down to 450 variance, exclude those in our control database, we're down to 260 variance. Genes with two hits remember this model, for this case we thought it was an autosomal recessive compound heterozygous model, so we're down to 20 variants, 20 genes. So that's pretty helpful. Then we start looking at the biology of those genes, what's known about those genes and model organisms and so forth and so on, using links in the analysis tool to OMIM, MGI, PubMed, expression networks and so forth. So that gets us pretty quickly, these analyses can be done in less than five minutes. So it's great. Now, once you think you got it, what is the burden of proof? That is a tricky question. So, right now as far as we are concerned, there's nothing that beats the value of an unrelated case with mutations in the same gene and a similar phenotype. Recall that the experience of metacondromatosis, the large pedigree I showed you, and certainly searching the literature to find another family. We've built a tool, it's not on here, I don't think, called Gene Matcher. So, because it's so important to find other people that are interested in the same gene, so it's called Gene Matcher, all one word, genematcher.org. You can go in there, there's a couple of optional questions, you don't have to fill it in. Go down to the third question, it says what genes are you interested in? You punch in your gene symbol, you're done. Then, if someone else three months later from around, halfway around the world is interested in that same gene for whatever reason, either a basic biologist or a clinical geneticist or both, they come to Gene Matcher, they punch in their gene list, and all of a sudden the two of you have a match, that is you're both interested in the same gene, each of you will get an email saying, here's somebody else who's interested in the same gene. That's up to you to contact them and find out what the story is. We have had really useful matches between two families with a very rare disorder, one family with a rare disorder and a mouse model that made the connection a very perfect fit, one family and a fly model with orthology to this gene in humans that looks very good, similar to phenotype to the extent you can say in flies in humans. So it's very useful, I urge you to dial into it and give it a try, genematcher.org. Where do we stand? At the end of two years, here's the track record for Baylor Hopkins, we'd collected 4,326 samples, we'd completed 609 pro bands with complete analysis, we'd recognized 189 newly recognized disease genes, we found 121 known disease genes that really for the often had phenotypic expansion, that's why we didn't recognize them initially. So we like this progress, but we've got to do a lot better, I'd like to see it ramped up by 5x or 10x over the next two years, as would our colleagues at NIH, including Lou Wang, who's the project manager for this. My, I think, does that mean my time is up? Is that the little, so we hope also, those of you that are just new to Mendelian disease, most textbooks have these three principles, pliotropy, penetrance, and variable expressivity. Pliotropy begins to go away because we understand biology better. Penetrance, we don't really understand still, that is when two people have exactly the same allele, and one of them has the disease, and the other doesn't, even if we look very hard, we can't, they don't have it, so that's an incomplete penetrance. So what is the biology behind that? Obviously, if we could understand that, that might lead to a lot of phenotypic insights, or a lot of therapeutic insights, I'm sorry. So this is a big question, and I hope that as we get more disease genes and better biology, we'll be able to solve that one. Variable expressivity, that is same disease gene, just different phenotypes of different severity, or perhaps some different features, we're beginning to understand much better. We are finding, interestingly, individuals will have not one but two Mendelian diseases at an appreciable frequency, so you sort of find it, and you would say, gee, I never saw a patient with this disease who has this feature, and it turns out, now that we explain it, they actually have two diseases. So that's the thing we're learning. Now, let me just close with what is really the take-home message. That is the predictive power of Mendelian disease, and this is a phenotype that we've solved recently, and I hope in these last two slides, I can give you a sense for why it's so powerful and why it's exciting. So this is a disorder, perhaps not well known, spondylo-matefseal dysplasia cone rod dystrophy. It's a rare disorder, but if you have the phenotype, it's 100%. So it's an important disorder, not only for those folks who have it, but also it has some interesting biology. And they have postnatal short stature, and they develop loss of visual function. They have a cone rod degeneration with macular changes, as shown on the panel on the right. In some patients, this develops in the first few years of life, in others, it develops not until middle age. Rare autosomal recessive trait. We looked at three unrelated probands, and it turned out that our analysis showed that they were segregating two mis-sense mutations, A99V and P150A, and a gene known as PCYT1A, located at 3Q29. That gene encodes phosphocoldine citadilul transferase. The key regulatory enzyme in the pathway, the major pathway for phosphatidylcholine biosynthesis. Now, phosphatidylcholine is the major membrane structural lipid in almost all cells of the body. So if you said to someone, what is the thing that unifies skeletal dysplasia and retinal degeneration? Not many people would have come up a priori with this gene or even this pathway. And if you turn it around, you said what would loss of function mutations in this gene in this pathway cause, you would guess some global disaster, basically, because all cells require this compound. Here's the pathway, and let me just say that you start here with choline and the extracellular fluid, there's a transporter, a kinase, you make phosphocoline, that's the substrate for this enzyme CCT-alpha that's encoded by the gene I mentioned. They take CTP and phosphocoline and you make citadilul choline and then that's converted to phosphatidylcholine. Okay, great. So we've now looked at several other families and they all have loss of function mutations in this gene. Some in Brazil that were published at the same time, our paper came out a few months ago. Some patients in Brazil with mutations in the same gene where you have now more patients that have come our way, including one that had been in the clinic at Johns Hopkins for about 20 years and we just figured out what the problem was. So here are the two tissues most involved, a cross-section of the retina and a cross-section of the growth plate of long bones. So any of you that have taken histology at some time in your career, I started taking it in high school, we'll recall these long lines of chondrocytes growing, lining up on the growth plate and migrating down and eventually being calcified, that's how long bones grow. And then of course here is the retina with the photoreceptors sticking into the pigment epithelium, the photoreceptors are at the outer layer of the neural retina and these membrane discs are where the light sense that a pigment is and a certain number of them are shed every morning before sunrise and these are replenished constantly. Well, obviously if you sit down and think about it, one thing that's apparent is that both of these cell types have a tremendous demand on membrane biogenesis. The photoreceptors because they're constantly shedding their outer membrane discs and replacing them every day. Now what about the chondrocytes? Well, I started taking histology in high school and I've looked at this picture for that many years, I won't say the number of years. And it never occurred to me that what's going on in addition to this wonderful lining up is what's going on is these cells are getting bigger. See, they start off at this size and they end up like at this size and that point was made in a paper by Cliff Tabin et al from Harvard earlier in 2013 that these cells actually in the course of their biology go undergo a 30-fold increase in volume. So then I got my colleagues who were mathematically smarter than I would and said how did the increase in plasma membrane take to encompass a 30-fold increase in volume and if you assume perfect spherical nature, it's a 10x increase in membrane to enclose that 30-fold increase in volume. So we're working right now on the idea that these cells have a really unusual demand for membrane biogenesis and perhaps also quite a limited access to the other pathway that makes phosphatidylcholine. That's been looked at in these cells and they do not express, there's a pathway through phosphatidylethanolamine, they don't express that pathway, we don't know about the retina yet. So here's a rare Mendelian disorder that's predicted a relationship between two disparate tissues. It looks like it's gonna be a lot of interesting biology, it also predicts that this pathway may hold other retinal degeneration genes and other skeletal dysplasia genes and so from these rare disorders not only has it informed these patients and their management but taught us a lot about biology and opened a lot of doors. So that's gonna be going on over and over and over again. Thank you very much for your attention. Glad to answer your question. How do we sequence? So that for the centers from Mendelian genomics, it's purely up to the centers. I think it's safe to say however that all three of us, certainly we, are using totally Illumina. Now, we may use, right now we are validating every presumed disease mutation at our place by saying or still. I think other places may be beginning to use Ion-Torrent as an orthologous method to validate the mutations. Right, yep. Great talk, David. Thanks. I have sort of two interrelated questions. So if you look across the human genome, the rate or frequency of polymorphisms is obviously variable and moreover, if you look at the hotspots within genes for mutations, that also varies too. Does the Mendelian phenotypic frequencies or genes you've identified and others, do they fall predominantly within these kinds of high frequency mutations within either genes or intergenic regions? Yeah, that's a great question. And this question and the thinking behind it really is described very nicely in a recent paper by David Goldstein. So I urge all of you to search on him. I think it's in plus genetics. And what David did is he looked at all of the genes known in the genome and he calculated a rate of polymorphism in each of the genes. And there's a spectrum. Some genes have a huge amount of polymorphism and you can calculate it as total number of polymorphic nucleotides or you can calculate it as polymorphic nucleotides per KB to normalize per sequence. I mean for size. Some genes have a lot and some genes have almost none. So there's a real spectrum. And I think David called that genic tolerance or something like that. And we use that a lot so that when we're down to a few genes, one of the things we ask is this a gene that has a lot of variation in it that seems to be well tolerated? That is to say it's more or less neutral variation and it's high frequency in the population or is this a gene that has very little variation? Looks like a gene that's under very strong selective pressure and probably is less likely to tolerate variation. And certainly that kind of thinking and that kind of analysis for us at least is quite helpful. It's not so helpful that we get a number and we can make a cut point and throw everything out above that cut point. So it's not black and white or at least it hasn't gotten to that point for us yet but it certainly helps us as we consider each gene. So let me put an example on it. Everybody may have heard of the gene called Titan which encodes a protein that is an extracellular matrix protein, no, a cytoplasm protein in muscle. Everybody, everybody has mutations in Titan and there are papers showing that mutations in Titan were found in patients with cardiomyopathy. Therefore, Titan mutations must cause cardiomyopathy. I'm sure there are a few mutations in Titan that does cause, that do cause cardiomyopathy but the vast majority of Titan mutations don't cause anything. So we have to incorporate that thinking into our evaluation and that's some interesting biology. We need to understand that biology. Go ahead, please. Thank you so much for this great talk regarding variable expressivity. There is something we call a genotype-phenotype correlation in Mendelian disorder that we have several Mendelian disorder that they really do not have and it's above variable expressivity. So how do you explain this or any new approaches for that? Right, so of course, from the earliest days of genetics, geneticists have been interested in the question of genotype, or genotype-phenotype correlation. And the question makes the point that for some disorders, there is very tight correlation, very tight, between genotype and phenotype. I would cite achondroplasia. 98.5% of patients with achondroplasia have the same missense mutation. And so you can recognize that phenotype from a distance. Your clinical exam is essentially the equivalent of an Illumina 2500 high-seq, much cheaper. So that's a very tight genotype-phenotype correlation. Other genes, in fact, probably the majority, the genotype-genotype or genotype-phenotype correlation gets looser and looser and looser to the point it's almost non-existent. So for example, the X-linked disorder adrenolucidistrophy can cause a very severe childhood phenotype, a cerebral ALD, with the neurologic symptoms start and the patient proceeds usually to a neurologic death within less than two years. And in the same family, in the same family, males will present at middle age with a spasticity of the lower extremities, and neurologists call that adrenal myeloneuropathy, didn't think it was the same disorder, but it's very, the same mutant allele in the same family, causes those two widely different phenotypes. We don't understand that yet. Even in cystic fibrosis, a disorder like cystic fibrosis, where you have a large number of patients or sickle cell disease, sickle cell disease is essentially a mono-alelic monogenic disorder, if you think about it. Everybody has one allele. They're all homozygous for exactly the same allele. And yet this clinical variation is quite wide. So what are the factors that account for that variable expressivity? One baby with sickle cell disease will present with pneumococcal pneumonia and hand foot crisis and not make it to the end of the first year of life. And another individual sickle cell disease will walk in at age 25 or 30 with a little arthritis, and it turns out to be some bone problems related to sickle cell disease. So we don't understand all of that variation yet. Some of it clearly is environmental variation. You see this clear environmental effects in cystic fibrosis, for example. But we see less opportunities for environmental effects in a disorder like sickle cell disease. That's a very interesting biology. We don't understand it yet. Very interesting set of questions.