 So, hi everyone, my name is Mike Bridno and I'll be doing the first module of this course and giving you a little bit of general introduction to genetic disorders and what it goes into collecting the data about the patient and then looking into how that data then gets analyzed in the context of the genome, which you will hear a lot about in all of the other modules in the course as well. So just as an overview, I'll give a very brief overview of genetic diseases and here I'm going to be talking about germline diseases, not cancers, they're somewhat different beasts and talk about phenotyping, what goes into actually describing patient features, identification of candidate variants, what goes into identifying which of the variants in somebody's exome or genome actually causes their disease and talk about something called matchmaking to establish which variant actually is causative. So sometimes you know and sometimes you actually need to have a little dating game between patients with rare diseases and we'll actually do that as part of the lab. So very broadly talking about genetic variants. Single nucleotide variants, indel, CNVs, they happen naturally in every single generation. Every single generation that's born has some number of variants that weren't present in the previous one. For single nucleotide variants it's, you know, ballpark 100, give or take, depends a lot on the father's age. Most of these variants will do absolutely nothing. They're benign. They don't contribute to any disease or they cause some small, they don't have any phenotype whatsoever or they cause small phenotypic differences which we see present among the people in the room. Different hair color, different skin tones, different heights. All of these things that we sort of understand as being completely normal. A small fraction of the variants will be disease causing. And they will either cause a disease or cause a disease if you have two bad copies of a specific gene, so recessive diseases. Some of these will be under selection. So what does that mean? Well that means that every next generation is less likely to have that variant than the previous one. Meaning that people with that mutation don't have as high a chance of reproducing when it comes to the next generation. So most of the very severe rare diseases which cause severe intellectual deficits or physical feature, strong physical manifestations would lead to less likelihood of having children. Other such variants don't really have to be under selection. They could actually be perfectly fine in the population. And this applies to pretty much any variant that's either triggered by drugs. So a pharmacogenomic variant because unless you're getting a very specific drug and there is no count for effect, there's no real selection on that variant. So for example there's a specific variant that leads to a very severe effect in combination with anesthesia. Now until very recently people didn't get anesthesia. So obviously that variant had absolutely no effect over in terms of selection. Really diseases that really manifest themselves in old age Alzheimer's disease, if their variants have caused that, they're not going to be under selection. They're not going to really make a difference as to whether somebody can have children successful. Finally a tiny, tiny, tiny fraction of variants will be advantageous. They will actually make you smarter, faster, better than the previous generation. And what happens with those is that they eventually take over everybody and everybody will have them because you're more fit and you procreate better. So genetic diseases are largely caused by changes to the DNA. There's also an environmental factor, but we sort of sweep it under the rug because we don't fully understand it. Not that we understand genetics at all. And people often talk about two types of genetic diseases, rare and common. In reality it's a continuum. You start with ultra, ultra, ultra rare of which there are five people in the whole world that have it to your garden variety, common disease that has some genetic component, but also has a strong environmental component and has a huge component of we don't really know. So rare diseases are typically caused by highly penetrant rare mutations. So these are going to be a variant that's present in just the people who have that disease. If you have that variant you will have that disease or if you have two of those variants you will have that disease depending on the success of predominant. Common, there are many sort of hypotheses. So common disease, there's some people who say that I mean it's also probably a combination. There's whether it's caused by rare variants that have variable penetrants. So it's actually there's a rare variant that contributes but it doesn't always contribute. There could be other modifiers that don't cause the effect or there is something environmental that actually triggers the disease or it could be caused by aggregation and epistasis of more common variants. So if you have something three of these variants it's okay but once you get to five of those you start getting a phenotype. And as a result it's much, we can't really fully understand it because we look at people's genomes, we look at them like this is very correlated with the disease. Yeah, it's kind of correlated. If you have this variant you have a 10% higher risk of the disease. Okay, well what does that going to do for you? It takes you from a 4% risk of the disease to a 4.4% risk of the disease. So it's harder to interpret that data and so we don't, you know there may be many variants that are jointly acting and models that actually, that things that model this have been built for certain quantitative features. But it's harder to do with a sort of a disease that's either on or off. So what are quantitative features, quantitative phenotypes? Things like height, head circumference, blood pressure, IQ sort of. These are things that are often if you look across the population they're going to be Gaussian distributed or close to Gaussian distributed. So if we take everybody who's in this room you know just probably for sex and drew a height, look at our heights you will get something that looks like relatively normal curve. Obviously there are exceptions and there are variations. And this is because the reason that everything goes to a Gaussian distribution is basically because of the central limit theorem. If you have lots of variants which cause small effect then you on average will have half of those variants on and half off so you'll be somewhere around the middle, you have a few more on you start shifting in one direction or in the other direction based on the variant. So things like height we think there are hundreds of genes, hundreds of variants in your genome which contribute to that. Which also leads to the terms like mean parental height which is actually something that people use when looking at genetic disorders. You compare somebody's height to their average height of their parents and if the average height of their parents is way out of whack with what you get for a child something interesting is going on. So it's because you on average expect them to be adjusted for generation because every generation is a little bit taller. All right so how do we find the causes of genetic disease? First common disease because there it's sort of a bit less that we can do. The main way that people do this is through genome-wide association studies and variants on genome-wide association studies. So in the scientific communities GWAS has a little bit of a bad name because there's been a lot of results and many of them don't validate when you go to a different population. So people often say I'm doing X and it's not GWAS but if you think about what the word genome-wide association study means it's exactly what they're doing. They're just calling it a different name. But basically coming up with a better ways of doing statistical correlation between variants and the disease. A whole bunch of people with a disease, a whole bunch of people without, let's see is the variant more common in people with the disease than people without the disease? Obviously lots of variants to look at. So if we have to look at every single one of them we're going to have a lot of false positives. So what people do is they try to pre-filter variants to reduce the set that they actually have to look at. So for example looking at variants with functional links. So they're already involved in the disease pathway somehow. That gives you some additional information. People also try to aggregate variants. They say okay well you know what I'm not going to care about which variant you have in this gene. Do you have a variant in this gene? Or how many variants do you have in this pathway to try to look for enrichment? One thing that's important to realize about all of these GWAS kind of analysis they identify correlation and not causation. So there's this brilliant thing known as linkage disequilibrium which says that when you inherit a SNP you actually inherit a whole bunch of SNPs that are nearby and they tend to group together. So just because you grabbed onto a variant and say oh this one is correlated with the disease it could be that the actual causative variant is actually somewhere nearby and we just haven't looked there you found a marker but not the actual cause. Okay what about rare diseases? We're looking here for a single or two variants responsible for a disease. And I will in this case ignore all of the recent literature saying that a good fraction of people with rare diseases have two, three, four rare diseases all at the same time. I'll be happy to discuss that offline with anybody who's interested. Why that is the case. So here we're looking to take the variants across the genome and their ballpark several million variants if you sequence your genome relative to a reference and distill it to the one or two that cause the disease. So there is mainly we can throw out right away. If a variant is present in 30% of the population and you're looking for disease that's present in 0.00001% of the population the variant of 30% goes out the window. So that's filtering out for common variants. Similarly you know you can filter out variants based on the fact that they're in an intrigenic region and we even if it causes the disease we really don't understand it. So you'll never be able to explain how the variant causes the disease and there are lots of variants like that that you can just throw out. But at the end of the day we need some ways to drill down to the exact variant. The way that this has been done for the past 30 odd years up until sort of seven or eight years ago is through linkage. You find a family that has a lot of this disease and then you start mapping which part of the genome do all of the people with the disease have in common and all of the people without the disease have different. And there are pedigrees that have hundreds of individuals in them with a specific disease running through it and they search down to try to figure out what portion to cut down the area. And this is how we mapped lots of the genetic diseases up until 10 years ago when we were able to sort of sequence whole genomes cheaply. For now it was just you know find the family find the region of interest sequence that small region of interest and look at what variants are there. This helps you identify actually the cause of the variant because you're actually going down to the exact variant that's causing the disease. So this is the way things worked in the past and a lot of what we're going to do today is saying how things work today for identifying variants that cause a disease. So one of the however you know when we talk about identification of a variant that causes a disease one of the things that's really important to realize is that you need to know what disease and it kind of sounds obvious but actually it's a huge paradigm shift in the way genetic labs work. Genetic labs are clinical genetic labs who are doing genetic testing used to be like please sequence this gene for me and tell me if there is a variant there. Well this is because the clinician already knew that the phenotype fit this gene and they didn't really need to tell the lab what the patient presented with because their order was the phenotype. I'm only interested in this one gene so tell me if there's anything interesting there that could cause the disease that's explained by this gene. With exome or whole genome testing everyone here knows what exome is I assume it's basically a slice of the genome that's easily more easily interpreted from a clinical perspective. The test is look at this whole genome and tell me if you find anything interesting there well you'll find something interesting there but can you tell me more of what you had in mind like what interesting things are you looking for are we looking for something that causes a heart defect or something that causes bone deformities those are very different genes so for this actually we need the lab needs to understand what is the phenotype of the patient what does the patient present with and this has been a huge change in how genetic labs operate with the introduction of the exome or the genome because as I said before it's like please do the epilepsy panel for me well that means the patient has epilepsy you don't need to sort of go into much more detail so you may think that we can just go into electronic health record and get this data well a modern electronic health record actually sucks for collecting patient phenotype data there are many reasons for this the primary primary ones of trying health records are exactly what they found like they're made for recording things not looking things up so the idea is you put some information in and then if you know exactly what you put in where you put it in then you can go in and find it sort of like this big broom closet in the slide you know if you know exactly where you put the relevant piece of information you can go in there and find it but if it's somebody else who is looking for that piece of information good luck or if you it's been a while and you have forgotten there are other reasons why electronic health records don't work well especially in the rare disease context which I'll go into but I mean this is just a more philosophical slide but what electronic health records should be doing is it should help guide the user around tests genomes and phenotypes and it should be something that helps them in the interpretation process and that's not something that's happening today to my knowledge at least so when I talk about the patient phenotype I want to get the terminology a bit clearer so I talk about deep phenotyping and what's that is describing the features of an individual rather than of a disease so we don't want to just say they have diabetes you want to talk about you know whether they have a BCD or not whether they have you know what their insulin levels so breaking down the disease into the constitutive features because when you're working with a rare disease often you're you're just trying to establish what the diagnosis is so if you actually know what the diagnosis is you're it's it's sort of you this this is not the right this is the right level I need to be talking in this the symptom level not the disease name level so when people used to do phenotyping they did one of two things they used either free text or checkboxes so with free text the issue is that clinicians when left to their own devices and let loose on a keyboard would come up with different ways to describe the exact same thing a patient with dysmorphic features so facial dysmorphisms some kind of non-standard facial structures could be described in the note as DF dysmorphic dysmorphic cases or dysmorphic features they mean exactly the same thing but these are symptoms and DF is an acronym for it which obviously would require some context to interpret these are actually all of the ways that the terms congenital malformation or congenital anomaly which basically means something wrong at the time of birth showed up in lab notes at the hospital for six children now it's here at least a couple of chuckles you know there is anomaly you know abnormalities so people can't spell or can't type which is fine I mean they're not these are you know highly paid expert they're not paid to type there abbreviations of different types and just different word choices so these are this makes the problem very difficult for computational approaches you actually get lists like this DD congel for behalf pro anybody want to venture guess what that is sorry behavioral problems developmental delay DD is developmental delay or DD DF MR the developmental delay dysmorphic features and mental retardation of course we're not really supposed to use the term mental retardation anymore that's sort of term now it's supposed to be intellectual deficit or intellectual disability and even that I think is now going out of vogue and there's a new term for it but you know the issues that these are things that are very hard even for humans to interpret and next to impossible for computers without a lot of context the alternative is having checkboxes clinicians love these the problem with checkboxes is you can't get sufficient granularity so here there's a checkbox for language delay but it doesn't say whether it's receptive language delay speech delay or both those are both types of language delay but they actually could be indicative of different types of underlying problems and similarly you know for cardiac there is five issues that are listed but there's another box that sort of then you can go and go wild on again and it's quite problematic so lots of problems with how things work descriptions that make a lot of sense to a human are uninterpretable to a computer where things like first words at five years that means a lot to a human the child clearly has speech delay or speech delay possibly a broader language delay that's a phrase that a computer will has no hope with any kind of AI that's available today of mapping down to the term speech delay because you need to have a lot of context what does first words mean the fact that this is an indication of speech at five years or actually first words at five right that phrase makes a lot of sense to you still you don't need to know that's five years and not five months first words at five months would be very impressive so so it's these are things that you know make easy are easy for a human but difficult for a computer multiple terms with the same meaning as I already discussed and as a result very difficult to do computation with phenotypes so because of this what we want to try to do is to map everything to ontologies these are concepts that they help you understand assigned conceptual meaning to specific words so words that as we use them have multiple meanings so I'll take an example of the word football and what does that mean when I say the word football maybe you're thinking about this which is American football and quite popular but I'm actually from Europe so I'm actually thinking about that so it's the exact same thing but it's called football in a different setting and like okay well we can get we can probably sort that out but it gets worse we could be talking about this so for the visitors to this country that's called Canadian football it's sort of like American football but the fields bigger and there are more players and the rules are a little bit different but it's a close relative but still different or this anybody know what this is sorry no it's not Aussie Aussie is the next one Gaelic football so this is played only in Ireland but it's called football in Ireland so and then there is Aussie football which is the next one so same thing happens in medicine when you talk about fibrillation it could be muscle fibrillation or ventricular fibrillation but the context needs to be clear in order to be for you to for you to understand so ontologies are terms with relationships if we go back to our sports example we can have an ontology of sports where we can have ball sports as opposed to puck sports for example football related sports or you know football descendant sports which separate into things like North American football including American football and Canadian football Association football and we can actually have soccer as a synonym for that because some people call it that and then there are rugby derivatives which are rugby union Aussie rules football Gaelic football and this actually shows that the open with a once was called rugby and the others called Aussie rules football Aussie rules football is much closer to rugby than it is to soccer which is association football same thing in can happen in biology and there's something called the human phenotype ontology that helps organize this medical knowledge especially in the genetic disease space you can go from something very general like general abnormality to eye diseases of normal I morphology down to more specifically coloboma versus globe abnormality which is different type of abnormal I morphology and then there are different sections for neurologics Kaleel and many other areas of the human body body systems it's or it's there's over 12,000 terms now it's linked to omim which is a big database of diseases so that you have a list of phenotypes you can sort of say well what diseases does that match and it's really the way rare disease world genetic disease world germline disease world goes about describing the features nowadays it's become the lingua franca of the rare disease field it's a much better ontology for your from the medical informatics space you may have heard of snow med CT so med CT is very broad covers everything in medicine this covers genetic diseases much better but doesn't have all the other stuff that you don't need outside of the genetic disease sphere when you're dealing with medicine this gives you an idea of the area of abnormal behavior and it's and it shows you sort of the depth of the ontology it's also an area that's great for phenotyping graduate students so it has features like apathy inappropriate behavior including disinhibition irritability and my favorites lack of insight and lack of motivation so this is you know this shows you the depth into which the ontology goes and so it's really great for for multiple things including medicine so we've built a system called phenotypes to allow for deep phenotyping I'll show you the system today and it's was initially a project led by postdoc Marta Gerda and it started with sort of our frustration with mining clinical records so the structured data in a clinical record is often misused and lack specificity so ICD codes which is most structured data in the clinical record are made for you know what describing patients billing so they're mostly used for figuring out telling insurance company what they should pay them for and this is not really super like you know obviously what the patient presented with is relevant to what you bill for but they're not exactly the same thing you can conduct the same procedure based on two slightly different presentations but that could be important in the rare disease space so they're so they lack specificity the other problem with billing codes is there is something called up billing and something called down billing up billing is when a doctor has a choice of two codes this code will pay them a thousand dollars and this code will pay them five hundred dollars they'll use the thousand dollar code they are both valid it's not that they're lying it's just that they're choosing the one that's more convenient for them down billing happens when there are two codes this one may be a thousand bucks I want maybe five hundred but the thousand dollar one the doctor knows that the patient's insurance won't pay for more happens to the south of the border and then they will choose the five hundred dollar one so that the patient's insurance will cover it for the patient this more happens in academic hospitals where potentially the doctor's salary is not directly tied to their yearly billings right so they're really a valuable data in the HR's on structures free text notes that the doctor takes but mining it is very messy so recent study had 73 percent accuracy determining whether a patient had dementia from their clinical record based on like human compared to a human or so and this is because things like dementia are very hard to describe it can be described clinically in many different ways like a phrase you know oh at this point the patient only recognizes clothes clothes relatives that indicates potentially dementia they given the context but it doesn't actually have the word dementia in it and it's hard to understand so using ontologies that HPO patient side so we'd like them to just say dementia is difficult there are eleven thousand terms in HPO so going through and checking each one off is going to take time going back and remapping data to an ontology posted is this post visit is time-consuming and prone to error and so you really want to do it at the time of the patient visit so the goals of our work was to make the deep phenotyping simple and make it faster than paper so I'm going to actually give you a little bit of demo in the lab so the slider is backwards okay so you know you have the information to enter the name date of birth and then you have for example can draw the pedigree because that's really important for genetic studies so you can go into the system and create a new family and say well here's the patient I didn't give a gender okay you know what they can mail those are the two parents and let's say the parents had their own parents boom you click on that it creates grandparents you click on this this creates more grandparents and let's say the patient had a sibling well so see this line right here that looks like it goes to a sibling if you click on that say okay you know brother sister unknown and so on click and say okay we have a sister potentially you know the father may have remarried and has a second partner so this goes to a partner you create partner right here you can actually create a third partner fourth partner and the other thing that happens very commonly in genetic diseases is that there's consequently that causes the disease it's a one of the mutations cause the disease but consequently is how you get to mutations so you can actually take somebody and say oh well you know this sister this this sibling link I can actually drag it to another individual and say well these two are actually sisters so all this is online web base yep it's online web based you'll play with it in the lab and it's actually open source software you can download and install on your own computer if you want so it creates this and this line becomes a double line for those of you who've actually drawn pedigrees in a clinical setting double line indicates consequently the program inferred it it's trivial inference but it it's inferred and you can add it similarly you can do things like you can say that oh well this this individual you can add phenotypes like cancers clinical phenotypes personal information dates of birth dates of death if you claim say somebody has a date of death that's a couple years ago they get crossed out the way pedigree supposed to work so basically it's meant to do pedigrees the way a genetic counselor or clinical geneticist would do a pedigree in the clinic and the reason we built this into the system is that this is like giving the user a carrot to use it to do use the other parts of the system which is to really describe the patient well so then if you go on there there are many other sections and phenotypes I'll just constantly show you a couple there's a measurement section this is a one-year three-month-old baby and if they say they weigh at this point 15 kilos right away it'll say oh well that's a hundredth percentile that's three standard deviations above the the mean and right away you get a growth chart which shows you where the baby is relative to normal development charts and then there is a clinical symptoms and physical findings section and you can see we've already selected increased body weight that was inferred from the data that was entered but here you can easily type and for other search for other phenotypes so you can type no seizures I'm sorry I can't type but that's okay because if you know I can't type it finds the right phenotype and if you actually type in MR it'll say oh do you really mean intellectual disability why does it know that well if you click on this I button it turns out that intellectual disability has lots of synonyms which is dull intelligence intellectual disability low intelligence mental deficiency mental retardation mental retardation not specific matters in non-progressive intellectual disability and so on so it actually is able to term take MR and say oh that's an abbreviation of mental retardation that's one of the things that that could mean so you can select these features and given the ones you've selected actually gives you a few others to check that may be important for resolving the differential diagnosis and gives you an actual differential from OMIM given these three features here are the matching diseases that we know so basically it's meant to work in the clinicians workflow help them collect the data and then analyze the data all in one place so trying to connect various features and there are ways of actually integrating the genome part into this to see which variants are more likely to cause a disease okay so very quick overview you have more than you'll have enough time to play with it yourself but being cognizant of the time going to run in unless somebody has a quick question yeah sorry so where's the data coming from so it really depends on the hospital different hospitals have different ways of approaching this in some hospitals they get a lot of the data from the patient pre-visit there is also in a genetics clinic people generally don't just walk into a genetics clinic off the street they're often referred to it from a more general clinic or another specialist so there is information that's coming from that other clinic so referral letter of some kind some genetics clinics are very good about getting the patient to fill out part of the pedigree like give me information about your parents your grandparents your brothers sisters on samples online before you actually show up and then that could be integrated into the pedigree before the patient ever walks into the door but then a lot of the data gets collected during the actual visit so genetics clinics are very different from your GP or mostly any other doctor that you are likely to go to in that a patient visit typically lasts one to two hours so a lot of time to collect a lot of information and a lot of time for the doctor to look through and do some extensive phenotype yeah great question definitely it can be there are ways that this can be used to fill in gaps there are certain things that you really don't want patients to self report or to try to understand if they do because it's you know it's it's quite tricky to you know for example one of the features of genesis is a flat filter so like having no bumps right here and you know most humans are not going to be able to figure out if they're filled from a flat or not exactly so it's so there are certain things that you don't want to prove people to self report but often we knew no cases of you know patients going out above and beyond and learning a lot about their disease and actually get becoming more of an expert on it than their doctor so in that case certainly we are thinking of ways of doing it I can be glad to discuss them with you during a break or during one of the times that we have in the course but yes that's a great question getting the patient involved is very important so some of the anthologies of using a system like phenotypes you can integrate data between different studies all the data under the hood is mapped to the HPO there's a little bit of free text if there is you really didn't find anything in the HPO but if anything at all similar in HPO you're presented with that option and it helps you to collect data that's structured your as a result able to connect data collected from one study and data connected in another study because the terminology is the same you don't have to worry whether seizures and epilepsy are the same thing or aren't the same thing they really are but you can do better and more thorough analysis of the genome data as you will sort of play with during the lab and where our hope was to get clinicians to use phenotypes in the exam room we had variable success some do some don't and the other important part that some like phenotypes gives you is it helps train the next generation of clinicians you can diagnose the systems you can identify previously seen patients who are similar to your patient and this is a feature I didn't show you but you can search the database of patients that has been built up over time and you can do decisions based on prior outcomes as a result so then how do we go on now that we have phenotype the patients to identify the candidate variants in a in an exome or a genome there's a lot of candidate variants in any patient's exome or genome so this is a list of variants identified from candidate variants identified from a patient in a Canadian care for air project cohort so a patient who was seen because of a rare genetic disease and sequenced on a research basis and so whole bunch of variants and really looking at a table like that and trying to figure out which one of them may or may not have caused the disease is not so simple and a lot of what you will hear today and other modules will be about filtering this data and trying to understand which are more likely to contribute or less likely to contribute what people do is they do this sort of pipeline where they first filter by population frequency anything that's common goes out the window and what we mean by common has been becoming more and more rare over time as we sort of have a better idea of what the actual population looks like so used to be minor a little frequency less than you know half a percent I know some people use now one in a thousand it started off being around 1% then you say at that point you do variant classification and prioritization you take the non synonymous variants the ones that actually change the protein in some way and try to filter by some some kind of priority score either using software that look at the variant and try to predict what its function is or by using other external information so for non synonymous variants you can look at things like would this variant likely change the protein structure by doing some modeling of the structure what would the structure look like if you change that amino acid looking at the mean as chemistry is this you know changing the amino acid from said hydrophobic to hydrophilic which would probably change the structure quite a bit and the most powerful tool is looking at homology looking at other really similar proteins and say have we actually seen that in other similar proteins because if we have brought or if that site is generally variable in lots of these proteins probably that means it's okay to change it probably it's not going to do too much to the protein so that's the functional information but the other thing you want to do is actually look at the variant in the context of the family structure and say well I have a variant is that present and also in the previous gen other affected individuals because if it is that sort of increases my probability of this and if I have a huge pedigree I can get become very very confident that that is actually the variant that causes the disease unfortunately if this is what your pedigree looks like you have two individuals well what's the probability that the affected father has the same variant of 50% so that really doesn't give you much so what do you need you actually need other families need other families where you can find the same variant or at least variants in the same gene the same phenotype and identify whether the match is real or not but finding these new families is going to be difficult why because rare diseases are rare there's not that many of them if there are 10 people in the whole world who have this disease the odds that both of them walk into your office if you're a geneticist are pretty low so people use all sorts of genome interpretation tools to help with the task like this so they use patient symptoms together with gene function information so if a gene is known to cause a specific changes in a specific pathway and changes in that pathway are known to yield specific clinical phenotypes then maybe if you're change your gene then it also will lead to the same clinical phenotype there are also people people use mouse models so what they do is they take the mouse and knock out that gene make a mouse knock out look and then phenotype the mouse see what's wrong with the mouse maybe it has a you know abnormally shaped spout and that could map to facial dysmorphism in a human or they have you know some brain abnormalities some seizures seizure disorders then well that could map to epilepsy in a human so there's actually huge high throughput mouse knockout studies happening to knock out pretty much every single you know all kinds of all the all of the genes to see what is the phenotype that results and the this is mapped to something called the mouse phenotype ontology which then is linked to a human phenotype ontology and you know that an abnormally shaped spout could map to things like facial dysmorphism so that's you know the last thing I'm going to talk to you is about matchmaking how do we actually find other individuals in the around out there in the world who have a disease if you have and now a mutation in the new disease well rare diseases are all together rare but each one is rare but all together they're actually pretty common so I don't believe the 6% number it's that what you get if you add the prevalence of all rare diseases in the database called orphaned together I don't we don't have 6% of the population affected by a rare disease but numbers around 2 or 3% are actually quite believable because many of the rare diseases don't have extremely visible features and so you may not even patients don't sometimes don't know even that they have it if it's really mild or it just doesn't show show up until you sort of you look very closely and when you're a doctor and you see this ultra rare genetic disease you have trouble you may not recognize it even if it's known just because you don't have experience with it or you may not have an insufficient sample size to understand that this is a novel gene let's say this is the first time you've seen mutations in the gene and it's sort of it's usually happens like clinician looks at the mutation and says or the researcher actually in this case is well the gene is in this well-described pathway and other mutations in this pathway lead to muscle diseases this patient has a de novo mutation in this pathway which means that the patient's parents don't have that mutation it's just the patient who has it the phenotype matches quite well but that specific gene has never been described in the literature as contributing to a human disease what do you do so in the old days what you would do is you would go to a conference and present your case and say look at what interesting thing I found anybody seen things like this and geneticists are amazing at sort of remembering like yeah oh yeah I saw that you know X years ago or you know what really amazed me is the morphologist people who actually look at the facial structure you show them a picture of a patient and they'll be like I saw five years ago a patient with that same facial structure and you know go in their record pull out a photograph and show and how that brain works I don't understand but they're amazing at this that's the sort of the way that things have done or you publish a case report around your patient now what we wanted to do is take matchmaking into the 21st century and take all of these rare cases that all doctors are seeing all around the world and bring them together so that people can actually share information effectively and this effort has been part of a broader effort called the matchmaker exchange where different groups have built these matchmaking tools and now we got all of them to also talk together so in general how does making a match work you put data into one matchmaker and another doctor puts their data into a different one then the databases talk to each other and figure out hey we actually have something in common and let both of the users know and then the users actually now talk to each other and say is that really interesting or no is that are the database systems completely wrong and this is not an interesting thing for us to follow up on and so I did update the slide this is from a year ago but this this is the two various efforts that have contributed to the matchmaker exchange and specifically we've built one tool called phenome central which has a lot of data from the Canadian rare disease cohorts and something called the undiagnosed disease networks international which is a lot of data from us and Europe for undiagnosed to try to diagnose undiagnosed patients so phenome central again which you'll play with it's a matchmaker which allows different users to connect with each other so the way it works is you submit a patient and to do this you use the phenotypes interface that you've I've already showed but you can also this looks ugly but you can also import data from other phenotypes instances directly in or you can and then you can add a VCF file which you will also get to do in your project and then phenome central goes into its database and searches for other similar patients and the way it does it that uses the human phenotype ontology so if you think of this tree structure as the ontology and you have these features that are annotated for one patient and these features that are annotated for another patient we compute what's similar by going up the tree and then looking for the overlap between those trees really identifying what's common between these so you know this could be one kind of abnormality this could be a different kind of abnormality but both of them have an abnormality while this patient has this thing which maps under two different systems and then that you can compute the similarity and there's a little bit of math that happens we look at for each feature how likely it is to just occur by chance so really rare features get a higher score than very common features and we compute the information content basically how much information do you get from the fact that both of these patients have these phenotypes and then it also incorporates gene data it incorporates variants and tries to score the variants based on the phenotype present in both patients to see using a tool called the examizer to see if things are shared so in this case they would say well oh the red gene is actually pretty high scoring for both of these patients so maybe that's likely a cause of the disease and looks at actually lots of other patients as well to see what's common to what's common to just a couple but not too many once you do that you can actually see the patients that are similar to your patient to identify the highest matching similar patients and for that patient you get to see your phenotypes for the patient the other patients phenotypes and which variants are in both so actually in this older version you didn't see it directly but after you contacted the other user you would actually get the full information revealed to you and you could see that this mutation actually causes the disease in both patients so this is what sort of the end result of phenome central B here are the two patients here's what's similar about them is that interesting enough to go and write a paper and there are things like terms of use about who can access this data and things like that so that brings me to the end of the presentation questions before we with the lab product practical right next so questions before we sort of go into that no yeah yeah so that's a great question so matchmaker exchange was really built around this use case of we want to find new disease genes which have never been described before and write papers about them what about the disease genes where we now now we've identified the disease and we know it's a new disease we published the paper but we have two patients well you know patients three through ten will give us a lot of information about the course of the disease how it changes the variability of the presentation what do we do with that data so in phenome central you can actually set the flag on your patients to say I'm no longer interested in being informed about new matches because I already know that this is the cause of the disease but if others are looking for that case here is that paper that we published on it and they can contact you then oh we have now a court of ten let's look at a broader court and some of the studies in the mastermaker change have looked at you know like say oh we now have 15 patients with this disease what do we know now that we didn't know when we had three so but there is no structured way of doing it and there's really no at this point there's nothing like matchmaker change in the clinical context rather than the research context so what do you do as a frontline clinician who is just I'm going in I'm this is my patient I don't care about writing a paper I just need to know what to do about my patient how to treat it it's an ultra rare disease on which the only thing that's published is a study of two two patients this is something people are talking about so we some people called we called clinical MME clinical matchmaker exchange it's a really no longer a matchmaker exchange now it's sort of a information gathering tool but I think it would be a this is something that people are thinking about but haven't really got