 Hi, everyone. So as I introduce myself, I'm Mike Bridno and I'm going to give you sort of an introductory lecture about generally genetic disorders. And as I pointed out here, we'll talk more about germline disorders rather than cancer disorders or cancer predisposition disorders. Even though many of the same ideas that you're going to learn can certainly be used in cancer predisposition syndromes, like Lyformani. So the way I've organized this, so I have a few introductory slides about genetic disorders in general. This is, you know, for those of you who are doing genetics, this should be like, you know, the level of introduction to a first-year genetics class. So, you know, just to make sure that all the bioinformatics folks are caught up in terms of what kind of things we're after. Then we'll talk about phenotyping. And this is actually something that I think is new to generally the CBW series in that people are very interested in genotypes and how to analyze this big genomic data. But there's not, I think, enough thought into how we actually phenotype the patient and how we find out what's actually wrong with the patient. And I will talk about some of the ontologies that are available for this and show a tool that we have developed and that you will end up using. Then I'll talk about how we go and identify candidate variants for a genetic disease. So once we have phenotypes and a genome, how we can go and distill and identify a candidate variant for a rare disease. So in the sort of, in sections three and four, I will really concentrate on rare genetic, the syndromes rather than common ones, because it's a lot easier to do things with rare syndromes. Common is a more complicated beast. And in the final stage, we will actually, I will show you how to do genomic matchmaking. This is when two clinicians who are in opposite sides of the globe happen upon the patient with the same rare genetic disease, how they can identify the existence of these cases across countries and continents. And in the lab that we will do, it will really be concentrated on portions two and then four. So in the lab for this class, for this lecture in lab one, we will do, we will phenotype some patients and then the lab, part of the lab for section module two, there will be about actually looking at the results of the matchmaking and seeing if you have diagnosed and diagnosed some patients, which we have given to you to diagnose. So just very quickly about what I mean when I say genetic variants. So genetic variants could be signal, single nucleotide variants, indels, copy number variants, structural variants. And these happen naturally in every single generation. So all of you have about 100 genetic variants that were not present in any of your parents. These are novel variants that you have created that happened in the generation of your zygotes and that became you. Most of the variants and then there are additional variants which take place during your lifetime. Every single cell in your body has probably maybe one more variant than the cell from which it started out, came out with. So these variants, you know, are everywhere and when we look at variants present in an individual, most are going to be benign. Most variants will have no visible effect on you and if there is a visible effect, it may do things like change your hair color or, you know, change the exact shape of specific facial feature and not really have any impact on your health. This is the huge, huge fraction of all variants. However, a small fraction will be disease-causing. Of the variants that are disease-causing, some will be what we call under-selection, which means that these kind of variants, evolution likes to reject them. If you have one of these variants, you're less likely to reproduce, hence because of the genetic disorder that you have. So as a result when you are actually, you know, those variants will be less frequent in the population and they quickly die down and we can use some evolutionary features in order to help identify them. Other variants, other genetic variants are neutral. They can be disease-causing and neutral. And how can that be? Well, because neutral talks about evolutionary pressure. So if you have a genetic variant that, for example, predisposes you to Alzheimer's, that's not going to be under evolutionary selection because you can have kids, you don't have a family and then what happens really after you've reproduced is irrelevant as far as evolution is concerned. So these variants can be benign or disease-causing and neutral is separate. Neutral talks about evolution and very few variants, very, very few variants will be advantageous to an individual or to a population and if they're really, really advantageous, they undergo what's called a selective sweep and they actually take over the whole population and everybody will end up having a variant. So can somebody give me an example of a variant that has occurred relatively recently or that we know about that was advantageous? There was what? Against the back death? Okay, so that would be a good example. I do not know about that specific one. The one that's sort of one of the best known ones is lactose tolerance. So that's a variant that happened not a few hundred years ago, that happened a few thousands to maybe 10,000 years ago and happened mostly in a European population and that allowed people to keep consuming milk through their whole lifespan rather than just as kids and obviously that was very advantageous if you're living in northern climates where food during the winter is hard to get and if you have a cow you can stay fed during the course of a winter which is why and that variant is present in almost a huge fraction of Europeans and very small fraction of let's say people of African descent because in Africa that variant may have happened but there is no similar pressure and lactose tolerance is something that has actually developed multiple times during human evolution. Multiple groups of humans have independently developed genetic mutations that led to lactose tolerance. So that's a classic example of an advantageous mutation but those are very rare and once you and once and if they're really advantageous then very soon after they happen everybody has it because you know if you keep on having the evolutionary pressure those people who have the mutation are more likely to reproduce at least you know it used to be the case until we have medicine which gets in the way of evolution. Okay genetic disease and by the way throw questions out you know this is it's meant to be quite you know interactive there's you know I'm trying not to make it too dry. So diseases are caused by larger due to a patient's DNA we call genetic and people typically split into rare genetic disease and common genetic disease obviously it's a continuum right there are things which are ultra ultra ultra rare more more more common and then you get the things which are really common so you know on the ultra ultra rare side there will be diseases which you've never heard of unless you watch Dr. House and on the common side will be you know things like type 1 diabetes it's pretty genetic but or autism. Rare genetic diseases are typically caused by highly penetrant mutations so what does highly penetrant mean? Ready? If you have a mutation pretty much means you have the visible phenotype you have that disease and there is actually there are few people there are exceptions to this there is you know and there's actually studies what the people sometimes call them superheroes people who have a genetic mutation which should cause a highly visible phenotype a severe disease but who are perfectly happy and healthy and we've identified some of these and people are looking at what happened why does why do 99% of the people who have this mutation have a severe disease and this weird 1% does not. While common genetic diseases they often have an environmental component so it's you know it's not the case that if you have a specific mutation you have the phenotype at least that's what we understand based on things like twin concordant studies where identical twins who have theoretically identical genomes although that's not completely true there are lots of mitotic differences but they have very very similar genomes actually have a 60% concordance rate on that for the having the disease so not 100% but 60 so we think that their environment has a big role to play so rare disease versus common disease when people talk about rare diseases we think that's always or you know almost in parentheses because nothing is always in genetics or medicine or biology is caused by highly penetrant rare variants so it's very unlikely that a rare disease would be caused by a common variant something is common present in lots of individuals and it's unlikely to cause a very rare condition just the math doesn't work at the same time common diseases there is actually a few hypothesis what could be happening it's some people think and obviously it's a combination of the two but the question is what's more predominant some people think that it's largely caused by a rare variants with variable penetrate penetrance so it's still a rare variant that causes some of these more common diseases and actually many rare variants that contribute to a common disease but the penetrance could be variable some people have a variant but don't this don't actually develop the disease other people have the variants and do develop the disease others think it's caused by aggregation and epistasis of common variants so it's actually more common it's when I talk about a rare variant I typically think about definitely less than one percent mostly because our ascertainment isn't great probably less than point one percent if we actually knew exactly the frequency of every allele in the human population we just don't know that our analysis is imperfect so people typically say if it's present more than one percent we throw it out it's a common variant it's less than one percent we're interested so for more common variant diseases that's not a way of that people approach this because we think that they're having a multitude of common variants is actually something that can contribute to a disease and if you think about more common diseases there are many of them are sort of more of a quantitative phenotype so if you think about you know intelligence or you know obsessive compulsiveness you know it's actually something that you can think of as a quantitative trait you know you can't say this person yes this person's obsessive compulsive that person isn't obsessive compulsive it's really a continuum where some people are just way on the off the chart left or right whichever side of the distribution you want to think about and that's what we call the clinical disease but in reality we're dealing with a distribution of intelligence or many other things and quantity of phenotypes are often Gaussian distributed especially if there are enough low side which contribute to the phenotype if enough if enough genetic variants help to contribute to your phenotype each one sort of you can think of it each genetic change pushes you a little bit further in one direction a little bit further in the other direction then by mixing all of these together you end up getting a Gaussian distribution by the central limit theorem or a slight variant on the central limit theorem right very see that if you have slight hundred things each contribute a tiny amount and you end up with things that are distributed around the mean but then pushed off to one side or another so for traits like height we think that actually hundreds of genes are contributing and hundreds of genetic loci are actually impacting things like your height which is why a very good predictor of height is your mid-parental height you take the average height of your parents and that's a good predictor of your height modular the fact that every single generation is a little bit taller than the previous ones due to improved due to improved food and things like that right make sense all right so looking for cause of genetic disease for common disease we basically use something you know I call it GWAS people who work on some of these approaches yell at me when I say this because it's not you they say it's not GWAS it's sort of it's a much more robust method but if you think about what GWAS stands for it's genome wide association study that's really what they're doing they're doing an association study between a variant and a phenotype and they're doing it on a genome-wide basis so if you do it in a very straightforward way you look at variant and say is that correlated with a phenotype variant is that correlated with a phenotype variant is that correlated with a phenotype it's easy test to do but you get into the problem of multiple testing correction so if you do if you have a million variants that's 10 to the 6 tests that you have to do that means your p values be better be 10 to the minus 8 before you're considered significant which basically means you have to do a lot of you have to have a lot of patients before you can actually identify any kind of biases and once you start looking for multiple things which are acting together it's basically forget you know you'll never have there aren't enough humans in the world to power an association study so people start doing tricks they look at aggregation of variants across a gene they look at pre-filtering based on function they look at aggregation at looking at networks gene networks to identify which genes may be working together in order to help improve the statistical power. One thing important to remember about GWAS is it identifies correlation not causation so if you see a paper which says oh we have a variant that is linked to a specific genetic disease that doesn't mean that variant causes that disease it means that variant is correlated with a disease the actual causative variant may be a different variant which is sitting somewhere not too far away just due to linkage this equilibrium or it could be you know in some other way linked but not necessarily causative and for rare disease it's actually simpler we're looking for one or two typically you know one if it's a you know homozygous variant or if it's or if it's something like if it's a dominant disease or two if it's compound heterozygous recessive disease variants that are responsible for the disease the problem here is that there are you the problem is that there's really no nothing statistical about this anymore you have usually too few patients in order to power an association study you really are looking for a specific variant and saying could that variant have caused this disease and how do you narrow down the search from several million variants in a whole single genome to a smaller number well if you have a multiple if you have a large family and you have a pedigree you can do linkage you can see what are the common portions of the genome amongst all of the members of the family so you can also do functional you can do you can do things like identifying looking at variants which are assumed to have a functional role and filtering based on that or you know at the end of the day once you have a candidate variant or a couple of candidate variants you actually need to somehow prove it and to prove it you need to either put it into a model organism and run functional studies which show that you can recapitulate the phenotype in a tissue or a mouse or fly or whatever is that your model organism of choice or identify other such patients with this disease and key thing here is unrelated patients with the disease because if they are related could just be that they share the variant due to their heritage due to the fact that they come from the same come from a recent common ancestor so right because you know if you have if you're if you're somebody's if you if you have a share a grandparent that means you know about 20 variants have happened just in your grandparent and you both of you have inherited so that's there's a good chance that that happened by chance all right so that's you know brief overview of genetic diseases next thing I want to talk about is phenotyping and how and that's talking about you know when you have a genetic disease patient has something you know sometimes when you're working with cancer it's sort of okay here's a thousand patients and all of them have Jiglioblastoma and it's sort of the phenotype is obvious because you've selected your patient group based on the phenotype but in reality even within those patients you will have different outcomes different reactions to drugs different survival rates and different comorbidities right which could have contributed to the survival rates which you may or may not have information about you may just know the patient of glioblastoma here is their sample you probably will have a couple other things like age so you would think that okay we have electronic health records for many of these patients can we just get that electronic health record data and make use of it turns out not so easy because this is what the modern electronic health record looks like even places like which have epic and other great electronic health record systems really it's their electronic health records are meant for collecting data not for using the data that has been collected so it's I I consider it like a broom closet like if you have a very you know a closet with lots and lots of things in it if you know exactly what you put in and you know where you put it you can go in and find it but if you have such a closet and you say want to say how many records in that closet have X okay you know I work at the sick kids hospital across the street how many kids with microcephaly which is like small heads have we seen at the hospital in the last year there's absolutely no way for anybody to run such a query so it just does not exist because you know first of all so for small heads it's kind of a bit easier because sometimes people measure the heads and so we do have that number somewhere even though the number could be in different places but that's our issue but even when you but when you get into something like okay instead of small heads let's take you know specific facial abnormality cleft palate how many kids with cleft palate well other somebody will say cleft lip other will say cleft palate other will say compound cleft because it may be both up and down so there'll be very different words and searching across you know even if you had a search capability you would not be able to do a textual search that would identify all such cases and this is really what an HR should be and this is you know my sort of my goal is to prevent into explaining what I would like an HR to look like and that's something that can help navigate tests genome and phenotypes sort of to complete this triangle to go back and forth identify phenotypes what's had the patient has what's in their genome decide this is the test I need to run that contributes a new phenotype or maybe identify something new about the genetic variants that are present and then you continue around the triangle so when I talk about phenotyping here what I mean is deep phenotyping and you know so what what's deep phenotyping instead of saying patient has disease X I want you to give me all the features of our goal is to actually identify what the disease is because in rare diseases identifying a diagnosis is difficult there are also many very similar diseases so actually specifying all of the common all of the features is important for saying its disease a versus disease B and to obviously keeping track of genotype phenotype associations how do people phenotype their patients and this is what happens most places today so people do either free text or tech boxes so this is the option of free text and if somebody if a patient has dysmorphic features or dysmorphic face which is basically say abnormal facial shape this is what you may find in the medical record DF dysmorphic dysmorphic faces dysmorphic features and so there's multiple ways of saying the exact same thing it gets a little bit more worse for congenital malformation or congenital anomaly those two are the same things these are all of the things we've taken out of the records of the diagnostic lab at sick kids these are all the exact same things they include abbreviations so Kong M or Kong Malphore or congenital M it includes interesting spellings like an anomaly anomaly and an anomaly it includes abnormality or abnormalities it includes basically many many ways of saying the exact same thing which are all totally incomprehensible to a computer and not accessible to bioinformatics methods and so this is an example of list this is actually a patient description from the diagnostic lab DD Kong Malphore behalf pro and as a human you can say oh that actually makes some sense DD is developmental delay congenital malformations or behavioral problems but for a computer that's not interoperable or DD DFMR which is developmental delay dysmorphic face and mental retardation which is actually bring with not exactly the same as the mental delay but similar the alternative is you do checkboxes and you have these forms where you can check off what are the relevant phenotypes for the patient the problem is that they has a limited granularity and as soon as you want to specify something more granular you have to use the other box and then you go back to the same problem of entering free text so many problems with the status quo if you have free text it's very difficult to interpret for a computer you may see in a doctor's note something like first words at five years that makes a lot of sense to a human patient has language delay but that means nothing to a computer and good luck identifying all of the patients with language delay somebody will say first words at five years others will say language delay others will say something else and has trouble spelling could indicate dyslexia or recognizes only close relatives could indicate dementia but obviously these are expressed in human terms not computer understandable terms we also have the problem of multiple terms having the exact same meanings and as a result it's difficult to do computation with phenotypes and in order to do computation with phenotypes which you need are really medical ontologies and on top yeah will I define an ontology yes so I will define it I may have a slight later so ontology is really conceptualization of of meaning of knowledge where there we have the concept we have concepts and then we have synonyms for those concepts so that multiple synonyms can move this map to the same concept but also the same textural description can I actually map to multiple concepts and in which case you have to specify which concept you're talking about as opposed to which which textural string and these concepts are also organized in a way that shows how they relate to each other so I will show you an example and hopefully that will demonstrate so why so this is what an ontology is so if I say a word football it may mean something very different to you than it does to me so you know an example so when I said it you may think about this and and but in reality I was actually thinking about that so and you think okay well it's two footballs American football you know soccer European football you know can't we just call it to you know you know have the disinvigorated actually becomes more complicated because others of you may have thought about this and as I don't need to explain to the this audience probably the difference between those two yeah American football versus Canadian football but others may have thought about this does anybody know what that is what Gallic football so this is Irish Irish play a game that's played it's sort of similar to rugby but played with a spherical ball and you can score both inside the net and above the net and it's it's it's pretty rough only the Irish play and then you know you may but also could have thought about that if I know what this is Australian football Aussie rules football so yeah and you say okay fine just us humans misusing the same word but the same thing can be you choose in medicine so there's muscle fibrillation versus ventricular fibrillation people it may just use the term fibrillation without specifying which one they mean which requires this disambiguation into the underlying concepts so what are ontologies ontologies are terms with relationships so if I were to build a sports ontology we could have something like sports ball sports football related sports North American football under which you may have American football and Canadian football association football or soccer as it's known in this content and then maybe rugby derivatives which would include union Aussie rules football and Gaelic football which actually shows that you're something like rugby union is more closely related to Aussie rules football then is association football even though two of them have the word football in the name but the third one does not so this allows us to start doing logic on these on these terms the same thing can be true in medicine there is an ontology that we love which is called the human phenotype ontology and it has 11,000 medical phenotypic terms so these are phenotypes they're not all kinds of medical terms I think I may have a slide later about ontologies in general like other kinds of medical ontologies but the problem with medical ontologies is there's so many things that can be wrong with their individual so what we want to do is really limited to phenotypes that can be present in a genetic disease and these are structured so it's a coloboma which is a specific abnormal I morphology is under abnormal I morphology under eye disease and neurologic diseases are their own separate category and skeletal are their own separate category so it's a humongous effort yeah it's part of mesh but mesh is more for literature this is HBO is more used for clinical for clinical workloads it does not unfortunately unfortunately does not relate to ICD ICD is a billing we're billing code so there are lots of medical ontologies which is why you know we use the HBO but they are not they but they're not but they don't relate to every other medical ontology we get there's a long list and I can talk for a while about the advantages and disadvantages of each one of them so in the hospital some people use ICD are billing codes they're basically I'm saw the patient for this problem and that's what I'm billing for but if a patient came in with a broken leg but they also have down syndrome you're not going to report the down syndrome because they came in for the broken leg so they're usually about billings not phenotypes that are present in the patient snow med is better for phenotypes but snow med has its own issues because it's too big and is not sort of super refined in terms of genetic abnormalities so there's there's multiple that are in use and non are great the good thing about HBO is in addition to this sort of granularity what's very granularity granularity means basically how specific so if you look at this hierarchy you could say that a patient has an atrial septal defect you could also say that the patient has abnormality with the atrial septum they can which is you know but you can say they have a cardiac malformation or cardiac anomaly cardiac abnormality so it's a granularity of the phenotype so you may report all the patient has heart disease but or you can get very specific about what is the exact problem with the heart so that's what I mean by granularity how high up the anthology we go and so and they are linked to omium which are the common genetic diseases so each phenotype there are links which say oh that's related to this one or this one's related to this one and this you know what I'll just skip this HBO that map to omium yes each omium disease will have many HBO terms which map to it so yeah HBO is pretty well it's not it's the official ontology of the international rare disease research consortium it's developed by an international group it's it's pretty much the lingua franca or rare disease research at this point it's so it's the main thing that's used in rare it's not something that we developed we've contributed to it but we have not developed it so you know we have built a system to make it easy for clinicians to enter HBO terms so the key thing is that ontologies are large so remember the checkboxes I showed imagine giving a clinician 11,000 checkboxes to check off yes or no to they'll you know they will probably you know they will either never finish or they will laugh at you I'm not sure which is more likely depends on the person and it really want to make it easy to do and make it so that they can do it during the patient visit so you know the goal of our work was to make deep phenotyping simple and to make it faster than doing it on paper so I'm going to very quickly show you phenotypes and you will have a chance to play with it in the lab so I'm going to I'm going to go to this version of it although you will be using a different one when you go and you can create a new patient and for a patient you can specify their name and you know for example date of birth scrolls reverse so you can do things like draw the family tree draw the pedigree so there's Jim there's Jim's parents you can click here and that will give you Jim's paternal grandparents and maybe maternal grandparents and maybe you know Jim's mom has a new partner and this is now separated union so you can indicate such things Jim may have a sister and there's the sister so for those who are not familiar squares are men circles are women and for every single note there's these handlebars which add children partners siblings or parents you can also drag each node to another person so for example if you wanted to indicate that for example this is actually a consanguinous union and these two people are actually siblings which you could do is say grab take this bar and drag it to here and that indicates that these two are actually siblings and this is a consanguinous union between cousins something that's actually quite common in genetic disease that you are seeing consanguinous families so once you save it you can actually do you know so there are many other sections which you can explore on your own the key thing that you will need to know are how to enter clinical symptoms so there's actually an area where you can just do quick phenotype search and if your patient has something like a small head you just type that and it'll say oh well small head you really should have said microcephaly so you can actually and you can click on that and that will tell you the patient has microcephaly and it will tell you how good your description is how would you how how informative you can also do things like let's say you wanted to know heart defect I'll say well there's kind of truncle defect which is specific heart defect there's abnormal heart morphology which is very general well there's this I button which can you know if you're looking for something more detailed you can click on I and then it gives you a whole bunch of synonyms for this term and allows you to click browse related term which will actually give you a whole bunch more detail about other types of abnormal heart morphology that may be present and you can go into each one of them and for example zoom in or zoom in further and then select the specific defect that you had in mind so it allows you to browse the whole hierarchy right inside this right inside the system and then you can select yeah that's the one I meant once you have selected a number of terms a few things happen one is the you get the ability to get to look at the diagnosis so given the phenotypes it actually does the search for what are the matching genetic disorders so that you can see what matches I clicked a few random things so here's something that and then they can see what is actually associated with that specific disorder you can also see gene panel so you can see here are the genes that are which are mutations in which are known to cause the following phenotypes and you can actually click on any one of these to recompute without that one so here's a panel without that phenotype so this gives you the ability to very quickly enter all the phenotypes of your patient so there is a way to re-index it inside the system we and we do not have a direct connection to OMIM it's so it's everybody has their own instance up to them to keep it updated so when you install it there is a way you know we basically you know the thing we ship with it this we update every few you know whenever we have a new release of phenotypes we download the latest version of OMIM and include it so usually people have it on their own computers within their within their hospital there is one public version of phenotypes called phenome central which I'll talk about next but that one does not allow for the PHI descriptors yeah I mean it's open source if as long as you're doing it on your own computer you can download it and use it there the license you know if you want it for your hospital then there is a bit more you know to talk about but if you're just using it for personal use there's you can just download it install it and run it well so that's that's done but not by us the only moment HBO and only more already linked so that's we just use the links so no it's HBO so all of the terms so when you do the search here for you know CHD and it shows abnormal heart morphology if you click up sorry I'm having trouble with the navigation if it takes an I button it'll show human phenotype ontology the HP term and you can click on this and it will actually take you there to HBO but yeah so for example if you want to find all patients with an FBX mutation who had you know microcephaly in both the sun and bother can they search the pedigrees not the pedigrees it can do the search of the patients the pedigrees you'd have to script yourself but there are restful APIs for pretty much everything so you'd have to you can write your own code and their Excel exports as well sorry oh yeah everything goes and there's a data model behind it and there are restful APIs which you can use instead of the nice UI I like the UI but yeah but but but you can actually write code to work with phenotypes we won't do that here but so the next part I want to talk about is how do we identify actually a causative variant this will be pretty quick so when you sequence a patient's genome you're going to have if you're looking for a cause of a rare disease you're looking at millions of variants you can sort of use different tricks in order to narrow them down to a relatively small number so you can look at things like filtering for common variants for variant is present in more than 1% of the common population throw it out you know we also look typically for non synonymous or stopgains or splicing variants variants which are thought to have a larger impact and throw out things like synonymous variants although that's not the right thing to do synonymous variants have been known to cause disease so you know it's it's it's sort of really the right thing to do is to prioritize them lower but still look at them but in practice people often throw them out and once they have done that they actually often look at the variants and they try to for the non synonymous variants look at their functional prediction like what could the function be and for this they look at things like protein structure where within the protein does the mutation lie is it likely to actually change a functional unit of the protein they look at things like amino acid chemistry so the what is the actual chemistry of the amino acid that's changed does it change from hydrophobic to hydrophilic and as a result could change the structure and things like that and they often look at homology so they look at others related proteins and see is that site variable so this is encapsulated in lots of tools like polyfans sift you know mutation deaster and so on almost all of these tools are heavily biased but to the homology they're very much driven by the conservation of the site and this is important to remember if you're looking at the cause of a rare disease that actually doesn't manifest until late in life or that's a pharmacogenetic disease for example as something that's only manifests went in combination with a specific medication those are not likely to be under selection so looking at selection is really not necessarily the right thing and if you're looking for a rare disease that on sets at age of 50 polyfan filters are not going to do necessarily a good job you should be very careful with that all of these tools are imperfect nobody uses like if you're looking for clinical diagnosis that's not proof just because all these tools even agree but yes the tools often disagree and usually people say well if two out of three do it I'll put it on my list of things to look at it's it's a guide not a hard filter all of these tools so things that can be a guide you know finally when you identify a mutation seeing other affected individuals within a family you know could help make you more confident that this actually is the mutation that you're looking for so you know when people often do they do exome or whole genome in one patient and when they find the mutation they think they're interested in they will just anger just that specific location in the parents to see what's the inheritance pattern is it novel is it you know or sometimes people just in sequencing is cheaper now people just do whole trials off the bat and look for the novel so that they can see the novel mutations right away and obviously the final proof that you would like is having multiple families all of them having the same mutations in the same gene at least and same phenotypes that's what sort of the clinical genetics community holds as proof that you have identified variant that or gene that's related to a disorder so one thing I want to throw into this there are genome interpretation tools which actually look at the phenotypes so the most popular of these are one of the most commonly used of these is called exomizer and what happens there is they look for not just the harmfulness of variants which is you know what is computed by tools like polyphen and sift mouse and whatnot but they look at the field built for every gene of phenotypic relevance score and the way they do this is based on model expert model organism experiments so for example if they knock out a specific gene in the mouse and the mouse has a small skull they say oh a small skull is sort of like a small head in a human so we can sort of try to understand well maybe the mutations in the same gene will cause also a small head or for example you know let's say they when you knock out a specific gene in a mouse it causes some kind of brain abnormalities well that's probably would correlate to abnormal neural neural neural function so that could correlate with a gene and humans that causes seizures or developmental delay or other kind of neurological abnormality and for this we can use the ontology right because the ontology doesn't give you the relationship between the phenotypic terms you can say oh you know brain abnormalities are very similar to or structural brain abnormalities are neurologic abnormalities as are seizures and as is developmental delay so these are all problems of the brain so what they do is they take the phenotypic profile of the individual of the patient map it to mouse ontology mouse phenotype ontology there is an equivalent mouse part and look for what genes have been known to associate with those mouse phenotypes in the mouse knockout studies so they do extensive mouth knockout studies and so and it turns out that if you look at a random patient like keep patients who walk in the clinic genics clinic door and you sequence every single one of them automated tools will help identify the cause of mutation in about 50% of the cases so you'll really have to have much involvement it's the other 50% though that are the difficult ones so finally yep that's true so you know there is not every single gene has been studied in the mouse so there are you know extensive knockout studies for lots of you know that at this point but we do know of cases when you know when you look in the mouse phenotype there's there's usually something there but it could be you know there's full knockouts or embryonic leafels and what we really need are specific mutations introduced and it becomes you know then you have to go and look at the specific mutation you haven't see if you can recapitulate it in the mouse with CRISPR whatnot so the last thing I want to talk about is matchmaking so how do we identify all of these extra families and with rare diseases there is a lot of very rare diseases so people are thinking about people throw out numbers anywhere between 7000 and 14000 rare diseases out there in the world there's a lot of them which have very very few patients actually but all together rare diseases are pretty common we think that the total prevalence of rare diseases is about 5% that every single about 5% of the people will develop some kind of a rare disease over their lifetime I'm not sure if that's it's that high but you know that's the number of people throw out and when a patient sees a patient some when a clinician sees a patient with a rare disease they might not recognize a known disease just because they don't have experience with it or they have only it's the first patient where whom they've identified a new gene and they want to find others who have the exact same gene and the key is to share the data all the clinicians have to be able to put their data together in order to to to have to make conclusions and to tackle this there's an ambitious international effort called the matchmaker exchange which is meant to tackle this challenge and the idea is that when there's one clinician who puts in their case into some database another clinician may put their case into a different database really the databases should talk to each other in order to help clinicians identify the match let them know you know the two of you have the same kind of patient and let them talk to each other than to really confirm that this is the right match and the matchmaker exchange has multiple members so these are all the members from around the world and I will talk more about phenome central which is the team group right here which is the tool that we have developed for this and that's the one that you will use in your lab so it's a portal tool for sharing of phenotype and genotype data and you know there's a you will all play with it not with the main one you will set up we set up a separate one for this course which lets you connect with other clinicians the idea is that you phenotype your patients and this will be done using the phenotypes interface that you already saw once the patient is phenotyped you can add VCF file corresponding to the patient's exome or genome and as you will do and decide how you want to share it is it just for you is it something you want to make public or is it something that you want to make share a matchable which means that it should be it'll be matched when there is new similar cases that appear and then the under the hood what the system will do is we'll find similar patients for you it will identify them based on phenotypic similarity using the HPO so when you have patients you can identify what's really common using the HPO and score for this and similarly it can take the genetic data and help you identify patients that are similar genetically by running examizer to prioritize all of the genetic variants in each patient and figuring out what's really common what are the genes that are common for the two patients at the end you can see the patients that are similar to your patient and you'll see a view that looks a little bit like this where you can see all of your features but you don't really know what the other patient has you have just very general terms and you can then you can see some of the generic similarities and you can contact the other submitter and then you will be able to see the full amount of similarities in your case and the other case so that's the end of the lecture so yeah questions first so I don't know how you can cancer is a cancer illness I'm not something I work nearly as much on but in for genetics it's often the genetic counselor which is sort of like the nurse of the genetics world who are responsible for interacting with the patients also genetics visits are much longer than regular you know clinic visits you know the geneticist may spend two hours with their patient so it's really is it's a bit of a different game there and a lot of these cases are research cases where they're you know they're being studied for research purposes and then that's not a huge overhead entering if you have as you will find out entering the data for a single patient takes a few minutes if you actually are if you're if you if you know what the patient has a lot of the time is actually spent reviewing the notes what we will do is we will give you summarized notes for patients and you will see that you can enter all of the data pretty quickly yeah so the for you know for the broader matchmaking there is no identifiable data that goes in it's the it's the clinicians who are communicating the patients do not communicate directly and that has been one of their like their patients have wanted to get involved but we've had issues in terms of how we bridge this clinician world and the patient world and it's almost like you want to matchmaker exchanges one for the patients and one for the clinicians so it's actually interesting for research yes but we've also said that this is something that should be part of standard standard of care so in which case if you're consenting to your care for care you're automatically consent to very rough data about you being shared through systems like this so it's if you're doing if it's a research patient then yes they need to consent explicitly if it's a clinic patient then if it then if it's a clinic then you actually need less consent this is this was a decision that you know we ran up with the appropriate ethics and policy people and it's the kids we've actually different places have done different things as far as REB is at sick kids we've gone taken phenome central through the sick kids REB and they have they've signed off