 We're honored to have Nancy Cox here from Vanderbilt, and she's going to tell us about her recent work looking at genetic contribution to gene regulation and how learning about that can help us to understand about the role of genetics in human traits and human disorders. So please join me in welcoming Nancy. Thanks. Thanks. It's a real pleasure to be here. This is such a great meeting, and I've been really looking forward to sharing some of the newest findings that I really want to get some feedback from an audience like this on and more collaborations with my ENCODE colleagues, because while this project was very much grounded in and started by research that we were doing in GTECS, it is very much going to end up in deeply looking at ENCODE. So what I'm going to do today, so I think over the, in the, about a year ago I was talking about initial results in about 5,000 subjects from BioView, and what I'm going to talk about today are results of studies that we've been doing with our predict scan methodology in about 20,000 BioView subjects on our way to about 120,000, so there'll be some novel gene to phenotype discoveries and more on the continuum from Mendelian to common disease in a number of different dimensions. Some new ways that I think we have to really get more kinds of insights into biological mechanisms of disease and new big picture biology that relates to the medical phenome as a concept in and of itself separate from individual diseases. So just to review, a lot of what I'm going to talk about is centered on this analysis tool that we had worked on through GTECS called predict scan. The idea is, you know, when we measure gene expression, there's some component of that that is entirely genetically determined. Lots, most of it is attributable to other kinds of exposures and to the feedback of traits and diseases on what we measure as transcript levels, but we can use what we measure in something like GTECS to develop SNP-based predictors of gene expression that then we can test with phenotype. So this is already published, last-year Nature Genetics, and all of the software, it's on your bingo sheet, is actually publicly available in GitHub and continuously updated as we're able to get more data from GTECS for building the predictors. So we use GTECS as a reference panel to build these prediction equations and just create a really big database for each gene in each tissue, all of the SNP predictors that come into the prediction equation and the associated weights. We're using ElasticNet for building the predictions, but that means, so once that's created in any data set where we just have the genome variation and that can be GWAS level, it can be whole genome sequencing, we can essentially impute transcript levels for each gene in each tissue and associate that calculated endophenotype with our traits of interest, in this case the medical phenome that we have in BioView. And it has, in any given tissue, there are only about four to nine thousand genes that have really high quality prediction, but across all of the tissues that we measure in GTECS, we're getting now high quality prediction in about 18,000 genes, so, and this is with just the first half of the GTECS data that will come in, so that will improve over time. We've also not used ENCODE annotations in this first pass through in building the predictions, that's something that I think people are looking at and working on, so the quality of the prediction may improve as we can include more information from ENCODE annotations in building the predictors, plus we've only built CIS predictors of gene expression at this point, as GTECS matures and we have larger samples, we expect to be able to use at least the top performing trans biology in the prediction equations. So this is all being applied now in the context of the Biobank at Vanderbilt University, so Vanderbilt built their own electronic health record more than 20 years ago and there's a clinical data warehouse as it were that we call the synthetic derivatives that's a de-identified and continuously updated enriched image of the EMR, there's actually a lot of correction that comes in, so outlier variables get looked at hard and corrected, that's about two and a half million subjects today. We have DNA on more than 215,000 subjects with GWAS-level genotyping on about 20,000 and exome chip date on about 36,000 today, but we're in the midst of some bigger projects that'll get us by this time next year to dense genotypes in about 120,000 and whole genome or exome sequencing on thousands to tens of thousands and that's going to be decided by some of the lots of grants that are in the system now. And we'll have DNA on more than 225,000, the synthetic derivative will have more than three million observations. These are really rich data, so just the synthetic derivative alone for learning about the phenome is a fantastic resource, but being able to look at DNA samples with genome interrogation on tens of thousands to more than 100,000 is a great value, but the really cool thing is the phenome. I had no idea before coming to Vanderbilt how much more science, how much more biology we can learn about disease by being able to look at the entire medical phenome as opposed to individual diseases. And I think, so Josh Denney who conceived the concept of the phenome-wide association study, people have thought of it as kind of a curiosity, a cool little twist, flipping a GWAS on its head, it's way more than that and I'm going to try to convince you by the end of the day that we can learn much more disease biology considering the phenome in its entirety than looking at one disease at a time. So the phenome-wide association study original concept was taking a single SNP that's either say the top GWAS finding for Alzheimer's and looking at all of the phenome associations with that one SNP or taking a SNP that's a loss of function variant in a particular gene and looking at the entire phenome associated with that. When I say the entire phenome, Josh Denney who's one of the 41 members of the Department of Biomedical Informatics at Vanderbilt, that's a fantastic resource for doing phenome studies has set this up so that the tens of thousands of diagnostic codes get boiled down into about 1600 phenome codes that we look at and those have been set up to be much more, there's much more information in those than in medical diagnoses so it takes multiple visits to instantiate a diagnosis for example it's not so we also have a number of algorithms for getting to 95% sensitivity and specificity for any given single diagnosis so this is not that really fine grained look at a single diagnosis but it's a much more nuanced look at these 1600 codes that boil down from the tens of thousands. And when we do FIWAS there's a hierarchical organization so for example if you're looking at the phenotype end stage renal disease in the controls you don't have kidney failure not otherwise specified, kidney transplant, kidney dialysis so all of the related phenotypes are not used at all in the analysis your control set is a group of phenotypes not on the sort of hierarchical diagnostic spectrum of the phenotype that you're looking at so keep in mind about 1600 diagnostic codes now of those so we're not looking at anything that doesn't have at least 30 individuals and I actually don't look at anything myself that doesn't have at least 50 individuals so that then that actually boils down to closer to 1300 and then a subset of those are normal pregnancy normal delivery well child care you know sort of part and parcel work physicals their codes for billing their diagnostic entities but they're not disease and so it's it's less than 1300 codes actually that that that we're reporting out on and what we're doing is building this gene by medical phenome catalog so we're doing not FIWAS on a single SNP but on these predicted gene expression phenotypes and and trying to create a comprehensive gene by medical phenome catalog you can think of it as essentially knocking down each gene in each tissue and reading out the consequences of that across the entire medical phenome up regulating each gene in each tissue and reading out the consequences of that across the entire medical phenome and of course we're not manipulating the human beings to do that we're using natural variation and how we've we've used GTX to learn how to read the natural variation and translate that into these imputed genetically predicted transcript levels and we think this makes by view a really cool discovery engine and it actually works on the larger scale so the deviance of the transcript dome is I'm gonna try to convince you the only tail of a distribution you ever want to be in is this one which is the tally of the number of transcript levels where people are at least three standard deviations from the mean plus or minus okay so when you're low here that's a good thing because this deviance of the transcript dome is significantly correlated with the burden of medical disease the number of phenome codes people accumulate in their lifetime so you can see while most people in bio view have relatively small numbers of phenome codes 15 or 20 there are plenty of people with very large numbers the one you have a chronic disease you just accumulate phenome codes like crazy because disease begets disease and there there are plenty of people with yes very large numbers I've talked before about some of the early findings so in just 5,000 subjects in bio view we saw that the reduced genetically predicted expression of this gene greek 5 was associated with many different ifenotypes none of these are genome-wide significance so if we do a Bonferroni correction for the the number of genes we look at with quality prediction and the number of phenome codes and and and the number of tissues where we have quality prediction none of these would meet genome-wide criteria for significance but it's a really interesting pattern and if the phenotype were just eye disease yeah I would meet genome-wide criteria for significance in less time than I could get the analyses done on the next 15,000 people they'd knocked it out in zebrafish with notable eye phenotypes and have been able to show in subsequent studies of zebrafish embryo eyes that the protein product of this gene is very highly expressed in all of the parts of eye that give rise to the particular eye phenotypes that we looked at so it's highly expressed in the lens it's highly expressed in parts of the eye where retinal detachment is going to be an issue it it's highly expressed in the cells that form the sheath the myelin sheath around the optic nerve and in parts of the eye that control fluid dynamics so the association of the reduced expression genetically predicted expression of Greg 5 with the different eye diseases makes sense given where the protein product is highly expressed and they've now been following some of these into adulthood and seeing little zebrafish with cataracts and things so it's a really cool biological validation but it to my mind the ultimate biological validation of what we're discovering is not with model system knocked down and knockout experiments it's with the human knocked down and knockout experience experiments Mendelian diseases give us a window into human phenotypes that are observed with large-scale knockout of a particular gene across all tissues and we definitely are picking up with the reduced genetically predicted expression of Mendelian disease genes the phenotypes that you see associated with the Mendelian disease so there is this continuum from loss of function mutations to deleterious amino acid substitutions to just reduced expression of these same genes for example here's one nuclear factor one X type actually autosomal dominant mutations at this gene are associated with two different Mendelian diseases Marshall Smith syndrome and so does syndrome to they have quite similar features but some differences so accelerated bone formation which then you see as fractures in the kids diminished muscle tone especially in the upper body that lead to breathing difficulties the larynx and trachea are characterized as floppy they have characteristic facial features especially for the eyes and blue sclera mental and motor delays speech may be absent or abnormal intellectual disability and impairment with so does syndrome to you see overgrowth in childhood but a prominent scoliosis or curvature they also have distinctive facial features and muscle weakness but more congenital anomalies so congenital anomalies of the kidney heart eyes ears and then sometimes deafness or is reported they have some benign tumors low-grade malignancies but not increased rates of cancer for the most part but may have epilepsy and seizures intellectual disability behavior problems and specific speech and language disorders stuttering was mentioned other speech and language disorders some features characteristics sometimes of autism the insistence on sameness that you see with the OCD but also ADHD etc so so so these phenotypes reduced genetically and and it's the breathing difficulties that led to early death of these kids until more recently when physicians were much more aggressive treating the breathing difficulties that arise because of the muscle weakness they would get respiratory infections very serious respiratory infections across GTX the genes very highly expressed in the brain but you also notice muscle and it's highly expressed in cervix and uterus and and some in the heart so in the red I show some of the features so this is just what I'm showing sort of fully it's the top signals for reduced predicted expression of n-fix in blood I'm gonna show you some of the other tissues in a second in the red are classic features of one or the other or both of the disorders so you see the cardiac and circulatory congenital anomalies the right way to think about statistical genome-wide statistical significance with a Bonferoni correction conservative Bonferoni correction is something in the range of 3.8 times 10 to the minus 8 although because of the correlations among phenotypes if we do permutations it might be something like 7 times 10 to the minus 7 but think of it as 3.8 3.4 times 10 to the minus 8 so cardiac and circulatory congenital anomalies some of the eye congenital anomalies and but then there's all these other features so we see congenital anomalies of the esophagus not reported to be a feature of this disease but wouldn't be surprising because by definition Mendelian diseases are rare fully characterized in some handful of kids with the disease and so over time I would be unsurprised if we don't see kids with either or both of those diseases characterized as having congenital anomalies of the esophagus this is just salivary gland cysts but it's an interesting phenotype because it's not all that common it's highly significantly associated with reduced expression of this gene it might help physicians to get to canonical diagnosis that what they're looking at really is about this particular gene but you notice one of the other things that as these kids are living longer lives parents always want to know is what what's gonna happen to my child in the future and one thing that will happen in all likelihood to the girls who live longer is pelvic inflammatory disease because we see that very highly significantly associated with reduced predicted expression of this gene which given what this transcription factor is known to do is not necessary and where it's expressed highly is not necessarily surprising also see the giant cell arthritis and Pemphagous and Pemphagoid associated and that's gonna be potentially interesting to see if the kids develop as they age not clear whether these infections are part of the disease or just a consequence sometimes of the fact that they do have trouble clearing secretions from the lungs because of the muscle weakness in other tissues we get the facial weakness we get pneumonia is which is you know the you know one of the canonical manifestations of the CC diseases of the larynx and vocal cord symptoms of respiratory system the symbolic dysfunction speech and language disorder these are among the top signals for these phenotypes where we don't have that many diagnoses in a health care system for these so to see them come up in a disease characterized by speech and language disorders and intellectual disability is interesting disorders of the tympanic membrane we also see some neural tube defects which actually I did find one report of one of these in I think it was so the syndrome too but fractures and kidney anomalies over a range of significance from 10 to the minus 5 to 10 to the minus 9 see in men across many tissues and seizures and convulsions and epilepsy so cardinal features of the disease we see with reduced predicted expression of the gene and and this is so we see it for autosomal dominant disorders we see it for autosomal recessive we see it sufficiently that we're creating a database of Mendelian disease genes and the phenotypes associated with them in bio view as we as the more and more is done to sequence when we think we don't know what what Mendelian disease is segregating in a family it could be useful to look at this and see the expanded set of phenotypes in much larger numbers of individuals with just reduced expression of the gene to see whether you might be able to hone in on a gene more rapidly but also for the outcomes as patients live longer with some of these diseases we're also creating a database of Mendelian genes in waiting because there are hundreds in just the first 20,000 there are hundreds of genes where reduced expression of the gene is associated with multiple congenital anomalies intellectual disability and other really bad stuff and some of these at least may be Mendelian disease genes that the Mendelian sequencing centers and undiagnosed diseases network people are looking for and so having a database where we can they can look very quickly at the what phenotypes are associated with the genetically reduced expression and also genetically increased expression some of the genes in this category are already Mendelian genes but where it's loss of function that's driving the known Mendelian disease and we see increased genetically predicted expression of those some of those same genes associated with as I say many congenital anomalies but not the same ones and other bad things so there might be other kinds of mutations gain a function mutations that could give rise to some of these maybe one of the few ways to predict what kinds of phenotypes would arise with de novo mutations in genes where we don't normally see germline mutations at all but it could have improved diagnoses and allows us to to iterate between the phenotypes and the undiagnosed diseases network and the Mendelian sequencing studies they'll often refenotype the people when they think they know what the disease is looking for the the gene is looking for phenotypic manifestations that might have been missed and these are these are some ways that that we hope we can help with some of those but to drive home what I think could be a surprising new idea with respect to translation in the common variant spectrum I want to focus on zinc transporter that's a cause of autosomal recessive acro dermatitis and terapathica so this is I say an autosomal recessive disease associated with this blistering skin condition around all openings in the body but also with chronic diarrhea and gastritis serious behavioral problems anemia it was fatal in early childhood and like by four or five years of age until the gene was cloned and found to be a zinc transporter five days after zinc supplementation the rash clears within a week the diarrhea and gastritis clear the behavioral problems are reported to be gone in the first month so so here's a Mendelian disease where we have an effective therapy this gene is very highly expressed in intestine stomach colon not surprisingly skin not surprisingly also brain this is a log base 10 scale it's really highly expressed in the gut brain skin but also in the thyroid and we'll come back to that and and prostate so again in the red you see some of the canonical phenotypes so the anemia mineral deficiency Cation Bectizies and iodine hypothyroidism are characterized as phenotypes that can arise in mineral deficiencies including zinc deficiency and the gastritis and do adenitis you see schizophrenia at three times ten to the minus nine so the behavioral problems an interesting parallel but a top association with cardiomyopathy so a question of you know what if you weren't treated would these kids eventually have developed cardiomyopathies you see some benign neoplasms of female genital organs cervical incompetence so there was some some higher expression in both male and female sex organs also in other tissues notably skin you're picking up a lot of different blistering skin conditions so in Pitigo pylonidyl cyst pruritus and related conditions bullister metitis psoriasis lots of skin conditions that are probably the same skin phenotype but not recognized as the Mendelian skin disease because it doesn't occur at birth but a little later in life and it doesn't have the full set of symptoms most people have one of these rather than multiple of them and in adulthood or or early adulthood even rather than from birth but you also see type 2 diabetes which has an association with another zinc transporter acute renal failure so this is you see all the kidney failure acute renal failure chronic kidney disease kidney failure not otherwise specified kidney transplant renal dialysis primary pulmonary hypertension suicidal ideation attempt a known consequence of zinc deficiency is cerebral degeneration and you see that associated Gallatin crystal arthropathies in a range also so again all the canonical features the diarrhea gastritis skin condition behavioral problems and these people would likely benefit from zinc supplementation to they have their three four standard deviations below the mean some of them but even people to standard deviations below the meaner it's significantly increased risk for all these bad things and how pissed off would you be if you go through most of your life with a nasty blistering skin condition that is never treated the right way that you have all of your life chronic diarrhea schizophrenia I mean if you knew that there were geneticists realizing that a nickel's worth of zinc could it might have improved your conditions so we're looking at starting some trials so basically going into the GI and Derm clinics trying to identify people who might be seen in both testing clear just to the snips that are predicted the expression of this gene and it's an innocuous therapy lots of people take zinc supplementation just to be healthier so there are dozens of Mendelian diseases that can be treated reasonably effectively with innocuous therapies like vitamin or mineral supplementation or dietary interventions dozens and there will be more people with increased risk of bad diseases due to the reduced expression of just those genes just those dozens of Mendelian genes then there are people who have any Mendelian disease in our life today because they're rare this acro dermatitis and terapathic is one in five hundred thousand live births but we have more than five thousand patients in bio view at high risk of kidney failure cardiomyopathies not to mention the blistering skin conditions or schizophrenia and chronic diarrhea so the number of people who could benefit from in this case zinc supplementation but in other cases vitamin supplementation removing a particular food group from the diet it's a large number of people these are relatively modest interventions and there's real biological support for the idea it's the same it's just reduced genetically determined expression of the same gene that gives rise to a Mendelian disorder with these phenotypes big picture observations we look at the transcriptome coefficient of variation it's really interesting like Daniel MacArthur's loss of function tolerant genes have tend to have high coefficients of variations like mother nature saying oh yeah I don't care you can have none you can have a lot it doesn't matter how much you have but Mendelian and mouse embryonic lethal genes actually have narrower coefficients of variation despite the fact that they are just as heritable as most other genes and actually these ones tend to be somewhat less heritable and the phenome burden so if we tally up you know think of a kind of an area under the curve of whether you want to think of add significance into it or just the number of highly significant phenome associations the phenome burden is way higher down here than it is we have most genes have no genome-wide significant associations at all and genes that are Mendelian disease genes or have associations with congenital anomalies intellectual disability and look like they should be a Mendelian disease gene also have much more phenome association much more highly significant phenome association in either direction and there's a class mouse embryonic lethal genes that we that are not Mendelian disease genes there are also really highly associated with phenome one of the things I talked about in just the first 5,000 subjects was that there is a genetic opposite to at least some diseases this is what the genetically predicted expression of many of the transcriptomes look like in 20,000 people in bio view and if I tell you that at this end of the distribution you have increased risk of myeloid leukemia where in the distribution do you want to be it's not here you don't want to be at the other end because at the other end you're at high risk for something else and in fact when we pull together all of the genes highly significantly associated with acute myeloid leukemia at the opposite end there are people that are increased risk for sepsis so and it turns out when I start with sepsis and look it's really it's it's all leukemia and lymphoma the opposite of sepsis and and CERS if I start with breast cancer the opposites are diverticulitis and diverticulosis GI hemorrhage from from ulcers coliothiasis and colisostitis so inflammation and infection in the bile duct the opposite of bone cancer is acute bronchitis bronchiectasis pulmonary edema post infection pulmonary inflammation and taken as a whole the opposite of these cancers are like the immune system in overdrive too much overreaction sepsis is not just a big infection it's the overreaction of all of those tissues to the insult it's your kidneys shut down your lungs shut down and and for the there are plenty of people with the same sort of magnitude of infection that never developed sepsis so it's really interesting that in different organs you're seeing the same sort of thing lots of people have ulcers and never know it they don't have GI hemorrhage from their ulcers most of us have respiratory infections all our lives and never have acute bronchitis bronchiectasis pulmonary edema after respiratory infections so so again this is it's a raising the question of whether the the commonality to the this polygenic risk for cancer it's kind of a failure of immune surveillance immune system not not quite up to full task but as we went to the 20,000 yes we replicate those general ideas but it's bigger than that there are these major biological axes that genes and diseases are piling up on I'm going to give you some examples so top genes affecting risk of kidney failure are characterized as pivots on this axis of innate immunity wound healing and a bunch of different diseases and genes are piling up on that axes along with kidney failure so a variety of additional phenotypes consistently observed as associated with these same genes and you see the consistency in both directions so there's certain phenotypes always on the same side with kidney failure and other phenotypes always associated with the opposite end of things so this is one of the ones characterized as a pivot on this innate immunity wound healing axis so you see the renal failure not otherwise specified at one times 10 to the minus 17 end stage renal disease 10 to the minus 13 nephritis and nephropathy acute renal failure renal dialysis so that's that's in red but you see anemias they're inevitably anemias and yes anemias can be a consequence of kidney failure but I think this is more than that you see diabetic retinopathy but other retinopathies as well so non-diabetic retinopathy is always you see some retinopathy the uterine liomyoma and symptoms of uterine liomyoma frequently associated as well but you see some congenital stuff with all of these two because they have the really big highly significant associations they're not Mendelian disease genes but they might as well be actually this one is a Mendelian disease gene also gangrene also notice the substance and substance addiction and related disorders that's a really interesting thing so far all of the top genes I've identified for alcoholism substance addiction disorders are on this axis so you wouldn't I mean it's kidney failure usually has the more significant association but they're on this axis it's a really interesting thing because the more morbid obesity not just obesity but morbid obesity and bariatric surgery looks like it's on the axis too anyways so again so another gene also from blood but it doesn't matter whether the predictors were built in a solid tissue of blood the patterns exactly the same to get the renal dialysis and stage renal disease renal failure not otherwise specified cystic kidney disease hypertensive chronic kidney disease so the the things that give rise to kidney failure type 2 diabetes with renal manifestations but you also get the bullish dermatitis burn you know skin skin conditions congenital anomalies and stage you know another one and stage renal disease renal failure type 2 diabetes with renal manifestations so you see the retinopsis we also often see glaucoma and I just wanted to pull in one with some interesting associations that are not on this axis this is just one of several different axes that we've been able to identify and name because some of the top genes are characterized as pivots on the axis it's this one was interesting so this is neuropeptide S receptor 1 because it's this gene was has been studied in rats for decades for the fear response and to see it associated with phobia just gave me a kick so it's I remember from my psychology classes this this gene it had also been already associated with asthma with axi- exacerbation it's associated with a number of endocrine things here as well which I think is you know potentially interesting but not all genes are on that on that axis I pulled ones that are that are clearly on that axis and it's a major axis don't get me wrong because there are lots of phenotypes on these axes if I look for the top genes for schizophrenia many of them are on this axis more than half well 40 50% are on this axis there are other axes that are just have that it just have to do with brain biology that schizophrenia is on as well but you see cardiomyopathies retinopathy primary pulmonary hypertension all on one side of this axis but there are highly significant phenotypes associated always on the other end so you get some of those but not relatively benign skin conditions ectinic keratosis subarid keratosis rosacea some not so benign melanoma but you get basal cell carcinomas a bunch of of neoplasms and dysplasia's hypothyroidism alzheimer's and essentially all all other dementias atrial fibrillation it has not escaped our notice that the I'd rather have these than these but but I put this as a real axis a real pivot because these are just as highly significantly associated at the opposite end as these are so when you see kidney failure minus you know 10 to the minus 20 you're likely to see alzheimer's or melanoma or rosacea or one of these others at 10 to the minus 20 in the opposite direction so it's not that you're in sort of health and you can fall into disease you want to be here in the middle and these these are opposite ends of a real axis and and as long as you're balanced you're fine but you get too many of those genes with expression in the wrong direction and you fall off of it and some of the Mendelian diseases are clearly on this axis they're associated with many of the same phenotypes but a Mendelian disease gene has is able all by itself to pull people over into disease and there are lots of axes so some that that we can easily name from results in just the first 20,000 patients include the wound healing innate immunity tgf beta signaling apoptosis and growth calcium signaling but there are other signaling pathways in the brain that I'm still trying to figure out so there are lots of axes and it's like it's like back to Aristotle and and I think offers new ways of thinking about disease and biomarker development for monitoring people as they come off these axes in one direction or the other if we could develop therapies related to keeping people from sort of sort of on an even keel and and not falling over that would be an interesting way to think about drug development and so these are lots of new kinds of ideas happy I really am interested in feedback but but these ideas are screaming for some now end code to see all these genes remember we build only sys predictors and these are uncorrelated they end up associated with all the same phenotypes because the top things that are associated are different for different ones of them sometimes it's the cardiomyopathies that are at the you know in the 10 to the minus 20 and and and the kidney failure in the 10 to the minus 11 and sometimes it's just the opposite but it's and so they're uncorrelated but it still looks the same because it's like this Chinese menu of sets of phenotypes that are always there on the axes on both sides so yeah so sometimes it's glaucoma sometimes it's retinopathy and sometimes it's both so uncorrelated but clearly driven by the same biology and and screaming for end code to get at some of this larger biology and of course this is just so after QC this is about 18,000 people that we've got results on in a few months we'll have 36,000 and we'll be able to look at it all again and 72,000 and more than 120,000 so my my colleagues at Vanderbilt we're singing a new tune Eric and Lisa did most of the heavy lifting on analyses and on work keeps the computers going our zebrafish colleagues and bio view is part of our CTSA my G-tex colleagues and G-tex is just also a fantastic sister project to end code thanks go ahead with the mic okay um very interesting work so I have a question so if you look at all the top hit in all the Jiva studies like SARS as type 2 diabetes cardiovascular disease and even do the FIWA study does the those uh treat show up on your list yeah they show up on the top yeah so um so like for example if you if we if we run type 2 diabetes uh as a phenotype just uh the ICD9 codes you get all the top GWAS hits if you if you refine the quality of that phenotype you get closer estimates to exactly the same odds ratios that you get in GWAS with research quality diagnosis of type 2 diabetes but the fact is if you use only the diabetes mellitus code you get all the same things because 90% of diabetes is type 2 diabetes that said now when we do predict scan analysis on all of the world's data in type 2 diabetes we are able to characterize more of the genes that are probably the targets of the GWAS associated SNPs but there's also a really interesting thing that we see with type 2 diabetes and others in very large scale i think some of the very top SNPs from the perspective of effect sizes are often associated with multiple of expression of multiple of the local genes which may affect more biology and that may be why we see sort of outlier effect sizes for a small number of SNPs with each disease those those SNPs may be part of the expression the predicted expression of multiple local genes potentially in different tissues that all help to drive the biology of the association that's observed yes hi so i have a question do you actually look maybe and identify genes or SNPs which are protective of certain well so you know the people are very interested drug companies are very interested in rare protective variants so loss of function mutations associated with reduced risk of disease okay the the direction of effect of our genetic prediction is getting them the same information in the sense that it's increased predicted expression of most of those genes that's associated with kidney failure and all those other bad things a part of the reason drug companies want that it's a it's a sort of a guaranteed gene target but it also suggests that downright inhibiting that gene or protein is the way to improve health because it's so much easier to inhibit gene or protein than to upregulate it now i mean so when you're in the middle you just have you have average predicted expression you have the population average predicted expression and no associations with that with that the predicted expression of that gene but when you have too much that's that is a good drug target in the sense of it's always easier to drag a gene's expression or proteins level down than to increase its expression yeah just two very quick questions nancy one is that the is the availability of the data going to be filtered anyway does it go to dbGaP or is it open access it's right so historically as we published papers relating to phenotype those data have gone to dbGaP that's what Vanderbilt's always done i think the you know like the the database that we want to set up from indelian disease genes right away i mean even just at at the 20,000 i think you know people would probably have to certify that they wouldn't try to re-identify subjects the good thing is these uh this genetically predicted endofenotype is based on many SNPs often even with you know a penalized regression sort of approach like elastic net so going backwards re-identifying would be much more difficult it's a it's a really deep end product as it were results oriented database so i think it would be safe but we'd still have to have certification that it wouldn't be used outside i mean you know you wouldn't try to re-identify right so just uh it's a good segue so the the mutations that have been associated as as you know g was most of the mutations lie outside the coding regions of many of these genes and many of them lie in the regions of now non-coding RNAs are these filtered out of your no no no so we have we have a whole set of non-coding RNAs whose expressions we predict and that's a really interesting subset of what we characterized as Mendelian genes in weighting there are some with huge associations to cardiac congenital anomalies and other things i mean they totally look like Mendelian disease genes but they're not long non-coding RNAs and really really interesting patterns of association associated with their reduced genetically predicted expression