 Yes, so So today what I'm going to tell you about is genome and it is data integration genome by transcriptome by emr But it's on the way to genome by transcriptome by encode to emr So I'm I'm here to learn more about using every all the new tools in encode just like the rest of you and The purpose for me is is all about The genetic component to common disease I think data integration is one of those things that is extremely helpful and getting us to the Just the few remaining questions we have on the role of genetic variation and disease like what what are the genes What are the mechanisms? What are the directions of effects and Where are those reverents so We've spent a lot of time in my group Trying to get at functional genomics at the function of genome variation most specifically and I think You know so so most of what we have worked on is this sort of Pathway where we start with genome variation Oops, let's see Yeah, start with genome variation and And assume that genome variation has an effect on transcript levels in some cases Now of course things besides genetics influence transcript levels and so we get some distribution around that genetically determined effect and and we presume beyond some threshold of Of expression in some genes you can become affected and so that's a sort of a generic model within GTX people have used creative ways to Characterize to sort of look at how regulatory variation Concentrates heritability so this is the heritability with Gcta for type 1 diabetes using all SNPs that are interrogated in the welcome trust case control Consortium GWAS for type 1 diabetes, so you see about So about 48 percent heritability for type 1 about 50 percent heritability for Crohn's disease but Concentrated with just the few thousand SNPs that are characterized as eqtls and adipose You see almost 40 percent of that heritability in Just a few thousand SNPs that are eqtls and adipose or heart or lung or muscle And and so across the board for type 1 diabetes. You see this regulatory architecture concentrating heritability and and this is Indeed essentially the same set of SNPs regulating the same set of genes in a set of So it's shared regulatory architecture across all of these tissues But for Crohn's disease, it's a completely different pattern. You see Concentration of heritability in whole blood which has of course all those cells that are so important in inflammatory disease But you don't see much concentration of heritability across all these other cell types So a key take-home message is yes, you get this concentration of heritability, but it's not eqtl this per se Because it's the very same set of eqtls Concentrating heritability here for type 1 diabetes and failing to concentrate heritability for Crohn's disease and So it's a feature of the genetic architecture this ability to concentrate heritability and regulatory elements and our colleagues at Harvard and the Broad did us one better so with Alka's price Alka's prices groups paper at the end of the year in 2014 where they combined 11 common diseases together as disease and compared it with controls and used basically n-code annotations to show that yeah about 40 percent of heritability of All the heritability that you get with all variants can be concentrated into The the variants that you directly interrogate that lie in DNA swan hypersensitivity sites But if you use the imputed data this is closer to 80 percent of the common variant heritability is Concentrated in SNPs that map to DNA swan hypersensitivity sites, which is remarkable and and Yes, it varies by disease. So this is a concatenation of a bunch of diseases and it's heavily representing Autoimmune disorders for example, so but so this this does vary, but whether it's 60 percent or 80 percent that's a remarkable concentration of the common variant heritability into what we see as DNA variation that must be in a sense regulatory and probably regulatory at the level of transcriptomes and so that that led us to think about The analysis of this data in a different way of these data in a different way Think of it as a missing data problem so if we really believe that a substantial fraction of The genome variation affecting risk of common disease is regulatory Why not focus our analysis on endophenotypes that that are more direct measures than what we really want? so the genetically determined part of transcript levels and instead of testing individual SNPs or Test testing individual SNPs and asking after the fact are they regulatory? Let's aggregate those variants into SNP based predictors of transcript levels and then test these predicted transcript levels directly for association with disease or relevant quantitative traits So this is the brainchild of a talented young faculty member at the University of Chicago that I sincerely hope will be Talented young faculty member at Vanderbilt soon. She's a really out-of-the-box thinker, and this is now to the box idea it takes a while to wrap your mind around the notion of of testing Calculated endophenotype genetically predicted expression and the the manuscript describing the method and and initial applications Is in press and nature genetics now as a GTX companion paper, but but if you think about a Decomposition of measured transcript levels some part of it is completely genetically determined a Substantial part of it is of course all those non-genetic factors that influence what we measure as transcript levels and over a lifetime the environmental influences on transcript levels and the lifetime of genetically determined transcript levels lead us to have certain diseases that feed back and and and Chaint the diseases you have changes your measured transcript levels as well Which is why it's been so confounding to try to use measured gene expression to figure out what drives disease because if we you know measured transcript levels in Blood cells and kids with asthma and kids without there are thousands and thousands of genes Differentially expressed between cases and controls most of that as a consequence of disease not a cause but here We're going to focus on just the genetically Determined part of transcript levels and and look at the association of that Phenotype with our disease that finesses this problem, so we don't we don't have the confound And it's very similar to the ideas in Genome imputation very analogous to genome imputation So with genome imputation we use thousand genomes as the reference panel to learn the correlations between DNA variants due to linkage disequilibrium and then use that information so that when we've Genotyped just a subset of the total sequenced variants in thousand genomes with some genotyping chip we can impute Millions of additional variants here. We're going to use GTX as our reference panel and Learn the the relationship between the genome variation and the measured transcript levels We can store that those the weights from these prediction equations and then basically in Any data set where we have genome interrogation At at least the common variant level so that can be whole genome sequencing, but it can just be a GWAS as well You can basically impute transcript levels in all tissues measured in GTX and use that as an endophenotype to do a gene-based test on trans on whether Genetically determined transcript levels are associated with your phenotype of interest now of course Heritability represents the upper limit on how well you could do with such a method right because we're using only the genome variation to impute the transcript levels Fortunately, there are plenty of transcripts that have very high heritability So what you what you see here is the prediction performance are squared so this is an out of sample comparison of the predicted transcript level to the directly measured transcript level and And and we're showing this by the heritability the GCTA based mixed model linear mixed model based Heritability the trait so the black line Is the actual measured estimated heritability the red dots are the is the R squared prediction performance So the more highly heritable the trait is the the better the prediction performance is and For any given tissue there are a substantial number of genes with quite reasonable Heritabilities right so so all the genes down here. You're not going to do very well predicting But in some tissues those genes will be up in this part of the curve most of them and so the On average so we see the significance of the correlation between predicted and directly measured expression levels Have Q values Less than 0.05 for 40 to 50 percent of genes less than 0.1 for 60 to 70 percent of genes So the significance of that correlation 80 percent of genes have a correlation between predicted and measured expression greater than 0.1 and 50 percent of genes have a correlation between the Predicted and the directly measured greater than 0.2 and a substantial number of genes have that correlation greater than 0.5 0.6 there's plenty with with with that correlation being greater than 0.8 Which is a remarkable set of genes where the environment is hardly impinging on the expression of those genes and yet the population Has wide range of variability completely genetically determined I mean essentially completely genetically determined It's a really interesting from an evolutionary perspective set of genes that the the prediction models that turn out to be the best probably tell us something about the genetic architecture of Sys regulatory variation because with the sample size and GTX now we're still using just the sys Regulatory variants so polygenic prediction did not do as well as lasso or elastic net the Suggesting that this is mean a more penalized models More sparse models actually do a better job than polygenic prediction as your sample size increases It makes less and less different switch predictor predictive model you use But but right now we've been using elastic nest net because it's the most robust so regardless of which Genotyping product used in your GWAS and and the quality of your imputation the elastic net tends to be more robust in terms of the Prediction the idea the advantages of the framework is that we're we're iteratively using more and more of what we do know All of this information on how genome variation relates to transcript levels to learn what we most want to know What are the genes? We get genes with this test right. It's a gene-based test. We get directions of effects It's a much more natural way to move into pathway and network analyses So so the discovery level information is better So The other thing we're looking at is using this in the analysis of whole genome sequence data as a way to combine Unify the common and rare variant contributions because you're getting the gene level information You can imagine combining the common variant gene level information on the regulatory end of things with the rare variant information on the functioning of the protein itself and so now I'm going to tell you about the Integration with the electronic medical records so ever since we worked with under jet ski on this cell paper where Andre used 130 million electronic medical records to show that each Common disease has a characteristic set of Mendelian diseases that are over represented Among the common diseases. He called it a Mendelian barcode. And so ever ever since we've worked on that paper. I've been obsessed with understanding the continuum between Mendelian and complex common diseases and the continuum between loss of function mutations deleterious amino acid polymorphisms and just reduced expression of genes and so So now We've been thinking about how we can use predict scan in this context To potentially validate and prioritize rare variant discoveries. So so the idea is predict scan itself involves an integration of transcriptome and genome variation So Using g-tex for example as a reference panel to really make this this relationship to Validate and prioritize rare variant discoveries You have to do further integration with what I'd call a phenome reference panel and Having moved to Vanderbilt. Have I got a phenome reference panel for you? so they at Vanderbilt, there's The clinical data warehouse is called the synthetic derivative and so you get a Deidentified and continuously updated image of the electronic medical record with now 2.3 million subjects Vanderbilt built their own EMR more about 30 years ago And so you have a really longitudinal set of data in addition they until Just this year. They had an opt-out procedure for collecting samples So they their biobank has more than 200,000 subjects. They they moved to prospective consenting By as per NIH requirements now, but they're really not there's really no change in the rate of sample accrual with the prospective consents because with the opt-out we removed a Random set of individuals to make sure that nobody knew who was in or out of the database and now we don't have to do that So it ends up being a wash in terms of the rate of recruit the rate of accrual Right now. There's about 20,000 samples with GWAS level data 42,000 with exome chip But by by the end of 2015 or beginning of 2016 There will be more than 2.5 million subjects in the synthetic derivative Will have dense genotype or whole genome sequencing on more than a hundred thousand with with exome chip data on all of those individuals exome chip interrogation and That's a dandy phenome reference panel. So you guys are all familiar with the idea of GWAS But with bio view you can do a fee was so a phenome wide Association study and the way they've done that in the past is to use Individual functional variance or individual variants that were say the top signal for atrial fibrillation And then you can look at what other phenotypes? Across the entire medical record are associated with that apparently functional SNP So what besides atrial fibrillation? Is associated with the top SNP for atrial fibrillation and to do fee was you have to have a very large cohort of patients with both genotype data and many Different diagnoses the idea with bio view now though is that we can use predict scan to do gene-based fee was across the entire cohort and gene-based fee was for For different expression of genes in different tissues so cardiac tissues for heart disease brain for Neuropsychiatric phenotypes and that that really makes bio view into a phenome reference panel so the given my obsession with the Questions around Mendelian disease genes the natural question was what phenotypes are associated with? Reduced genetically regulated expression of Mendelian disease genes the heart our hypothesis was if this is going to work You would expect the entire spectrum of the Mendelian disease to be represented in individuals Across bio view who just have reduced expression of these genes so the so Of course, it's all even though we've got 20,000 with genotype data It's on all different GWAS products and the quality of the imputation Wasn't quite where it needed to be so we're still we're re-imputing So that these analyses have been done on just the 5,000 bio view subjects done with the aluminum 1m Which is the largest number of individuals done with any given single GWAS product and we focused just these preliminary studies on the The the genes where all the SNPs in the prediction equation are directly Interrogated on that product in either a whole blood predictor Which we actually built from the depression gene network data because it's bigger than what we have in GTex now It's more than 900 or the cardiac tissue. So that's I Think I used left ventricle Built for more than 300 individuals in GTex. So we have 125 genes. So that's that's the data said I'm showing you But it's already really interesting. So pex 19 is one of a set of 12 genes from this gene family where recessive mutations lead to a Paroxysomal biogenesis disorder called the Zellweger syndrome spectrum of Mendelian phenotypes So the kids with this disease are born with Hypotonia they have seizures they have what's called bony stippling in the patella and the long bones showing bone Resorption in those areas on on on X-ray you see Kidney and liver cysts the liver cysts lead to coagulopathies the kidney cysts and also kidney stone formation lead to renal failure um pex 19 actually is is One of the rarest causes of the Zellweger syndrome because it the kids are I mean, it's a the most severe form of this Zellweger syndrome spectrum and so we looked at what bio view phenotypes are associated with reduced genetically determined expression of pex 19 because all of the snips in the predictor for pex 19 were directly genotyped and So what you see are the top association is with an ICD-9 code given to kids that they think need a Screen for chromosome anomaly or genetic disorder So they the kids probably have hypotonia maybe have seizures at birth, but you see Kidney failure hypertensive heart arena disease hypertensive chronic kidney disease epilepsy kidney replaced by transplants of kidney failure um Fracture of the patella. So again this bone resorption of the patella is seen partial epilepsy Eplot recurrent seizures. So so essentially all of these features you're seeing as associated with reduced genetically determined expression. You also see the Disorders of calcium and phosphorus metabolism Hyperparathartism disorders of pyra pyra the proxosomes are a calcium store in cells and if Kids with Zellweger syndrome lived long enough. They would probably have these disorders. So my Mendelian experts tell me so really interesting set of phenotypes associated with reduced expression of pex 19 but because of the work that's been done on the genes that regulate Paroxysomal biogenesis genes we actually know what to expect for increased expression of pex 19 and you'll actually see exactly the phenotypes that you would expect associated with increased expression of this gene as well. So you see primarily cancer phenotypes Metabolic syndrome associated with increased expression of pex 19. So again pretty much exactly in line with expectations another Mendelian disorder that we looked at is a leads to a mitochondrial A mitochondrial depletion syndrome. It's a DNA synthetase for mitochondria. So you basically don't Don't replicate DNA. So you lose mitochondria rapidly over time The the kids are born often with the droopy eyelids They have the inability even to move their eyes after a while They lose the ability if they ever walk they lose the ability to walk even to breathe so that they lose the ability to really use muscles recent study in Finland on a series of patients there suggested that they have increased fractures although The paper said they didn't know if that would be because they fall down more because of the muscle weakness Or they actually have bone weakness due to mitochondrial depletion and bone. And so what are we seeing in Bio of you with reduced expression. We actually see the ptosis of the eyelid. So you see the ptosis of the eyelid which is a Sort of the canonical thing. We see fractures everywhere skull fractures tibial fractures rib fractures femur fractures, so Really interesting potential It's possible they just fall down more, but I think It'll be interesting to test and I got a guy who is very interested in mitochondrial diseases Working on the possibility that bones really are more fragile Increased expression of this same gene is associated with periprotonemia and multiple myeloma and So that would be unexpected But a really interesting idea for what might happen if you have mitochondria on steroids able to reproduce more rapidly than than Expected and so so we're looking at this as well in a much larger series in multiple myeloma This raises a question of whether altered expression increased or decreased of Mendelian disease genes might contribute Disproportionately to common disease so you saw kidney failure with Pex 19 a lot of Mendelian disease genes have kidney failure as part of their spectrum and if reduced expression of Those genes leads to kidney failure that may be a substantial contributor to what we see is garden variety kidney failure in addition for some so for some genes some Mendelian disease genes we see additional phenotypes Hemochromatosis we think you know we think of as being associated with certain phenotypes and we see some of those phenotypes Associated in individuals in bio view the cardiomegaly for example and But we see additional phenotypes so most people with hemochromatosis in the US have Mutations as a cause as opposed to loss of function Mutation so they just have a poly more amino acid changes that are deleterious that lead to The disease but in bio view we see really strong associations with kidney failure We know that actually iron does accumulate in kidneys and people with hemochromatosis It's not reported to cause kidney failure, but Significantly reduced expression of this gene may lead to kidney failure And that's something we'll be looking at as we go forward Just I'll just close with a few examples of so So we think that if if but with Mendelian diseases if you if reduced expression of Mendelian diseases are essentially reproducing the Mendelian phenotypic spectrum then You could use something like bio view to refine phenotypes of Existing Mendelian diseases and characterize new ones basically predict what the Mendelian phenotype will look like When went as you know with the undiagnosed diseases program and the Mendelian diseases programs Go forward. We'll have a way to match things up with these undiagnosed diseases programs where you've got an individual with a Set of phenotypes that suggest strongly that there's a genetic Basis for the disease they have but nobody knows what it is when you do exome sequencing you often have half a dozen or a dozen Genes that could be the driver we could go into bio view when we have more than a hundred thousand subjects Genotyped we can go into bio view and look at what which of these dozen genes have the set of phenotypes most similar to For reduced expression of the genes most similar to what we see in the individual and of course with many of the sequencing studies That we're doing even today so in type 2 diabetes for example with very large numbers of exome sequences done We don't have many new Findings we can find some of the things we already knew about but we have a lot of genes that are close to being Genome-wide significant we can take those genes into bio view and look at Which ones show with reduced expression reduced genetically predicted expression show Evidence of being associated with diabetes Glucose traits insulin traits you know across the board BMI. So so it's another way to help Prioritize the rare variant discoveries we think There's a few just a few others what that I was going to show you that are kind of fun to look at so these are not Mendelian disease genes Some of them were just genes like this one that I had some idea what phenotypes should be associated with a colocystic ion and receptor so indeed biliary disease is associated with Reduced genetically expression. Oh, sorry increased predicted expression of this Gene, but what was what was kind of intriguing was to see the Other association so this is a receptor not just for regulatory peptides in the gastrointestinal tract But also in the brain and you do see a few other interesting suicidal ideation post traumatic stress disorder Hard to know what that means in just these 5000 samples, but when we have more than a hundred thousand That's going to be a really interesting thing to keep an eye on There's a there was a gene greek 5 that looks like an eye super gene so increased expression of greek 5 was associated with retinal detachment cataracts glaucoma of various sorts Disorders of the vitreous body other retinal disorders. So and again, that's increased predicted expression Which of course Makes it ideal as a drug target for for developing a therapeutic reduced predicted expression of this gene which is Import it's so it's one of those genes that acts as a chaperone for proteins from the endoplasmic reticulum to the Golgi Was associated with schizophrenia with Psychosis with other non with other non psychotic or transient mental disorders Psychogenic and somatiform disorders and epilepsy but It has an interesting pattern of association with other things too. So you see transient alteration of awareness thyrotoxicosis, which can sometimes be I Mean that's delusions and people thought to have thyroid disease Lack of normal physiological development failure to thrive alteration of cons consciousness and meningitis So it looks like it just makes brains more sensitive to any kind of insult. So again You know, I wouldn't make much of it now, but when we have a hundred thousand This is going to be an interesting one to look at so so what I've shown you is just a little bit of a flavor So this is the results of our studies in five thousand in bio view with the predictions built only through statistical methods So we're modifying those now to use n-code annotations which should improve the quality of the predictors Within a few weeks we'll have results with this improved methodology on all genes in so we have a hundred twenty five genes now all genes in twenty thousand and then In a year or so we'll have results in more than a hundred thousand With all genes. So I think it's going to be really fun to use this as a general discovery tool But to really try to get an understanding of whether altered expression of Mendelian disease genes really does Contribute disproportionately to human diseases So these are our members of our g-tex team and I need to especially point out hockey m and eric gamazon who have I hope hockey will be I'm signing on the dotted line to come and sing in our new Nashville group So with that I need to acknowledge my g-tex college colleagues. It's a really fantastic project I'm Many of you that work in n-code know how much fun it can be to work with Like-minded group of of people on these big projects, but I'm happy to take questions now Well, I don't think the mic is on but it should be Probably have some interesting data also Taking into account sexual dimorphisms in Expression levels. Yeah, so there's a whole group in g-tex working on context specificity with respect to sex So barbers strange that we had we got a one of those Supplements to really study sex Context specificity in g-tex and they've really been doing a ton of work on it. It's One of the things that's Challenging with respect with respect to context specificity Sex and even the cis trans thing is how you choose to normalize because actually The the normalization process and even the way we use peer factors to Prior to general analysis can remove a lot of the sex effects can remove Trans effects clearly also and so So I think there's a lot of kind of deep thinking and diving in The exact best way to do this to make sure that we find the sex effects that you know are there But I think there's some really interesting preliminary studies out already from other groups Suggesting that the x chromosome is a very Is a much more interesting chromosome with respect to Sex effects in transcription then Then we I mean not that we should be surprised but but really disproportionately interesting I am catalysus that from pen So for the predicts can so In my reading the g-tex you find two kind of EGs or e-snips one that are shared and the others are Cell type specific so do you use that separately? And then the other question is if you take now the cell type specific e-snips you actually see enrichment for Diseases that are associated for that specific issue. Yeah, so So it's definitely true that This is a feature of genetic architecture. So for some diseases if you don't use The regulatory variation identified very specifically in the right Cells and tissues you don't see anything and and so There's a lot of examples for neuropsychiatric diseases where you really have to have the regulatory variation from brain in order to see effects Not true for all but probably for 80 90 percent of Neuropsychiatric disorders. That's true and and this cross tissue versus Single tissue architecture is important as well. So what we've been doing is building both tissue specific Predictors as well as a cross tissue predictor and that's true. So for brain. We're building a so there's several Brain regions sampled in g-tex. So we're building brain specific predictors, but also a general cross brain predictor because there's definitely shared architecture And when you use that shared architecture across multiple tissues from the same set of individuals You improve your power and and your resolution for the predictions. So when What you want to study is the cross regulatory architecture using The this cross tissue sort of predictor is going to be a better idea When you really need to hone in on the Specificity you want to use the tissue specific and of course at some point Through collaborations with the single cell consortium. We hope we'll have even cell types specific and you can get that for some cell types now but but But for now that the tissue definitely helps for diseases that have that sort of architecture Okay, so I have a few announcements and I'll do these in reverse order We need to be back here at 10.05 for our next talks. So please be back a few minutes early We'll have coffee if you go outside and back to the left. I think there will be signs that say how to get there So we'll be on break for coffee and join me again in thanking our keynote speaker and ansi cox for next one talk