 All right well good afternoon everyone and thanks for joining us for this April installment of the NHGRI DIR seminar series. It's great to see all of you in the halls. Some people I haven't seen in a very long time and thanks for all of you who are joining us remotely. Our guest today is Dr. John Greeley. He is a professor of genetics and pediatrics at the Albert Einstein College of Medicine. He wears numerous hats also serving as the chief of the division of genomics in Einstein's department of genetics as the director of their center for epigenomics and is an attending physician at the Montec Fiori Medical Center which is located in the Bronx. His research program is centered on understanding models of genetic susceptibility to human disease especially those affecting children. His group focuses on understanding phenotypes through genetic or environmental influences that change the innate properties of a canonical cell type or perturbations that alter cell lineage choices during differentiation. His group also studies the effects of environmental and genetic influences on stem cell differentiation in order to understand the mechanisms of cellular memory and ultimately to reveal the functional variants in the non-coding majority of the human genome. It's a pleasure to have you with us here today John. We're all very much looking forward to learning more about your work and thanks for making the trip down from New York to be with us. Please join me in welcoming our guest today Dr. John Green. Thank you all for coming. Thanks to all who are on Zoom. I appreciate Sophia who was a graduate student in the west of Ireland and I met her actually a long time ago through that connection. So Irish connections obviously run very deep. As Andy was saying I want to talk about variation in the non-coding genome. You think in terms of DNA sequence variation but it's also how the non-coding genome is used which gets us into this area of epigenetics and that's one of my major foci. Before starting I want to do a land and labor acknowledgement recognizing that I'm honored to visit the land of Piscataway and the Koch tank nations today and that there aren't that many Piscataway in the Koch tank in the audience because of the displacement through theft and genocide that's happened to our Indian, Native American, Indigenous family members. I also recognize that my economic foundation and that of many others is based on the enforced labor of slavery and the ongoing lingering injustice of this is anti-black racism that affects our Black family members. So I hope that you're able to take away stuff from my talk today which is interesting from the point of view of medical medicine and science but I'd also like to recognize that some of the stuff that is enabling me to be here is also worth thinking about. So I'm going to bring four components to this talk. The first part I've never talked about before in public which is what we're doing with genomics in the Bronx. The Bronx I'm going to argue is possibly one of the most fertile places that we could be doing genomics research and medicine in the United States. So we have a lot of initiatives, I wanted to describe those. In terms of trying to understand human disease which is what we're trying to do with these programs I'll talk to you about this field of epigenetics and then how the altered perspective that I've had to develop in epigenetics is causing me to apply new insights into how we interpret the results of the so-called epigenomic assays and then get into the part that I think is possibly the most interesting area within this large epigenetics umbrella which is what happens when you have a variant in the DNA sequence at a regulatory locus. So this is the Bronx. This is an income map in the Bronx and for those of you who don't know New York City you can see the Bronx because it's at the top and it's quite pale. The Bronx is diverse. 9% of the Bronx looks like me, northern European white. 91% is everything else imaginable and it's going to get even more diverse with time because what you're seeing is that a lot of the older folks are disproportionately northern European white. We're going to see this non-gentrifying borough become more diverse with the passage of years. It's where the poor and immigrant populations live. 26% of the Bronx residents live in poverty and in fact it's 36% in congressional district 15 in the South Bronx which is the poorest congressional district in the United States. It's just across the Harlem River from some of the most wealthy people on planet Earth and these patients of mine do not deserve to live like this. 34% of the Bronx residents are foreign born and quite a few are undocumented and uninsured and these are the languages spoken in the Bronx. The North Bronx is largely English speaking as the primary language. South Bronx would be mostly Spanish speaking but if you take away those two primary languages and you ask what's the third most common set of languages spoken, yellow is West African. We have very substantial West African communities throughout the Bronx. What makes us have a very interesting position within the Bronx in the health system in the medical school is that we're basically on our own up there. There are some city hospitals but otherwise Montefiore Medical Center is the sole provider of care within the Bronx and the medical school which is very research intensive lots of NIH grants is right in the middle of the communities that are the most diverse in the United States. So we have had people you know the usual suspects approach us to say we'll sequence hundreds of thousands of your individuals in the Bronx. There'll be millions of dollars worth of sequence that we can give to you and we've declined those offers so we don't have sequence of everybody in the Bronx and because we have a set of principles in the health system that have to be adhered to before we'll have a partner work with us and these are largely the ones that you would have seen in other venues for but the top one is very important and is part of the trust with the community where we say the reason for health disparities affecting black and brown people is not because of black and brown DNA it is because of racism and socioeconomic structural inequalities that have been built up. Everything else here you probably recognize as good general principles that you would have heard in other forums. So we have got some very good partners that we have welcomed. I've managed to set up with colleagues at Einstein and Montefiore a clinical testing service for every new cancer diagnosed in the Bronx through Karris Life Sciences which allows us to get exome and transcriptome sequencing on every new tumor that's diagnosed and have genomically directed tailor therapy for these patients. We have been inventive in terms of finding sources of genomic information. Sri Raj is an extraordinarily brilliant population geneticist who works with me and we have been working with the all of us data. The release before today's big announcement which doubled the number of people and genomes are out there but this as you can see shows a great diversity of the genomes in New York City and you'll see that within the Bronx there is a relatively small number of people who have only blue in their genomes which would be the northern European white folks much more so than say Manhattan, Queens and Brooklyn. So all of us is a laudable approach in so many ways because it has placed the LC issues of sequencing at the forefront which has made it attractive to us. So what Sri has done with these sequences is she's done some identity by descent analysis to look to see where the more related groups of people exist within these 13,000 genomes and it's prompted in part by the recognition of clinicians like myself that what is rare globally can be common locally. So if I have a patient who turns up in my clinic and they've got albinism I don't think of the typical textbook causes of albinism. I ask if they've got Puerto Rican ancestry and I look for one of the two Hermanski-Pudlak genes because those are founder effect mutations in the Puerto Ricans. Likewise a child comes in Puerto Rican ancestry who has short stature skeletal dysplasia. The first thing I think of is steel syndrome. So founder effect mutations, variants that cause disease that are enriched in a population that you serve in your health system you need to know about. So this is why what Sri did was she looked to see what's happening in New York City in general. So these are clusters, the size is the number of people, the distance is the FST, the relatedness and it makes a lot of sense. Europeans with Europeans, African with African and so on and the one that I'd like to focus you on is this South Caribbean group here with African ancestry and through our colleagues at Mount Sinai we're able to deduce that these are Garifuna South Caribbean, a very specific ethnic group from the South Caribbean and I want to point out that the cardiomyopathy variant that is at the top of the list there has been previously described by our Mount Sinai colleagues but not yet published so I want to give them appropriate attribution but what you can see here is that in the top list of genes is all of the things that you'd recognize to be variants associated with disease in Ashkenazi Jewish individuals and these are known founder effect mutations although we are picking up some that have not been previously described in Ashkenazi Jews and pretty well everything that we're going to find in these underrepresented groups represents new founder effect variants so the ability to go into populations and to understand the specific patterns of disease that you should be seeing in them in the rare disease space is also a possibility. We want to get a better sense of what the genomics looks like in the Bronx and we have an ongoing study de-identified specimens 3,000 of them that represent the 25 zip codes the Bronx proportional to the population of each zip code and that's going to give us two axes of information. One is where in the world did all of these genomes come from and how have they become admixt and it gives a real granular view of where we think the complexity and opportunities exist within the Bronx and we know that the Garifna for example have a big community here in the central Bronx. So the other thing is to have the Bronx become a laboratory for environmental exposures how it interacts with disease and genotypes. The our partners ACLIMA have been driving convert hybrid vehicles up and down the streets of the Bronx as you can see on this map and they have about 16 I believe it is sensors for different types of pollutants and they just shared this with me as preliminary data they're still having completely covered the Bronx as you can see but what you can see running west to east like a gash through the Bronx is Robert Moses cross Bronx Expressway and those big red lines are the big highways that run through the Bronx but even in the back streets you can see that there are a large number of communities where there's huge amount of this PM 2.5 pollution and in fact if you look down here this is the food distribution area for the entire of New York City the Hunts Point area where the big trucks go to deliver and pick up food and the pollution down there is is is dreadful. So the other thing that has to happen better in general but in particular in my communities in the Bronx is to improve on our poor genomic diagnostic rates in rare diseases. Montefiore is an unusual place to practice genetics because we're a pioneer accountable care organization only 15% of the Bronx is commercially insured the rest is either uninsured or government insured so in effect Montefiore is the insurance provider for the majority of our patients so if I need to order an exome or a whole genome I'm I don't have to go through a pre-authorization process with a commercial insurer so I can order anything and what happens is I can order exomes and I still get this very low diagnostic rate so it's not because of the lack of testing in the Bronx it's because of the fact that we're just not able to make diagnoses so we formally studied this in our CESAR 2 project NYC Kids Seek and as you can see the overall diagnostic rate was about 18% and these are kids presenting with neurological cardiological immunological problems the bread and butter of any sort of clinical service and why this is the case the DDD study out of the UK that came out last week even in the abstract I could see things that were resonant there is the things that were determinants of their diagnostic success included cryos you have both parents you're going to do more to make diagnoses because of instability of families who have poverty we only have about 55% of our patients who from whom we can get samples from both parents so immediately half are look are facing a struggle to get a diagnosis and maternal diabetes is rampant in the Bronx and most of the Bronx has as part of its heritage African ancestry so there are many good reasons why it's difficult to make these diagnoses in the Bronx and it means that if we can solve those problems there we can show the way for a lot of other people so we're setting up the New York Center for Rare Diseases through Montefiore and Einstein it's and we have three good partners there Gene DX the diagnostic company Pac-Bio due to long-read sequencing and Google and what we're going to do in this is have three sets of innovations I'm very interested in how we can do advanced phenotyping of our patients especially patients who have not been represented in the dysmorphology textbooks so there's a lot of natural language processing we want to do harvesting of HPO terms working with facial imaging and so on we are going to do long-read DNA and RNA sequencing and we're going to build advanced analytical approaches for these data and make that an open sandbox so that if people have good ideas good products that we can add into this analytical system so they can try and help us to do what's going on in our patients in the Bronx we will be happy to have those dropped into the sandbox and here's the other issue right now we only make diagnoses based on the 3% of the genome that encodes proteins it's 97% of the genome remains a mystery to us but if you've got the FN1 gene you've got a patient whose family has had multiple aortic dissections there's nothing showing up in the coding sequence of this and related genes I want to know what's happening at that promoter but I also want to know how would I know whether something that's happening at that promoter is it could it be damaging in some ways so that's going to be the last part of the talk today so of course when you have a low anything genetic that doesn't explain phenotypes whether it's GWAS or whether it's misinheritability or low diagnostic rates with rare diseases somebody is bound to say but what about epigenetics what about epigenetics indeed so let me talk to you about non-coding DNA and transcriptional regulation and trying not to use the word epigenetics as much as possible because I've evolved very much how I think about this very ambiguous term to the point that I took a sabbatical back in 2014-2015 just to try to straighten out my head about how we should be doing this kind of work and the insights that I generated during that period of time have proven to be what I'm currently using first of all though I want to give you a bit of history I'm hopefully going to have this book on epigenetics coming out sometime in 2023 and marking up a cover there it just happens to use the Irish colors thank you very much to the artist who gave that but I want to give you the history of epigenetics and why the term itself is just you know there's nothing special about it so the early 20th century the embryology community had just kind of got past the idea of pre-formationism they knew that there wasn't a homunculus in a gamete that specialized structures were formed from non-specialized precursor cells and this was the process of epigenesis but then the geneticists came along and they said hey we irradiated this mouse and it is a T-hairpin phenotype in its tail or we irradiated these drosophila and they have these white eyes what's interesting is when we breed these animals their offspring also have the same phenotype so that genetic material in the nucleus those chromosomes must have something to do with heritability and the embryologists went wild they were pretty hostile to the idea that their turf was being invaded by these geneticists and I've read some of the accounts of this time and they're using Nazi Germany language to describe the geneticists it was for some reason completely out of control so in steps Conrad Hal Waddington pro tip smoke pipe you look so intellectual Waddington was a dropout from his PhD program in geology in Cambridge and he was a dabbler he decided that he would go over and start dabbling in biology because he found it a little bit more interesting as part of his dabbling and he was a very successful dabbler he eventually got a doctoral degree based on published work but he went over to Germany to work with Hans Schpeemann he of the Schpeemann organizer classically trained embryologist leader in the field but he also got a grant to go out and visit Caltech where two of the former members of Morgan's fly room at Columbia had ended up Alfred Sturtevant in Theodosius Dubjansky so he learned what they were doing from a classical genetics point of view and he was unique for having a put in both camps so both of these warring camps and so in true Rodney King style he said why can't we all get along and here was his solution he said imagine this landscape and a ball is rolling down it's making these decisions that's like epigenesis right it's like cell fate determination differentiation decisions and so on but imagine that this landscape is pulled into place by these guy wires and those guy wires are shaping the landscape creating the cell differentiation potential but if you snap one of them it changes so that snapping a guy wire is like mutating a gene and that's why you can have altered cell fate altered morphology and its epigenesis meets genetics it's an epigenetic landscape model and that is a really simple and somewhat boring model and gravitationally it's very implausible because for some reason he thought gravitationally pulling the ball down the hill side but the guy wires were were acting on a landscape was trying to go upwards so while he dabbled a lot he obviously never went over to the physics department so David Nanny about 10 years later after a 14-hour propeller plane ride from New York to Paris was describing to a Greek speaker why he wanted to call his cytoplasmic inheritance of maintenance type locus characteristics in tetrahymone car genetic and was told that that was linguistically incorrect and he needed to call it epigenetic and he did and so this was the second use of the term and he was absolutely right we now know it's an siren a immediate mechanism the cytoplasm but he introduced the idea of epigenetics as a cellular memory phenomenon and again this kind of came and went until the 20th 1970s when John Pugh resurrected the term so this is one of two pictures you'll ever see of John Pugh he's the longer haired guy in the glasses there John was another dabbler he was working with Robin Holiday in the MRC in north London and he was meant to be working on recombination repair but he was fascinated by X chromosome inactivation Mary Lyne's work and he couldn't figure out a mechanism for how a signal early in life could be propagated to later in life but he went to see Ruth Sager who was doing a sabbatical in London at the time and she was talking about her work with the restriction modification systems and she was talking about the modifications being the addition of these covalent groups to the DNA sequence and that was John's aha moment he said this could be if it's passed on to the daughter chromatids this could be a way of maintaining a memory of a prior event so we went back to Robin Holiday and Robin said a couple of years later okay we're ready to write this up as an abstract for the Royal Society so that's me and John Pugh in best of the green just before the pandemic and I said to John why did you use the word epigenetic and he said I had a 200 word limit I needed to be able to talk about mutation being a change in the nucleotide and I needed to be able to talk about it being a change in the presence or absence of this covalently added methyl group and so I called it epigenetic I said were you inspired by David Nanny or Conrad Waddington and he said I'd never heard of either so here's the problem oh and then the other thing of course is Robin Holiday ultimately started to talk about epigenetics and this DNA methylation as something involved with genetic regulation and that became the the definition that that he ran with over time so the word epigenetic was independently come up with by these three individuals in our evolutionary biology colleagues would say that this is like convergent evolution there is no thread that carries through to these three individuals and they're and they're used to the word and Art Riggs who passed away about a year ago wonderful scientist he was coming up with exactly the same ideas about DNA methylation and X chromosome and activation but to my eternal admiration and never use the word epigenetic so why do we think that epigenetic mechanisms mediate disease and these mice here are enormously influential in this field you can see the one on the left is not only yellower it's bigger so this was experiment these were experiments done on the viable yellow mouse and what you would see in a litter was a mixture between these obese metabolically disturbed animals and other animals that look like they didn't even have any any mutation at all and it was because there's a retro transpose on that it landed upstream from the non-aguity gene and if it was constitutively active it was causing the phenotype it was silenced through these regulatory mechanisms the mouse looked like it didn't have a mutation at all and what George Wolf did in Arkansas was he fed these animals this high single carbon donor diet in the pregnant mother and he was able to shift the balance from the yellow mice over to the pseudo aguity mice and influence their adult phenotypes so think of what this means it means that an environmental exposure mediated through some sort of effect on the expression of this retro element was leading to a phenotype later in life so there's a memory component to it as well and when DNA methylation was shown to be involved with the silencing of the IAP element it seemed to tie all of this epigenetic regulatory mechanisms together so I think this is probably the most influential model that we we kind of grew up with so let's think about what we're saying here we're saying that there's some sort of extrinsic influence to the cell and what it's reflected by is some sort of change in either chromatin structure DNA methylation and a change consequently in gene expression so that's that's how we think about it right that's that's the model that's in our head what's what are we considering happening at the cellular level so if you've got a healthy individual and a diseased individual or an exposed individual and this what we think is that cells have reprogrammed cells have changed their DNA methylation their chromatin structure their gene expression and that this is a what I will call cellular reprogramming so the idea then became that if cellular reprogramming has occurred we should be able to look at these molecular events that regulate the cells and if we find changes in the individuals of the disease that's proof for the hypothesis wrong the problem is and we've done so many of these genome wide sorry epigenome wide association studies that have ended up being uninterpretable and it was one of the reasons why I took a sabbatical because I didn't know what the hell we were doing one of the reasons you can have these changes of DNA methylation has nothing to do with cell reprogramming is that you just have a change in the proportion of cells that are methylated at this allele where they become go from being a high proportion to a low proportion and the bulk change is going the bulk pattern is going to show high methylation to low methylation but there is no actual reprogramming of any cell the other thing is that you could have DNA sequence variants effects sorry let me just go back and where on one allele here we've got a SNP which is shown in blue another in red and everything nearby is unmethylated on the other allele it's methylated sequence the DNA sequence polymorphism effect is nothing to do with cell reprogramming it's just the genotypes driving driving these changes so and so then I started thinking well why did you choose this particular locus in in in response to this environmental stimulus because DNA methyltransferases, tetoxidases, chromatin remodellers, and modifiers they have no sequence specificity nor do you want them to have any sequence specificity because they have to go to different places in different cell types right all with the same genome so what we're probably looking at is the footprint of transcription factor biology where the transcription factors are either binding or not and that's what's being reflected by the chromatin changes and the DNA methylation changes and the way that they change the chromatin DNA methylation is basically by recruiting enzymatic complexes this is just well known biology but for some reason we weren't putting it together in our heads with all the DNA methylation and chromatin stuff and this is a paper from John Stamatoianopoulos in the University of Washington that I think kind of got a bit lost because it came out early in the pandemic but what John did was he found all of the nucleus sensitive sites in in basically the open chromatin several hundred human cell types he was able to show where the actual transcription factor was binding because he was using DNAs and it gives a specific fingerprint and then he went into the top mid-down and asked the question what is happening at these loci are these under purifying selection they're less variable than other regions around it or or otherwise and he showed that they're actually massively increased in their diversity compared with the 200 base pairs planking so the loci where these transcription factors are binding are highly polymorphic between all of us sorry and so it's completely unsurprising that we would have differences in chromatin structure DNA methylation and so on at these loci then the next question becomes well if transcription factors are so important what regulates transcription factors so some transcription factors are nuclear hormones something binds in cytoplasm to get into nucleus blah blah blah but many if not most are actually living downstream of cell signaling pathways so what I like to think now is that if we can infer from our genomic information what the transcription factors are that are mediating whatever it is we're looking at and we know something about which cell signaling pathway is acting on that transcription factor we have an insight into the cell signaling that could be associated with this phenotype so this is where I would plea with you to stop doing hypothesis-based research what I think you should be doing instead is question-based research and this PMID here's work from David Glass where it's very philosophical but he's saying if you have hypothesis-based research not only can you fail but you can also you'll also be biased towards trying to swerve your results towards proving your hypothesis but if you have a question-based research where you say I'm open to these other influences other than cellular reprogramming being informative in terms of understanding the pathogenesis of the disease or the mechanism of the phenotype then if you're open to all of those you don't call them confounders you're going to get insights and you will succeed so I think this is a very useful way of thinking about how to do genomics in general and it is it's definitely the way that we should be thinking about it with this complex set of things that we talk about with functional genomics so I'm going to give you a few vignettes of how this altered perspective has helped us with few studies the first one is a very simple study infected toxoplasma gondii into some human cells and looked to see what was happening to the host transcriptome and chromatin and the we also looked at the toxoplasma itself although I won't show you that so we did RNA-seq and a taxi can bulk and what you see first of all is this massively overexpression an extracellular matrix metalloprotea is Adam TS-15 and it and some other members of its family are overexpressed as a result of the toxoplasma infection that's kind of interesting because when you dissolve the extracellular matrix you allow the cell to kind of slide within the tissue and that's thought to be part of helping to spread the infection within the tissue when we looked at the the other upregulated genes they fell into three main categories immune which you expect because the cell is fighting off an infection but also cell division and metabolic processes and we're interested in the metabolic in particular because toxoplasma is an oxotrope it can't make all of the nutrients it requires for its own survival it needs to use those from the host the host cell so the idea that it was inducing expression of some host cell genes involved with metabolism was really interesting from that point of view so again with this transcription factor centric idea we took the attack-seq data looked for the overrepresented motifs and Fosby Juneby is AP-1 Relay represents NFKB and our F1 is RAS-related transcription factor basically and we asked then the question well which genes are associated preferentially with each of these transcription factors and while the NFKB responses solely as we as far as we could see an immune response the other is as well as inducing immune responses the AP-1 was involved with cell division genes and also metabolic genes while the RAS-related pathways really focused on the extra cellular matrix so this is our model now instead of thinking in terms of what the genomic information is telling us on its own we're working our way back to transcription factors and cell signaling we already know that there are some molecules secreted by toxoplasma that influence p38 map kinase but we know that we should be looking for things that influence ERC-1 and 2 and the SAP K-jun map kinase pathways as well because they're upstream of the gen and RAB-1 so that's one vignette the second is and that's reprogramming of cells the second one here is just to illustrate something quite simple that you cannot alter itself populate subpopulations within a population of cells so we were taking CD4 positive T cells from peripheral blood from younger and older individuals this is a study of aging where epigenetic events are thought to be important we did single cell RNHC can we saw that there was one group of cells that seemed to decrease with age and another that seemed to increase with age and it will be of no surprise whatsoever to people with a bit of an insight into immunology that you lose naive CD4 positive T cells with age and you gain cytotoxic T cells with age so this I'm just putting in to illustrate that I'm not ignoring the cell subtype effects that can occur to confound your cellular reprogramming ideas but this is all also the first 12 of 400 single cell RNA-seq experiments that my graduate student Maria Tomatos has done and this is going to be an amazing data set to study for aging I want to say Fordham University is also in the Bronx Maria Kundukovic is an extraordinarily talented neuroscientist and she was resistant to the idea that why is it that we're so we just take female mice out of neuro but neuro behavioral experiments because their their behavior is variable depending on when they where they are in the Easter cycle she said let's positively use that as as a physiological model that could be studied and it has relationships to things like anxiety depression and so on so it is well known that if you're in a low Eastern state as a female mouse you are much more timid whereas the high Eastern state which is a like a four to five day period causes you to be much more bold and less anxious there are actually some cellular phenotypic changes that she's found the ventral hippocampus actually undergoes a lot of dendritic genesis in response to the Easter dial being at high levels and that regresses as when the Easter dial goes down so there's this amazing remodeling that's happening of dendritic connections in the ventral hippocampus so we helped her with the attack seek on the nuclear nuclei that were new end positive so neuronal nuclei from the ventral hippocampus and what we saw there was some low side were starting up closed and becoming open others were open and becoming closed and this was happening over a four day period a really beautiful demonstration of the dynamism that can happen in these cells that are fixed in the brain it's not like they're they're circulating cells and being replaced by other cells and the genes nearby were increasing and decreasing their expression as part of this process so again what do we look for we looked for the transcription factors that were present at these low side that were changing their chromosome states and while there were a lot of the MF2 transcription factors one thing that you don't see here is the Eastern response element and the thing that you do see here is EGR1 and when Maria saw this she says I know what's going on so Easter dial combined one of two receptors one is the nuclear receptor for Easter dial which is present in the in the cytoplasm the other is the membrane bound version and if the membrane bound version gets gets bound it kicks off a number of kinases in particular map kinase in ERC which control EGR1 so when she saw EGR1 lighting up she realized that this is the way that the Easter dial was probably working and she subsequently showed that the that the membrane was where the Easter dial was binding and then the final vignette is to look at one of these endocrine disrupting chemicals a chemical called tributyltin which is an obesogenic agent and everything from C. elegans to mammals so Taylor Thompson made this the focus of her MSTP project and she wanted to understand what was happening at the regulatory level because it was thought that what was happening with TBT is that it was binding to PPAR gamma RXR and that by doing this it was facilitating its action and causing adipogenesis so we wanted to know is it doing this by speeding up the binding by binding it at the top of loci we're brought into this model this principal component analysis plot here is showing the the variation that you see with expression over the the several time points regular differentiation as shown with the red line we used a PPAR gamma agonist rosiglitazone and showed that there really wasn't that much of a difference in terms of gene expression but when you hit these cells with tributyltin they really started to change in many ways so what you're seeing here is gene expression differences but what you don't see is that they also became hyperplastic hypertrophic hypertrophic increased amount of lipid within them earlier and when we looked at the gene expression patterns it looked like there was Beijing of the of the adipose tissue it started off as white and then this was inducing genes that were associated with beige adipose tissue so it's almost like it's a different cell type that's being produced when we looked at the data and I'm summarizing an awful lot in this one slide we did not see any evidence for PPAR gamma rxr activation what we did see was decreased teed transcription factor activity and the teed transcription factors are are associated with adipogenesis we also saw a big RAS related gtpa's response and both of these independently were saying to us something's happening with the actin there's presumably some sort of damage to it so what david has done is he's gone and he started to look at what's happening with the f actin in the cells and he's showing that there is damage which is related to the dosages that taylor was using in her experiments so now let's finish with the last section which is I think the it's a less complete story but it's so fascinating that I thought I put in front of you today because it's it's I think where we need to be thinking about a lot of questions going forward we have been developing this thing which we call the regulatory landscape and real enrichment analysis approach so if you have certain things that should be randomly distributed in the genome are they non-randomly overlapping the open chromatin regions of one or more cell types so we used to do this through a permutation analysis approach so for you know 52 cell types we're doing a thousand permutations for you know each of the sets of loci of interest that we're putting in that becomes computationally very expensive so we've because we have enough observations we were able to use a central limit theory approach and now what used to take um 10 hours is three seconds um so this is a package that we'll be releasing shortly generated by eric and sam and this is work that we're doing with donald shane back in ireland um so we decided we'd focus on the trait of autism neurodivergence and you can see the numbers of individuals that basically we 7,637 individuals with autism and we looked at de novo variants with the assumption that they should be randomly distributed in the genome we looked at 52 cell types and we performed these permutation tests and we also performed the central limit theorem approach um and we had concordant results so what do we find what we found i think you can ignore the missing siblings uh column on the left because it's actually a very small number of individuals but what you see for almost all the cell types in almost all these cohorts whether affected or unaffected with autism is that we were seeing significant enrichment for de novo variants in cis regulatory elements of all of these cell types so this was blowing our minds a little bit because how in sperm or an egg do you know that this locus here is going to become an enhancer in a liver cell or in a breast epithelium cell or whatever it might be what is the how is that marked in some way um so this is very consistent with john stamatory nautilus's ideas right because if your de novo variants are landing right in cis regulatory regions of course you're probably going to have increased nuclear tie diversity and perhaps it could be related to where the transcription factor is actually binding um this is also something that made me think of ryan ernan does's work where because by definition these are extremely rare variants if they're de novo variants and what ryan was showing was that the uh the effect that you get on the heritability of gene expression is you know it's there but it's low until you get to ultra rare variants variants that are n of zero or n of one in nomad um so um are we potentially identifying these variants that are have an increased effect on uh dni methylation chromatin structure and gene expression so the clue that we got was when we looked at the the triplet the triplet context of these variants and what you can see is everything with a little red cg in it is one of these triplets that has a cg the vast majority of the of the variants were of the de novo variants are happening in a cg context um so this is a highly mutable sequence in the genome obviously because if it's methylated and you lose you deaminate it instead of uh deaminating cytosine to urusil it deaminates to thymine and it's very difficult to repair because of the because you don't uh the machinery doesn't recognize it as being foreign to the dni sequence so this we then looked at the uh ultra rare variants in top med where there is zero n equals zero in nomad the word found nomad exactly the same pattern and when we looked at those top med rare variants by window they're they're clustered so we think that what we're seeing is clustered cgs in the genome and for anybody who's spent any time looking at the genome and i think that this place people probably have um we recognize these to be like cpg islands the regulatory elements of genes um and as such they are um uh more likely to have transcription factor binding sites so that's john stamatory and options observations explained and they're more likely to have regulatory effects on on genes if you're enriching for say promoter sequences so that's ryan or nanda is explained so we're we think that we're uh probably looking at denova variants which are um they're they're they're marking they're ending up as cis regulatory loci and somatic cells not because the chromatin is marked in some way during committed genesis but because of base composition which makes a lot more sense so we then asked the question when these events occur what are the what do the genes nearby look like and there are about 162 genes that the spark folks have put together that say these are autism genes and what we're able to um focus in on were the ones where the um the cis regulatory elements um within 10 kb of the transcription start site with a denova variant what we are now calling denova regulatory variants or dnrbs um we're enriched in these 13 cell types here so now what we're doing is we're trying to figure out how to predict whether these denova variants are denova regulatory variants are damaging and we're taking some inspiration from stefan sanders approach which is the category wide association study approach um and what we're looking at here is a denova regulatory variants from two separate individuals with autism in in these cohorts and they're they're sort of side by side at the k-rass gene which causes costello syndrome and is associated with autism and there's a constitutive um uh hypo uh sorry uh open chromosome region at this promoter site as you'd expect when you look at the um the sequence in detail it's an uh that's transcription factor canonical binding site and you can see that to put a t into either of those positions is enormously disruptive of what the transcription factor is expecting at that locus and you can also see that the second t is in a cg context just as we were describing earlier so what we're trying to do now is to an enormous amount of work because we're doing two comparisons one is autism versus neurotypical and i'm not going to call call autism a disease because that's uh we sensitivities with our patients about that but within the autism group high iq and low iq and low iq is where you start talking about pathogenesis and disease and as you can see here we've got a number of different uh properties of the of each of these de novo regulatory variants and we want to annotate them and see if there are differences in these 20 in sorry let's you say 13 cell types we've narrowed it down times two comparisons times all of these different things here so eric sosa in the group has been very busy and he's looking at things like the fast con score and it looks like there could be an increase in the in the conservation of these low sign the individuals with low iq versus high iq he's also taking the phylo p scores and we're doing more analysis on that and also looking at the distance of the de novo regulatory variant to a canonical transcription start size for for a gene that's looking de novo regulatory variant by de novo regulatory variant but what if we do it person by person people with with autism in these different categories and that will give us an indication of the genomic architecture what needs to be present to have either the trait of autism or the low iq and the first clue that we have that says we're going to find something here especially with the low iq issue is when we studied the number of de novo regulatory variants and we chose glutamatergic neurons here because we're interested in them there we can create them relatively easily from ipsc so we think it'll be a model for us and as you can see these curves start moving further and further over to the left into the area of the low iq as you have more and more of these de novo regulatory variants so this tells us a few things number one there's probably information outside the coding sequence that helps us to understand who's at risk of autism who's at risk of low iq autism and it fits with an oligogenic model it fits with the idea that multiple events are going to in one individual add up to an increased susceptibility or a worsened phenotype if iq is described as as a worsened phenotype so how do we test these loci because it's all very well for us to predict stuff but we have to close the loop and test it so this is where long read sequencing i'm showing packed bio data here but i want to emphasize that the oxford nanopore appears to be an extraordinarily good technology as well and i want to look and be neutral about that so i have a couple of interesting observations here and i need you to geek out with me for a minute because this is this is me looking at these data and going woo so we what you've probably seen these kinds of data before there are two samples here there's blood and there's lymphoblaster cell line and you can resolve the haplotypes i don't know if this is paternal or maternal but it's one of each and what you can see here is that there's this is the same variant that i showed earlier from the bisulfite sequencing having a difference in methylation on both chromosomes but now with packed bio data we can look at much broader context of what's going on as opposed to the 200 base pairs around the the nucleotide itself so this minor allele what it does is it reconstitutes a pu1 binding site so this is an unusual situation where you actually lose methylation at the locus if you have this minor allele so that's why you see the blue which should be only in this area here extending downstream to beyond the the methylation QTL but the normal allele is as shown here and what you can see is i'm flagging everywhere where there's a polymorphism and this lower haplotype here is the same as this upper haplotype here in the lymphoblaster cell lines and that's important because when we zoom in here i'll show you there's the area of demethylation so things have now turned red from red to blue i should say but there's a resistant allele there may be another one a little bit higher up but i want you to focus on this one here there's an allele where you don't lose the methylation despite the fact that it's in sys with this variant you can see the the read extending over to the variant now you say oh come on john that's over interpreting it that's looking at one maybe experimental artifact but what happens when you go down here to the same haplotype in lymphoblastoid cell lines the everything is unmethalate is methylated despite the presence of the functional variant and all of these cells are ebb transformed b lymphocytes and b lymphocytes you would expect to be present in about five percent of peripheral blood so i um this is me geeking out i freely admit that but i would guess that this is a b lymphocyte here it is resistant to the spread of demethylation as it is in ebb transformed b lymphocytes so this tells us two things number one is we can use this five base sequencing to see where a variant is located and see a change of DNA methylation that so it picks up functional variants but it also helps us to identify the cell subtypes in which the functional variant is active or not and that is a deconvolution exercise that we have to um work on separately so um that's the i've kind of jumped ahead there obviously but there's your punch line so to finish i want to emphasize that i want this kind of research to be happening in the Bronx so that there's no head start for european genomes in other words we we don't make the mistake that we made in the GWAS era and even the rare disease era where everything gets worked out in europeans and then we play catch up with every other group i don't have that option in the Bronx because i only have nine percent northern european white individuals non-hispanic white individuals so if we're going to understand non-coding variation not as a confounder of epigenetic studies but as a source of information about phenotypes that we can use diagnostically with rare diseases and indeed with with common diseases so we can improve our diagnostic rates um this is the path forward that i see and i'd be really interested to get the input and active involvement in this quest from other people so um i have this i'm i'm going to slow down a little bit and just say marlia tomatos is extraordinary she's been doing uh 400 single cell RNA seek experiments and 400 attack seek experiments and 400 genotyping experiments on these samples from your aging project and she is becoming really good analytically as well she's a she's a force uh jacob starver is doing work on clonal hematopoiesis david yang has been working on uh what we hope to create as a new field of population epigenetics which uh i hopefully will get to tell you again about some other day cassidy londi is trans-differentiating uh hepatic stellate cells into myofibroblasts to understand how to prevent cirrhosis in patients with liver disease eric is doing the autism work and christine is working with us on natural language processing in um in uh our rare disease patients the genome diver team central rep genomics collaborators new york central rare diseases mentioned them thanks to them and thank you to you for having me here today hi great um i was wondering uh as you look particularly as you're seeing these clusters in the cpg islands um are you thinking about trying to get uh long read RNA seek for single cell to see if you're changing um start sites of some of these transcripts and and see if that's one of the driving things in terms of of changing gene expression and the kinds of things you're seeing in with the genes we hadn't thought of that um yeah thank you that's a really interesting idea um obviously isoform use is a little bit more difficult to link to pathology um because they're they're still meant to be all normal um but if if there was something different about the sort of upstream component of the gene in terms of splicing or you know start site of the protein coding component um possibly yeah so sometimes the initiation site can actually change the downstream yeah splicing determination as well yeah um i hadn't thought of that thank you that's a really interesting idea we have a question online um leslie bisaker says great talk john going forward what you think are the best candidate phenotypes for an epigenetic etiology and practical clinical diagnostics less i'm sorry that you're unwell um and next time um so if you're going to do um it depends on what you want to epigenetic means a lot of things so if epigenetic in this case means you're going to use an epigenetic assay to try to understand a disease i think there are two answers there one is um we could it's going to be very difficult at the functional genomics level to be able to say i can predict this particular sequence change to be uh changing the function of this regulatory element um because you'll never see it twice if it's a very rare event so if you are say crispering in these individual nucleotides into a locus to try to understand their their effects these epigenomic assays could actually help you because they'll it you may have a common outcome of all of these variants in that they you lose the regulatory locus or it opens up the chromatin or whatever it might be so it will kind of collapse the information from the individual rare variants down to a common output of the chromatin structure has changed or the DNA methylation has changed so i think those assays could be helpful from that point of view um as regards just taking DNA methylation data and using it for um understanding how to predict diseases there's really nice work being performed by Aaron Halperin and Noah Zaitland in UCLA and what they're doing is they're taking samples of blood from patients in the UCLA health system and they are um doing DNA methylation assays while they also look at the genotypes of those individuals so the classical approach classical meaning last few years um is to do polygenic risk scores and associated with those risks they're finding that the DNA methylation risk scores are outperforming the the polygenic risk scores and that's probably because the DNA methylation is already influenced to some extent by the um DNA sequence variation so it's probably capturing some of that variation but it's also probably having um reading out things like cell subtype effects and perhaps cellular reprogramming as well that are um uh adding information that you don't get from the genotype on its own so i think that those were the two things that i'd particularly be interested in. Yeah great talk um so i was getting the title of your book and actually the cover there where you have you know genetics and the word we're not going to use but um and i've always kind of looked at the chromatin architecture as dynamic so you know you have a transcription factor binding you open the chromatin things get into modify the DNA and here you have this great pool of variants in new york and it looks like you're putting all those things together but is that dynamic model of chromatin structure leading to downstream events whatever they may be is that the model you're testing or are you waiting for that to kind of emerge as a conclusion? There's a lot in that isn't there um i think that we are making the assumption right now that there's a a reference epigenome no matter what your ancestry is but if we know that sequence variants exist between individuals and populations and that sequence variation can influence whether the chromatin is open or not just as a shorthand for regulatory locus um it's quite possible that polymorphism of the regulatory landscape exists between individuals and could be linked to ancestry and uh your well ancestry in which case we can't go in there with the assumption that if you're black or Hispanic or whatever it might be population in the Bronx that there is an open chromatin locus here just because it was shown in a cell line from a you know the one we use is from a white woman in Italy so i think we need to go in there and make these discoveries independently in each group what i'm not sure of is whether i've answered your question all right yeah you kind of got to it um the question is um i refine the question a little bit um the i would think that just like you're saying you're going to have these genetic variants that lead to downstream consequences but um you think there's going to be an order of operation are you going to open the chromatin first or are you going to do something else first yeah it so i'm not really talking about the context of disease i'm talking about with the thickening biochemistry yeah um mark tashney is very interesting on the topic of epigenetics um but what mark has told me which i thought was very interesting is that these so-called pioneer transcription factors like ap1 are you know they're distinctive because they can open up the chromatin they can get in there they can recruit remodelers um but as mark says if you overexpress any transcription factor highly enough it'll get in there and it'll cause the same effect so um that may be hyper hyperbole there may be some that actually don't but the the idea that um pretty well any locus can be opened up by the binding specifically of a DNA binding protein um uh presumably with the recruitment of the appropriate complexes um makes me feel that it's probably a generalizable um uh phenomenon how DNA methylation DNA sequence polymorphism works at that stage it becomes interesting because um i think DNA methylation probably exists blanketing the genome to reduce the space that the transcription factor can actually bind to easily um and DNA sequence variation obviously is going to influence uh this ability as well and the other thing that we always talk about we always talk about transcription factors acting individually they don't they act as sister regulatory modules so it's you know there's some things that Chris Black in UCSD are showing that terrify me where he shows in his different lines of mice that have polymorphism at these loci that there's some places where he just doesn't see a transcription factor binding in black six or whatever it is but there's no sequence variant there that explains it so my worry is that there's probably it could be the effect of something which is coming together with it in three-dimensional space and it's even more complex than just the three transcription factors is what we're binding at that locus so complex we have one more question online one more question um how would long read sequencing of people population studies like the new data released from all of us today be helpful in this future work sounds like Dr. Denny's on the line um the obviously if you're making the assumption that the structure of a locus is some canonical structure and you're wrong about that in people who are of different ancestries to whoever was used to make that original canonical genome um then everything downstream is is going to be effective right you're going to have your chromatin studies and your your DNA methylation studies and everything influenced by that so not only is the long read sequencing going to help us with the understanding of of what could be variable in your individual patient with their individual ancestry but if you're reading out DNA methylation at the same time you'll be able to say not only do I see a variant here I also see a change of DNA methylation and one of the great things about rare variances you have the other DNA haplotype for comparison the joke of medical joke um orthopedic surgeons why do we have two arms so the orthopedic surgeon has a normal one to compare to um the we have if we have a a sequence that has not got the variant it's a wonderful comparison because it's within the same cell they've had the same environmental influences they're at the same cell stage in the cell cycle and so on so having um having uh long read sequencing reading out the DNA methylation at the same time is going to be a first clue as to whether the variants in that locus could be influencing the regulatory properties of that locus okay so we're a little after the hour we'll end things there so thank you all for attending and thank you for an absolutely wonderful talk