 Well, it's great to present for this group. John, thank you for the introduction, and a lot of this work is in close collaboration with John's lab. So I was going to talk about how epigenetics actually may control genetics. So throughout this meeting, there were multiple talks pointing to the importance of genetic variation in understanding of epigenetic landscape. So there is a lot of justifiable interest in how genetics controls epigenetics. So briefly, all the studies can be summarized as take your favorite epigenetic feature, QTL studies, right? So you have EQTLs, methylation QTLs, chromatin accessibility QTLs, all types of QTLs. And of course we believe that understanding the effect of genetic variation on epigenetic features can allow us to go a long way in understanding the mechanism of disease association, the biology and so forth. However, we're interested in the inverse problem is what is the effect of epigenomic landscape on genetics? And one of these effects is how epigenomic landscape controls mutation, right? Because the source of variation is mutation. So what we've been doing, we've been looking at data on mutations both in germline context. So we have now sequencing data for multiple trios and quads. I wouldn't be talking about this today. And somatic mutations, conscious somatic mutations, where so these are differences between parents and children, where changes are happening in the DNA. And here differences are between usually blood, some control cell type and cancer cell. So this is the idea. And the idea is to see what are the effects of epigenetic variables on this changes in the DNA sequence. Why are we interested? So we're interested from multiple reasons. One interest is in statistical and medical genetics fields because understanding of mutation rate models would inform methods for gene mapping. And I'll talk about that in a second. Another big interest of ours is evolutionary biology. And there are two reasons we care from evolutionary biology perspective. One is that mutation rate is a key parameter in a lot of evolutionary models, right? If we want to infer selection, if we want to understand differences between populations, differences between species, date, speciation events, we have to have some understanding of mutation rate. The other interest is evolution of mutation rate itself, right? Because cell controls mutational events. And mutation rate is one phenotype which is under selection. So the question now is not only why mutation rate is what it is, but not only what is mutation rate, but why is mutation rate what it is. Also there is, of course, interest from biology perspective, biology of DNA repair, and biology of DNA replication. So I'll, for maybe a couple minutes, talk about statistical genetics piece of work. So there is a growing interest in gene mapping using the novel mutations. There are two areas specifically, it's genetics of neuropsychiatric diseases and cancer genomics. And the idea here is to map genes involved in disease progression or cancer driver genes using recurrence. So this is not your classic genetic mapping, for example, LD-based association or linkage. This is mapping using mutations, and this is the only mapping strategy which is possible to the nasexual systems. The idea is very simple. You find different patients carrying mutations in the same gene, collapse them by gene, and you can make an inference that this is a significantly mutated gene, right? There are more mutations than you expect. Now the big question is what do you actually expect, right? Because in this studies you cannot run case control. You cannot really look at how many mutations in this gene happens in cases versus how many in controls because you would lack statistical power to do so. So the idea is to do some sort of model. So for example, the simplest approach, and this was used in early papers on the subject you take, some estimate of genomic mutation rate using independent samples, then you evaluate probability to observe recurrent events in a given gene, correct for multiple testing, right? So why this is not the correct strategy? Because if you have heterogeneity among samples, especially problem in cancer genomics where you have some samples basically filled with mutations, others have much lower mutation densities, you will make flushes, inferences, and this mapping would generate a lot of false positives. So there is another strategy, another strategy is the following. So you take, look at your real data and just would permute data around. Look at the permutation, experiments multiple permutations and you can evaluate how frequently you see these two mutations independently hit the same gene. And the problem here of course is mutation rate variation because if mutation rate is heterogeneous along the genome, this may be simply a mutational hotspot which you don't know about. So what we need, we need careful model of local mutation rate and the problem in cancer is that because of accessibility to specific mutagens or specific genetic changes in repair systems, and I'll be talking about it, you may have a situation where this mutation rate heterogeneity is patient specific, not just cancer type specific, but specific to individual patient. Now over five years ago, we again in collaboration with Jon's lab made an observation that a bold density of human SNPs and human tube panzeta versions is increased in later replicating regions of the genome compared to earlier replicating regions of the genome. So we have certain epigenomic variables that control potential mutation rates. So this is stratification of S phase of cell cycle into four regions and we see increase in both divergence and polymorphous. So this fueled our interest in the question and it turned out that the same effect is observed in cancer genomics. So here this is in collaboration with Gadigeta's lab. We see that there is effect of replication timing in pretty much every single cancer type we analyzed. So there is increase of mutation density in late replication compared to early replication and some genes that are located in late replicating regions are sort of usual false positives of a mutation mapping in cancers. There is another variable which is a level of gene expression. Genes that are expressed at high levels have less mutations in cancer genomes. And the standard idea as the culprit is the transcription coupled repair mechanism. And I'll show you the pathway because I'll be, and I'll show it then again because I'll be talking about this pathway throughout the talk. So the idea is the following. If there is a lesion in DNA, one of the mechanisms is a nucleotide excision repair which starts with TF2H which is LKs on one's DNA. There is excision step in both direction and there is a resynthesis using the other strand as the template. Now this mechanism, this is a very accurate repair mechanism which can be recruited in two different ways. So one way is stalled RNA polymerase, so if there is a lesion on DNA transcription cannot proceed forward and polymerase recruits nucleotide excision repair system downstream. The other mechanism is what we call global genome repair is active search by the XPC complex for lesions in DNA. So first thing we decided to check is okay. We think that this mechanism leads to reduction of mutation density in actively transcribed genes. What happens in active regulatory elements? We decided to look within DNA is one hypersensitive sites. I don't have to introduce them for this audience. You're all familiar with that. My naive expectation was that mutation density may be elevated because these sites are not protected by nucleosomes. Maybe they are more accessible to some sort of damage and so forth. So when we looked at multiple cell types, this was published last year, multiple meloma colon cancer, melanoma lung cancer, CLL and this scale depends on the number of samples we had. We see reduction in every single cell type, reduction of mutation density within regions of open chromatin. What's important, the effect is very well localized. I'm not talking megabases or hundreds of kb, this is one kilobase resolution. Reduction is compared to immediate flank and I'm not going through many regression models how to take into account effect of location, effect of nucleotide composition, chemical spectrum in this cancer type and so forth. What can be behind this effect? We decided to look at one system specifically in melanoma and there are several reasons. One is there are multiple samples available, it's high mutation rate cancer and most importantly we know the mutation source. We have a signature and we believe the signature corresponds to UV damage of DNA and we know that the major repair mechanism acting on this signature is nucleotide excision repair so we can make some biological hypothesis from looking at the system. Now it's a little more quantitative presentation of the same data. This are intergenic regions, this are intronic regions, we have mutation density and we have chromatin accessibility in quantitative fashion. This is just number of map DNA is one cleavages. So what we think is this is the action of transcription coupled repair, so the difference between intergenic and intronic regions, however within each of those there is very strong dependency on chromatin accessibility. Why is this happening? So there are many possibilities. One is that what we're seeing is purifying selection in regulatory elements so maybe mutations are happening but negative selection purges them and we're not seeing them. So I don't have time to discuss this in detail but as somebody who unsuccessfully spent now almost three years looking for signatures of purifying selection in cancers, I don't believe in that. So in order to assume that this is the case, selection must be dramatically stronger than encoding regions of the genome, we never observed that. Another possibility is this is association with replication timing or other epigenetic feature, not necessarily specifically with chromatin accessibility. So we test it in two ways. You can run multiple multivariate regression models and see that this is not the case and also the scale of the effect is very different, right? So this is a very localized phenomenon. Okay, so another possibility is the accessibility to DNA repair. And here what the hypothesis is, XPC in global genome repair is the large bulky complex like DNA is one, right? With footprint which is much larger than distance between nucleosomes. So it has to work with chromatinized DNA and there is active mechanism to assist nucleotide excision repair to work on chromatinized DNA and if you look through experimental literature the access of DNA repair to naked DNA is always much faster. So again the idea is that global genome repair may work more efficiently in open DNA compared to chromatinized DNA and recruit the same nucleotide excision repair machinery downstream. Now even as bioinformaticists we can test the hypothesis without running any experiments because cancer genomic data, when you look at mutations you have phenotype and genotype in the same data set, right? So I have a phenotype, what is the drop of mutation density in DNA's hypersensitive regions and I have a genotype of the tumor and I have a hypothesis that nucleotide excision repair is involved. So we can stratify all our melanoma samples into those where we do not see any change in nucleotide excision repair which are marked green or samples where we do observe potentially deactivating mutation anywhere in nucleotide excision repair pathway. And we see that there is statistically significant enrichment of samples with potentially deactivated nucleotide excision repair among samples where the drop in mutation density associated with chromatin accessibility is very small. We can further exploit the structure of the pathway because if mutations deactivating nucleotide excision repair happen downstream in actual repair part of the pathway then we should abrogate both effects, dependency of mutation density on transcription so correlation with expression level and correlation with chromatin. So as we see here these three samples for example where mutations happen downstream in this genes in core repair part of the pathway they have very small or no decrease in mutation density associated with either transcription or chromatin accessibility. Unfortunately we had only one sample, this is sample number four upstream in specifically with mutations specifically in global genome repair and this feeds the hypothesis but I probably wouldn't really make very strong inference from a single sample. So concluding this part of the talk we think that mutation density, well we think, we know, we observe that mutation density is markedly reduced in regulatory regions marked by denation of representative sites and the effect is likely mediated by global genome repair as can be shown by association of this effect with presence of intact nucleotide excision repair pathway in the sample. So this is very focal, so what we learned so far? We learned that mutation density in cancers is shifted towards later replicating regions. Regions cancer doesn't really need because most of expressed genes and active elements are located in earlier replicating domains. We observed that mutation density in cancers is reduced in actively transcribed genes in genes cancer needs versus genes cancer doesn't need and we also learned that mutation density is reduced in actively active regulatory elements. So this is kind of the thing. So these are primarily observations especially on expression and DNAs one accessibility with specifically within functional, potentially functional elements. So what happens if we change resolution and we'll look at the megabase scale and we use the data collected by the epigenome road map consortium from multiple cell types and multiple epigenomic variables. So first again looking at variation in DNAs one hypersensitivity, just density of peaks per megabase versus number of mutations, again I use melanoma as an example and I use classic UV induced mutation density, there's pretty good correlation. However, one interesting feature we know that is the following. So I can look at three different skin cell types, melanocytes, fibroblasts and keratinocytes and I see that there is decrease in mutation density associated with density of open chromatin regions in each of the three cell types, however in melanocytes this decreases much more profound. The correlation coefficient, negative correlation coefficient is much greater. The general phenomenon again is that activating marks are anti-correlated with mutation density and repressive marks are positively correlated with mutation density, again places where cancer doesn't need functional genes to work have reduced density of mutations. And I'll come back to that point. Now back to specific cell types, so if I take for example mutations in liver cancers and information about now methylation marks in liver and information about melanocytes and I would also look at melanoma mutations, what I observe is that if I condition to the right cell type, the other cell type carries no information. So if I check liver cancer and melanoma and I check data on methylation in liver cells, hepatocytes and melanocytes, if I would know about melanocytes, liver cells had no information to mutation density in melanoma, if I would know about liver cells, melanocytes don't add any information to mutation density in liver cancer. So now these are observations, they hint at the importance of features, they hint multiple features, they hint into importance of correct cell type. Now what are we going to do? We have highly dimensional data set, now for some reason POSIX involved in the study like random forest regression and I know there are many methods, probably many bioinformaticists in the room like other methods, but I just follow POSIX in the study, so POSIX and Rosa selected random forest regression for the analysis, machine learning methods, so what you do you throw everything into it and we show that we can actually predict mutation density per megabase with fairly remarkable accuracy, not every cancer, but it's over 80% of variants can be explained in a whole bunch of cancer types. Now because it's random forest you can look at the features that contribute to this classifier and this is the pattern, so if we look at melanoma, I see some of the methylation cells but most of the features come from melanocytes, if I move to liver and this is of course small chunk of very large matrix like this, so I would look at what features significantly contribute to the predictor for liver cells and these features come from liver cells, then I would look at colon cancer and there is a same match, multiple myeloma and so forth, there is one cancer where it doesn't work and I think probably we didn't have the right cell type is lung cancer, so in lung cancer this trick didn't work, okay now I can do the following trick, I can take all of my features and close to them by gene and I can look at for which of the tissues collectively, what is the variants explained by the classifier if I take only the relevant cell type versus all the relevant tissues and cell types and again for melanoma I see that I can explain most of variation looking only at melanocytes, the effect is not as dramatic but also I can select the right cell type in liver cancer and so on, so looking at this what we decided to do, we decided to develop a simple classifier, so now we're turning this on its head, so what I told you so far is this, there are regions of the genome where genes are expressed, where chromatin is active, this regions have less mutations than regions which are heterochromatic, latent replication, not associated with active chromatin and transcription and I told you that looking at the pogenomic data if you have the right cell type you can actually predict mutation profile over the megabase, so now what we decided to do, we decided to turn it on its head because we can develop a predictor of cell type of origin of cancer from mutational data, so I look at the genome and I scan a database of a pogenome roadmap and I'm trying to predict what is the cell which is cell of origin of this cancer, right, again we never ran the true experiment taking tumors of unknown primary, predicting and acting on them clinically this wasn't done, so what we did, we did very simple experiment, we took individual samples from our data sets and we developed a classifier again looking at significant features that explain variation of mutation rate along megabase and what we see for most of cancers we predict with overall accuracy of 88%, what is the right cell type, we do not predict lung cancer as I mentioned, again probably we don't have the right epigenomic profile, there was almost an anecdote with esophageal cancer because the original cell type which the algorithm selected we believed is a false positive but then looking at the literature we realized that these are exact cells that people believe give rise to esophageal cancer, so it lists with some reasonable accuracy this trick works, okay so now there is an important question, the important question is these are cells of origin and we heard today about epigenomic modification due to cancer progression, this was my original thinking, this is this whole talk about failures of my original hypothesis by the way, so my original thinking was the following, we observed that cancer avoids mutations in regions it needs mutations, we know that this is determined by epigenomic profile, now we can think about evolution of mutation rate and this is what we're doing on theoretical side of things which I don't have time to present and you may think about the following idea, okay so cancer starts frequently at high mutation rate background, then mutations keep happening and of course many of these mutations may potentially be deleterious for for the tumor, there would be selection to suppress these mutations if you look at expression data both basic decision repair system and nucleotide decision repair systems are overexpressed late in melanoma compared to early melanoma, so I thought that this is active selection on mutation rate, right to eliminate mutations where tumor needs them, so then we asked the following question and we didn't have plenty of data but there are two cell types where we did have data, so we can take, we can see how mutation density is predicted by epigenomic features of liver cells versus epigenomic features of liver cancer cells, right and what we see is that we can predict much better using liver cells than liver cancer cells in melanoma there is even even more interesting experiment because we take the same cell line and we can see that all peaks in cell line that don't predict as well as all peak within melanocytes but if we take peak specific to cell line or specific to melanocytes these are pretty much non-predictive and melanocyte peaks that are not observed in cancer still predict mutation density. I found it very surprising I think one possible explanation is a lot of mutations we observe in tumors actually arise very early before epigenomic changes associated with cancer, okay so I see John standing there so I'm going to my conclusion slide basically again mutation density at one megabase in cancer is very strongly associated with chromatin organization this association is very highly specific with respect to cell of origin and it looks like cancer genome has enough information about cell of origin so you can actually predict what is the cell of origin based on on cancer genome thanking my lab so this is how seriously we think about our projects Paspolek who recently left the lab contributed to most of this so he's here listed with the lab members and of course thanks going to Johnston Montanopoulos and Bob Thurman and to Rosa Carlec and Amon Coran who were our collaborators thank you. Fabulous do you think that the tumors are actively actively going at silencing some of these mutations in order to transit from a normal state to a tumor state if indeed the mutations are more likely to arise in the normal tissues that an active process? So I'm a little bit in disarray with my thinking right now so my original thinking was that if you look at mathematical models of evolution of mutation rate you find that in a sexual systems selection mutation rate is much more efficient than a sexual systems so in principle cancer would have the ability to change mutation rate especially if what we're seeing it's cell type specific to silence mutations in regions where it needs and I found this model intellectually pleasing I don't think this is what we're observing I think what we're observing possibly is very simple fact that most of cancerous clonal and most of these mutations possibly accumulated very early in like before before cancer progression but tell you the truth by now I don't know I don't have any good model anymore. Fantastic so I was wondering you're on the later part of the talk you said the correlation with when you get the chromatin states from tissues versus cancers the cancers that you show us are cell lines so is it that would that be a factor that cell lines are very selected and they probably have very selected chromatin states very different from what the original cancer would be. Yeah so your mutation rate would be better if you take directly cancer tissues than cancer cell lines. This may be the case so in principle if there is a genetic control of mutation rate I would be surprised that it would be different in cell lines compared to cancers but the observation is absolutely correct so the main results on the paper were done on primary tumors and the last couple slides were comparison with cell line data and we didn't have matching data sets so that's that's of course a deficiency but I do not see an obvious hypothesis why there would be a substantial difference because cell lines have been there for reasonably long time and if mutations are keep happening and would be associated with with epigenomics of cell lines we should observe that. And I have another question it's a very general question so it's been known in the field and very much propagated by for over many years that the mutation rate is constant between cancer cells and normal mutation rate so what can you comment on that what is it now where does it stand? It's it's an interesting it's a very interesting question so I think there is disagreement within the field whether mutation rate is elevated during in cancer or it's not elevated so people who believe that it is elevated they point to a a lot of mutator genes associated with cancer both germline predisposition and these are early events in cancer for example we see a lot of samples in melanoma with changes in nucleotide excision repair pathways theoretically fits very well because you would have change in mutator and would hitchhike with together with cancer drivers. Now there are people who don't really believe that there is substantial difference and especially if you look at mutation density if a lot of this events happen early people point to dependency on age of diagnosis and this type of observations. I don't have a strong opinion either way I find arguments of increased mutation rate very logical and I also I'm happy to live in the world where it's it's gray in some cases especially when you have mutator mutations mutation rate may be elevated in other cases maybe it's the same you just hit randomly driver driver Jane. Thanks.