 So welcome to the cancer session, the first scientific session for today. So there's three talks this morning from Matthew Lupian, from Mike Snyder, and Shamil Sinayev. All are focusing on various aspects of non-coding control of cancer genomes. So Matthew, I think, is going to start. All right. So thank you, John, and thank you to the organizers for the invitation to present our work here at this ENCODE user meeting. Before I actually start talking about the research we've been doing, there was one little aspect that I'd like to come back to with regards to what Mike Pazin introduced us to when we started this meeting, which is the notion that the ENCODE data is rapidly made public for all to use. I'm not an ENCODE-funded individual, I'm an ENCODE user. And I've been using ENCODE data since 2009, and when I first started using that data, I felt somewhat that I was using other people's data, that I didn't necessarily have a claim to that data to start with. So I felt a certain extent like an educated vulture taking a look at all the ENCODE data. Others would call potentially these types of individuals as research parasites, but I still prefer the educated vulture concept. Since 2009, we've been extremely lucky in getting a lot of papers out of using ENCODE data. There's a small collection of the publications we've had using ENCODE data in the last few years. The key thing that I want to stress out is the reaction that we got from the ENCODE community. We honestly didn't no longer felt as if we were educated vultures. This was gone. Instead, we felt we actually were leprechauns making the most out of this gold pot of data that came down from the ENCODE rainbow if you want to use that analogy. And so if there's anything, if any one of you is just like now starting to investigate the usefulness of the ENCODE data sets and are on that cusp of figuring out whether you're going to be a research parasite, educated vulture or a leprechaun, my key take home message is let's all be leprechaun. Let's make the most out of it. So don't shy away from using it. All right. Now on to the study of non-coding variants in breast cancer. So we're highly interested in studying mutations that are picked up in tumors for a number of reasons. First, because we can learn a lot about the tumor biology. That's a given. But as well, because we can identify with the mutations that are found within tumors, biomarkers that will allow us to effectively trace or monitor disease development. There is an advantage in the fact that tumors tend to shed DNA within the blood systems that are circulating tumor DNA that you can access. That circulating tumor DNA can be assessed for mutations. If you know which mutations to look for, in order to assess whether the tumor is highly abundant within the individual, whether there's a recurrence coming in, anything of that flavor. With that in mind, there's already a lot of work that has been done in the field of breast cancer to look at the effectiveness of looking at mutations within the blood to trace disease progression. So there's two studies that are showcased here. The one on your left, specifically took the tumors from patients, sequenced those to identify specific mutations to look at, and then started taking blood samples before as well as after surgery and assessed the presence of these mutations within the blood. And as you can see from the bottom left panel, there's these four lines that look at four different mutations. The gray one is never picked up in the blood, the rearrangement number four. But rearrangement number one is rapidly picked up in the blood, and it's actually picked up a whole lot significantly earlier than clinical recurrence. An average of 11 months was reported in that publication. So in other words, by monitoring in the blood the presence of mutations that are specific to your tumor, you can effectively discriminate patients that have a recurrent tumor coming in before clinical symptoms show up. So you can actually start treating these patients differently to provide for a better approach to try to minimize the likelihood that the recurrent tumor will be lethal. On the right side image, what they've done there is to simply take one blood biopsy after surgery and then assess for the presence, absence of mutations in the blood and discriminate patients that had mutations in the blood versus those that didn't and looked at outcome. And you can see that the likelihood of having a recurrent tumor, the red line, is much higher when you have mutations in your blood than when you don't. So monitoring these sorts of events in the blood is very useful. There's two key caveats though. The first one is that not all mutations are going to be detectable in the blood. The rearrangement one versus rearrangement four figure where you see that the rearrangement four is never picked up in the blood. So you need to know which mutations need to be as comprehensive as possible in your selection of mutations to look for in order to ensure that you get at least one that will show up in your biopsy. There's another component that you need to be aware of it's that tumors are highly heterogeneous at regards to their genetic landscape. So here's, for instance, a list of 127 individual primary tumors from the breast that have been characterized for a series of different genes that would harbor or not specific mutations within them. And you can see that there's not two patients that are identical. So there's an extent of heterogeneity that you need to take into account. And so when you're designing panels to screen effectively for the presence of mutations in the blood you want to be as comprehensive as possible, have as many mutations as possible in that panel while being cost effective in order to be able to effectively discriminate the appearance or not of mutations in the blood. And so that's why we turn to non-coding mutations. There's a huge opportunity within the non-coding space to identify additional mutations to take into account that heterogeneity to have a greater panel of mutations to look at to actually identify mutations that would be detected in blood systems. So this is whole genome sequencing data from three different cancer types liver cancer, majoloblastoma, and breast cancer. This is published data as highlighted by the reference down on the right side. As you can see, and that's been shown by others as well in the past, the vast majority of mutations identified through whole genome sequencing map outside of coding sequences. So about 2% or so of mutations typically fall within coding sequences. The vast majority are intronic or intergenic. And as you've heard well yesterday and as you're very familiar with the non-coding genome harbors a lot of functional elements that are potential targets of these non-coding mutations. So with a gene-centric perspective that's been expressed, you need to have a promoter that's active, you need to have enhancers that are active, insulators or anchors of chromatin interactions are also critical in the process of gene regulation. And so any of these elements can be affected by mutations. We know how to find these elements. ENCODE and others have clearly shown that specific histone modifications or transcription factors are excellent markers of these different functional elements. So one can effectively identify where these elements are in your model of choice. If you do that in breast cancer in models of breast cancer and then you overlay the information as to where enhancers are, where promoters are and so on and so forth on top of where non-coding mutations are found as reported by others. This is not a claim that's new. That's been shown by many others in the past. You can see that the vast majority of non-coding mutations fall within heterochromatin, compressed chromatin. So these are not that interesting because they're likely non-functional. But there's no selective pressure to have these mutations play a role in the development of the tumor. And there's a small fraction of non-coding mutations that you can see mapping to enhancers, promoters, and insulators or anchors of chromatin interactions. There's close to 5% of mutations in breast cancer that fall within these regions. Now within that 5%, we still need to figure out which ones are the most relevant to look at. And so before we could actually formulate our clear hypothesis, we reminded ourselves of this basic principle of regulation, i.e. that for a given gene to be regulated, there's this folding of the chromatin that needs to be to take place in order to bring distal elements enhancers in close proximity to the promoter of the gene of interest, then that's organizing more or less the set of regulatory elements for that particular gene. And so with that in mind, our hypothesis was the following that we could identify effectively regions that harbored mutations that are currently mutated sets of regulatory elements. So instead of looking for single enhancers with a burden of mutation, we said let's take the unit, the regulation for a particular gene and ask whether that unit that set the regulatory elements has a significant burden of mutations or not. And the hypothesis would be that if those sets of regulatory elements are important for an oncogene, they'll acquire mutations that will increase the transactivation potential of these elements Our working model was breast cancer, as I highlighted initially, and here's a case example the ESR1 gene. It's the gene that encodes for the estrogen receptor, which is a potent driver of breast cancer development in over two-thirds of cases. So what you're seeing here is the ER locus. Let's see if the mouse works, yeah. So the ER gene is right here at the bottom and you see where it stands with regards to other genes upstream The easiest one is that we've showcased all the DNA1 hypersensitive sites identified in MCF7 cells, which is our model system for ER positive or estrogen receptor positive breast cancer. That's all encode data. We could have done it, but it was available through encodes, so we used it. We zoom in onto these DHS sites right here and as you can see, they're all color-coded differently. The easiest one is the orange one. That's the promoter for the ESR1 gene based on the principle that it's at the TSS Then there's red and blue DHSs. These are discriminated as red and blue based on whether they're predicted to physically interact with the promoter of the ESR1 gene or not. This prediction comes from what you've heard yesterday, these cross-correlation and cell types for DNA1 hypersensitivity, which is becoming, I guess, more and more popular, which was effectively showcased to be a good approach to predict or have a prediction of potential physical interaction on a 3D escape by John a couple of years ago. And so you can see here that we have a number of DHS predicted to physically interact with the ESR1 promoter. The blue ones are all DHS sites, so predictive cis-regulatory elements and hensers and the likes, which don't physically which aren't predicted to physically interact with the ESR1 promoter. So in other words, the red and the orange define the set of regulatory elements for the ESR1 gene. Now that we have this set of regulatory elements, this SRE, what we do is that we simply overlay on top of that information the position of mutations called through primary breast tumors. For this, we had access to published data that DOI has specified right under the discovery number here, which consists of 73 patients, 73 primary tumors that are ER positive, ESR1 positive. And whenever we have a mutation that maps to a DHS site predicted to be part of the set of regulatory elements for the ESR1 gene, we color-coded red. And when it falls within the DHS, that's not predicted to be part of that set of regulatory elements, it's color-coded blue. And just by eye, you should be convinced that there's more red over red than blue over blue. There's more mutations mapping to the set of regulatory element of ESR1 than there are the DHS sites outside of it. And you should be like, yes, beautiful, right? But we need statistics. So to get to that, we develop MUSE, which stands for Mutational Significance in Sets of Regulatory Elements. It's a tool that actually quantifies the level of enrichment of mutations within the set of regulatory elements of a gene of interest compared to the mutational background you have locally or genome-wide. The principle is very straightforward. It's mimicking what people have done so far in finding genes of significant burden of mutations within coding sequences. In that, instead of concatenating exons of a given gene, we end up concatenating the set of DHS sites that are part of the set of regulatory elements for a given gene. So in this case, we would do it for ESR1. So all the form that 3D structure that regulates ER would be concatenated, and then we calculate the mutational load within that set of regions. And we compare that to the mutational load that you find within the DHS site, not connected to ESR1 or the genome-wide background mutational rate. And so if we do this within breast cancer, this is the outcome. Er is a clear outlier. So each dot here corresponds to an individual gene's set of regulatory elements. And er is a clear outlier. That's the reason for this example. It has the most significant burden of mutations within its set of regulatory elements compared to all other genes within breast cancer. If we were to do this in Melanoma, for instance, then we would get TERT. TERT would be the number one. It works definitely well in that situation as well. Validating what others have shown in the past. And so it looks promising. But we still wanted to confirm ourselves that the mutations we were working with from the discovery cohort that we had or at least at the enrichment of mutations within the set of regulatory elements could be picked up in an independent cohort. And so we got access to 52 patients from the Princess Margaret Cancer Center hospital that were also breast cancer patients positive for the estrogen receptor, the same type of individuals. And we did targeted sequencing within the predicted DHS sites regulating the ESR1 gene. And we could find three of these elements mutated. Three patients in which mutations were found within DHS site thought to regulate the ESR1. That corresponds to roughly a similar percentage of individuals in our validation cohort as we could find in our discovery cohort with mutations in the set of regulatory elements of the ESR1 gene. Now it's one thing to find an enrichment but it's another thing to show that these mutations actually play a role, actually are functional in promoting differential expression of ESR1. And so we decided to specifically focus next on demonstrating that these mutations were indeed functional. So the first thing we did was to use IGR, so Intragenomic Replicates. It's a tool we developed in 2012 that specifically allows one to assess the extent to which changed in the DNA sequence can impose changes in the binding intensity of transcription factor on the chromatin. It's better than PWM assessment. It's more refined because it takes into account the signal intensity from chip-seq tracks as opposed to just a change in DNA sequences. And so what we've done here is looking at six different mutations. One of them is a different transcription factor and we're assessing whether there's a change or predicted change in binding intensity for different transcription factors based on the mutations we found within the regulatory elements of the estrogen receptor. And so we can see, for instance, that this mutation is capable of reducing the binding intensity for a collection of transcription factors of which many are known to regulate the ESR1, such as Gaeta-3 in this specific case. Other mutations tend to show a greater trend of transcription factor binding to the DNA as showcased for this mutation, that one, this one, as well as this fourth one. So our mutations have the predicted capacity to change binding intensity of transcription factors. But do they change the transactivation potential of the enhancers in which they fall or in the regulatory elements in which they fall? So to assess that, we decided to use a suffrage reporter assay. So we compared a reporter plasmid with the wild-type sequence which had a mutant version of that same regulatory element, and as you can see from these six mutations that I'm showcasing here, all of them, except one, led to an increase in the suffrage activity. So all mutations we have, except one, increase the transactivation potential of these regulatory elements. In other words, most of these mutations are acting as gain-of-function mutations. And I would agree with the notion that ER is upregulated or highly expressed within ER-positive tumors. The next thing we decided to do was to make use of the CRISPR technology. So we saw that yesterday we spiked deleting specific enhancers, so we did that exact same thing. We deleted four enhancers as individual enhancers one at a time, and then under the cells where these enhancers had been deleted, we simply asked, what's the impact on the expression of ER? If these enhancers that have mutations in primary breast tumors are indeed regulating ER, you expect to drop in the expression of ER. And that's exactly what we saw. The second enhancer, this one, number two, we see a significant drop in ER expression, and there's neighboring genes that we've also assessed, and there's always cases where there's no effect. When deleting the fourth enhancer, we also see a significant drop in ER expression. So these two enhancers are clearly regulating ER. This third enhancer shows a close to significant trend in reduction, reducing ER, ESR1 expression, sorry. And then this first enhancer, when deleting, showed a trend, but not significant, which I guess is what you expect, right? Because within this set of regulatory elements of ESR1, there's over 25 elements. So deleting just one should not completely abrogate the expression of that transcript. So these mild effects are exactly what we expected, even though some are significant. The fact that some are not significant, but that there's a trend makes me extremely excited. So I was very pleased to see that. Another aspect that we decided to look into to increase our confidence at what we were discovering was not a red earring, a red herring, sorry. Blame it on the French language, right? With the following, we expect to have convergence. So if there's a reason for... If there are genetic mutations accumulating in the set of regulatory elements to change its expression, you would expect that in other cases, tumors would find a way to also affect the regulatory elements if ER to still impact ESR1 expression. So in other words, one could expect, for instance, changes at the level of the epigenome, so as to make some elements more active, or so as to repress repressors that would make the ESR1 expression go up. Now, we haven't looked at that yet, but I'm just telling you that that's an interest of ours. What we've looked at was genetic predispositions. So if we have mutations that are acquired, somatic mutations acquired in the course of the disease development that are contributing to a given phenotype, one can expect that inherited variants would be targeting the exact same biology, that would be targeting the exact same mechanisms. And so what we decided to do was to actually assess if we could find predisposition genetic variants that would converge on the same regulatory elements in which we could find mutations that relate to ESR1 expression. Now, we have shown in 2012 that the risk-varyome for breast cancer, as we call it, the sum of all risk loci associated with a given disease, in this case breast cancer, had the propensity to fall within regulatory elements. We've heard about this. So in breast cancer, it's no different from other diseases. The vast majority, or there's a significant enrichment of risk variants, or risk loci, to map within K4 and U1 regions, for instance. And in breast cancer, we also had chip-seq for a number of transcription factors, so we could show, for instance, that FoxA1 and ESR1 itself were transcription factors binding to the chromatin regions that hard-bird these sorts of genetic predispositions to disease. If ever you're interested in running these sorts of analysis, we have a good tool that's public on GitHub to do so. And following up on this analysis, what we showed was that, indeed, these SNPs were changing the affinity or the capacity of transcription factors to bind to the chromatin, so as to influence the expression of the downstream gene. Of all these SNPs, there's one that cut our attention. I don't know if it would cut your attention right away, but it's this one, the 2046210. Why were we excited about that one? Well, if you actually look as to where it maps across the genome, it's on chromosome 6 within the ESR1 locus, right upstream of ESR1. And so we thought that maybe we had here an opportunity to identify convergence of genetic predispositions and mutations onto the same element. And so that's what we did. So you've heard yesterday about taking into account the LD structure of a given risk-low side to identify all the SNPs that are likely putative causal SNPs and then merge and get information with chip-seq tracks for histone modifications as well as transcription factor binding in the relevant cell model that you have. That's what this figure is doing. At the end, the key take-home message is that we end up with two SNPs that are likely putative causal SNPs for this risk-low side that fall within DHS sites that are bound by a collection of transcription factors. Of these two SNPs, there is one that maps to the same element that we actually found to be mutated in primary breast tumor. So this is the green diamond corresponds to the SNP. It's positioned right here within this DHS site that's called an MCF7 cells as well as another model of ESR1-positive breast tumor, the T47Ds. And within this region, within a few hundred base pairs of each other, we had three mutations called in primary breast tumors. So indicating that there is somewhat of a convergence, this given region is not only affected by genetic predisposition, that specific one, but also affected by mutations acquired in the course of tumor development. And since we have a SNP, you've heard of EQTLs yesterday. We could use that SNP to further demonstrate that that enhancer was indeed capable of regulating ER. And this is what we've done here. So each color is a different gene. ESR1 is orange. We're subdividing the populations into homozygous wild type or heterozygous versus heterozygous versus homozygous risk allele. And what we see for ER is a significant differential expression and increased expression with the risk allele in the population of individuals that are homozygous for that risk allele. And so overall, what we see is that we can identify mutations that populate the set of regulatory elements of particular gene as opposed to looking for enrichment of mutations within a single element. We take the component of taking all these elements together. And with regards to the set of regulatory elements for the SR1 gene, we can actually find a number of significant mutations in that set of regulatory elements that appear to act predominantly as gain of function mutations. There's convergence between variants, genetic variants, predispositions, and somatic mutations. And overall, there's a methodology here that one can imply to any given cancer type of interest to be able to identify these regions in which we should be paying more attention to the mutations. So if I'm designing a panel, for instance, to monitor mutations in the blood, I would prioritize these DHS sites that are part of sets of regulatory elements significantly mutated in patients. So with that, I'll just thank the people that did the work. So Sven Bailey, down in the back here in Kinjal, led this research. We're the key authors on this publication, first authors on this publication, or this work to be published. We also need to thank additional members from the lab collaborators from the Princess Margaret that are listed here, as well as our international collaborator and long-standing collaborators, Richard Calpar Salari and Nicholas Senneth Armstrong. So Nicholas is in the room. So feel free to go up to him. He's the guy that has been working the most heavily on IGR and developing that pipeline. So great people to work with. And so huge thank you for them on this. Thank you for your attention. Hi, great talk. Thank you. But I would like to know your opinion about another complexity about the breast tumor, another tumor, right? That is the hierarchy of the tumor cells themselves, right? There are like the cancers themselves, which perpetually maintain the tumor growth, whereas more differentiated cells don't. So have you considered investigating if those SNPs are specific for different populations? Yeah. So the answer to your question is no, but I agree with you it's extremely important and then we're moving towards that. The key is to be able to effectively discriminate these populations and have models to study these populations. So in breast cancer, I don't know if it's the best model to actually be doing that. There's other models that we're investigating where we can actually more effectively discriminate the chromatin landscape within the stem population versus the bulk population. So definitely great question. Definitely worth looking at, yeah. Right here. So that's coming. So we got the GUS hits from the literature. We didn't run GUS assays ourselves, so these are all reported within the NHGRI GUS catalog. So I don't remember the exact numbers, but they've all been done with the proper GUS format. So thousands. So could you comment on the function of ESR1 and how could its expression change be involved in tumor progression and are you following that up functionally? Yeah. So that's been well established within breast cancer. ESR1, so drugs that are actually targeting ESR1, the protein product, are highly effective in curing patients. So that's why it's nice to show you. So the field won't be surprised by ESR1 being a target of genetic or epigenetic alterations, which we're not showing here, but which we hope to show in the future, but genetic alterations for sure, because it's definitely known to be a key driver of tumor progression in two-thirds of patients. So fantastic talks. I'm just wondering if there's a convergence between germline and somatic mutations. Did you do any experiments to test the combinator effects of the two SNPs? Yeah, we did the Siferaise reporter constructs. Okay. Did we do it? I think we did. I don't remember. I have to look back. I don't remember. We definitely looked at interactions between different SNPs. Exactly. I don't remember if we added the mutations in there. But they're typically not taught to be within the same individual. They could be in some instances. So definitely it's worth. So assessing that partnership between SNPs and mutations, absolutely we need to do it. This is just an equal one. So we haven't explored that in great depth. Yeah, exactly. To his theory, right? Yeah. You have that. Yeah. I wondered, what was the frequency of the SNPs, those two SNPs in the general population? So the frequency in the population, by heart, I don't remember 40% or something like that, they're in R-square of 0.8 and above. That's what I can tell you. They're in R-square of 0.8 and above with the lead SNP. And so the lead SNP is part of the GOS arrays. Those GOS arrays are based on high-frequency SNPs, so on that ball range. And I wondered, your program, MUSE, that can be agnostic to cancer, right? It could be used for any GG. Yeah, absolutely. The only thing you need is DNAs or whatever, like a measurement of where the regulatory elements are and mutations or, yeah, absolutely. It's absolutely agnostic to disease. Hi. So thank you for your talk. Again, it's a question on MUSE, the software. Can you enter two different groups? Does it have to be normal and disease? Can it be two subgroups of patients where you might look for enhancer deaths? So you would have to run them individually. So we're not running. So the call mutations are not done by MUSE. The mutations are called by the classical software that are out there at the call mutations, which are comparing normal to disease. So as long as you have the call mutations, the SNPs, whatever, MUSE, you only need to feed it a VCF file of call mutations. That's it. Okay. So if you have two different conditions, then you would give the list of mutations specific to condition one separately from the list of mutations for condition two. Thank you. Yeah. Yeah. Sorry. Are the genomic alterations that you find to impact transcription factor binding, are those in the canonical motifs for those transcription factors, or is that more complicated? So there's definitely a tendency to be closer to the motif, but not necessarily within the motif. The beauty of IGR, so IGR is a fabulous tool, and I highly recommend it because it's agnostic to the motif, and it actually can predict, change in binding intensity for TFs, even if mutations are falling outside of the motif. It's completely agnostic to the motif. What do you think the mechanism of action is there? Is it an alteration in another bound factor? Is it the structure of the DNA itself? So there's more research to be done in that aspect, but we're definitely seeing more and more mutations falling in the flanking sequences as opposed to... I mean, there's mutations and SNPs that fall within the motifs themselves, but to have them in the flanking regions plus or minus three base pairs or so from the motif is becoming more and more of a recurrent issue. So maybe we'll hear more about that topic with the next speakers, but it's definitely something that needs to be looked at, and that's what a PWM-based approach to predicting change in binding intensity for specific transcription factors doesn't make any... Well, I mean, it is restricted because it only allows you to look at those mutations that are within the motif. So that's a little caveat of that approach. Hey, Matthew. Hey, Matthew. Nice talk. I just wonder if you could just speculate on... You know, ESR-1, I'm totally convinced, but you also showed some data showing that other genes in the region were co-affected. And so do you think that there's more than one gene participating? Or what are your thoughts? I have no problem with that hypothesis. I can actually say that there's... So we also have mutations called an ER-negative tumors, and many of them map to some of these elements that are predicted to regulate ESR-1. And there's one gene in particular, RMND-1, which is when you look at the CRISPR-Cas-Base-9-Base deletion of enhancers, that enhancer that has mutations both in ER-plus and ESR-1-positive and ESR-1-negative tumors also affects RMND-1 expression. RMND-1 is high in both types of tumors, and if you knock... We've done that in cell lines, but if you knock out RMND-1 in models of triple-negative as well as in models of ER-ESR-1-positive tumors, proliferation is reduced. So absolutely. This is not the... I'm not in favor of the single enhancer to single gene type of approach, even though that's what I'm showing. But yes, there's definitely more than just one that's going to be affected. Great. Thank you. Thank you. One more? The other... The microphone behind you. Did you find out any other regulatory regions for other genes besides ESR-1? Maybe that was asked, but then I missed it. Right. Are there sets of regulatory elements? Yeah. So this is the first pass. So we didn't go crazy in loosening up the stringency. We were as stringent as we could, and so there is a trend if we go back to the... We won't. If we go back to the QQ plot... Sorry. No, that's fine. I'm cutting up on your time as well. If we go back to the QQ plot, there's a trend. If you look at all the single orange dots... So ER is a clear outlier. Right. There's two that are somewhat far away from the theoretical expectation, and there's a trend towards getting that enrichment as well. So in our first pass... What were those other regions? I would have to look back at the table. We can sit down and look at these. But the concept is that we only had 73 patients. My assumption is that having more patients will help us. And just a couple of weeks ago, Stratton published 300 and something additional ER-positive, whole genome sequencing, primary tumors, datasets. And so we're very hopeful. Let's phrase it that way. Yeah. Good. Okay, great. Good discussion. See you in the next talk.