 Okay, good morning. Thanks to the organizers for allowing me to speak about this project. Oh, let me go over here. So I'm a postdoc in Matthew Meyerson's lab and I'm gonna talk about a project that I'm co-leading with Alice Berger and Xiaoyun Wu, also in collaboration with Jesse Bohm's lab. So recent cancer sequencing projects has done a really great job of identifying many novel cancer genes. However, we're still underpowered to detect all of the functionally relevant genes, as described in a paper by Mike Lawrence and Gatti Getz. And what's shown here at each of these dots show the mutation frequency of each of these tumor types and how many samples have been sequenced. And so for many of these cancer types, we're very underpowered to detect relevantly the functional mutations in these genes, particularly those that occur at a very low rate. It's even more of a problem when we're trying to identify the specific variant in a gene that is functionally relevant. And so this is an example of EGFR, which is a very well-characterized cancer gene, has many hotspot mutations, many clinically relevant mutations. But what's shown here are all the mutations that's been observed across many. Two more types that have been sequenced, and there are many mutations that occur in only one patient. So the question is, are these also functionally relevant? Are they also doing a similar function as these hotspot or very well-characterized mutations? And so with Jesse Bohm, we've been thinking about how do we experimentally, in a high throughput way, functionally test the function of many of these mutations? And so given all these somatic variants we're identifying from these cancer genome studies, we can create specific reagents to test their function and then put these reagents through a host of assays, some of them being very pathway specific, but also we're interested in gene agnostic assays. So if you have no idea what the function of this gene is or what pathway it's involved in, we would like these types of assays to give us a more broader picture of the function of these mutations. And so to test this approach, we decided to look at mutations among adenocarcinoma. Not only is it one of the leading causes of cancer deaths worldwide, but it also has a very high mutation rate. And so identifying those mutations in long adenocarcinoma that are functionally relevant above the background mutation rate just purely computationally is a challenge. So our approach was to take mutations observed in long adenocarcinoma from two papers, including one from the TCGA project and create an ORF library of both the wild type versions of 47 genes as well as missense and indel variants that were observed in these genomes. And from this ORF library, we infected A549 lung cancer cells and in multiple replicates. And then after 96 hours, assay gene expression using this approach called L1000, which is a reduced representation of the transcriptome. It only profiles 1000 transcripts, but the benefit is that it's a very low cost. And so since we're asking many, many variants in high replicate, that helps with the cost. So how are we predicting the functional impact of these mutations? We are taking each replicate introduction and here I'm showing a heat map showing the correlation or how similar the gene expression changes are between multiple replicates of a wild type ORF of this gene ARAF. And so you can see in the heat map, mostly red meaning that the gene expression changes are very similar. When expressing this mutant version, V145L, also very consistent gene expression changes. And then when comparing the mutant and wild type replicates together, also very similar gene expression changes. So we would predict that maybe the mutation isn't doing anything to the function of the gene since the gene expression changes are similar. In contrast, another mutation in ARAF, this S214F, the mutant ORF version also gives a very strong consistent gene expression signature. However, when comparing the mutant and wild type, gene expression changes are very different. And so in this case, we would predict that this mutation has a functional impact. So for each mutation, what we saw is that these three sort of heat maps could be represented as distributions. And so using a Crosco-Wallis test, we could then ask, does there appear to be a change in the distributions? And using this test and then doing an FDR correction, we then for all of our variants can predict which ones are likely to be functionally impactful versus not. So there's one other feature we saw from comparing these gene expression signatures. And that was a direct comparison between the wild type and mutant signatures. So the top two rows, I'm showing the ARAF case. Here is a case for STK-11, where when introducing the wild type ORF, very strong consistent signatures, but introducing this mutant does not give as consistent of signatures. And so we think that it's a losing function. And then conversely, with this beta-catenin mutation, the wild type does give some consistent signature, but the mutation actually increases the consistency of the gene expression signature. So we represent these two features on this plot, which we're calling a sparkler plot. So the x-axis shows the corrected p-value of that test to determine if there's a functional impact. And so anything that falls below our FDR cutoff would be predicted to be inert. And then for the comparison between the wild type and mutant, things that look like a gain and function in the mutation are shown with a positive score, and those that are predicted to be loss of function gets a negative score. So then when we look at this visualization gene by gene, we see these interesting patterns with known oncogenes, known tumor suppressors, and then we can look at other genes of unknown function. So many gain-of-function mutations or change-of-function in these known tumor suppressors, and I'm calling out the specific mutations. For these known tumor suppressor genes, again, many, many loss of function predictions with our approach. And then finally, with genes with unknown function, some maybe have no mutations that are predicted to be impactful, and then some that may have a functional impact. So how do we know these predictions are correct? We have a set of benchmarks of mutations with known function, and so we were very accurate with those. We also did an approach to assess our false positive rate. So we had a very high replicate experiment or we did 21 replicates of a subsample of these orfs. And then if we simulate a smaller study where we just take eight reps from the same orf and predict how many times, or determine how many times we falsely determine something has an impact when we're actually sampling the same orf. So from the simulation, we were able to see that our false positive rate from the simulation was very accurate to our expected false discovery rate. Also, we're looking at correspondence with genetic hotspot mutations. So this is one example in FBXW7, which is a known tumor suppressor. These two mutations here were predicted to not have an impact whereas this one was predicted to be loss of function. And when you look at these mutations and the frequency of them here in cosmic, the one mutation that has a predicted functional impact is right near a hotspot and so you would potentially predict that that would be more functional than these other mutations. Another way we're assessing, if a signature is, or how correct these predictions are, are to look at what the actual identity of the signatures are and compare signatures between different alleles. And how we're doing that is one approaches through clustering. So if we take all of our orfs and just cluster them together to look at transcriptional classes, we do see some known connections such as with KEEP1 and NERF2. And then we see this larger cluster here and if we zoom in on that, many of these orfs in this cluster are known driver mutations in the EGFR RAS pathway. And so any mutation that we predict to be functional and is present in this pathway gives us stronger support that they are indeed functional. And so in that, within this cluster, we see rare variants that were, again, these variants were observed in very low frequency across the cancer, across all the lung anocarcinomas that were identified. And these were all predicted to have a functional impact and also cluster with known activating mutations in this pathway. So we can predict that these may also have a similar functional impact. So it wasn't that all rare mutations in these genes clustered in this pathway. So for example, for EGFR, there was one mutation, this H1129Y, that also was predicted to have a functional impact, but was not clustering with the rest of the mutations. And so we're currently investigating what this unique signature is. And so what I don't have time to talk about is other orthogonal assays that we're doing. I have one example on a poster today, but we're also taking this ORF collection and putting it through a host of other assays that are more pathway specific, but also potentially general assays. And so doing this type of impact phenotyping is really important to really understand what the function of all these specific variants are. And also at a broader level, if we don't know what the gene function is, it's important to try to map these mutations into functional pathways. What's powerful about this gene expression approach is that it's agnostic. We don't need to know the actual function of the gene, but we're just comparing mutant and wild type. So do we see a difference between the expression changes that are induced upon, or with the mutated version? And we think that this approach can be used for any gene. Again, we're interested in expanding this to other variants, not necessarily cancer somatic variants, but, and then also an interest of mine is splicing. And so I'm interested in expanding this to look at splice variants. So with that, this is a huge effort by many people at the Broad. And I'd first like to thank Alice Berger and Xiaoyu Nu, who are co-leading this project. Jesse Bowen and Matt Meyerson and Todd Golov's lab has also helped a lot with this project. And as a shameless plug, I'm starting my own lab at UC Santa Cruz. So there are many positions open. And I guess I'm a little early, but I can take your questions. So there'll be more time, I guess, for questions. Nice work. A question I have is this, when you do these gene expression profiling, obviously you get a signature that is associated to the mutation that score positive in your ass. Can you go back to the tumors, to the primary tumors that harbor those mutations and see whether you see any enrichment of that particular signature in those tumors? We haven't done that analysis yet, but that is exactly what we'd hope to do. I did a very superficial analysis of that and we didn't see the same type of signature, but I'm trying out other approaches to see. Another question, obviously, you look for expression changes. I mean, there is a likelihood in your opinion that the driver mutation may not necessarily influence a gene expression or do you anticipate that any driver mutation directly or indirectly at one point will affect the gene expression? So short answer is I think that all mutations won't affect gene expression. We may see changes at the methylation level. We were only assaying gene expression. We tested actually a splicing factor mutation and didn't see big changes, so we think that's because it's affecting the splicing and we weren't able to assay that. So we're hoping that this concept and actually of the wild type mutant comparison but maybe with other assays, so if it's affecting methylation, maybe you can use those same concepts but for different assays. And the one that we are testing out now is cell profiling. So imaging different features of cells after introduction of these orfs and then seeing if we can use the same approaches. So on a related note, I can imagine there could be some very, either cell-type specific mutations or very time-dependent changes in gene expression. So I was wondering, I mean I'm sure I have lymphomas mutations that won't score an epithelial cell line. So how did you look at that? Or in another way of saying it is did you, you had great results but were there any hot spots or anything you were pretty darn sure was gonna be a functional thing from some other point of view and didn't score? Actually, so for everything that we knew about, we did see a functional effect. However, what we've now started to do is take these into different cell lines. So what I was showing was data from an A549 which is a lung cancer cell line. It has background mutations in some of the genes that we assayed. And so putting in different contexts, we have primary lung cell lines, a different lung cancer cell line and we actually do see some differences. So one example is in KEEP1. So A549 has KEEP1 mutations. So yeah, the cell line itself has KEEP1 mutations and so we still think we're reading off a loss of function mutation because the mutant version can no longer rescue the wild type function but then when we put it in a different cell context with KEEP1 wild type mutations, we actually may see potentially dominant negative effects but we are still seeing a functional effect but maybe the sort of direction of the effect changes. What about the, how are you gonna deal with real lineage specific effects or things that require the presence of other oncogenes that are mutated but are not present in your tests? Yeah, I mean we really understand that there's a lot of context effects and so when we assay them, we'll just have to take those into consideration. Our next speaker is Sintel Wu who's from Brown University is gonna tell us about Comet, a statistical approach to identify combinations of mutually exclusive alterations in cancer.