 Well, thanks for the opportunity to present a new way of looking at evaluating mutations in TCGA data. We've seen these plots before. If we normalize the frequency of mutations by the protein length and look across many different tumors, there are certain genes that recurrently come up as being very significantly mutated. But we also recognize that there's a very large number of genes that have a low frequency of mutation across all these cancers, but they may very well have deleterious mutations in them. An example here from stomach cancer is showing that there are three samples that have mutations in two different positions within this protein. And so from this, you wouldn't necessarily expect that this would be a mutation you'd be concerned with, but if you start considering data from multiple tumors, a profile starts to emerge that becomes interesting. And if on top of that we know that what we're looking at is NRAS in these examples and that its homologs, ATRAS and KRAS have similar mutation profiles, then we're more increasingly convinced that these mutations are in positions that are functionally important and that mutations there would be deleterious. Now what we're doing in evaluating this particular example is implicitly knowing that the overall structures and functional sites of these proteins are conserved. And so while their function can actually differ, the sites and the structures generally are conserved. And so binding sites, catalytic sites, interaction sites can be shared from one to another. These particular examples are still on the left-hand side of this graph. And so while we know about KRAS and its homologs, there are many others that we can evaluate still using a similar concept or approach. So this is a very typical plot that we would see for a given gene that has a low frequency of mutations. And if we look across many of the different TCGA tumor types, we'd also see if we were to go based off of Mutsig rank that it wouldn't really emerge to the top. And so this would be discarded almost entirely. But if we take this domain alignment approach, then what we can begin to do is take different genes across cancers and align their domains and then look for positions within these domains that have mutations that occur more than you would expect by chance. This particular approach applies to about 40 percent of MISIN's mutations. It leaves out a class of others that could very well be identified through clustering, identifying hot spots elsewhere in the coding sequence. It also excludes nonsense and frame shift mutations in certain genes. But what we're really focused on is trying to identify genes that are in the tail of the distribution that have positions when aligned that you see your current mutations. So here's an example of the type of data that you would get from aligning several or a couple hundred homeobox domains across the 20 different tumor types. And qualitatively, we can already see a number of points that are of interest. And the way that we quantify this is by calculating a modified Z score. So what we're doing here is just identifying the median and seeing what the median deviation is and then quantifying that by, for every point, calculating what its deviation is from the median and comparing that to what the MAD is here, the median of absolute deviations about that median. So this particular point here has a modified Z score of eight. And if we do this for all positions in all domains, then we come up with a summary plot like this, where as we have increasing counts, there are certain very clear outliers. Many of these are P53, PIC3CA, a lot of the ones that we would expect. There's also a number along the baseline here. The point size is related to the total number of genes contributing to the counts. So larger point size has more genes contributing, which is a benefit of the approach that we're taking here. So we're actually seeing a large number of counts at certain positions within certain domains by doing this alignment approach. And there's a lot of what would consider to be noisy points here with low counts that appear to be significant, but I'll show you how we correct for that. So the example I was giving with KRAS is this particular point, and this is representing position 12 in the RAS domain. What's also interesting about some of these points that are outliers as well is that we can look at their homologs and identify others that are mutated with low frequency but are at the same functional position as what's in these other domains. What we care, though, is more about these larger points that have lower counts and that are benefiting from this approach. So if we zoom in on that region, we can see, you know, still some of the other positions in P53, PIC3CA, some others here. But there's additional points that appear to be outliers. And what we've done is looked at mutations or variants that we see in thousand genomes data from otherwise healthy genomes and match the profile between what we observe in thousand genomes to what we have in TCGA. So we can begin to see that many of these points here are deviating from what we would expect. The example that I had given with homeobox is this particular point here, and you can see that there's a total of 48 genes that are all contributing to the counts that we had in histogram before. So we can evaluate every single one of these positions and see what are the different genes that are contributing to that point and look in those tumors and begin to assess them as whether they're deleterious or not. What we can also do from this is to structurally interpret what the functional impact of that would be, because we're relying on these conserved structures and functional sites. In this particular case, we can highlight that arginine residues mutated at this position that bind in the minor groove would have an impact. And in this particular case with homeobox domains, there are transcription factor and you could imagine that there could be widespread differences that would then happen transcriptionally from mutations at those positions. And these other points here are also on the backside in the helix that binds through the major groove. So that's relevant really for very many of these points, and it helps to quantify many of the genes that we see in the tail of the distribution. This particular gene is essentially never identified as being significantly mutated through many of the other typical approaches, but has several mutations that happen all at the same position here. So to summarize what we've done is gone comprehensively across all positions in all domains having been aligned and identifying those positions that appear to be deleterious based on modified Z scores and distributions compared to thousand genomes. What we don't account for in this particular approach is hot spots that would occur outside of domains or even structurally disruptive mutations. And so if this is going to be a useful method, it will definitely have to be integrated with others. The validation part can be a bit challenging to try and take events that occur at low frequency and find supporting evidence in these larger data sets. But there may be some interesting analyses that we can do around that. And what would be an interesting comparison as well as the number of cancer genomes and normal genomes increase, then we can instead of doing a profile match between TCGA mutation data and thousand genomes, we can look within those domains and unhealthy and healthy genomes and compare it to each position, which of those appear to be deleterious. So much thanks to Ilya and the group, Tay and Teo specifically, and many others who helped out with the analysis. I'm going to be happy to take questions. How does this compare to something like mutation assessor, which also looks at domains across different related families of genes? That's another really good tool. Mutation assessor is primarily based on sequence conservation within sub-families. And so rather than comparing the mutation counts at a particular position across multiple genes, it's assessing what the conservation of a particular position within that gene would be. So it's complementary in a way. So when you look at the various curves of how well these different programs assess, have you compared it to all the other programs that are out there to see how yours compares? It depends on what your benchmark would be for saying how well something performs. In this particular case, we can look at what we know are recurrently mutated genes and identify those as having very high modified Z-scores and high counts, so those are kind of more clear, obvious ones. And really looking through the tail to say what is or is not better is particularly challenging to benchmark. Just a clarification. Are these sequence-based domain alignments or structure-based domain alignments? Sequence-based. Okay. So these are all P-fam in this particular case, but you could use any other alignment approach. Okay. So you took the P-fam, pre-defined, among the families? That's right. Okay. So the structure-based sequence alignment would give some genes that the sequence-based alignments didn't catch? It would be an interesting way to evaluate seeing what we could find in structure and not just sequence alignments. Thank you. Now, just a quick comment on the, you know, conservation versus the recurrence. I think you're exactly right. These are complementary. You're actually counting recurrence across domains which are related and therefore there's some functional connection, but that's a recurrence count that matters. Something like rotation assessor, which looks at conservation and specificity, which is sub-family conservation adds additional information. The two together actually would be the most powerful thing to do actually. So you could filter your recurrence once by adding additional information about specificity and conservation, and that would be probably the best way to get more information or one way to get more information. Agreed. Great suggestion. Thanks. Thank you, Brady. And now for the last presentation of the session this morning. We have Shabarna Sinha on integrative analysis of TCGA data reveals Wilms-Tumor-1 mutation is a driver of DNA methylation in acute myeloid leukemia.