 So, first, I would like to thank organizers for giving me this platform, being the, I believe, only yeast geneticist in this room. I'm a little bit thrilled, but I'll try to explain you why yeast geneticist and how yeast geneticist could get involved in cancer data analysis and what came out of it. So, upper back, by way of brief introduction and why upper backs of interest or should be of interest to this group, that upper back enzymes are endogenous strong mutagens in human cancers. This was established through last maybe two or three years by our and others group's efforts. And by just some headlines and summary of what is known, biochemistry of these mutagens is very simple. These enzymes take out, I mean, a group from cytidine in single strand DNA, create uracil, and after some DNA repair and replication reactions, it leads to C2T and C2G mutations. And this happens in cytosines preceded by thymine and followed by adenine or thymine, TCW motif. This is known by chemical specificity for several, for a subclass of upper back enzymes. And the gene family of upper backs is extended, but the group which is called upper back three of closely located genes, six out of actually this one and five out of tightly linked upper back three genes have this TCW signature and their function is at least a signed function is innate immunity by restriction of retroviruses and retro transposons. But accidentally they get access to chromosome DNA and cause hyper mutations. Upper back 3B and upper back 3A are current prime suspects, but without going in details, I can say that jury is still out. Then another by way of summary statement, I want to make to stress again that upper backs are the only known so far strong endogenous carcinogen. And we conclude it because from looking at cancer mutations, we see that genes that are important for, considered important for cancer metogenesis are not avoided by upper back metogenesis. The analysis is not here, but it's in our paper that was published now it's about a year ago. And three groups including us significantly contributed into this understanding. So the example of why upper back metogenesis of interest to TCGA is that it can be really overwhelming at least some cancer types. So black sector is just nine samples of bladder carcinomas that are not enriched with upper back metogenesis. The rest of 130 samples are enriched and fold enrichment can be up to five fold as compared to randomly expected. And it could be up to 70% of mutations in the exome with strict upper back stringent upper back signature. Now a little bit so right here, this is the picture of yeast. And this is how we got involved by studying some phenomenon metogenesis in single strand DNA that we artificially created in yeast at double strand breaks, uncapped telomeres, and then we moved to uncoupled replication forks pursuing the hypothesis that this single strand DNA must be hyper mutable in response to any kind of DNA damage or to many kinds of DNA damage because lesions in double strand DNA can be repaired but they stay in single strand DNA. And after being restored to double strand DNA, trans lesion synthesis which is often error prone creates and we found it in our artificial systems mutation clusters. And you see since the mutagen often has its mutation signature mutation clusters have uniform mutation signature in the ideal world they can be even completely strand coordinated. So this blue can be only mutations in cytosine for example. And then I kept suggesting people in cancer world let's look at cancers, let's look at cancers, maybe there are some clusters there. And in fact it turned out that we should, we were supposed to do, we ended up doing it ourselves and found in several whole genome sequenced cancers that there are mutation clusters. So every line here is a mutation cluster formed only by cytosines and you may see that it's up to 12 kilobase of stretch of mutations only in cytosines. Now if we look further these cytosines are preceded by thymines most of them and followed by either thymine or adenine, TCW and more important and even furthermore C2T and C2G mutations and very small number of C2A. So it's exact match with this upper back signature. So there is something in cancer in that very cancer that we are looking at that can make this mutagenesis and if this mechanism works in clusters we ask does it work in the whole cancer exome or genome. So we were going from a specific hypothesis and we were able to formulate very stringent statistical hypothesis and in fact our paper came in parallel with the paper in breast cancers from Stratton group where they did the novel pattern recognition and came up with mutation signatures that include the very stringent upper back signature. But these signatures can be identified only in the group of cancers. In our case we can produce because the hypothesis is so stringent and in fact it works so even for exome maps we can produce sample specific p-values and it appears at least to me that this may be of value for this community and importantly our output enrichment which sample specific is perfectly correlating with the output from NMF from non-negative matrix factorization for groups of cancers. So now examples of where this sample specific p-values and sample specific upper back annotation can be useful through analysis of TCGA. First of course and this is published we identified that several groups of several types of cancer cervical bladder, head and neck, breast and two side types of lung heavily enriched with upper back mutation signatures. So the color code here always black sector is Q-value greater than five percent so everything that is not black Q-value so after multiple testing correction is more than less than five percent. Now we can see the six cancers are enriched. We also found that in some cancers there are no upper back enriched based on whole genome analysis but we see that there are mutation clusters which are very heavily enriched with upper back. This is why I put this statement that it probably occurs in the background everywhere but by some reason in some cancers only it goes up either because there is more mutagen or because there is more substrate for a mutagen which is single strand DNA. Our DNA is normally double-stranded and only transiently during replication fork or around some unusual events becomes single-stranded. We can also see that some types of cancer in this case endometrial have small number of samples that are enriched with upper back mutagenesis. Now let's zoom in. So this endometrial endometrial carcinomas we see that there are only very few heavily enriched samples with upper back mutagenesis and interestingly and this is unpublished we just noticed it that most of them fall into serous subtype and I don't know how much value would we give it but it's formally statistically significant. Probably it's not enough to make firm statement we need more samples or TCGA need more samples but this may be really a rare situation or some classification of these endometrial samples. Another published result so when we looked at subtypes of breast cancer from TCGA marker paper these are these are subtypes based on expression and four subtypes that were present in a sufficient amount. So again when we split this upper back signature by subtype because we can identify each sample that is enriched with upper back mutagenesis we see that her two enriched or her two subtypes has a higher presence of upper back mutated samples and one more example of a different kind we can actually separate non upper back mutagenesis from upper back mutagenesis because each mutation we annotate does it have annotate upper back signature or not and of course it would make sense only if a sample is enriched with upper back mutagenesis because upper back signature can occur by just random mutagenesis it's just two and a half nucleotides and two out of three types of base substitutions but in bladder cancer practically all samples were enriched with upper back mutagenesis and there was an observation in bladder cancer that ERCC2 oxygen repair gene is one of significantly mutated genes and at the time of I entered into our group entered into bladder cancer marker work group late so we didn't have time to reshuffle on all supplemental information to introduce this analysis but when we take out so this is gray part of this diagram when we take out upper back mutations and leave only black mutations which are non upper back then ERCC2 mutated samples clearly have higher mutation presence and as cancer biologists call it higher mutation rate so this is another utility which we are happy if the community would start using it and as of our TCG related efforts we are trying to make input into cancer specific analysis work groups now we are working in collaboration with Gadi gets and Gadi's group to integrate analysis of upper back mutagenesis the way we see it right into cancer exome math analysis within fire hose and to analyze updated and new TCGA exomes and maths ourselves or just to provide this for the community on with this I would actually I want to mention one thing so well being first confused with cancer data but then getting excited I realized that there is a value of our mechanistic studies that we create stringent statistical hypothesis the mechanistic field and with health to understand mechanisms in cancers but at the same time it helps us to understand mutagenic mechanisms and what is important that these two fields start speak common language and maybe modify the format and modify the format of questions and data outputs so we begin to understand each other the biggest amount of time I spend is to understand what's there in the cancer field because then things become simple and to explain because because I'm explaining in my language and we need just to meet more often and organize merger conferences and on this I would like to end by thanking all my collaborators and in red our collaborators that are on our poster and to this specific work thank you very much yes so I'm wondering if you've done any sort of co-occurrence analysis with apobec or mutual exclusivity analysis to see what other aberrations might be occurring with it or not so the answer is yes and yes first apobec mutations and even more of that apobec mutation clusters are colloquializing with chromosomal aberrations with breakpoints now there is another angle to the same question are there mutations caused by apobec that co-occur this is an excellent idea because single strand DNA maybe spread all over the genome but these are small pieces so co-occurrence would be highly likely for some mutations induced by apobec because it's a small fraction of cell cycle small fraction of the genome with mutation frequency with patient density there 10 000 fold more than in the rest of the genome so there will be and I was trying to convince and one person who might try to convince that this is exciting idea is in the audience we talked today and if somebody else wants to take this idea and try we are happy to provide annotation of maths by apobec and apobec samples thank you thank you thank you okay I'd like to welcome our last speaker of the session Matthew Wichalkowski from Washington University School of Medicine and he'll talk about integration of multiple data types for genomic characterization of virus associated tumors please