 Well, I think we can start. So I want to welcome you all to this SID Virtual Computational Biology Seminar Series. Today, we have the pleasure to host Guna Rech, who is professor of biomedical informatics at the computer science department of the ETH Zurich. So Guna Ern is PhD at the German National Laboratory for Information Technology, where he also did a postdoc. Then from 2005 to 2011, he led a group on machine learning in genome biology at the Friedrich Michel Laboratory in Tobingen. And then in 2012, he joined the Memorial Sloan Kettering Cancer Center as an associate faculty. And then in May 2016, he and his group moved to Zurich to join the computer science department of the ETH Zurich, where he's still is. So his group's research lies at the interface between methods research in machine learning and sequence analysis and relevant application area in biology and medicine. Current research focuses include large-scale machine learning, accurate transcriptome reconstruction, identification of RNA processing regulators, developing clinical decision support systems, and developing method and resources for international data sharing of genomic and clinical data. Zurich Laboratory has extended expertise in analyzing the RNA-seq exome and oral genome data and has contributed to major discoveries in RNA processing regulation, RNA alteration, and the relevance in cancer. So today, Guna will tell us more about these novel approaches to identify, understand, and take advantage of RNA alteration in human cancers. So thank you again, Guna, for accepting our invitation. And the floor is yours. Thank you so much for the very nice introduction. It's a great pleasure to be here and speaking here, talking about our recent work. Some of that work has started while I was at Memorial Sloan-Kettering Cancer Center in New York. And we are about to finish some of this. And I'm happy to talk about this here. OK, so let me start with some more general motivations before I go to specific topics. So I think it's clear that some pieces of medicine are relatively imprecise. There's medications which are on the market, which make a lot of money, which are not helpful for some of the patients, which take those. What you see here is a figure from 2015. For instance, the first drug, which has the largest market share and income, only benefits one-fifth of the patients who actually take this drug. For the next drug, it's one out of 20 people who benefit from this. And 19 people don't benefit from this. So there's a great imprecision to some parts of medicine. And I think we have the understanding today that phenotype, namely whether the drug works or not, is influenced by a genomic part, but also by lifestyle, the environment, and so on. So it's a complex system, which we are, and it's influenced by many different factors. So the genomic part, there's a promise that the genomics actually helps to understand when certain drugs work, when certain treatments work, and so on. And there are some initial successes of genome-based medicine. One is in pharmacogenomics, where you see certain mutations in the genome, which lead to a different binding pocket. And if you have that mutation, then this may change the efficacy of a specific drug. For instance, here with clopidogrel. Another case is rare diseases. Here's an example of a 20-month-old girl, which has a rare neurodegenerative disease and has a strange gait, impaired vision, and so on. And only by exome sequencing, it was found that this girl had a deficiency in a vitamin B, a transport mechanism. And this revealed how to treat this patient, in this case, by giving more vitamin B. And this, I think, almost completely cured that patient. And then in other cases, we have another example as cancer treatments, where we look at specific mutations, which may be common even among different, across different cancer types. And if you have a certain mutation, then you can treat patients for these specific mutations with specific drugs. So I mean, there is a great need for large-scale clinical genomics. So I think it's clear that rare diseases, most people who have rare diseases, will get their genome sequenced relatively soon. It will become common practice to do so, similar for patients who have a cancer. Their cancer genomes will likely be sequenced very soon. So the estimates are that within five years, probably, that 15% of the developed world population will be sequenced either for rare diseases, or for cancers, or for some other reason. So just thinking about the data sizes which are involved here, so this is quite drastic. So maybe a cancer genome is 300 gigabytes total in sequencing, and a germline genome, maybe 90 gigabytes in sequencing. So if you multiply this with 150 million people, that is about 28 exabyte of data. So if you would all put it in one place, that would be roughly the size of what Google stored maybe a few years ago. So it's quite large. So that needs new algorithms, and it needs new concepts, how to work with these data sets, and how to distribute them, and so on. And we did some work, which I can't go into detail much here, on genome graphs, where we can actually collapse a lot of the genome sequences, which are very similar to one data structure to more efficiently access and store such large data sets of genomes. OK, but the genomics part is just one part. We have the other challenge, that we have the lifestyle, the environment. We have clinical data, which we also have to capture. And a few years ago, or maybe 20 years ago, the data looked like this. It was in non-digital form. Nowadays, the data is maybe in digital form, one mostly digital form, but it is in some databases. And these databases are hard to access, and they are not designed for research purposes. They're designed for clinical purposes. So it's really hard to actually get the data out for research purposes. These databases are not fast enough, and they don't deliver what we actually need, and so on. So what we really would like to have is data in a somewhat more intelligent form and a much more accessible form in the future. So obviously, this data is already used in clinical studies, and these clinical studies are needed in order to show the efficacy of certain treatments. And usually, what happens is that you have this institutional database. You have a patient cohort, and then you have people, people who go into this database and read of what's stored in the database, read the medical documents, and extract certain pieces of information which are needed for that clinical trial. And then they enter this information into the clinical research database. At a hospital like MSKCC, there's lots of clinical trials going on, and there's 500 full-time employees who do this all the time. So that's why this little cubicle farm here, right? So because it's really many people who actually do this who are involved in extracting information out of electronic health records. So this is, I mean, from a computer science perspective, that's totally inefficient and totally unacceptable to go this way. And I think we have to solve this as computer scientists. So I think there's quite a few challenges, data science challenges, which we have to solve at medical centers. Just thinking about efficient search and information retrieval, I mean, we know how to do this in principle. Google can do this, right? But it's not done on medical records. Right now, it's very hard to visualize something like a patient electronic health record, as we can do, for instance, visualize something like a piece of genome that displays quite complex data. So we cannot do this with patient data as well. And obviously, we need high performance data access and computing within the hospital's efficient data structure, scalable computing, all within the hospitals. And we need some data science mindset, I think, also in the hospital. And this is kind of a long way we still have to go, but I think that's where we have to go eventually, right? So I think the challenges which we have to solve as researchers and maybe also as people who push research into hospitals as we have to develop, of course, new data science approaches for medical data. So maybe we need new methods for medical data. But also, we have to provide tools for the community so the hospital could actually use this and reproduce some of our analyses and use our tools. Then obviously, we have to solve biomedical problems. So solve cancer or solve a specific part in that quest and usually do this through collaboration. But I think a big challenge is also to create an environment which allows us to do this. So where we can actually solve these first three challenges. So an environment where the medical data is in one place, where the researchers are in one place, and everything can come together. And that's, I think, an important step, an important goal. And that's one goal which is attacked by the Swiss Personnel Health Network and by the Data Coordination Center is to create a network of compute systems which are interacting with each other. And these compute systems provide means to bring in hospital data, to bring in research data, and to perform the research on that data within a secure IT environment. And there's people in this room who are involved in this to provide this kind of platform. And I think this is a very important effort and without which it's going to be very hard to do any meaningful research on medical data and genomic data. OK, so this is ongoing. The SPHN network has started last year. The first projects start this year. And we will see in the future how much this will develop. So now let's get closer to what MyLab is doing. Very brief overview before I get into specific projects. So as I mentioned, we work on data structures for genomics. We try to develop genome graph data structures so we can actually encode and store information about thousands or millions of genomes in an efficient way. We also work with clinical data. One project which we work on is on intensive care data. The data here is from the hospital. We get data from about 60,000 patients. And we would like to predict, develop a early warning system for kidney failure or for heart failure. So you can actually predict maybe a few hours in advance that something is going to be wrong with that patient. So doctors want early about this. So we are interested in developing methods for heterogeneous biomedical data analysis for patient and disease modeling. And we work on cancer genomics. And most of what I'm going to talk about today is about cancer genomics and anything related to cancer what we work on. So I will need to come back to talk about the other topics at another opportunity. OK, so project one. This is a project I started when I was at MSKCC. At MSKCC, we had access to some medical data and to some genomics data from the same patients. And here, we performed a joint analysis of clinical notes and somatic mutations. So a clinical note looked like this. It has different sections. For instance, chief complained, history of present illness, and so on. And then there's a summary of what the doctor wrote at a certain point about this patient. And in this case, we had about 2 million electronic, these kind of documents, from a total of 200,000 patients. Now, the question is, how do we extract information about this? Because it's quite heterogeneous. Every doctor writes something different. And we had to come up with some ideas. And we came up with this relatively simple idea, which I'm happy to explain here. So we thought a sentence is a good unit of information. So a sentence is a statement. It says something about the patient. It's relatively short. It's not too heterogeneous. And we thought we could just analyze sentences. So it has limitations, of course. But if you just count how many different sentences we have, we have about 100 million sentences, which we found in these 2 million documents. So what we did is we removed a kind of stop words, which happened a lot. And then we took these sentences and thought maybe sentences which share similar words probably mean the same thing. So we clustered these sentences. And the idea is when these clusters probably have a lot of sentences which mean roughly the same thing. And this is kind of a description of a phenotype or of the medical condition of that patient. So and here's just a small piece of that graph which we generated. So each dot here is one sentence. The color are different clusters. This is just one little piece of a big graph where the nearest neighbors of sentences are connected to each other. So just to give you an idea how similar or how different these are. In the end, we chose, I think, 10,000 cluster centers. We essentially reduced these 100 million sentences to 10,000 clusters. And then we had for each patient a description, whether they had that sentence or they didn't have that sentence. So it's a binary vector. And that is something which we can then analyze. And we can connect to genomic data, which you also had for these patients. And in those cases, in the case of MSKCC, there is data from two more sequencing. There's a 1,000x sequencing of about 400 genes. So 342, I think it got a bit larger by now. And about 100x of germline genome. So we got a list of somatic mutations for a subset of the genome. The idea was, can we connect the data which we get from the text and associate that with the genetic information, maybe conditioning on some clinical variables, like which cancer type and so on. So we get more interesting correlations. And here's just one basic result. There's still a somewhat preliminary. So here you have the gene. Here you have the sentence prototype, which describes this cluster center. And for instance, he underwent colonoscopy, which revealed a polyp in the colon. And that is associated. The presence of that sentence is associated with a somatic mutation in APC, which is not totally surprising. But we found some of these cases, we could easily validate in the literature. They have been published about. And in some cases, we're much more recent or we're not published about yet. So we see this as a great source of hypotheses, which we can then go and check with the doctors to discuss those and see whether they mean something meaningful. So something which helps maybe the treatment of the patients in the end. So this is ongoing work, but it's one way where we connect clinical data with more genomic data. I will take questions at the end. So the second two projects, which I'm going to describe is mostly on genetic data, where we don't have clinical data. These are from large cancer genomics projects. This is from TCGA, the Cancer Genome Atlas. This is a large US project from the National Cancer Institute. It has about 12,000 donors. It's about one petabyte of data. It has exome data. It has RNA-seq data, some whole genome data, and a few other modalities. And the second project, which I'm going to describe, is a subpart of ICGC. That's an international effort. It includes at least 20 countries, 25,000 donors, and also about one petabyte of data in total. So there's probably hundreds of groups generating this data and also analyzing this data. And you could say, well, why do we look at this? Probably everything has been done by now. But we have certain ideas how to analyze this data. And we are particularly interested in transcriptome changes. So my group has been working on RNA analyses and RNA splicing. And our view on this data was, can we see anything in the transcriptome, which is maybe cancer-specific, or how can we exploit that? And that hasn't been done so much. People have mostly looked at somatic mutations themselves. And these two projects, which I'm going to describe, look at two different aspects of that. So the first one is really about alternative splicing. Here, we look at cancer-specific splicing and the implications of in about 8,500 samples. And we wanted to find cancer-specific splicing patterns, differences between cancer and normal. And we also were interested in identifying variants, so genomic variants, regulating splicing in cis or in trans. So we learned something about gene regulation. And we also, as you will see, we thought about immunotherapy, whether this can be used for immunotherapy. So, and we need two data types for this. We need RNA-seq data to describe splicing and the RNA, the gene expression and so on. And we need exomes, we used exomes, which describe the variants near the exons. So typically not deep into the intron, but it describes the variants next to the exomes and maybe a little bit into the intronic regions. And these are the interesting regions when you look into splicing. So it's a good data set to look at. And it's a relatively large data set. So it's an association study with 8,500 samples. It's one of the biggest data sets you can look at, which is relatively homogeneous. Okay, so it's a relatively complex project, I would say. So we had the pancan atlas data, which is, I think, 30-ish cancer types. Each of the cancer types needs to be processed in some uniform way. We analyzed the RNA in a certain way, in a uniform way. We looked at the exome data. We did the bearing calling in a certain way. And everything was processed in a uniform way. And then we performed a different analysis. So we performed the splicing QTL, which I'm going to describe, some different splicing analysis. Then we identify neo-introns, introns which are very cancer-specific, which you only can find in cancer. And then we show that these introns generate peptides, which can be validated by mass spec, which may be used for immunotherapy in the end. So I will get to that. So there's a quite complicated data flow, and it's quite massive in scale to do this. Okay, so the main tool which we use is a tool for the analysis of alternative splicing. It's called splatters, splice adder. And the idea is that you look at an existing annotation, you look at the alignment of RNA-seq reads. So it takes any alignments, and it takes an existing annotation, which may not be perfect. It uses that annotation and annotates it, extends this annotation with additional connections between axons, which we find in the aligned reads. Okay, so there's a read which starts on one axon, and it goes into the next axon when we see, oh yeah, that gives evidence for an intron which connects these two axons. And it also detects new axons and new introns which have not been annotated before. So it finds novel and can also quantify existing splicing events. So in the end, we detect splicing events, and you quantify splicing events, you can also do some differential analysis, which is optional. So we stopped for this analysis here. We run this tool on all of the 8,512 samples and quantified splicing. So then you get essentially a quantification for each splicing event. Is the axon included or not included? And this gives you a vector for each sample, and then you can visualize this vector, for instance, by T's knee. And then you see, okay, actually each dot here is one sample, and then you can see that there are major differences between the cancer types, which is not totally surprising. If you would do the same with gene expression, you would also find a similar effect. So for instance, here you find the breast cancers, here you find the ovarian cancers, and so on. But you can also look into more detail and look into different subtypes of cancers. For instance, on the right of this panel D here, you see that the subtypes of breast cancer, so the basal is here, the luminal A and B I think are here, and this I believe is the HER2 class. No, this is not sure. So luminal A is here, luminal B is this one, and this one maybe here, right? And I think these are the normals, if I see it correctly. So you can actually see a good spread of the subtypes. You could actually say for each sample, just by looking at splicing which subtype it is. So that's quite good. But it's fully descriptive. I mean, there's limited things you can learn about this. So we're interested in quantifying how much maybe novel splicing is there, and also is there more splicing in some cancer types than in others? For this, we did one analysis where we normalized for the different number of samples we have for each cancer type. Obviously, when you have more cancers in one cancer type, you find more splicing. So what you did here is we just randomly chose 40 samples for each of the cancer types. So now it's comparable. And now you see that, for instance, there are some cancer types. This is the number of events for each of these cancer types. For some cancer types, you find a very large number of splicing events. So this is exonskips. So you find 40,000 exonskips for this cancer type. And this dark bar here, that's the number of exonskips which have been unnotated before in the annotation. So it's actually a pretty small part which is unnotated. And 80% is new. But there's very good evidence for it. We see it in multiple samples. We see it with many reads. And there's high confirmation rates for these splicing events. So there's a lot of new splicing going on. It is highly variable across cancer types. And we thought, OK, that's interesting. What can we do with it? And what we also see is that there's more alternative splicing going on when you compare normal samples versus cancer samples. So, for instance, here you have the number of alternative splicing events in average for the tumor samples versus the normal sample. And for some of the cancer types, there's a big difference in alternative splicing between tumor and normal. So, for instance, for lured lung adenocansidoma or lung squamous, there's quite big differences in the amount of alternative splicing which we see. And in some, maybe, thyroid cancer, there's not such a big difference. So the hypothesis is that the splicing mechanism is somewhat disrupted and this leads to additional splicing events which are spliced in a certain way. And that's, I think, that's one interpretation of this data. So we're interested in finding genetic reasons why certain splicing happens. And you can do this by looking at an association study where you look at a correlation of a somatic mutation with a splicing event somewhere else in the genome. And for this, you usually use generalized linear models. I don't want to explain in detail here. So this is the somatic change here and we try to explain the splicing change. And there's some additional factors which are compounding factors which we try to factor out wherever possible. So, and we did this in cis and in trans. And in trans, this is actually quite a bit of compute which we have to do. We have to correlate, essentially, every position in the genome with every splicing event which we find. So maybe somatic mutations, we have thousands of them, maybe 10,000s. And splicing events, we have 100,000s of events and we have to see whether there's a correlation between these two. And after proper filtering and a lot of manual work to get this clean, we find, first, not surprising that there's a diagonal, right? So because you have a splicing event which is next to an exon and that exon may be affected in its splicing. So this is this diagonal. But you also see kind of stripes here, right? And this means that there is a somatic mutation in a certain place, for instance, here in chromosome 2 and this affects splicing in many different places across the genome. And some of these cases are known. For instance, I think this stripe here, this is SF3B1, is a splicing factor. You have a mutation in the splicing factor and this leads to splicing changes all over the genome. This is actually where it's not known, right? So here's the count of how many events are actually that a somatic mutation affects. So for SF3B1 maybe you have 500 targets where we see a splicing change when you have a somatic mutation of a certain type. But there's other cases where we are not known, which are not even RNA binding, it may be a secondary effect. So there's a lot of food for thought how to interpret this. But I think we have done the study here quite thoroughly and I believe these stripes are probably really there. So maybe not totally surprising is the cis splicing effect when you have a somatic mutation or a germline variant which is close to an exon boundary, then this can have an effect on the splicing of that specific exon. So and here's a distribution of somatic or germline variants which we found associated. But it shows that somatic, that variants even maybe 100 nucleotides away can have a significant effect on splicing. It's not only the first two nucleotides or something of the intron, the AG or GT which affects splicings, it's the maybe much wider region of the intron which is influencing splicing. Okay, so we looked another way at the data and asked is there something maybe specific to cancer. So we had this idea that maybe splicing is disrupted in some cancers, in some samples. And what we tried to define or what we defined is a cancer specific splicing burden. So how much additional splicing do we have in a specific sample which goes beyond normal splicing? So essentially look at all the splicing which you see in normal samples and be subtracted from this specific splicing which we see in a sample. So we remove essentially the annotated transcripts, we remove junctions which we see in GTACs and then we only keep junctions which are commonly observed in pan can atlas. And then we count essentially how often do we see a certain junction for one sample. So and one dot here is one sample, right? And here is the splicing burden was the number of additional introns which we find confirmed in that specific sample. And you say if that number is very large then there's many additional splice events are going on and hence the splicing is probably somewhat disrupted. And what you can see here is in purple you have the tumor samples and in green you have normal samples you generally see the normal samples are much less disrupted than tumor samples. So there is really something going on in tumors. And we did this analysis across cancer types. Some cancer types seem to be very much affected and some cancer types are not so much. But what I think is most exciting about it is that there are now specific splice junctions which are tumor specific. They're specific to that tumor which you look at and it turns out that some of these splicing junctions actually recur in different samples. So here's an analysis where we look at the recurrence of alternative splicing in different cancer types. Each column here is one cancer type. These are tumors. These are normals. And this is G-tex. And what you see as a color here is the recurrence. That means how many samples have that specific splice change. So there are some splice changes which are shared by 60% of all the cancer samples but essentially none of the normal samples are also not anywhere in G-tex. So that means this is extremely specific to cancer and it is highly recurrent. Because that is something which we can actually target with immunotherapy if we have a peptide or a vaccine against this specific peptide which could target that. And if it is highly recurrent then actually this peptide would be effective for many patients. Okay, so now the question is are these peptides really there? And so we have these splice junctions which are new, right? Which connect one axon to another and this is a new combination which is not seen normally. This generates a peptide. It's different from peptides which are generated by somatic mutations. So a somatic mutation would maybe be within an axon and it would change maybe one nucleotide and this changes the peptide. But a splicing change introduces a new intro and a new connection between two axons and that generates new peptides which have not been seen before at all. So we generate a list of all these peptides. We do MIT binding prediction and only let the strongest binders pass. Then we look into mass spec data which has also been obtained for some subset of the TCGA data which is part of the CPTAC project and validate, I mean, essentially check whether we can find the peptides which we predict by splicing. We also, in order to compare it against something, we also did the same for the germline or somatic mutations. We also checked whether we find those. So we can actually compare the number of peptides which are generated by splicing versus the number of peptides which are generated by SMDs. So I think what's most interesting is here we look for ovarian cancer, breast cancer and colorectal cancer. Here is, each dot is one sample and this is the number of peptides which we can confirm by mass spec which is predicted to be MHC binding for splicing peptides and we can compare this with SMD-generated peptides. So in many cases we find exactly zero SMD-generated peptides which are confirmed by mass spec and MHC binding. So there's actually much more of those here and hence there's much greater potential to find something which you can use for immunotherapy, for targeted immunotherapy. Okay, so I think this is pretty exciting. So the next steps is to show that splicing peptides are actually bound by T cells. We have to show that splicing peptides can be used with immunotherapy, is that maybe the gene expression of those is high enough so it actually is detected. Then we have to show that patients with high splicing burden maybe respond better to checkpoint inhibitor therapies. So there's new work funded by SPH-NMPHRT together with Mark Rubin and George Koukos to do some of this work together with George Koukos in particular to show whether these peptides are bound by T cells. Good, so I'm running a little low on time but so maybe I do a little short version of this. Good, so I take questions later. Probably I'm too fast anyway. So project three. This is a project which is more of a consortium work where I mean the whole consortium is probably 800 people. So there's 800 people who try to publish at the same time which is about now. And some of that work will come out very soon on BioArchive and hopefully later in cell. Let's see. So I'm co-leading the transcriptome working group and this is one aspect of that work which I'm presenting here. And we try to integrate diverse transcriptomic alterations to identify cancer-relevant genes and signatures. And the idea of that is that there's many different ways to induce an oncogenic event or to disrupt maybe a gene, a gene's function. So for instance for MET there could be a non-sononomous mutation in MET which leads to an activation. It could also be an alternative promoter which leads to a higher expression of MET. It could be a fusion which leads to the higher expression of the relevant part of MET or it could be alternative splicing which changes some piece of the coding sequence which then leads to an activation of MET. So usually people have looked only at the somatic mutations to a certain extent at fusions but much less at alternative promoters, alternative splicing and so on. And part of that study is to analyze these different alterations types together and come up with a little algorithm which can find maybe genes that are recurrently altered across multiple alteration types. And that's the goal of that work here. So as I said ICGC is an international project. There's many countries involved if you don't want to read them. And we get data from different heterogeneous sources. We get whole genome data on one hand and RNA-seq data on the other hand. So the whole project has about 2,500 whole genomes. For this study we could only use about half of them because only some of them had RNA-seq data. But we needed RNA-seq data. So we came up with the pipeline which was a result of much discussions. We ended up using top hat 2 and star. We joined these results and flagged genes where the results were not agreeing and so on. So it will be part of another paper describing that. So then we looked at different alteration types. We looked at RNA-editing, gene fusions, expression outliers, alternative promoters, alternative splicing, alleles species expression on the RNA side and also DNA copy numbers and non-sonoms mutations on the other side. So we tried to put them all in the same coordinate system so we can actually do some joint analysis because RNA-editing is quite different from, let's say, copy number variation. So we had to simplify that. We had to summarize alterations per gene. And what we did is, for instance, for RNA-editing or for fusion, we said, well, we say this gene is modified if there is a fusion in that gene period. So it's a fusion, zero, there's no fusion. Similar with RNA-editing, if it's an RNA-editing event which leads to a somatic, to a non-sonomous change. Then copy number, we said, okay, it's one if the copy number changes more than, I mean, it's greater than four. So it's somewhat arbitrary. But we did the same with expression outliers. We say it passes the filter if the Z score is larger than something. Okay, so these are more quantitative filters. Then we took, essentially, for each gene, this vector for each sample, put it into this vector, and then we put it, essentially, into a cube. So here we have about a thousand samples. Here we have eight different alteration types. So six RNA-level alterations, two DNA-level alterations, and about 16,000 expressed genes. So this is like a three-dimensional matrix, and now we can slice this matrix in different ways. We can either project to the top. We can aggregate over the samples to do a recurrence analysis, or we could do over the, what is it, the alteration types to maybe look at different pathways, how they're disrupted, and we can also aggregate over the genes to see maybe whether there are samples which have maybe specific patterns. Okay, and here's a rough summary. When you look at cancer types, you see big differences between alteration frequencies of the DNA pieces. This is for the different, this alternate promoters, expression outliers, and so on, and here DNA, and you see significant differences between the different cancer types. You see, for instance, in kidney, there's much more alternative promoters, and maybe in lymph BNHL, there's much more alternative splicing going on. So there are specific characteristics of specific cancer types. So that's good. You can also look at samples which have, I mean, there's a certain way to look at somatic alterations in cancers. So, for instance, we can look at mutations in P10, MTOR, PI3, 3KNAs, and so on, for instance, for kidney, and you see that certain samples, the column here is one sample, are mutated. So we find certain mutations for a subset of the sample. So in 37 of these samples, we find somatic mutations or copy number mutations. If you now include additional types of mutations, so we include fusions and allelespecific expressions, splicing outliers, and so on, you can actually find quite a few more samples which have some alterations in these key pathways related to kidney cancer. So I guess we have an explanation for a larger fraction of samples, what's happening with those, except that the alteration now is not a somatic mutation, but it is something going on on the RNA side. So a new type of analysis we performed to identify known and newly recurrently altered genes. So essentially we were interested in genes which across the different alteration types, so we didn't care which alteration. So across the different alteration types is frequently altered in many cancers or many samples. So a similar type of analysis is done with somatic mutations asking, is that somatic mutation frequently or recurrently altered in specific cancer types? So we wanted to repeat that, but we did it across different alteration types. And so that's the result of that analysis. So here you have the ranking which you came up with, and what you see is, yes, it is highly enriched with cancer-sensors genes and with driver genes which are known, but it also shows that there is much, there's a few other genes which are very interesting which have not been picked up as cancer-sensors genes before. And I can't go much into detail here. So for instance, KLF13, there's some evidence that this is indeed involved in cancer, but it hasn't been picked up by other analyses before. More details about this manuscript or about this work we described in a manuscript which will come out probably next week or in a few weeks. Okay, so in summary, the group works at the interface of data science and biomedical applications. We do a lot of technical work or, for instance, on the scalable graph algorithms for large-scale genomics and new training algorithms for recurrent neural networks. I can talk about this another time. I talked about three projects. One is the unbiased analysis of EHR in the context of two more somatic variants. It needs more data, it needs more clinical data, it needs more somatic variants from Switzerland when we push forward to get this, but it also needs more discussions and validations with clinicians. So project two, I think I'm pretty excited about because we actually can show that RNA splicing leads to RNA alterations which are potentially targetable by immunotherapy. And that's a new work, an ongoing work, which I think is great. And project three is a new way to integrate RNA and DNA alterations to identify key genes in two more, two more agencies. So I think overall the collaboration with life sciences, with life sciences we need in order to translate the technology into new science and better healthcare and the initiatives like the SPHN Data Coordination Center or PHRT will make these kind of collaborations much easier. So with that I would like to thank my team, about half of that team has moved from New York to Zurich. I'm grateful for that and also for their work. And I would like to acknowledge a lot of collaborators in New York, also in Zurich and other places. Thank you so much for your attention.