 Okay. Welcome to the medicine and health session of the virtual SRV days 2020. I'm very excited that already like 120 participants are here in this session. And we're very optimistic that it will be a great session because we have a perfect lineup of speakers. And the chairs of this session are Kosia Popadin from the laboratory of the human genomics of infection immunity at the FBF in Busan. And I, Katja Bernfeller, I'm a group leader at the Institute of Allergy and Asthma Research up in the folks. Some organizational things during the presentations are the people in the audience can submit the questions through the question and answers or drug and antwerp feature. And they can also upload questions that are posted by others. And when you submit a question, please keep it short and simple and also remember to specify the speaker's name. After the 10 minute talks, some questions from the question answer feature will be asked by the chairs, as long as the session is running on on schedule. All the other questions are then passed to the media speaker session, which is directly following the last presentation. And these questions will also be asked by the chairs who will select questions also taking into account the upwards by the attendees. So with this, I would like to introduce the first speaker, Francesca Singer. She works in the Nexus Group of Daniel Steckhofen at ETH Zurich and her talk will be on single cell RNA sequencing analysis of tumor biopsies. So Francesca, the floor is yours and I'm looking forward to your talk. Thank you very much. So why do we want to use single cell RNA sequencing to have a closer look at tumors? Well, they're not very homogeneous, but rather they are composed of very different cell types in each cell has a different function. So single cell sequencing helps us to bring us from a bulk perspective to actually a closer investigation of the individual cell types depicted here. And also how individual cells differ from each other from their neighbors. And ultimately, when we bring this into a clinical setting, so how can this help us? Well, we can look at tumors that they heterogeneity. We can look at what immune cells are present in the samples or the micro environment. But also we can look at gene expression, pathway activation, and ultimately, we can try to tackle the question what treatments might be effective for this patient. And of course, there are certain challenges to bring single cell sequencing into a clinical application. First of all, it's a high throughput method. So we have a wealth of data that needs to be sorted and that needs to be put into a clinical context. And also we need to perform a robust analysis and needs to be well documented, and it needs to be reproducible. And so we try to address some of these challenges by creating a workflow that is embedded into a snake-make environment. And here you see a bird's eye perspective and overview of this workflow where we try to use single cell RNA sequencing to have a closer look at tumors and to also report information that could be potentially useful for the clinics. So just an overview. So we have these transcripts which are barcoded to be labelled, to which cell they belong, and we do have a count matrix. There are cells and genes represented and this is further filtered and post-processed such that we can do an expression profile analysis. Mainly here we compare cells with each other, looking at how similar they are and whether they can be grouped, but also we compare this to a priori defined cell type marker list in order to do a cell type classification. And ultimately what we want to know is the sample composition, meaning for instance what types of immune cells are in our sample, but also what are the different tumor populations that we potentially see. And this brings us to the reporting part here. I listed some examples of what we can do, of course gene expression, but also the differentiation state of certain populations, so the velocity can be very informative. We can do a pathway analysis and we can also do differential gene expression comparing different clusters. And ultimately this can be used for an enzylic drug prediction. Here the differentially expressed genes are linked to further investigate this for being a potential drug type. Just very few details. First of all, the basic steps where the take home message is mainly the pipeline already takes care of filtering of trying to help you to get the high quality data set. So we use their ranger for an initial mapping of the reads and the gene annotation. And then as you can see here with some QC example plots, we do various kinds of contamination removal and filtering, including removing cells that are potentially empty droplets. And also those that have high mitochondrial gene content that would be the red ones here, because these might be likely dying or broken cells. And then another important aspect is the normalization and also the cell cycle correction. We don't want ourselves to be grouped just according to their cell cycle phase. So ideally we get something that is depicted over here. Each cell is colored by the cell cycle phase. And as you can see, after correction, we have a well mixed sample, not driven by the cell cycle phase. Two core functionalities of the pipeline are the clustering, which is the grouping of similar cells, according to their expression profile and the cell type classification. So for cell type classification, we would take here these rows, these gene expression profiles of cells, and we would compare each cell to a priori defined cell type marker lists, which are genes that should be highly expressed in a particular cell type. And what we calculate is a similarity score that tells us what is my cell, what cell type is my cell. So depicted here in the seed map, each cell is a column, and the more red we are, the more likely it is that my cell is of a particular cell type. So what this does is basically it avoids the manual annotation of my sample, but it rather does an automated cell type classification. So just showing some examples of what could be potential readouts of such an analysis. Here I show you an example of lymph node metastasis of melanoma patient. In red you see the melanoma cells, and then we have diverse types of immune cells. And here, of course, you don't have to read all of this, but I think what is obvious is we can distinguish many different types of cells, even sub types of immune cells, which are typically hard to distinguish. And very importantly, we can relate this also to the gene expression of individual genes. For instance, of cell type marker genes, for instance here depicted as 100B, which is expressed in the melanoma cells as a melanoma marker. But then also important for clinical decision making, for instance, for immunotherapy, are the gene expression, for instance, of CTLA4 on the T cells, or of MHC class 1 on the tumor. And all of this can be addressed by putting basically the subcomposition side by side to the individual gene expression. In my final example is on the tumor heterogeneity. Here I picked an ovarian cancer sample, and in red you see the ovarian cancer cells. And just on first glance what you would think is that this is a very homogeneous tumor. However, if you put the side by side with the clustering as depicted here, then you see that actually the tumor is made of several populations, which are possibly different from each other, which could be important for the treatment decision. And if we go further and actually ask ourselves what makes them different, then, for instance, the pathway analysis can help, which you see here. Just example excerpts that show us that we seem to have two major populations, which actually differ in terms, for instance, of oxidative tumor. So at a glance, you see what could be potential differences of these populations. And with that, I want to thank the whole Nexus team, and in particular, our single-cell experts that worked a lot to make sure that we have a better understanding of these populations. And with that, I want to thank the whole Nexus team, and in particular, our single-cell experts that worked a lot to bring this pipeline to life, and then also our great collaborators from the genomic facility in Basel, from the University Hospital Zurich, and the University of Basel. And of course you, for your attention. Thank you very much. Thank you a lot, Franziska. There's been a question on the question and answers section by Julian Ruh. How many steps of the single-cell analysis do require quality checks, manual thresholding, choice of autocrity, and how to perform these in an automated workflow? Like, are there checkpoints in your analysis? I tried to address this point-by-point. It's a rather long question, so I hope I can answer all of it. So, I mean, ideally, every step should be quality-controlled, right? I mean, I think it's most important to have a solid basis of your data, so controlling the mapping, controlling whether the genes and cells that you have are actually not contaminants, the cell filtering that I was talking about, I think it's very important to reach a clean data set here. But then also, I mean, always, for instance, for clustering and cell type classification, also here, it's important to always double check, I would say. Could this be a doublet? Could this be a round cell type that I assigned here? So I would say it's important in all stages to have a quality control, if possible. And the second part, how much is this automatable, or so how much automation can we have? How do you perform this in an automated workflow? So, in Snakemake, what is very handy is we do have, like, it's a config file where you say you have some default parameters, and based on how we do this based on the cancer type at hand, we would specify certain, let's say, parameters that by default would do a robust analysis of this particular cancer type, and this can be adapted to other cancer types. But basically, Snakemake is very flexible and gives you the tools at hand to do this in an automated fashion on diverse datasets. So everything that I was shown here is really just kicking off once, and then you get all these plots automatically. Okay. Thanks for the answer. We will go over to the next speaker and ask the remaining questions in the Meet the Speaker session following the session. Okay, so our next speaker is Fabian Arnoit, who is the head of the Laboratory of Molecular Tumor Profiling, University Hospital of Zurich. And he will present today a new software which helps to understand the patient's tumor profile and helps in clinical decision making. So, Fabian, the floor is yours. You're welcome. Hello, everybody. Can you see my screen? Yep. So just small comment, I'm not the head just to make that clear, but that's fine. So yeah, thanks a lot for having me. As mentioned, I'm actually from the University Hospital. So it's for me very exciting to be able to give a talk here. And maybe I can also give you a bit of insights into the clinic. So I'm going to talk about the MTP pilot, and that's a browser based application that we developed for essentially efficient and interactive visualization of NGS data at molecular tumor boards. So I'm going to give you a small background to why we had the motivation to do this application. So we are, our lab is called Molecular Tumor Profiling. It's a lab at the University Hospital. And we are doing a high throughput genomic panel testing of cancer patients. So it's DNA sequencing. And we process typically more than 25 samples per week. It's a collaboration together with Foundation Medicine, and we deliver the Foundation Medicine tests for the Swiss cancer patients. And we have a lot of samples and many of them, the NGS results that are summarized in the medical report for many of those patients at NGS results are discussed at the U.S. lab at the Molecular Tumor Board. That's essentially a board where oncologists, pathologists, bioinformaticians sit together. And based on the NGS results, they discuss the best therapy recommendation for a patient. So essentially, at the end from an NGS result, we will get a tumor board report which has a therapy recommendation. And here we encounter some problems. So one big problem is oncologists, especially those from university hospitals are really very busy people. So there is a limited amount of time to discuss each patient. And as you might know, NGS results, they are getting increasingly complex. So in the clinics, we have larger and larger panels, we have more biomarkers. It's getting more complex. And there is simply not enough time to manually analyze all the data and all the possible aspects. So we came up with the solution, which is MTP pilot, which tries to provide automated comprehensive annotation for NGS results in a user friendly interface, and also having linkage to common databases and interactive tools to visualize the mutations. All of which would not be possible to do on a case by case basis. And in a nutshell, what it is, it's essentially you take a medical report from the left, which includes NGS data, which is very text heavy and not easy to analyze quickly. And you pack that into a web based solution, which is more intuitive and faster to understand. And it's essentially a web based solution. So we have a front end with HTML and JavaScript and the back end in our case it's PHP solution with connected to our SQL database. So how does it look like. We have a menu on the on the left side where we essentially have the patients for a certain given tumor board. And we have of course, a section number two with the patient and specimen information by the way that's all anonymized so no worries. And we have resources in section three, where we have patient metric which I will come to in briefly. And then we have number four and five. We have a panel with the genomic signatures which in case for the foundation medicine tests are the MS status and the TMB and an ideogram automatically generated with all the mutations for this patient. And these two together so the signatures and the ideogram, they're very simple tools but they already give us a very fast and easy glance at the overall tumor status of the patient. The patient has a lot of mutations, but there are a lot of copy number alterations which may be indicative of genomic instability, and so on. And then the core section is actually number six, which is this comprehensive automated annotation of the mutations and how does it look like so I will go through it step by step. And number 1234 it's not so interesting for you that's essentially just automatically the annotation of the mutation at the protein level, the genetic level. However, then in number six, we automatically link these mutations to common use databases, what's very often used in the clinics it's database like cosmic clean var or uncle KB. We have an automatic linkage, which will tell you whether this mutation is described as significant but the genetic and so on. Very useful is also number seven. We have automatically a link to mutation to smart domains. So you immediately see whether your mutation is is affecting a functional domain of the protein, which might also help a lot understanding the mutation, especially in the case of frame in tumor suppressor genes. So you can immediately assess what part of the protein is actually lost. We have then other fields. Nine and 10, we check databases to check whether a mutation is somatic or germline, which is also always an important topic during these tumor bars. And what we also have and point 11 is actually a 3d viewer. And that actually allows us to, that's automatically generated, and it maps the mutation here displayed as a sphere model to a suitable PDB model. So you can immediately see for each mutation if there is a PDB available where the mutation sits whether it might affect the function of the protein. And then some other smaller features I will present quickly is, we also have a fusion visualization. So fusion are quite complex mutations and they are not always really easy to understand, especially if we do if you do DNA panel sequencing We've done a lot in the clinics. So the breakpoints are not always very clear. The orientation might not be clear of the fusion. So essentially this tool helps. It's also automatically generated and it helps to characterize the fusion so where the breakpoints are, which domains are affected how the final fusion could look like. And this is linked to our database. So, by this time we have now a lot of patients in our database, and we have a patient matcher. So that allows us to match, to match molecular profile of a given patient to other patients. And this can be helpful, for example, to maybe confirm a diagnosis. So in this particular case you see this patient has a, it's a hallmark fusion it's temperature fusion that's a hallmark for prostate cancer, and you see other prostate cancer patients So that might help identify the diagnosis or might also help with treatment. And with that I'm actually towards the end already. So the status quo now is our current local solution it's obviously integrated with the database with our patient database that feeds the the pilot software. And this is then what we presented tumor board. But on a longer term, we want to publish this as well and then offer like a community web based solution or essentially, you can upload for example your vc files on the browser based and you get almost very similar interface just with your results. And so, that brings me already to a knowledge and so I want to thank of course Martin so he's the actual head. And he enabled also the whole project. He will present later as well. And he actually did a lot of the back and programming for for this. Then, also thanks to pathologist and oncologist that helped as well designing it. Thanks for your attention. And a lot for your interest in presentation. So we have maybe time for one first question. So I have I see question from Robert even egg. So, as soon as you need to decrease dimension the complexity and presented to clinicians how you prioritize what is the rank of mutations or genetic genetic rearrangements, which you present to clinicians how you decided to actually foundation a medicine in our case already does a certain ranking for us. So they already rank by pathogenicity, whether a mutation is significant and pathogenic, and they will rank mutations that are not clearly pathogenic as variants of. And so that's automatically done actually by many actually providers of panel sequencing. Okay, but what we what we can do with the soft. So with this, I think we both next. Thank you. Thank you. Next will be Julian dragon is a scientist in the vital IT group the competence mathematics and computational biology of the SIB and los and he will present as Swiss knife analysis tool. Hello everybody. Can you see my screen. Yes, yes. Yep. Okay. So I'm going to present DS Swiss knife which is a package we wrote here in the vital IT group of the SIB in order to facilitate the federated data analysis. As many of you know, we have a problem with the with the sensitive data and that is that they are often see load. That is because ethical and legal constraints prevent sharing of clinical data. And that is especially true across borders. Also historical data are in different formats and levels of granularity. Different variable names or non standard variable measurement units. And also the methods for use for analysis are not standardized. So we try to solve some of these problems by bringing the analysis to the data. We set up we have set up as part of as part of a large European consortium for type two diabetes. We set up a federated database comprising 10 clinical cohorts across Europe. These cohorts are connected to a central server. And in order to perform this federated analysis, the software execution is distributed over a data network. And only summary data, the summary level data, for instance, partial means mean values or regression coefficients are returned. In this model, the analysis is brought to the data. No individual data ever leaves its origin server only aggregates are shared. This was achieved using an R package that we call DSWIS knife. We implemented several, several modeling analysis methods. Amongst them principal component analysis came in clustering and random forests. These, these methods are designed to work on virtual cohorts by virtual cohorts. We, we mean aggregations of individual cohorts existent on the remote nodes. This, this is like running a model on a virtual cohort is will yield exactly the same coefficients or model of results that you would obtain on the full data set present on on the analysis computer. The package works with open source software on on the opal infrastructure and with a data shield. R package. We are actually following the data shell consortium in in this. On this path. It's them who, who opened this by implemented by implementing remote federated GLM models. So within IMI Rhapsody, we have built a federated database with about 50,000 patients. And for the analyst, the patients are in different stages of the disease. Some of them are ill. Some of them are pre-diabetic. We chose the three cohorts of diabetics totaling about 25,000 patients to perform a combined K means cluster analysis. So we built this virtual cohort. And with on on five phenotypic variables, which are HBMIC peptide, glycate, hemoglobin and HDL. We executed the K means algorithm modified to work on federated data. And we obtained the results like so this is a plot of the results that the acronyms on the right are known from the literature. The first one is for instance several insulin deficient diabetes. And we can see from the plus that that the cluster that the measures for the cluster for this cluster of several insulin deficient diabetes. It's quite different from the several insulin resistant diabetes, which is the second one. Furthermore, these results that we obtained are quite similar to what is known from the literature. So quite similar to what we obtained from each individual node, but that wouldn't have been necessarily the case. We had to combine them to make sure we will obtain the same results. Going very quickly forward, I will show you that we managed to execute also federated PCA on the virtual cohort to visualize the separation between the clusters. Here we can see clouds and not individual points, obviously, because we are not allowed to see individual measures, even on principal components. And these are two of the clusters we managed to see visualized on the first two principal components. I just want to say that this PCA illustrates how data can be combined on the fly to get a global overview. This analysis is not limited by the number of variables. It is not a bandwidth hungry or CPU hungry. The algorithm is running time in matters of minutes and as you have seen 25,000 patients is quite an important number. So we are pretty content with the results. And this would be the key miss message. The Swiss knife is an innovative toolkit for remote analysis of virtual cohorts without any sharing of sensitive data. I would like to. You talk of the key message. We're running very late so we will handle it right now to the next. Thank you Julian. Thank you. So in our next speaker is Maria Livia family. So she is a researcher at Swiss Pro group and today she will present us a way how to integrate genome and variation data with the knowledge of protein function and disease. So you're welcome Maria. Thank you. I'm trying to share the screen. Yeah, we are. Okay, thank you very much. Hello everybody, everybody. I would like to present you work currently done at Swiss brought the teams to increase the utility of uniprot as a platform to link genome variation data with knowledge of protein function. And this can be done thanks to the richness of annotations that are present uniprot kb Swiss product. As you know, uniprot kb Swiss product provides plenty of information on proteins, including protein sequences, sequence features, protein function, but also we provide information on human genetic variants, and they are involved in disease particularly Mendelian diseases with highly high penetrance. We also provide information on the functional impact of genetic variants. These information is provided as free text. Therefore, in order to make our data useful. We need to standardize the annotation. Our work now is standardizing annotations of variant clinical significance, as well as annotation of variant functional impact. For the variant clinical significance, the upper in the upper part of the slides, you can see how variants are actually annotated in Swiss brought. We provide the position of the variant at protein level, the amino acid change, and if a variant is found in a disease, we indicate in plus disease short name. The standardization work consists in applying the SMG guidelines and terminology for variant interpretation and reporting. So that we define the variant at nucleotide level using the HVS nomenclature, and we use the five pathogenicity categories that are used in clean bar to indicate the clinical significance. In this example, the variant is likely pathogenic for lactic acid aciduria. The standardized annotation are linked to a medID. They are stored in an internal variant database and are also submitted to clean bar. So these annotations are not yet available through Swiss brought, but through clean bar. Up to now we have standardized over 1000 variant interpretation, most of them submitted to clean bar, and almost 800 are already released, therefore public in clean bar. For the functional impact of variant, we move from free text to machine readable data using control vocabulary from the variation ontology, vario, and from, from go, as well as when needed from KB. In this example, I show that vario terms are used one to describe the protein property that is affected in this case protein function. There is a vario term that describes the type of effect. In this case, this is a loss of function mutation, and the go term is used to name the process that is affected. Again, the standardized annotation is associated with the medID and is stored in an internal variant database. Now we have standardized over 14,000 functional annotation, and we have created a corpus of over 3000 medID of articles on functional relevant variants. So, in the future, what we will do is retrofitting the standardized annotation of clinical significance and functional impact from the internal variant database to Swiss brought. We are also sharing our interpretation with clean bar, and we are going to use the corpus of pun medID from our internal database to train a deep learning classifier and be able to identify literature on functional variants. This will allow us to prioritize our work so that in the future we will focus on notation on novel and rare variants that have functional impact. The CVTs are funded by CERI and by NIH. I would like also to thank all my colleagues at NAPS who have contributed to the work, and with this, thank you very much for your attention. Maria, thank you a lot for your presentation. Thanks. Thank you. Precisely in time. So because we are late, we have to go directly to the next speaker. The next speaker is Maria Dmitrieva. She is a PhD student in the group of Christian from Marin at University of Zurich. And today she will present analysis of Oland microbiome in patients with cystic fibrosis. Maria, the floor is yours. You're welcome. Yes. Okay. So hello everyone. And as have as Konstantin briefly introduced, I'm a PhD student at the lab for Christian from Marin. Today I will be presenting the project that we did in collaboration with Dr. Carlits team at the Contonal Hospital St. Gallin. So just a brief note of why the lung microbiome is so special. In essence, it's an open system. So it's kind of exposed to the environment. We breathe in microbes from the environment. We breathe out. There are certain conditions in the lung that allow certain bacteria to grow or to not grow, but mostly the lung is a sterile environment. Now what happens in cystic fibrosis is that these people have a mutation that results in the airways within the lung being carried in being covered in really sticky mucus. And this mucus prevents bacteria from getting expelled. And it also provides a nutritious, nutritious environment. So these bacteria overgrow the lungs, colonize the lungs. And as a result, patients with cystic fibrosis frequently suffer from respiratory infections. And even though some of the therapies can help with that, it's still quite a big problem. So what we wanted to do in this project is to follow a small cohort of patients and over a period of several months. And then to use shotgun metagenomics to characterize their lung microbiome because current clinical methods provide only a limited overview. Okay, so here you see an example of one of the patients in our cohort who we followed for over 15 months. During these 15 months we took 10 samples and the patient essentially just expelled sputum from the lungs and this was taken for shotgun metagenomic sequencing. As you can see, there has been some variation over time, but in general, there are just a few bacteria that dominate the whole lung microbiome. Among these bacteria, there are a couple of well-known cystic fibrosis pathogens and perhaps the most famous one is Pseudomonas aeruginosa. So what I would like to show you now is that this Pseudomonas aeruginosa is in fact not just a simple single population, but actually contains a mix of different subpopulations. What we did was we called single nucleotide variations on the Pseudomonas aeruginosa genome and because we have several time points, we can obtain a kind of temporal profile of how this variation becomes more or less abundant. And then we have of course several thousands of variation across the whole genome that if we cluster, we observe seven distinct clusters that have a pretty known high allele frequency and also show a different temporal pattern. Now I would say that three of these clusters, so one, two and three, have very distinct shapes and then when we check the other four, there seem to be derivatives of these three clusters. So in our hypothesis, these are three different strain variants of Pseudomonas aeruginosa in the same patient. And finally, what I would like to show you is that these strain variants differ not only by single nucleotides, but also in large regions of the genome. So what you see is coverage profiles of the genome from different time points. And for example, there's a region like the one here that has different coverage at different time points. And if we look at how this normalized coverage changes over time, then we can see that the shape almost perfectly corresponds to strain variant one, indicating that it's only present in this strain variant and not present in the other two. Although we couldn't define the exact function, we know that this contains some kind of metabolic genes. So with that, I hope I've given you at least some insight in how we can combine metagenomic sequencing and longitudinal sampling in order to gain more insight into the heterogeneous populations within a cystic fibrosis lungs. And with that, I would like to finish. I will have a poster later this afternoon that you feel free to come by and discuss it. And on the top, you see the QR code for our preprint, which is out. So thank you. Okay, Maria. Thank you a lot for presentation. And thank you for to be in time. So I think we will leave a question for for later discussion, and we will switch to next presenter. So the next talk will be by Iman Townhauer, who is a PhD student in the group of Julia Forkth at ETH Zurich. He will speak on early phototherapy prediction tool to detect hyperbillion rubenemia. Hello everybody. Thanks for the introduction. So I'll be presenting joint work with the Children's Hospital of the University of Basel here, and as already pointed out, we'll be talking about the early detection of neonatal jaundice, which I'll introduce on the next slide. Neonatal jaundice is known as one of the most common pathologies in newborns. It is caused by overly high bilirubin values. Bilirubin is a common blood measurement, and in fact, around 60% of the newborn babies turn yellow within the first days of life, which is the typical physiological sign of neonatal jaundice. Approximately 10% of children require phototherapy treatment within the first days of life. And the problem with untreated hyperbillion rubenemia is that it can cause major disability with lifelong consequences. And in particular, due to the recent trend of shorter hospitalization, there is an increased risk of critical jaundice recently. We see how typically a phototherapy looks like. Not very nice for the child, but it's also rather one of the milder forms of therapy. So it's a task that is really suited for early detection of machine learning. In particular, in this instance, we'll be analyzing data on 362 newborns for each of which we have up to 44 variables measured. Another particular challenge with this data set is that it's comprised of time series data with missing values and irregular spacing in between observations. Another data that we're going to use for the prediction is random forest, which is a very, very state of the art but easy to use method in the realm of machine learning. It provides good predictions in particular on tabular data, but it's still easy to use and easy to interpret compared to other methods. It provides the task that we are targeting on this data set. We want to provide a tool that is easy to use that achieves an early detection of neonatal jaundice, especially it allows us to predict early whether or not a child will require phototherapy up to 48 hours in advance. Another tool we want to personalize the prediction, which is currently based on conventional methods such as normal grams which are basically a rule of thumb based on population averages. And with that tool, which again should be easy to use in a clinical context, we want to provide a safeguard against too early discharge. We want to provide a screenshot of the prototype that we've built as an online tool which is publicly available. This online tool incorporates a machine learning model for the personalized prediction of neonatal jaundice. Again, we provide an early prediction up to 48 hours in advance of a phototherapy. With the model that's integrated in that tool, we achieve a strong predictive performance of 95% area under the rock curve. And as I said, we wanted to make this tool as easy to use as possible, so we stripped the inputs that are required for the tool down to only four variables. In particular, this means that for instance, if we look at the feature importance, which is a proxy for the influence of variables in a random forest, if you look at it, and here we plot the subset of the 20 most important features in the model. We can, by doing backward variable selection, for instance, find that only four variables suffice to achieve the performance that we've reported on the previous slide. In particular, those inputs are on the one hand the biller bean value, obviously, the weight of the child, its gestational age, and the hours that have passed since the birth. So let me just say a big thank you to our collaborators from the University Hospital of Basel. And thanks to you for listening. And let me just point out that we have a poster as well. So please feel free to visit us. Thanks a lot. Thank you a lot. Welcome, keeping on time. And with this, I would like to hand on to the next speaker. Okay, so and our next and last speaker today is a sin Ray Lou, who is a PhD student in the group of dinner rich at each age Zurich, and today she will present us a new computation approach to extract cancer and nutritional signature from small cohorts of patients. Thank you. Hi everyone. I'm seeing right I'm from the biomedical informatics group, what by Professor going on rush at age. Today I'm going to talk about our work on supervised medicinal signature learning. And this work is also accepted by ISM be this year, and you can find the publication do I down here. So to start with, I probably need to explain. But first I also need to introduce the concept of medicinal profile for those of you who are not familiar with that. The medicinal profile we're looking at here in this project is the summary of counts of somatic single base substitutions with their immediate five point and three prime A medicinal signature looks a lot like a medicinal profile except that it is the summary of frequencies instead of count. A medicinal profile can be seen as the weighted addition of different medicinal signatures, and the weights are usually called exposures. So medicinal signatures can help us to understand the underlying mutation processes in cancer patients. There are some medicinal signature except some method out there already, but they depends on the size and variance of the patient cohort and do not exploit the metadata, such as cancer types. Also they model the mutation of counts was plus on distribution, which is the negative binomial distribution. So we propose a supervised medicinal signature learning model that utilizes the metadata by integrating supervised learning, which could reduce the dependence on the size and variance of the patient cohort and help us to learn more robust medicinal signatures, even from small cohorts. Also we're trying to do the right thing here by modeling the count data was negative binomial distribution. So here's the math. We first decompose the mutational profile matrix into a mutational signature matrix and an exposure matrix using non negative matrix factorization. And we try to find the best solution under the negative binomial distribution assumption by minimizing those negative log likelihood function. And added the classification loss from SVN classifiers on the metadata into the picture, which will give us and we optimize the objective function using majorization minimization method. Here are some results. We evaluated our model on the largest collection of Ho Chi Minh cancer and use the cancer type as the classification label. And we used five full cross validation in our experiments. Here we compare the signatures learned by our model, which is dubbed as as MBMMF and unsupervised model, which is dubbed as MBMMF was the reference signature site. And these two figures down below the signatures learned by our model performed the best in terms of both reconstruction arrow and classification accuracy on 45 cancer types. Another experiment we did is to extract a medicinal signatures. Sometimes we'll lose your voice. Oh, where 10 seconds back. So we were able to hear you, but just if you repeat your last two sentences, which will be perfect. So starting from here. I think you can go to the next slide. Yes. Okay. Yes, this slide. Thank you. Yeah, another experiment that we did is to extract mutational signatures from a small cohort was only six cancer types, most of which are related to breakout one and two mutations. And here we compare the best matching signature from our model and unsupervised model was the reference signature three, which is highly associated with breakout one and two mutations. We can see from the result that the signature line by our model is more similar to the reference signature though. So we don't know what you learned. Oh, can you hear me now. Okay, so to conclude, was supervision, the signatures learned by our model are more predictive of the cancer type, and still use lower reconstruction arrow. And they're also more robust, even when the training cohorts are small. And there are more cool stuff our model can do one of which is to force specific signature to be associated with certain molecular feature such as output back expression. And for more information, you can also find me in the poster session one. And thank you very much for your attention. Questions are welcome. Thank you very much. So I think we go to general discussion. So thanks to all the speakers for the exciting. I'm really excited that it turned out so well. And I also thank you for the active participation we've collected now quite some questions. And the audience members should now join the media speaker session. Where we will ask questions from the Q&A feature and for this, as you see here, just leave the current session and click on the camera symbol next to the media speaker session of session one in the program and there we will meet again. Yeah, just now. Thank you.