 So good morning everyone, my name is Kathy Newtseller and I'm going to be the host for this morning's session of the second day of the summer school. And so with that I would like to start with our first presenter who is Joaquin Topazzo. He received his PhD in biology from the University of Valencia. He has many years of experience working in research in academia but also in industry. So he was, for example, working for GlaxoWelcome, who is now known as GlaxoSmithKline. And since 2017 he is the head of the clinical bioinformatics area, Fundación Poguesa y Salud, in Sevilla, in Spain. And he's also a promoter of genomics projects such as the project called Future Clinic, which aims to prepare the scenario for the introduction of genome in the electronic health records. His research interests include functional genomics, system biology, and also the development of algorithms and software for the analysis of multi-omics data and particularly its application to precision medicine. He will today talk about mechanistic biomarkers and we're very excited to have you here Joaquin and please the floor is yours. Okay, thank you very much, Adrin, for this nice introduction and thank you to the organizer for allowing me to speak about our work. So yeah, I'm gonna talk about mechanistic biomarker but since I understand that for many of our students that could be a topic a bit away from their backgrounds, I will put the problem in context before or I will try at least. So let me start by explaining you what biomarkers are. So this is a scheme of how medicine works and how medicine will work in the future and how medicine is changing nowadays. So we are, I mean since, I mean I wouldn't say that since the times of of Hippocrates, but probably for more than a couple of centuries, the medicine, the modern medicine has been working in an intuitive way meaning that clinicians which were experts in specific parts of the medicine, they were better or not so good in defining symptoms and the definition of symptoms is something a bit, you know, not a curate in many cases and not objective in many cases. So with the advent of sequencing technologies, we had the opportunity of starting to find a completely different type of markers, not just physiological markers, but what we call biomarker, which are specific mutations in the genome that could be associated to diagnosis, specific diagnostics or to prognostic of the disease or to response to specific drugs, right? So this is a completely different change because no matter how good is the clinician, if you have a biomarker, which is a non-subjective definition of the disease, you will be correctly diagnosed or you will be treated correctly or, I mean, understand correctly in a context, in a probabilistic context because this identification of biomarkers is just a probabilistic fact, right? This is the idea is that as you find more, as you get or you collect more information on patients and you finally find association between a specific mutation and specific response treatment and so forth. And that has meant an important advancement for medicine, right? But these biomarkers are only probabilistic association. It doesn't tell us many things about the biological mechanisms behind. We can imagine a future in which we know as perfectly all the mechanisms behind the diseases, etc. that we will be able of taking decisions or actions based on the knowledge of these mechanisms. Because now we will find somebody with a completely new mutation. We don't have the previous probabilistic identification of this mutation. And probably we are not going to be able of saying anything about this mutation. For at this patient is unknown, right? But in the future, we know how the mechanism works, we will be able of taking decisions, even in situations that we have never seen before, right? That's the idea. So this is type of biomarker that we use nowadays. There are mutations in single genes. What you have on your left here is the page of the Food and Drug Administration where all this approved, all the biomarkers with an approved drug are listed, which are in the range of thousands. These are the sample of how biomarkers contribute to the improvement of the practice of medicine. So this is the improvement in survival since early 17th and last century to the beginning of this century. Meaning that the average survival in cancer that was one in four in the beginning of this century passed to one in two. And now it's 75, but it's around 70% nowadays. And this is only because we know how to stratify patients and how to treat patients more and more efficiently, because we can detect in these patients biomarkers which are associated to success or to probabilistic success in given training, right? So most personalized therapies are based on this type of biomarkers. So this is the optimistic, optimistic view. What happened with biomarkers? Well, I mean, they are okay. And they are okay for, for example, for the diagnosis of rare diseases, because they are highly penetrating diseases with a very clear phenotype and very clear symptomatology. They have been successfully in cancer more or less because the same, but for most complex diseases, like, you know, diabetes, hypertension, many other diseases. The success was not so, so good. The point is that we are looking for probabilistic associations. And we saw yesterday very nice presentation, which you could see probabilistic association of SNPs to rights. And in many cases, this association, I mean, the success of this association, the practical success of this, the part of the phenotype explained by this association is not very high. And what happened also is that in many cases, this association lack the mechanistic anchoring or mechanistic explanation of the fundamental processes responsible for the disease of the therapeutic response. So what happened in more complex diseases, but in general is that the phenotypes are not defined by a single gene, like, for example, in most rare inherited diseases, phenotype is caused by one single gene. But in all cases, this is a combination of mutations of damaged genes, typically in combination also with some environmental reason, is what they call the exposome, which means all these conditions, external conditions to which we are exposed to. So in that case, it's very difficult to find these relationships. But we know that most of the diseases or the complete diseases are caused by this combination of genes. And there are several proofs like, for example, this very famous paper of God in 2007, and which they demonstrated that in many cases, all the genes, genes causing the same disease or similar diseases map together in the protein protein interaction network, for example. So genes tend to to form sub structures, which are responsible for phenotypes. All these what we call these modules, these modules could be just a physical module like could be a protein complex, which is like a mega protein formed by the assembly of different proteins, or they can be just genes cooperating like in the case of metabolic networking, which different genes, different proteins are transforming, metabolizing to all the metabolizers they work like in a factory, right? So when I mentioned that, that that's the problem of using single gene biomarkers that doesn't explain the systems, the interaction between the genes. But I mean, I mentioned that most of the biomarkers are single gene, but this is not completely true. So there are several multi, multi, multi genetic biomarkers are one of the most famous ones is the mama print, which is just a decision support test for cancer, which they with the test of this this biomarker this multi genetic biomarker predicts the probability of metastasis in case of breast cancer, right? So what is interesting that this predictor was was proposed in 2003, as far as I remember, so very many years ago. And by the time in which it was proposed, the function of many of the genes were completely unknown. So nowadays that we know all the function for probably you cannot see here because it's very small, the table is very small, but all the functions of the genes are related to breast cancer. So what this multi genetic biomarkers is capturing is an idea of a module which is active in one or another way. And therefore, this activity will cause this, you have the module in this with this activity, no matter what combination of genes is causing this activity, probably you will develop metastasis. So in that case, we are having a better picture of what the module is, that in the case of a single, a single biomarker, a single gene biomarker. So our idea then was to try to change a little bit the paradigm. And while in the case of mammary, the multi genetic biomarker that was a bottom up definition, so they managed to define the module before to know what the genes do. But something that we can do is to use all the knowledge that we already have on biology to try to define these modules and then to use the modules like biomarkers. Right. So we have two problems here. The first problem is defining these functional modules. And the second problem would be to define the behavior of the modules. So we have different in historically, we have been managing different types of geological knowledge about these modules. The first one was denontology. Denontology is just ontological description of functions. Maybe now that sounds stupid, but if we go back to the beginning of bioinformatics and all this stuff, when the journal bioinformatics was called computer application in biosciences, KDAs and lots of bioinformatics, talking about the early 90s of the last century. No, I'm a young that I look like, right? So, but these times, one of the main problems was that the definition of function, the genes, were just description in human language. There was not an ontology. So it was almost impossible to try to find genes that do the same because in different publications, the definition, we but a human were coincident, but you cannot read with a computer all the definition and put them. So that was a clear advancement. Just the use of a unique ontology for the finishes. But at the end, you can you can think of these terms, like a box of genes, which share a function, right? The problem of using that modules is that you completely lose the information on how the components within the pack relate among them. So the second type of information that was very useful for, you know, why was the interact? So the description of protein, the network of protein protein interactions. Yeah, I mean, it was quite good because in that case, it describes how proteins relate among them. But the problem is that the interact itself has no information on the functional roles of the component. So we don't know. We know, firstly, in the case of the ontology, we know why they are together. In the case of the interact, we know who connects to whom, but we don't know why they are together. But we also have these pathways. So the pathways are a completely different type of description biology. They were inherited by the classical biochemistry, which you draw these diagrams in which you relate genes to other by arrows, by different types of arrows, which means which can for different functionalities that relate one gene to the other. So this is probably the most interesting type of description of functional volume, right? So let me just mention that these pathways, let me show you this slide. There are many different, well, not many, but there are a number of repositories and you can find pathways for the most known part of the pathways. All these repositories have similar, you know, circuits. Maybe the staff, which is discovered more recently, they did virtually, but they tend to be more or less similar, right? So let me explain this type of pathways, describe how genes relate to each other to make the cell do things, right? So we're going to talk about two type of pathways. There are more. Let me just focus on two very important type of pathways. One of them would be a signaling pathways. You can imagine them like electrical circuits. So you can imagine that you are in a, in a big building in which you can just switch on and off switches and things happen. So lights go on and off. You can open doors, you can do things, right? So these switches have an effect on some part of the building, but this effect would be the phenotype of the building. So you can change, you can operate on the building by just switching the switch, right? So this is a good signal for the signaling pathways. So at the end, the signaling pathways receive a stimulus, which is your finger clicking the switches and transmit the signal to the part of the cell in which an action should happen, right? For example, stimulus tell the cell that the cell mass produce more lipids to increase the size of the, of the membrane, for example, whatever happens. So anything in the cell happen because this signaling and the signaling map is quite correct, quite correct in this way, right? So we have these, these repositories and we have all also this all different type of repository which collect the peculiarities of, of the signaling metabolism, etc. In the case of different diseases. So there are these disease maps, right? So we decided that pathways are very good for defining these modules that probably will account for, for diseases. And now we have to decide how to model this, this, this activity. But the first thing that we need is to define what this activity was, this biological activity within a pathway because we don't know which are the elementary actions that trigger a function, right? So at the end, this is like the complete description of an electrical map of the, of the building and you will get completely lost if you try to figure out what is causing what, right? So this is an important part in this type of modeling. This is a very small example. You can go to the gene level, the pathway level, the whole full pathway level or the sub pathway level, right? You will do the gene level, which will be the equivalent to the, to the case of using single gene biomarkers. For example, here, if you think that the activity of the gene is, is relevant, probably is relevant. But it depends very much on the partners which are active, it can lead, it can trigger a signal of survival, or it can trigger a signal of cell death, right? So death activity of this gene alone, out of context, out of context, say nothing about what this pathway, the apoptosis pathway is doing in the cell. Same can be said at the level of pathway. So you say that, okay, I have a lot of genes active here. So the, the apoptosis pathway is active. Yeah, okay. But what the apoptosis pathway is doing is killing the cell or is making the cell survive. So this is important to define. So what is important then is the sub pathway level. So the sub pathway that connects the path of the circuit that receives the signal, the first signal, the stimulus, and until the last protein that triggers the function, right? If you decompose the pathway in this canonical or elementary system, which you will have an idea of the different functions that the pathway can trigger and when this one, when this function are triggered. So you essentially decompose the pathway in the different elementary functions, right? And once we know how to decompose the pathway in this elementary function, we can model it. So this is probably the first, I mean, the first case that I know, maybe there's, there are more before, but this is quite elegant to me. So this is a very simple and small sub pathway, which is the giant K sub pathway. So the inability for of activating this, this pathway, which, which sends the cell to apoptosis is associated with a very bad prognostic in the case of of neuroblastone patients, right? So this is the MECM is the typical mutations in the MEC, in this case, an amplification in the MECM is, is a biomarker of bad prognostic, but they know that many, I mean, a number of patients with a MECM amplification, I mean, don't have a so bad prognostic. And some people with no amplification in MECM have a bad prognostic. So there is something missing there. So if they, when they focus on this sub pathway, they calculated or they estimated the activity of the sub pathway, they discover that the activity of this sub pathway, meaning when, when the pathway was not active, the prognostic was clearly bad. They found a very nice association. The because essentially because the cell could not enter when the cell was clearly neoplastic, the cell realizes that the suicide and suicide, but in that case, the suicide was eliminated, right? So that means that the cancer cell could spread out. So what is interesting here is that the activity of the pathway is more related to the to the prognostic than the activity of any of the genes involved in the pathway. So this is exactly the definition of a system. The system explained more the phenotype in the case that the sum of the parts of the system, right? Oh, what is interesting is that they, they were not measuring the activity of the pathway. They didn't have a way of measuring the activity of the pathway. They infer the activity of the pathway. So this is a construct. This is not a real measurement, but this is a measurement which was inferred from the real measurements of individual genes that by themselves do not explain correctly the phenotype. But when you put them together in a model that explain all the model, they then in that case, they have a better explanation for the phenotype. So this is right. But the problem that making this model with differential equation was problematic when you try to scale up to their system. So even that there is this nice similarity between electrical circuits and signaling. So I mean, there are different ways of, of, of modeling that, but we, we were modeling this, you know, this propagation of the signal through a circuit in a more or less similar way to a circuit, thinking that thinking of genes like resistors. So the idea is that when one of these genes have a high expression, then there is a low resistance to the past to electricity. And if the genes is poorly expressed, there is not much gene, then there is a high resistance to the past of the electricity. This, this is for you, which activates the gene. If the gene is an inhibitor of the gene, then the, the, is the other way around, right? So then in that way, we can model these two types of activities. And you've seen me this, this is a formula. And we just start with a signal one and we see how much of the signal are right to the end. So you apply this formula with this, with this gene, I mean, these are normalized gene expression levels. So we will normalize all of them. We put them in scale from zero to one. And so they are relative value. They are not absolute values. And then we start with a signal one. And for example, the gene, which is expressed as 0.7. So it will allow to pass only 0.7 of the signal. So this is zero one. So it only passes 10% of these 0.7. But when you have more than one, you, you, you sort of join the signal. So at the end, in that case, in this story, example, you have 0.516 here. Yeah, you may think so, 0.516 is a high level of scenery, no, it's a loud level of scenery. What do you think? Probably you don't know, but never do I, right? So we don't know. But this is actually like if I say so, 0.6 is a high level of gene expression or not. Nobody knows. That that values have meaning in the context of a comparison. So 0.5 is a high level of expression. If in your controls, the expression was 0.01. But if you control the expression was 0.9, then it's a low expression. Then you have, you convert, you know, this dimensional values in other dimensional values that you can compare among them, right? So, well, I mean, so we have then this, this type of, of models. So that's a couple of things in these models. What is interesting, there are mechanistic models. That means that the transduction of the signal through the model has a consequence. This is a, you know, this is a causality, there is a causality here. But this is a causality provided by the biology, right? This is not, we were, we had yesterday a very nice talk about causality, but inferring causality. So in that case, we know this causal relationship because the model tell us how the genes relate among them to trigger this functionality, right? Another interesting thing is that this model can predict quantitatively. It's not just a quality, a quantitative prediction. So at the end, what we are doing here is just transforming this measurement that we can do into that has not a clear meaning by themselves into something that has a clear meaning because there are these signal activities what, what the trigger functionality is. So at the end, it's a similar concept of mama print. At the end is a formula by which you can, in that case, if you calculate the risk in that case, you can calculate other cell activities, right? So in a way, we could think of these models like a way of providing high throughput estimation of cell functionalities, meaning that you instead of doing this cumbersome, this cumbersome experiments, and when you go one by one function at a time, trying to figure out how this function, metabolic or whatever, an individual or a number of mice or sayings or whatever is changing, instead of doing all these cumbersome experiments, just by measuring gene expression in one single experiment, you can have a profile of how all the functions are changing within the cell, right? And actually, it's quite interesting because you can map this function to higher level functions. For example, in the case of cancer, you can map specific pathway functionalities to specific cancer hall mice, the specific functionalities which are typical from cancer cells, right? So let me show you the application of these models to several scenarios. Okay, sorry, I have to interrupt you again. This high pitch tone is still here. I'm not sure if you're too close to the microphone maybe? Maybe. Because it got better when you stand up. Okay, now it's good again. Okay, okay. Okay, thank you. Maybe I have to, I tend to move myself. Sometimes I move closer, okay. Here it's okay. Yes, it's good. Thank you. So let me show you this graph here. So this is a function in the cell which is DNA replication. So DNA replication is something which is triggered clearly in the case of cancer, right? DNA replication is triggered from these two pathways, some cycle and p53 through these four sub pathways, right? You take real data, the patients from the TCGA from the cancer anatomine genome project. And you in this, in this, for this patient, you have the expression and you have the survival, right? If you look at the survival of the patient with a high DNA replication or with a low DNA replication, you can see clearly a significant difference in survival. So people with a high DNA replication has clearly, clearly worsen pronostic that the patient with a low activity DNA replication. I mean, which makes sense. What we are doing is just detecting within these patients an activity which is related with cancer activity, right? As before, DNA replication is a construct. It's not we are, we are not measured, we don't have measurements of DNA replication. We are inferring DNA replication activity in the cell from the activity of the genes, given that they are linked in that way, right? So you may see that it could be just by random, right? But you look at other what they are called cancer hormones, you look at other characteristics of cancer that can be measured or can be mapped to cell, to cell activities. You see that apart from DNA replication, for example, cell adhesion is clearly related to metastasis. So individuals with a down regulation with a down with low activity of cell adhesion have clearly, again, a significant bad pronostic with worse than the normal higher activity. For example, positive regulation of angiogenesis. So people who make more blood vessels, who has more angiogenic activity have worse pronostic again. And this is apoptosis. So inhibition of apoptosis. This negative regulation based of cytogram shift from mitochondria is a marker of non-apoptosis, of inhibition of apoptosis. So people with in which we detect a clear inhibition of apoptosis have, again, bad pronostic. So we are just having a nice picture of what is in the cell in terms of functionality and clearly functions related to phenotype, right? Just by modeling these pathways. Like in the case, in the previous case that I showed you, what is interesting to see here is that the relationship of the association of the value in fur value is much better for the activity of the pathway than for the activity of any of the genes composing the pathway. Again, the system is more than the sum of its parts. I mean, it makes sense because it means that what is important is the DNA replication. And the DNA replication can be activated in different ways. So maybe the best gene, this gene which has a very high association, maybe is the most prevalent way, which by means these pathways activated, but not the only one. So the better descriptor of the phenotype, then is the activity of the pathway. So again, something which is interesting is that in different cancers, you see the same pattern of activation of the functions, but couples by different genes, right? So the profile of gene activity that activates the same function is different. For example, this is a breast cancer, this is kidney cancer. And that has very important consequences for, you know, for treatments or clinics. For example, if you decide that the gene is important, because it's very active, and you have a drug that will do any of the gene and you get an inhibition here, and you try this drug which works in kidney cancer, you try this work in breast cancer, probably it will not work because this gene is not especially active here. The activation is done through other pathways, right? We will see that in a more detailed example later. So yeah, okay, you can see many things. You can do a lot of studies, for example, you can detect processes which are more important for cancer initiation, they just increase. This graph shows the different stages in cancer. So the initiation of the cancer more aggressive states. Then in that case, for example, there is an increase in the very beginning and then it's more or less the same. So probably cell cycle has more to do with cancer initiation. For example, cell division has to do with cancer progression because there is a growing trend across all the stages. So you can do a lot of very interesting studies at the level of mechanisms with these type of models. Now you're going to use syrquids for many other things. So you can use syrquids in the context of prediction. So instead of doing predictor with what you are measuring, the type of gene activities, you can transform the gene activities into syrquid activities, which have more clear meaning within the function of the cell. And you can use these profiles of syrquid activity for doing any type of predictors, continuous variable case contrast etc. You can use a training set and then you can do a prediction. Let me show you this paper in which we were able of predicting the dry sensitivity values from the activity of the syrquids. I mean, it looks maybe it doesn't look a very good prediction, but biology is like this. So what you get with genes is not much better. What is interesting here is that you can in the features that the predictor selects, you can have the same time, the level of importance of these features in defining the predictor. So you are sort of learning biology from the data. So you can just see what the predictor considering important for defining the differences between your cases and your control. So you get that way, you can sort of uncover what the mechanism is behind. Okay, so this is an interesting application that we made a few months ago, which is trying to understand what are the gender specific mechanisms that a cancer response to cancer treatment. So it is well known that in some cancer, not all of them, but in some cancer, there are differences in the response between men and women. And it is not that there are some differences in gene expression, but the link between these two, these genes activities and the response to drive was not clear. So I mean, we apply this, this method to try to to interpret the gene expression profiles, in terms of pathway signaling activities, and trying to see how do they account for the different functional activities of the cell, how these activities relate to this cancer hallmarks, right? So this is a scheme of the publication, I'm not going to enter in the details, but these are the different types of cancer here on the left, how through different circuits belonging to different pathways, account for on the right part from account for different cancer hallmarks, for example, of genome instability, mutation, and geogenesis, immortality, evadine signaling, sustaining proliferative signaling, etc. unresisting cell, the cell death, right? So you can see that, for example, cell cycle, cell cycle, cell cycle through apoptosis, I mean, geogenesis are main players here in this, in this, in this game, right? And this is, but this is how different, these are different representation, how different cancer are affected by different gender, gender specific gene expression. So we completed the impact of gene expression on cell signaling, then we transform cell signaling into cell functionalities, and we map these cell functionalities on, on this hallmark. We see that there are different strategies in different cancers, but for example, it looks like if sustaining proliferative signaling is the more important for almost all the cancer, not for all of them. But so this is a very nice section of how different genes affect two different functionalities. So that leads me to another important point that I briefly mentioned before is the relationship between mechanistic models and causality. So I have shown you how we can use the mechanistic models to, to try to understand what is going on within the cell, right? But the important thing that we can use the models for more things like for saying, okay, I have this system, and what would happen if I change this part of the system, I can recalculate the system, inventing a situation that doesn't exist, and then I compare with the original situation. So I can just say, what do the effects in terms of functionality of this intervention? And by intervention, I mean, many types of intervention, knockouts, switching off a gene or switching on a gene, over expression, knockouts, you can just simulate the effect of a drug, and you can do many things, right? So we try to see if it does work, right? And we, for example, we use this real data from survival of cell lines, or cell lines that are, constitute a very simple system. So it's a cancer cell line, which is immortal, the leaves by themselves, and you can do things. So to some extent, you can figure out what happened in the cancer, but it's not exactly a real cancer disease, it's not within a body. So in the case of cell lines, they only survive or die. So they have not a very, you know, they have not a variety of phenotypes. It's not like a cancer within you that can be angiogenesis, poptosis, what poptosis can have in the cells as well, but you can have more functionalities, but in the case of cancer cell lines, so they only survive or not, right? So they did massive knockouts in different cell lines, and they have the results. So we know the original gene expression profiles of these cells, and we know the knockouts, and the result of the knockouts. So what we did was to simulate these knockouts, and to see what happened. And what we discovered is that there are circuits which are positively related or negatively related with survival. We call them onco circuits when they were positively related to survival, and two more supercell circuits where they were negatively related to survival. It's not a matter of chance that many known onco genes and many known tumor supercell genes are within this corresponding circuit. So they are onco genes because they are in onco circuits. And since we are in the case of cell lines, most of them are related to proliferation. And in the case of tumor supercells with anti-proliferation, right? So it was not a true price, but it's nice. So the variation of the model is like here. For example, you have a circuit activity here. You know that these genes one, two and three, one and two were clearly related and three not related. And this is because in that case, you are here or here, you are switching on the circuit. But in that case, the signal, in the case of three, the signal can go through other ways. So you are not switching on the circuit, right? So then you can predict the activity of the circuit. We can predict the activity of the circuit. We can predict what would be the effect of knockouts in the rest of the genes. And what we found is that actually we obtained 75% of correct predictions when we predict what would happen to the cells. Especially when you deactivate a tumor supercells model, you get more activity. And when you deactivate a Hong Kong model, you get lower survival. This is what we observed. So we have this software in which you can you can just simulate this gene activation, activation, whatever. Essentially, this is what you do is you upload a given condition, and you can change by hand the activity value of a given gene, which is your target, you can rise or low or lower the activity you put the activity to zero. There is like a absolute knockout or absolute inhibition. You put this activity to one, it will be all overactivation of the gene. And then you recalculate the system and you compare it with the original system. And you get something like this inhibition. This is the interface of the path after which is now within the path here, right? And you get an effect here in this pathway through these three or four sub pathways, but you get some effects in other pathways because the gene is shared for more than one pathway. So this is what you get. And you can simulate more, you can simulate the activity of a drug. So we know for many drugs, what are the targets, which are imitating or enhancing or whatever. And you can calculate what are the potential inhibition in this specific system. So as I mentioned, when we compare, you remember this slide we were comparing kidney cancer with breast cancer, a drug which could be very active and very useful in one cancer could not be activated in another cancer because the configuration of the gene is different. So you are inhibiting the gene, which is already inhibited, right? This is an example of knockout prediction in this small subpathway. We predict this inhibition of this UPB1, which was not present in the Achilles. So that was a complete prediction that we made. It will kill the cell and actually our collaborator did an example and it was killing the cell, right? So that was a modellization of metabolic pathways. We use a similar, some metabolic pathways almost actually like signaling because what we have is a transformation of products, you know, a different product. So there is not, I mean, the concept, for example, of inhibition doesn't exist here in a similar way. But we could do the similar predictions, right? For example, this is a vision of what's happening in the different metabolites that would be more active or inactive in different types of cancers. So you have here, these are the different cancers, right, cancer types. In columns, you have metabolites. So you have some metabolites which are very, very active in cancers. So for example, what you have here on the right is there are nucleotides. So the production of nucleotides is absolutely, I mean, they are producing metabolites, they are producing, sorry, there's this metabolite that nucleotides. So cancer is producing nucleotides like crazy because these are the bricks of the DNA. So they are reproducing and they are producing DNA. So there are cells producing DNA and lipids for the membrane. This is universal. You have a universal non-production of some metabolites, which are more specific of cells, but not from cancer cells, which make themselves unspecific. And in different cancers, because of the different peculiarities of the cancer, they have different profiles. So you can analyze one by one, right? Again, when you look at the survival of patients according to the metabolites, which were created to be here more or less active, you see clear patterns, clear different patterns of survival, right? So let me just change a little bit and then go to a more specific. We're going to detect now the problem. We are going to see those cells. So now the sequence in technologies has advanced a lot. And we can analyze the cancer at the level, I mean, the cancer, we can analyze the gene expression at the level of seeing those cells. So obviously cancer has a lot of interest because we know that there are a lot of always suspected, but now we know there was a lot of variability at the level within the cells, right? So this is this is an example with gliomastome cells. So when you just analyze the cells in terms of the expression pattern or the corresponding cell activity pattern, you can see different groups, different cluster corresponding to different types of cells. So you have the cancer cells and you have cells around oligodendrocytes, astrocytes, real brain cells and other cells, which are the matrix, right? So you don't have a, you cannot have a sample with a complete purity. You capture some cells around. So when you analyze the cancer cells, they are not similar. And actually you can distinguish three clusters within them more or less. And interestingly, if you transform the profiles of the signaling profiles into functionality, you see that there are, they have completely different strategies. I mean, these two ones have more similar, but there is one who have a completely different strategy, right? Which is interesting because that means that you have within your cancer, different cells. But this is not good. And you will see why. This is not good because when you create, this is a simulated treatment, right? So we just use the drug that we use in the, in glioblastoma, which is the bevacytumar. And when we simulate the treatment with bevacytumar, what we see is that in, for most of the cells, you have a high, I mean, a big effect, right? Meaning that it is interrupting a lot of pathway within the cell. But there is a small percentage of the cell for which this inhibition has not very, has a high effect, right? Then we can go sell my cell and see what happens. And let me show you this, because this is very interesting. This is one of the sub pathways, which is relevant, actually is one of the sub pathways with the drug is active. So this is the best pathway in which, I mean, this drug is bevacytumar is an inhibitor of Bev, right? So what we find here is that long responder cells. So this is, this is, this is the first node that triggers all the signaling through the bev pathway. And in this first, this first node is composed by many genes, meaning that any of the genes want to, there are, I think, there are more than this, but, but the other are more less relevant. So any of the genes can trigger the signal. If the C is this trigger, if the signal is triggered, so there is a lot of proliferation. So you have the cancer active, right? What's happening that in the cells, which are sensitive to the, which are responding to the treatment, you have, this is the red ones, you have a lot of bev. You have other bev, bev A, bev B, right? But what happened with the cells, which are not responding? And then you don't have bev A. You don't even have bev. So you're inhibiting a gene which is not there. So actually you have this PDGFD, which is another gene. So what you are doing is to inhibiting a gene which is not there in these genes. And this is in this cell. And this cell are triggering the signal through all the genes. So we have been able to dissecting why a cell is not responding. What are, what is the mechanism, the molecular mechanism by which this cell is evading the response? So we can try, we are using this for, I mean, in many different scenarios. But this is a proof of how interesting and how these models could be applied in real situation. So let me show you more stuff. So this is something that we have not published yet, but we are on the way. So we get, again, all the results of this massive experiment of Achilles. They do massive knockouts in cell lines. And we have this proliferation score. So we have now in this version, 128 cell lines corresponding to different tissues. And for any of the cell lines, they knock out gene 1, they knock out gene 2, they knock out a lot of these. We know what happened there, right? Then we have these values, right? So what we do is to take all these knockouts convert this to this, not actually to all the cell lines map in the, relate this, the effect of these knockouts and map with, I mean, all the cell lines and map them with the value. And we have this result, we are going to transform these cell line values into profiles of activity. And we have the profile of activity. And we have the big matrix with the, with the, with the values, right? So we have all these knockouts, about 1000 circuits. And we have, we are going to predict a binary value, which is quite unbalanced, because most of the knockouts don't do, don't have effect on the cell, right? Only 10 percent of the knockouts kill the cell. Think about thinking that you have to think that many of these gene are completely irrelevant in a cell line because the account for the function of the cell that's happening in the body, but they are living alone so they don't need these genes. And then we, we use machine learning and ensemble of classifiers with Bayesian hyperparameter optimization that will account with taking account of this, this imbalance, right? In the, in the parameters. So we, we use to measure the, the accuracy, we use a strategy of leaving one cell line out. We call it local, which means crazy in Spanish. And okay, and we will measure we use two strategies that we use, explain, explainable boosting machine and extreme gradient, gradient boosting. That case is a bit more accurate, but we lose a little bit the interpretability. And what we found is a very good prediction. So the prediction is what will happen with the genes in a new cell line, in a cell line which has not been used in the training of the system. So what do you observe is something which is obvious is that for the cases in which we have more, I mean, a bigger training sample, we get better prediction, but still they are quite good predictions, right? This is the rock curve and the precision record, record curve, which is more descriptive of what is happening here because we have a quite generalized, you know, system. So these are the more relevant circuits selected by the predictor. And we have cell cycle and door or the favorite circuit that explain, which is I mean, which was suspected, but explain most of the survival of the cell, right? So this again representation, more intelligent representation of the specific system where you can see that in different cell lines, you have different types of different strategies. And that would be the, this is the number of, this is for example, breast cancer, there are specific cell lines and you have here the number of true positive and false positive among the first genes predicted within any of the cells as what they call essential genes. So seeing genes that you knock out then the cell, they will kill the cell, right? So as you can see, I mean, the prediction is quite good. That's actually for all the genes. There are some systematic errors that you take the first 10 genes, you will kill the cell almost for sure in any of the cases, right? So we can use this concept not only for you for that, but also for studying what is the effect of mutations in complex diseases. So you know what's happening in the case of rare diseases, there are a lot of predictors of loss of function of a gene which are used to as predictors of pathogenicity of a mutation. And this is because in that case, these these diseases which are highly penetrated and are very dependent of only one gene. What happened as we have seen in more complex diseases is that one mutation cannot explain the phenotype. This is because it's an example because that's what we mentioned before. So mutations are within a context, so it depends on the it is not only a matter of the gene is broken, but it is what happened to your neighbors in the factory of the metabolic pathway or your neighbors in the circuit, right? So what we can do is to to trade mutations as perturbations and study what would be the effect of the perturbation in a complex disease and see what what happens. So in that way the model will explain as what mutation is important in the definitional disease and why. So this is diabetes and diabetes we found using public data that these three circuits of these three pathways which has to do with inflammation are very relevant in in in diabetes. So something that we can do is to try to say what happened if we simulate in a normal tissue the mutations of of diabetes. We can do the mutations, we can mutate massively all the genes and see what genes will have an effect on mutation. Interestingly in this pathway for example we see that these three genes when you mutate in the this is the control when you mutate other genes you get different pattern when you mutate this specific specific three genes have exactly the same effect on the pathway that we observe in diabetes. So interestingly these three genes are I mean this this is this is the the the concept that people use in the case of rare diseases. So mutations should be should occur in in in a low percentage of the population because rare diseases occur in a low percentage of populations. Some mutations are happening in a high proportion of the population. They are probably not this is mutation but in the case of of diabetes or complex diseases which are more prevalent is not is not the case. So we find here our mutation that has a strong effect a very similar effect but at the end there are there are quite I mean prevalent in the population. So we are discovering mutations that are clearly adapted to the to the apply that clearly fit to the to the concept of of disease right. So let me let me show you what's next there for entering in the last part of the talk. So we depend on the part on the of the models and the model depend on the of the pathway. So currently only one third of genome can be model and this model is represented by these arrows that connect gene to gene and define the the functionalities. And the generation of this knowledge is a slow process because you know this generation requires some years of laboratory work and it's a testing hypothesis on particular nations between genes and molecules etc. So the idea is impossible to use machine learning to generate biological knowledge from data. This is something in which we are very interested now. So what happened is that in the case in which they are generating this knowledge the scenario is completely different than in our case. So we are talking about very few variables and lots of samples but now in our case we have lots of variables and we don't have so many samples so we have a problem of dimensionality in many of our scenarios. So what we can do and we try to do is to try to learn this knowledge from the data to reducing this course of dimensionality. So the idea would be to learn everything in one shot but to try to use the already known path to the functionality to make new links from other proteins around. So in that case we reduce the number of potential causal relationship and we can use other more sophisticated models for causality like we saw yesterday or use experimental variations. So let me show you a practical application. This is a project that was funded by Fundación de la UVEA which we did systematic drug repulsion in red diseases. So we took the idea to take the disease map of red diseases and to try to link other proteins outside of the map to part of this map, part of this path. So just mention that I mean repurposing in red diseases is important because companies are not going to invest in new drugs so it's important that we can find these. So at the end is to try to use all this information, gene expression across many organs and to try to say okay we have this in any organ we have the we calculate this activity pathway of the activities of the disease map and we have also the activities of other genes which belong to targets of all the drugs and we can try to predict the activity of this of the disease from this target. So when we did this prediction the most relevant targets in this case of the published recently for example in Fanconia anemia we found a number of of targets which were evaluated later. So what is interesting here is that we can use machine learning to help to generate biological knowledge in what we call an industrial manner. So this is the case of Fanconia anemia one of the targets was EGFR epidermal growth factor receptor gene and two of the drugs that we predicted were to be active were demonstrated to be active by a collaborative group. So we applied a similar concept to COVID we modeled the mechanisms of entry of immune response etc and we found several a number of drugs as many of them were in drier some of them are not used but we know for some of that chlorogine in theory is effective but in practice it affects too many pathways so we can see that even if it is in theory active in practice could have a lot of secondary effects right. So some of the drugs that were in trial were predicted and all the more recent drugs that target the most relevant proteins here that were like that were interleukinetagonist, jekinimeters etc were also predicted by the system. So we have a version of the epatia model here in the epatia papillomis COVID that you can have a look at then and just my last or almost last slide there is another site that we are exploring now also that are this intermittent neural network which is to code the idea is to code the biological knowledge into neural network architecture so some of the layers should represent the real connection but known connection between the genes in terms here in terms of pathway activity in terms of regulation or in terms of protein-protein interaction and then we can see what are the relationships between these biological based relationships. So this is our work and this is doing now Pelin and Pelin our student in the ITN Pelin will develop a little bit more later and this work is being supervised also by Carlos Luzer and Isabel de Pomofeno. And just to finish to mention that because of this type of modeling and this type of application we probably would be a step closer to the precision medicine era in which we can take decisions and action based on knowledge. And just to mention that we have all this software available that you can use the models can be used in a very nice way through a web interface but there is a bi-conductor and a cytoscape application. And well I mean that's more resolved so thank you very much and I'm happy of taking questions you have. Thank you very much Joaquin for this very interesting talk and are there any questions from the consortium on Zoom? Of course not at the moment. Oh there is one from Giovanni Bisoda. Hi hello thank you for the talk first of all it was really interesting. What I wanted to ask is you mentioned that there is a lot of variability between the cells within a cancer tumor. Is there a way to leverage this variability to use it to increase our understanding on how these pathways are affected or is it just an obstacle to be overcome? Well I mean both. Both are true. So we can use that to understand how pathways are responsible for these different behaviors. At the same time it's a real problem as you could see because if some of your cells I mean what happened there you remember the slide in which we have this different activity of the drug and the cell. So what happened is that you typically kill most of the cell so you are free of cancer at least phenotypically free of cancer. And then what happened is that in some of the I mean some months you start to develop cancer again and then you get the treatment again and the cells are resistant as we say okay the cells became resistant. No, it's not true. What happened is that the non-resistant cells that were the active clones were killed and the silly clone which was very inefficient and was at very low level but was resistant was there. Then the silly clone without any other competition then spreads out and when you trade it again then the clone is resistant. So what we should do is to re-sequence again the cancer and to see what is the drug that we can use with this clone. So probably in that case we could be able of fighting more efficiently the cancer. Obviously we don't have drugs for all the solutions. This is something that will happen in the future but we still have a very nice arsenal of drugs and probably we couldn't manage many of these situations instead of just saying okay we can do nothing because the the conventional treatment is not working but maybe another treatment could work. Thank you. There are some questions on Slido. Tom wants to know do pathway repositories get updated automatically with new publications or do they get curated by humans or both? Yes I mean it depends on their repository but typically there they are exploring automatically the the literature and then the humans sort of supervised the information that they put inside but in general they are quite well curated. Okay thank you. There is another question by Tom and he asks is overfitting a big problem to tackle when trying to develop multi-genic biomarkers? I mean yes it could be a serious problem. What we try to do is to reduce the dimensionality of the problem by on one hand trying to use biological information so we just typically try to exclude genes which for sure would not be related but this is dangerous because in that case you probably will never discover new things but you can use also data to try to say okay the genes never co-express probably with the this is what we did in the last examples of the reposition so what we try to see is what genes could be used to predict the activity of the pathways but in that case we're reduced to genes which were reduced for us because they were already targets for all the drives so I mean you can reduce it in different ways but if you want to discover from scratch it's a problem yeah okay um there's one more question also by Tom um he says great talk he has another question so in GO how is the linking between the supercategories and the subcategories done? What uh I don't understand the question what in GO? In GO how is the linking between supercategories and subcategories done? In GO yes I mean uh well I mean there is an description of these categories and supercategories so what the question is that he basically asked uh how the linking between supercategories and subcategories is done? Uh well I mean it has been done by curators so it's um in some cases I mean you go one by one in some cases it doesn't make much much sense in some cases but this is is always in you know in redefinition but here I mean they have been defined by curators okay thank you um are there any other questions also from the consortium it doesn't look like there are any other questions so thanks a lot Joachim you get a round of virtual applause for your talk thanks a lot for your talk