 Our first speaker of the day, it's Magnus Fontes from the Institute of Rock and Glue Core. It's a great pleasure to have Magnus here. He is a PI in our network, as I mentioned before. He is an extremely successful colleague on all various levels, both as an academic. He held professorships in Lund at Lund University and in Copenhagen, both as a founder. He founded Glue Core, one company that is part of our network. And he held influential and important roles in industry in Genentech and the Institute of Rock that I already mentioned where he is since last year, even the general manager of this important unit in in Rock. Magnus is an expert in bioinformatics, machine learning and mathematics, in particular in the analysis and the visualization of high dimensional data. That's also one focus of Glue Core, the company that is part of our ITN. I'm really happy to have Magnus here and to learn more from him about his work and research now and then in the next one and a half hours. So, Magnus, the floor is yours and we are excited to listen to your talk. Thank you so much for these very, very kind words, Kerstin. I'm super happy to have this opportunity, so I thank you and all of the organizers as well as the attendants for this opportunity to present. So, my talk is called statistical learning and visualization, defining and looking at the cancer immunity state space, but actually I will take it in the other way around, I will start by trying to describe what we are trying to solve within cancer immunity. And I think it is an area where the skills and expertise that you in the network here have, and in particular the young researchers that is attending have a great opportunity to contribute and make a real difference. So I will try to describe the kind of make a problem formulation and invite all of you to help out and work with us to see what we can do in this area. These are my disclosures and affiliations, Kerstin mentioned them. And I would like to actually start with the coronavirus. So you can hardly open a journal today or a newspaper or look at television without hearing about mathematics and immunology. And that is that is quite a change, I would say from from from before the pandemic, the pandemic has of course changed a lot of things for us. I guess this is the reason why we are not together in Basel at the moment, some of us at least. But I also think it points out the importance of inference from data. So we are flooded by by data. As I said you open any newspaper like the New York Times here or or some other paper, and you get data, you get learnings from that data. And I think it points to to to the importance of of of actually trying to understand the data you're looking at. Is it biased. Are there outlayers in this in this data, and what kind of robust inference can we actually draw from the data. I just want to highlight one fellow sweet Hans Rusling that unfortunately passed away a couple of years ago, but he was the founder of GapMinder. And I think at least in my in my view, the, the main message that he wanted to pass on, this is at least my interpretation is exactly that you should, you should be aware of bias in the data of outlayers, and you should try to draw robust inference and insights from the data and and and really learn from the data you're you're looking at. So let me start with with an example here. So this is a study that appeared very recently in science advances I was, I was fortunate and lucky to to be involved in this together with some of my former colleagues from the Institute Pasteur in Paris. And I want to highlight this, just to show you that it's not only drawing robust inferences from data but also using the data sometimes in a little bit surprising directions maybe or or or or so I mean. This study was was made on a couple of cohorts so a cohort of melanoma patients in in in France that underwent a different kind of therapies for their melanoma. They also had a healthy control cohort. And the goal of the study was, was to see if so called checkpoint inhibition was beneficial during COVID-19 infection, because a part of these melanoma patients. They received so called checkpoint inhibition and I will come back to exactly what that is or they received different types of immunotherapy. And of course, some of these melanoma patients also got COVID-19. And in this way we could then try to infer the effects of checkpoint inhibition on COVID-19 infection. So, the list you have to the left here shows you the different therapies that these patients received, and you see immunotherapy, you see anti PD one only anti PD one anti CTLA for etc. So it, this also highlights that the therapy that you receive when you have melanoma for instance but also many other cancer indications have changed drastically over the last 10 years. So in 2010, when Siddhartha's book that I guess some of you have read the Emperor of all melodies, it was published checkpoint inhibition therapies did not exist yet. They were not available. And actually, Siddhartha doesn't even explicitly address immunotherapy in his book, but already in 2011, the first checkpoint inhibition therapy was approved. And then just a year after another checkpoint inhibition therapy was approved and for for the work behind those approvals. And Honjo received jointly the Nobel Prize in 2018. And I can recommend, at least I thought the breakthrough immunotherapy and the race to cure cancer is a nice description of what has happened over the last 10 years it's a popular book that I think you might enjoy. So what is checkpoint inhibition? Well, what did when it when it when a T cell tries to recognize non self and and do something about it in in in our bodies. There are multiple signals that has to happen for a T cell to kill a tumor cell or kill a cell that is infected by a virus or some other pathogen. And actually the immune system, as I guess, many of you know is a quite complex system to study with many feedback loops. Here you can see the antigen presenting cells part of the immune system, they can be dendritic cells or other antigen presenting cells on their surface they carry peptide MHC complex that basically describe the antigen that they have picked up and found. And that antigen is recognized by specific T cell receptor TCR. And that is one signal that has to happen, then there are also co stimulatory signals that have to happen for a T cell to take action. And on top of this, the T cell is also covered with other protein transmembrane protein receptors like CTLA4 and PD1. And those are so called immune checkpoints and this was the discovery of those and how they could put a break on an immune response what was what led to the to the Nobel Prize I just mentioned. So if a PD1 or a CTLA4 receptor on the surface of a T cell interacts with its ligand, so the PD1 ligand is PDL1 there are also another ligand called PDL2 etc. But if that happens, if it binds, that leads to that proliferation and effector functions and survival is inhibited. So the T cell does not do its job of killing off the cell. The cell presents PDL1 and shows that do not kill me. I'm a normal cell and this is a way that tumor cells also can avoid attacking by the immune system and the killing. Now, of course, this checkpoint inhibition is just one type of immune therapy. There are many immune therapies and you can go to the Cancer Research Institute website and have a lot of descriptions of the different types of therapies that are being explored. These therapies are then explored in order to attack or cure cancer of different types and cancer is of course a group of diseases that is highly diverse, very complex. What they all have in common is that there are some mutations in somatic cells, cancer cells that leads to proliferation, abnormal proliferation, so the cells start to divide and grow and form a tumor. Whether that is in blood, like in different types of leukemia or for a solid tumor somewhere in the body. You see here a diagram, so on the y-axis you have basically the number of mutations and then on the x-axis you see different types of cancers. As you can see here, there are some cancers where the cancer cells have a lot of mutations or highly mutated, many different types of mutations and those cancers are colorectal cancer, lung and melanoma. Of course, these cancers, what they have in common is that they occur in organs that face the external world. The colorectal cancers of course face everything that we are eating or drinking or the colorectal system face what we are eating and drinking. Lung, of course, everything that we breathe, whether that is smoke or fresh air, makes a huge difference. Melanoma, of course, occurs in skin that is in direct contact with the surroundings, like sunlight and others. As we know, a lot of different things are highly mutagenic and this leads to these high tumor mutational burden in colorectal lung and melanoma. This is also some of the cancers where we have been most successful with immunotherapies. I would like to describe to you a little bit something called the cancer immunity cycle. This was published or invented by Dan Chen and Ira Melman from Genentech and they published on it, I think it was in 2013, yes. So what happens when a tumor starts to grow? Well, normally when you have mutated cells that start to divide the immune system in our body should recognize this and take care of it and kill off those tumor cells. And how does that happen? Well, there is release of so-called cancer cell antigens. So these are small peptides that correspond to the mutated part of a gene in the tumor cells. So they should be recognized as known cells. These peptide parts are picked up by antigen presenting cells like dendritic cells. This is step two here. These dendritic or antigen presenting cells then travel to the lymphatic system to a lymph node where they prime and activate T cells. So T cells recognize their cognate antigen and have a monoclonal expansion and then those expanded T cells, they travel through blood to tumors and hopefully they then infiltrate into the tumors. They recognize the cancer cells by their T cell receptors and they kill the cancer cells. So this is what should happen and what we want to promote if it doesn't happen. And here you have a lot of known stimulatory factors spelled out as well as some inhibitory factors that we know about. So this is in green and in red, correspondingly. Dan and Ira then published another paper a few years later where they highlighted something they called the cancer immune set point. So why the cancer immunity cycle doesn't become a good cycle with a result of killing of tumor cells by T cells. It might be that simply the stimulatory factors minus the inhibitory factors. So if you look at this formula, this is actually a formula from their article elements of cancer immunity and cancer immune set point. So it says basically that the sum of stimulatory immune stimulatory factors minus the sum of inhibitory factors is it has to be greater than the threshold for this cancer immunity cycle to spin around and for the tumor to be killed off by T cells. That is of course very vague. And I would say that part of the work that we are trying to do now and I say we as a global community of of modelers and cancer immunologists. What we are trying to understand and make sense of something like the cancer immune set point. I would say that's at least how I think about this being a mathematician is that there is a state space. The cancer immunity state space is a very conceptual thing that as soon as we start to measure things. We can use those measurements to try to have some approximation realization of this imaginary cancer immunity state space. And we measure a lot of things and I will come back to exactly what we measure but genetic epigenetic environmental factors across many different scales in cancer. So the concept here is that of course that all the things that we measure there are normally relations between those things whether they are 25,000 different genes that we have measured. And that makes up 25,000 dimensional space and of course any point in this space cannot be attained by a living organism or human. There are simply certain parts of this measure space that that are viable where you can actually sustain life. And as you all know, as soon as we have functional relations between the underlying variables, this actually confines the samples to a lower dimensional space whether that is can be thought of as a manifold or some more complex subset of the space we are measuring. Then obviously I mean a lot of the measurements we do on patients or on model organisms etc. We try to assess fitness or health status of the individuals and those fitness measurements can be used to assess where in the state space you are sitting at the moment and if you if you study cancer and cancer immunity and you have measurements on patients. There will be domains of this space that are more prone to correspond to a healthy state and some parts of the space that are prone to have harbour more non healthy states. And the goal of treatment is always to try to push the patient from a non healthy position into a more healthy position. And here of course everything that I've been talking about so far is actually dynamical processes and most of the measurements we have at the moment are static and I will come back to that as well. This is just to give you an idea of this is from the cancer research Institute website. And it's to give you an idea of the of the immunocology development space here you see some of the therapies that are being pursued and you see that there are thousands of different types of immuno therapies in treating cancer. And it has grown enormously over over the years and it's still we are still in a very quick expansion phase. You see here different clinical trials phase one phase two phase three approved etc. And you see also there is a lot in preclinical so these are the molecules and targets that are pursued in in animal models etc. So currently at least there are over 3000 immunotherapy combination trials ongoing with involving more than half a million patients. And every such trial you can actually from a being a mathematician. I look at this as actually perturbation experiments on the immune system, trying to understand from perturbing the immune system how the immune system actually works and how it interacts with with a cancer tumor. This is a Vidley, which is a clinical research data sharing platform that has a lot of clinical trials and here you can request to access clinical trial data, and you can search clinical trials and make suggestions of what type of research you would like or what type of research questions you would like to address by accessing this data. So this is an at least an effort to to make available some of all this clinical trial data for for research. Another rich. I would say a source of data is of course the cancer gene genome Atlas TCGA that you all know about you can access all these data sets through the genomic data commons data portal. It also has a lot of valuable data and also valuable insights, you can explore the TCGA data through for instance the Atlas Explorer. There's also a lot of different options directly from the from the portal. If you want to learn about what has been concluded so far from the TCGA data, I would maybe start with looking at the, the, the pan cancer Atlas website from cell cell press where they have selected a lot of articles that give an overview of what has been learned so far through the TCGA project. Obviously there, there is a lot more information outside of this that this could be a good entry point if you if you if you're starting. So let me now go a little bit and give an example. So, I have looked a little bit at data set that is available at website down to the right here so you can access this data, as well as are based analytics and you can interact with the data. And this is the data that I have been looking at a little bit just to give an example of what you can learn and and how you can start to have an idea of this cancer immunity state space. So this data is a phase two trial. We have the specific data I'm looking here at here is bulk mRNA expression from the tumor macro environment at baseline. And the treatment here is blocking PDL one with at least a laser map in metastatic bladder cancer. So let us have a first look at this data. So, here we have 300 patients. We have 28,204 variables or genes. This makes up a huge matrix sample matrix samples times variables. And what I did here was simply a principal component analysis plot. So just projecting samples down to 3D from 28,204 dimensions down to three dimensions, keeping as much of the variance in the samples as possible in this projection so that that is PCA. And then I have colored them according to an annotation that is response response in solid tumors is normally measured by something we call resist 1.1, which is a protocol for for deciding response. So we have complete response partial response stable disease and progressive disease. And this is measured simply by imaging. So X ray or other imaging techniques, and we basically just look at our is the tumor growing is it shrinking is it disappearing. Complete response here is green progressive disease is is is read and then there is a scale in between. And maybe you can agree with me when you look at this spinning PCA here 3D that captures 19% of the variance in the data which is fairly okay. But maybe there are some domains that are more favorable where you have more green patients. So, so patients that actually respond to this checkpoint inhibition, and maybe there are some domains that contain more bad patients. But then you would like to filter away some of the variables that that contain very little variance here and see if we can get a clearer picture. So how do you do this, how do you filter away in an objective way filter away some of the noise. And this is a publication, 10 years ago that I wrote with pilot sonar song which is at the Swiss bioinformatics Institute. Currently, and what we did was that we looked at basically the so so you have alpha two here and lambda k are the singular values. And are is the rank of the sample matrix. So if you look at the denominator here in the alpha two expression here that corresponds to the total variance in the data set. And then the numerator is basically how much of the variance you capture in in in a certain subspace of corresponding to singular values. So we look at this the signal we capture and the total signal. And then we compared what we had in our data with the expected value when you would have from random data. And then we plotted started plotting this and the amazing thing was that we got plots like what you see in the middle of the picture here namely you have a clear optimum, meaning that when you start filtering away. noise. You did the difference between what you actually capture in your data set in terms of of the noise to signal as compared to what you would have in random data has a clear optimum. And actually when I looked at this projection score, which is the name we gave this. I optimized over our invigorate to 10 data, and there was a clear maximum for seven dimensions. And you also see this on this screen plot that it makes sense also from just inspecting the the screen plot. So then I did a prediction score optimization in seven dimensions. And then I created a graph. Connecting the two nearest neighbors in this case and then I did multi dimensional scaling on the resulting distance matrix so this corresponds to the isomap algorithm. And this is what you see here. Now maybe you can agree with me that there are some domains that contain mainly red points here so so progressive or stable disease and some domains that contain more a mixture of green and red and there might even be some domains here that mainly contain green points so there are some responders down to the right. I have colored according to something called the lund or TCGA taxonomy. So this taxonomy is based on immunohistochemistry plus MRNA expression and gives different sub populations of metastatic bladder cancer and as you can see. The operations we did with the prediction score optimization followed by isomap embedding pretty much captures those subgroups that are based on immunohistochemistry and MRNA expression. So this is the reference for TCGA or Lund taxonomy. What I then did with this with this data was that I performed a rank regression with respect to response. And of course, I adjusted before the false discovery rate so the adjusted p value cutoff I put to 0.05 and that cutoff resulted in 50 genes. And then I here did a PCA by plot where I can see the distinct variable clusters that are driving the signal so in the plot here to the left you have the samples to the right, you have those 50 variables. And this is a synchronized by plot. So the samples that find themselves in the same direction as variables here. That means that those variables are highly expressed in those samples. And as you can see here I also colored the variables to the right according to the mean expression in the complete response group. And as you can see there are a few clusters of variables. And when I did a database search so I basically did a gene set enrichment analysis. I use G profiler that you can find down to the left. This is a website where you can go and you run several different enrichment programs. And what what they all found was that this corresponded to to cell cycle up regulation, as well as interferon gamma signaling so interferon to signaling and from gamma and CXCR three chemokine receptor binding this is what what what was particular about the response, the response group. So the solution here is maybe that checkpoint innovation and invigor to 10 is highly. So the response to this treatment is highly correlated with CXCL nine expression so to the left you see the discriminatory genes, the top 15 ordered by Q value or adjusted p value so the, the, the, the interaction with responses in CXCL nine and CXCL nine is chemokine that is actually induced by interferon gamma, and it recruits T cells and NK cells, etc. So, so it's an, it's a, it's a T cell recruitment molecule. It's also true for the CXCL 10 closely related to CXCL nine, and they bind to this CXCR three that we saw on on on the former slide. You also see CXCL 13 here, which is actually a B cell, more a B cell recruitment chemokine. So, this is a study also fairly recently from February 2021 by Charles Swanton and others where they did a meta analysis across many different checkpoint inhibition treated cases and they found biomarkers that were to mere marginal burden, as well as CXCL nine and CXCL 13 expression these are the strongest predictors overall for checkpoint inhibition response. So that corresponds to T and B cell chemokines. So when I started digging into the data a little bit more and looked at signals that were more orthogonal in the in the cancer immunity state space here. I found that the one of the strongest signals was was an apple back signature. So here to the left, you see, again the, the data colored according to response, and to the upper right. You see the expression that colored according to CXCL nine expression and down to the bottom right, I colored according to Apple back So one of the apple back protein signals. And as you see both CXCL nine and apple back drive the response signal in slightly or almost orthogonal here when you project down to 2D an optimal projection down to 2D. So what is this apple back. Well, it's, it's actually a family of proteins that people have noted over the last years and and and highlighted as to potentially predictive marker for immunotherapy response. I can recommend this review of apple back in a recent recently published a little bit over a month ago, where they point out the roles of different apple bags. So, so, actually, as it said here they orchestrate a wide array of genomic and epigenomic modifications. So affecting cellular functions, and it has to do with immune editing DNA damage response methylation gene expression and homeostasis tissue home and homeostasis. So it is an interesting potential target. And if I look at the overall survival in the invigorate to 10 data so this is a Kaplan Meyer plot, you see in in red here are the CXCL nine high, I have a median cut off. And the green is is CXCL low so below the median. And as you can see you basically you do better if you are CXCL high at baseline remember so these are bulk expression values from the tumor micro environment at baseline. Then I look at here at Kaplan Meyer plots for apple back 3B to the left using a median split and you see basically the same thing as you saw for CXCL nine the T cell attractor chemokine that you do better if you are apple back high. So this is the red apple back high and apple back low is the green median split again. Then I actually took only the CXCL nine high group. So this is to the right. And, and, and only for these patients. I, I looked at apple back high versus apple back low and here you see that actually you do even better if you both CXCL nine high and apple back high. So when I looked also at the correlation between CXCL nine and apple back 3B they are fairly weekly correlated as expected so they really pick up two different parts of the response signal. This was just an example of data that you can access immediately and start to interact with and try to draw new conclusions like I did here. Of course, this was bulk MRNA expression from the tumor micro environment. And now we are trying to do much more and this is actually where I think there are a lot of opportunities for you as machine learning community AI community, mathematical modeling community to make a difference and work with cancer. Immunity researchers and try to better understand the cancer immunity state space. I think you can do a lot by integrating things like systems biology approaches maybe modeling the dynamical networks that that that take place between different chemokines different cells the crosstalk between different cells in the tumor micro environment. We are now getting a lot of single cell data from tumors and we are even starting to generate longitudinal single cell data. This will possibly make it possible to make more holistic models where we actually look at the cell cellular crosstalk in the tumor micro environment and as well as the dynamics of what is happening in the tumor micro environment. Obviously, we would like to predict what is happening in the tumor micro environment from less invasive sampling through peripheral blood so really looking at, for instance, T cell receptor repertoire and B cell receptor repertoire in blood over time. And from that try to infer what is happening in the tumor micro environment as well as picking up things like CT DNA so so tumor DNA that is circulating in the blood. And, and obviously we would, we would love to have more data connecting blood, the tumor micro environment and the lymph system. So, now I will come to a presentation of method that we are, we are developing. So, this method is what we call principle moment analysis so this is work together with Rasmus Henningson, and myself, and here you see basically a so called PMA plot to the right we the data is from gene expression omnibus it's a mouse dendritic cells that were stimulated by different stimuli. And then we have a series of time points and what I did here was basically creating the, the time trajectories. And I will, I will describe exactly what, or at least give an overview of what this method does, but it's openly available on this web page so if you go there you can find the paper that we put out archive last year, as well as a Julia implementation and also a PMA app that you can start to to apply to to data. So what what is this. So the general statistical setup here is, is that given a stochastic vector that take values takes values in a in a Hilbert space on on some. So defined on a sample space with the with the probability distribution we look at the push forward measure of this of this probability measure. And then we want to understand you. So the pushed forward measure here through sampling. So the PMA is based so this is very informal. It's it's really just giving you the flavor, you can look at the archive paper for for some mathematical details. So this is based on simply a single value decomposition of what I call the sampling operators so this is the operator that you see here you integrate over the Hilbert space and you take, take the position X, and you have some density function you and you have your underlying number mu. And of course you can look at this informal definition of this of this operator, you can look at it as a mapping from continuous function with compact support on on on the Hilbert space to the Hilbert space itself, or you could look under suitable conditions on the measure etc as an operator from L to on the Hilbert space with the measure mu to the Hilbert space itself. And this is, I mean so if, if mu happens to have finite mass and also a finite second moment, then this operator team you is actually a compact operator you can show that and this is in our paper. And the the dual in that case is of course the res mapping. So mapping a point in X to to its corresponding. Continuous linear functional. Directly from estimating the norm informally here, you see that you can have different norms so if you look at team you as an operator from compact continuous functions to the to the Hilbert space, you get the the the maximum of the mean expectation of the mean of the expectation of projection measure. And if you look at it that's an L to H mu to H operator, then you get this projection squared and the mean value of that as as as norms. And if you look at those they are of course invariant under the orthogonal groups are missing here orthogonal groups. And, and as you all know I guess the singular value decomposition is quite stable so this is from a nice paper by by Stuart theory perturbation theory for singular value decomposition from 1990. And as he points out here, the basic perturbation bounds for singular values are that they are stable under small perturbations of the underlying operators so he here is the perturbation. And you have the these nice theorems by veil and mercy, telling us that small errors in measurement etc does not impact hugely the singular values that we, we are looking at. So, the PMA framework is actually a framework where you can work both with the measure new the lower dimensional projections of new as well as approximations based on sampling of new and projections of these approximate measures. So we have an implementation as I said with this app through our webpage where you where we use simplices to approximate the, the, the measure we are looking for. Actually, we used the, to assess the, the intrinsic local dimension. We used a dimension estimation technique from a paper, a few years ago, together with just in your son and again, hello son is on. Which is a very quick and fast and and also very accurate dimension estimator also based on the skewness of a simplex and it's actually closely connected to this norm of expected projection. The mean value of the of the expected projection. I just highlight that some of the take home messages here on PMA is that it's it's very fast, even though we approximate the underlying measure with, for instance, locally uniformly. The uniform house dwarf measures on on on simplices and the simplices can have different dimensions in different parts of the sample space. It is fast. The number of degrees of freedom does not grow so it's as fast as principal component analysis. It's, it's more robust than PCA, because it is basically equivalent to to creating locally creating an infinite number of separate samples and and smearing out the distribution over, over these simplices. It is statistically and conceptually sound. And, very importantly, it's possible to supervise using annotation information and expert knowledge so this is where the integrative part comes in. You know, for instance, as I showed in in the initial slide that points are connected through time so you can actually see those different simulations. Over time, you can build simplices that correspond to this time annotation and and and basically creates though an infinite number of several samples indicating what you could look at us as a visualization of pseudo time. There are some other work we've done something we call the SMS singular value decomposition we published a few years ago that I actually used to pick up the alphabet signal in the data we looked at before. This generalizes to this new setting. The preaction score also generalizes to this new setting and as many of you have certainly realized a kernel version of PMA is also immediately comes to mind so you can use also nonlinear kernels here. Yeah, I would just like to finish this talk by addressing some general systems immunology questions. So, so this is from a paper by Peter Brudin from Karolinska in Sweden Stockholm. This is a researcher working in systems immunology and and really have worked with different age groups and his conclusions are basically those four bullet points here. I mean there is much, much more of obviously to what he has done but those highlight a few things that he wanted to point out. Human immune systems are relatively stable within individuals, but incredibly variable between individuals and this is also something we've seen in the middle year and three or work that we've been doing at the Institute pastor and that I have been fortunate to be to be a part of induced responses to pathogen diff different markedly among different age groups and they are unique to different kinds of stimuli. Functional gene expression responses to common pathogens differ broadly across different age groups and the immune cell composition. Also changes over the course of life. So these are things that we would like to understand in more detail. We hope to do this by longitudinal sampling, and we hope to do it by longitudinally sampling and sequencing single cells, so that we really get a picture of how cell different cell populations are growing and increasing over time and and the cross talk between different cell populations whether that is in tumor or healthy tissue or in blood. Some general directions where I invite you to to work with us and and and and try to discover more is exactly these longitudinal sampling experiments where I see a huge potential for systems biology approaches really building ODE or more complex models describing the time dynamics that that we see and integrate that with statistical learning or machine learning approaches that that could highlight the features that are robust and that could be predictive and also provide us with new targets to to to help patients. So, I think that. So this is my last slide and and I would say that immunotherapy in cancer, you can basically say that very in a very general way, and informal and not precise but still you can say that around one third of patients in some indications they do respond. And they actually get cured using different types of immunotherapy, for instance, checkpoint inhibition in in melanoma or in lung cancer. And these patients did not have any option before checkpoint inhibition so it's part of these metastatic and highly aggressive cancers that patients did not really have a treatment alternative before now they have. And the round one third of them respond and actually get cured and this I think is the first time you can really say that you cure cancer before we could do surgery. So we could cut away the cancer and that was sometimes highly successful and still is a very important part of of cancer treatment. You could burn the cancer by radio therapy. Also, very useful and technique and treatment, and you could poison the tumor with different types of chemo therapy that the basic idea there is that you hit cells that are rapidly dividing and cancer cells are doing that but then you get the adverse events that are connected with all parts of the body or organs where you have a rapid division of cells like like hair or or skin or intestinals. But now we have this immunotherapy where that actually helps our own immune system to do the job it is it is there to do namely eradicate take care of the cancer cells and kill kill them. So one third of the patient they do respond and get cured one third respond but then the cancer comes back and for maybe one third. It doesn't help at all. And we don't really we start to try to understand what discriminate those groups and start to try to understand what we can do for the groups that do not respond. But it's still a long way to go and I'm very convinced that whether you want to call it the cancer immunity state space or not but really to understand the dynamics of patients under treatment in this in this complex cancer immunity state space will be key to finding new ways to to push patients back to to a healthy position. And here I am again convinced that we need to work together as a community. So computational scientists working together in close collaborations with biomedical bench scientists, the scientists and this this actually has to, at least from my perspective I think we need a cultural change on in in all groups that are so really computational scientists that are willing to learn and really interact with biomedical scientists like you by just being part of this network show that you, you are. And and really then collaborate and share insights as well as data and just of course one last word before I finish is that it is of course extremely important to realize that when you work with clinical data. It's our patient data. There are a lot of ethical and legal restrictions that we need to think very carefully about and we need to work with patient organizations and and of course directly with patients themselves to to make sure that both that we are using the data in an ethical and legally correct way, but also that we actually tried to to to make as much use of the data as possible to to help patients going forward I think that is a moral obligation for those of us who engage in this type of research. This was my last slide, and I'm super happy to to take questions. Thank you very much, Magnus we send a round of applause to you for this, this great talk. Now, we have, we have time for questions, both here live in the zoom channel and on Slido the in fact there is one on Slido may and maybe I'll start with that. The next person asking the question says thank you for the talk, could you expand a bit more on the intuition behind PMA. How do we interpret the results. Yes. So, the, the basic idea is, you can, I mean, you can already see it if you if you say that you're working in, in, in a in a high dimensional space, and you are giving a few sample points. The under. Then you could say where, let's say I have two, two patients in space, where would my next patient fall. So the best guess maybe we can do is that it will falls somewhere along the line between those patients that could be one guess that would create this one dimensional simplex if I have three points very close together in the high dimensional space where would my next patient fall, maybe in the triangle that those spam so the convex envelope of these three points. And this is basically what we do so if we have a lot of sample points in high dimensional space. We estimate based on this simplex skewness. So, so the cool thing about the simplest simplex skewness is that you can with very few points actually estimate a high dimension you can use three points and just look at the expected angles that you would have between those if we had a uniformly distributed measure around them. The dimension obviously they would be basically orthogonal. So this is the concentration of measure principle, but based on this you can actually from just three point estimate and say oh this looks like a 17 dimensional place in space. So then we create those simplices. And, and, and we have uniform house dwarf measures of the correct dimension on the simplices and we could and we basically connect to make this complex of simplices. Then, why I call it principle moment analysis is because it's exactly that it preserves the moment the physical moment of this object, and you project down to lower dimension and you keep as much of the moment as possible when, when, when you do so. The PCA corresponds to basically just putting direct distributions that every every point sample point you have so that is PCA. PMA is exactly PCA if you just use the, the some normalized some of direct distribution corresponding to the points. So for this answer and Giovanni Visona, who is one of our years ours is the first question from him comes the first question from our zoom channel within the network. Hi, so first of all, thank you it was really fascinating talk. I have a couple questions one of them, maybe something that I missed but I wanted to make sure. So one of the more, let's say, relevant things for PCA opening applications is that it is deterministic. Is it the same for principle moment analysis, and if not is it robust to iterations. Yeah, so it is deterministic part, you know, the key thing is of course how do you approximate an underlying distribution with with with some other distribution. And then you have to have distance metric behind between different distributions, you have to have an assumption of the underlying distribution in order to do this in an objective way. So there is an underlying assumption, but if you want to do it concretely and have an analytic bound and basically show robustness without any knowledge whatsoever, I would say that PCA is clearly optimal you have no assumptions whatsoever. But if you happen to, to think that you know something about your data, then you should build that knowledge into the construction of the approximate measure. This is the, this is the central thing. And I see it as highly useful when you do integrative analysis so we, we might have annotations coming from, you know, immunohistochemistry that tells us something about these points, we can build that into the construction of this underlying measure. We might think that, oh, these should be, you know, there is a there is always an error when you do a measurements and and and you can of course regularize this with Gaussian kernels around each point etc so there are multiple ways that you can work with this framework. I, I mean it's simply a singular value decomposition of this sampling operator is very simple and still very, very flexible. If I can also have a more general second question. Yes, please. So, I was a bit curious. How to say about the follow up steps after you perform something like identifying these. Signatures like the upper back signatures that you mentioned earlier. What is generally the procedure afterwards. So, for example, do you try to decide whether it's useful for diagnosis or treatment. Do you try to perform experiments first or try to determine the biology. If you could just say a few words on that. Yeah, yeah, that's that that's also a very good question it's. Basically what we call reverse translational research. And I mean translational research is that you look at cell lines you you do a lot of experiments with cell lines and then you maybe move to an animal model and then ultimately you go for successful targets you do a face one study and look for those escalation and toxicity etc. What I described here is basically other way around that you you say that we are having all these clinical data on on different cohorts and and obviously I mean many people have said this but but I mean the best model for human health and disease is human health and disease. So we try to learn from from these clinical trial data when we when we have a target like the apple back here, of course, we then move it back to the wet lab and start to understand try to understand what inhibiting or or or promoting such a signal could could look like and there we we do cell cell line experiments we do animal experiments etc. And then. So it's basically reverse translation but then of course it's also going back again if we are successful in the lab we say that oh, we have this antibody for instance that we can block this pathway and then we of course try to to to to use it so it's a long. Even though we see a signal, it's a long journey to to come up with something that that could potentially be a medication, and this is again where I think that computational scientists can help enormously, because we can do a lot in silico. We don't have to do everything in vitro or in vivo in animals we can do a lot computationally if we would have you know systems biology approach coupled to this more machine learning oriented approach. I firmly believe so. Thank you. And out of curiosity, what would be approximately a time scale for these long process that you mentioned, like five years 1020. Yeah, that that's about correct five to 10 years. Thank you. Thanks to both of you. I have further questions from the network. If not at the moment there's one, then I would like to ask a question mark us. And when I, when I'm listening to talks about dimensionality reduction, like yours, it was a great, great talk. I'm always impressed like how mathematical the dimensionality reduction itself is but then there's usually a step to a very manual analysis, namely the interpretation, whether the the principal components or the principal components in your case that you find are biological signal or technical variation or a form of confounder that that is then in at least in my opinion as an, I would say, informed outsider. And this is then a very manual effort or human effort to to think about like the role of that principal component is it a biological signal or is it a confounder. And, of course, one can say well that's then the research that you are doing as a computation biologist or as a statistician to think about whether this reflects biological variation or technical variation, but wouldn't I often ask that wouldn't that be like a prime task for machine learning here to support the user with hypotheses, whether this variation could now be biological that it could be a confounder intra patient variation or some, some complicated dependence on the machine that we are not aware of yet. I would like to close my comment. Like, often see is that what a human being can think of it's often very linear is often very simple as a confounder and the real confounders maybe much more complicated than that so do you see a role for for machine learning in that. Yes. Yeah, that that is, that is a brilliant comment or or question question I completely agree. And I would say that that this is only lack of maturity. I mean, we, the data that we need to to to make those predictions has only just started to be generated. And I think we are at the at the beginning of this AI and machine learning revolution within, you know, biomedical research, but also clinical care, because this is exactly what I think will happen. And this gene set enrichment analysis that I performed, you know, manually, of course, should be done. I mean, in an automatic fashion and optimized for for biologically relevant signal. And I'm, we are thinking about those things. I would basically say it's just a lack of maturity and we need, we need you. We need, we need your network and and and and similar, similarly spirited researchers to work with us over time to create those expert systems. So I think there is always room for interpretation for a doctor to interact with a patient and and and discuss treatment. And here I think that things like visualization of data and trajectories etc will be a key. We need to have pictures or models conceptual models that we easily can understand to take decisions, because the obviously the end points of a lot of the research we do is is now overall survival objective response rate etc. But for a patient, it's also quality of life. So, yes, you prolong life with so much but how does how does that life look like how do you live your life. It's all very well when we can actually cure that that that is fantastic and sometimes we can and we hope we always should be able to do it. I think, often we can just prolong life, and then quality of life is extremely important and here it is a dialogue between the doctor and the patient and I think that this dialogue involves conceptual models that are easy to to understand and and and grasp. And there it might be difficult to replace that interaction and conceptualization with an expert AI system. Thank you. Thank you for a very detailed answer. Thanks. Is there are there further comments or questions for Magnus. Lukas Miranda another ESR from the network, please. Thanks a lot Magnus for such a wonderful talk and also very motivational towards working in the field. I had a couple of questions noted down, many of which are already answered during your last slide but there is one that remained out of utter ignorance and curiosity I was wondering. If you know how widely applicable these immunotherapies are for us to go how accessible they are to different peoples. Yeah, so this. This is a great question and remark. Honestly, I mean the first approvals came 10 years ago I mean they, we only had those options for a very brief time and we are. I say we, I mean a community of biomedical researchers and and and pharma companies across the world are developing them at a high rate, attacking new indications and and and we learn that that a lot of these therapies are indication of being specific they work differently in different settings. And we do expect them to be fairly. I mean I share this message from the cancer research Institute home page that I mean the goal here is to cure all cancers. That goal is, is unfortunately quite far away in the in the future but but we, I firmly believe that by different combinations of therapies both immunotherapies and more traditional therapies, we will be able to make a huge impact for cancer patients over the over the next decade, I am very convinced. Some of these therapies are are are very expensive and and obviously I mean, this also has to be addressed. It's, I would say that the challenge of how the world looks like has been has been clearly exemplified during the cobit pandemic. So we need to learn from this, we need to be better as researchers as a community and and and and as a world, and this is not an easy thing to do. Still, when I work in science, I think that I mean scientists we speak a universal language, we have friends all over the world. And I think we can actually also help to build a better world. So also this I think is is for all of us extremely important to think about what is the underlying motivation for doing things. And for me, being in science is is really. It's such a rich richness to be able to engage with with people all across the world and have a common goal. I, it gives meaning to my life. Yeah, sorry for being very philosophical here but I couldn't help myself. Thank you very much. Thank you. No further questions that we thank Magnus again for opening our summer school that was wonderful. And now we have a break until 1030 and then we continue with the talk by Felix Agakoff I'm looking forward to that and to seeing you again after a short break. Thank you. Magnus. Thank you so much. Thank you all. Bye-bye.