 So we're going to have now the first lecture by Christian Tomasetti from City of Hope and the set of lectures will be revolving around cancer and mathematical modeling and machine learning applications. Thank you, Chris. Thank you. Can you, can you hear me well it's, it's good enough. Yeah. So, it's great to be here. And thank you for for being here and listening to this hopefully we'll find the of some interest. So, just one second. Disclosures I always have a disclosure in other states you have I don't know if this is the case in Italy too but in other states you always have to disclose your disclosures. There are. Disclosures. I wanted to mention before starting with the lecture I wanted to introduce one second. This is the place where I work. So city of op. It's, I was at John's Hopkins, which is a standard university academic institution. And then this, this very January actually the middle of the month I transferred a move to city of all which is a cancer center but also they have a graduate school. So they do PhDs and all of that. But of course the focus there it's on cancer. Okay. And one of the reasons why when there is because we opened this new division that it's focused on basically it's a, it's a mathematical division. It's based on quantitative methods for understanding cancer evolution. So how you go from a healthy tissue to cancer, as well as on cancer early detection. Okay, and this is bought doing liquid biopsies so looking at liquids of our body that can inform us about what is happening in a patient, as well as for imaging. Okay, so AI neural, you know, methods for detecting cancer early. And and here is here is a little bit of the structure of, you know, of the center they open up and there. And I guess I'm showing this. So for two reasons. The first one is, I guess as just to say that what is cool about this center. Otherwise I would know I've moved I would stay like you know, as it where I was. And for, in my opinion for the first time, one of these cancer centers created an entity, which is the center where there is bought a technological branch, as well as a clinical branch under the same center. So what that means is as you can see in the figure here what that means is that we have this division of mathematicians. By the way, let me also say for those that interested, Sophie is here she's on on the math side she's work works with me and she leads that effort, but we have statistics and machine learning and AI. And so, working with this group, which is at the bottom this division in the grad school, then, and in the research institute then on one side, we have a group that develops new technologies. Okay, so literally I have, say, fives of you or one of the people working with us students come up with a new idea. We literally next day can have someone in whether it's a web lab or whatever that is testing those ideas. Okay, which is quite cool. I don't know who can do that. But what is very unique I think is then we have within the same center also the clinical side. Okay. And so we literally can have a patient samples and apply this methodologies right away, which makes things very easy. There are major centers like you know hard versions of kids. If you want to try to do something like that. You have essentially to put together multiple principal investigators and it's, it's, it's very messy. So, I'm excited about the center and I guess I'm, I'm also saying. By the way, it's very beautiful from the window. I'm just noticing. I'm saying also because we're always looking for students and people that would like to collaborate with us. And in fact, this in part was originated by, you know, from a conversation I had with the organizers of this conference here today. So please let us know or let us know if anyone is interested in working on these topics and I'm happy to talk to you. And we do this with people, you know, on site, but also there are plenty of people that work off site with us. Okay, so the now get into to the lectures. I, I hope you would like what I prepared but I wanted to present my the three main directions I work on. But I wanted to do a in the style that I try to have in my research anyway and with does does the work we need know that, which is I'm really interested in the actual problem that I'm trying to solve in biology medicine. Okay. And so, you know, when I grew up when I was at your stage and here I'm talking to the PhD students in the room which I'm assuming there are there are a few. So during my PhD. When I learned about using mathematics in understanding cancer evolution and all of that. So that's when I started. And luckily for me, I realized pretty quickly that there were a lot of mathematicians going around looking for applications where they could use their particular partial differential equation that they knew how to solve. So they will go to the department next door. You know the biology department and say hey, anything where you have this type of situation because I have these equations they are really good and you know I know how to solve them and blah blah blah. I'm not saying this in a in any way in a negative way. Okay, this is fine. This is absolutely fine. But it's one way to approach it. Okay, and we need obviously people that are really interested in the particular methodology, you know whether it's mathematical where it's statistical machine learning artificial intelligence. So that's their main focus, and then the application is kind of, you know, a nice consequence of applying this methodologies that they develop. Okay, I'm not that guy. Okay, so it may be very disappointed to you depending on how you view things. I've actually at that time when I saw that that was the reality in my field. And from my point of view, I wanted to do the reverse. Okay, so I wanted to start with the problem. Say I thought was important. And, and often ended up using tools that are very, very simple and you will see some examples. Okay, so I'm going to be shameless in showing you that sometimes we use very basic things. Okay. So that way. But that's, that's what I chose. Okay, so again, there is not right or wrong way to do it. But I think it's important to mention that because it's a major career decision that we might be PhD students at Hopkins. You know, we have to discuss always. It was, are you going to go for methodology because it takes a lot of effort to really be at the forefront of, you know, whatever is new machine learning you have to be very familiar with all the literature and so on. So is that what you want to do, or, you know, you pick an application and then and then you really have to spend a good amount of time learning about the biology of that particular problem. And then, and then the mother start or the machine learning becomes the tool that you have also to be familiar with good, you know, enough to tackle those problems. Okay. So those are two perspectives and I'm here just to, I was just saying, I'm on the second type. All right. So, and as I said, over these three days, I plan to show you three directions. The first one, they are all related to cancer because that that's what I've been doing. A great majority of what I've been doing some cancer. And in the first, in the first lecture which is today, we're going to talk about cancer etiology etiology it's a fancy word. I mean, you know, the causes of something right so what is the causes cancer. And, and I'll show you a little machine learning and some methodologies which we call metational senators which are useful to address what causes cancer. Okay. And then tomorrow, we'll do a little bit more math and just a little bit more. Okay. We're not going through methods or it's really just showing you I want to show you. I want to show the problems, the biological issues and how we think about it and how we tackle them. Okay. So there we'll do mathematical modeling or cancer evolution. That's the part I work with a few we just published a paper a week ago which we're very happy and I'll talk a good amount about that. And then finally, in the third lecture. We are back to more, you know, machine learning stuff where I'll discuss something that today I think it's a listing cancer it's a pretty important topic of the liquid biopsies. All right so essentially blood test usually can be other liquids. Okay, so sometimes these things can be done in urine or, but liquid biopsies and then applying some machine learning tools to the tech cancer early. And there are, it's, it's a, it's really an exploding field. And that can, and I believe and I'll tell you a lot more on Thursday. I believe can really change the outlook of cancer. Okay, and how terrible these diseases today. In my opinion, important problems, otherwise of course I wouldn't be working on those. And so today we'll start with cancer theology. And by the way, I'll, as I already told you, you know, my students when they start with me for the first few months feel lost, because they are like I was doing you know whether it was statistics or whatever they were whatever they were doing it's like, you're just having me do biologists three months they all I see is biology right. I kind of throw them in the water because for me it's a test of how serious they are in learning what the actual problem is. Okay, so I here I'll tell you a little bit the story about what happened in this research direction. So, in 2012 13 when I started thinking about this problem. Actually, these are screenshots for 2014 15. If you went online, okay, and said, and Google, what causes cancer, right, or say, someone of your relatives or friends was diagnosed with cancer. A parent with their child right diagnosed with cancer was home and checks what causes cancer. And this is the answer that was given by the main sources of information both for the general public, as well as from organizations that specific for cyber, you know, they're focused on scientists. And so you know I put Wikipedia, as well as cancer research UK, for example. And the answer is you don't need to really just the answer is, or the answer was, essentially cancer was caused by the environment. So environmental exposures pollution, for example, okay. Or sun exposure excessive sun exposure or an Indian environment this may be weird. If you are new to the field it was weird to me the first time I heard about, you know, lifestyle factors. So smoking drinking bad diet. These are all classified under under the environment category. Okay. Because you can think of them as something that you, you know comes from the outside to your body. And, by the way, I call them e factors for brevity so sometimes you will hear me say the e factors, and then the other source being hereditary factors so meet as shows that we inherit from our parents. Okay. That's it. I mean as you can see in this first figure cancer research UK smoking its diet that is you know, obesity with the scale alcohol sun exposure lack of exercise and so. Okay. Oh, by the way, I forgot to say something very important. Please stop me anytime if you have a question they would like to ask. Okay. I mean we can do questions at the end. I don't know if we do any. Yeah, we can do questions at the end too. But I would much rather answer a question from the spot if you have one. They would like to ask. I like the interaction. So. Okay. So, I'll give you one more example again you cannot really read as too small but then I'll share the PDF by the way of all the slides so you have them. So here is irk monograph. So what is irk irk is the International Agency of research of research on cancer, and it's basically the cancer research organization for the World Health Organization. So WHO for cancer. That's the agency that takes care of that research, basically the most important institution in for cancer research, you know, that we have in a listing some sense. So as you can see for every organ. There are several in a list of carcinogens and exposures and environmental factors. Okay. So again the same type of picture. Here, another example, this is an important publication in the United States every year reporting, you know, data on cancer. I don't know if you can read but it says essentially there are two factors, hereditary factors and environmental factors. Okay, so again h&e. And in fact, don't just that but they give a number, which is the environmental factors as opposed to the factors account for an estimated 75 to 80% of cancer cases. Okay. So they give you also an estimate of what they think is the proportion of the two. And by the way, just to clear from the beginning. It is actually well known as of today that hereditary factors don't account for a lot of cancer. Right. So, it can be very important if it's present in a family, but when you look overall in the population are the total number of cancer cases. As of today we can assign to an hereditary factor is actually relatively small. Okay, I estimate a five to 10% or so. Yes. I mean, this hereditary factor mean, you know something that you're born with and then independent of what happens in the environment. I mean, you know, even if the environment where as best as we can make it to get the cancer or, you know, is it something that you have and then environment hits you and you get the cancer, Okay, like, fantastic question. Okay, and in fact, this reminds me that I should have had one more slide, because that's exactly why you asked that question, which is, how do we get to cancer. And now you just get Thomas that is version of that answer, which hopefully it's somewhere close to the truth, which is that cancer is complicated, right, but I will say as of today, it's, I think it's agreed upon the main engine of what takes an specific issue to become cancer is metational events where for mutations we don't necessarily mean DNA mutations, it could be epigenetics, okay, but metational events, and a series of them that takes a person to cancer. And we call those mutation and tomorrow I have a for tomorrow but I probably should have introduced it today anyway. And all these mutations that are cancer drivers, because they are driving the car. Okay, they're driving the process, contrary to all other mutations that we call passengers. And as I said, depending on cancer type and depending on, you know, there are multiple pathways that take can take you to cancer, a typical rough estimation that typically uses that you need three in solid tumors often liquid tumors it's less usually than two, but there are cases where one is sufficient. But it's by far the exception. Okay, very few I know of where one event is sufficient. So to answer your question right, if you're born with an internal mutation. You have now have if you if you need three hits to get to cancer you now have the first one of those three in every cell of your body. Every cell of your body contains the mutation because it comes from your parents, right, versus when the mutation happens, we call somatic if it's new it's not germline. When it's not inherited. At the beginning list you get it in oneself of your own body, right. So yes, the answer. Okay. But so, so yeah so keep in mind that you need multiple events to together in general, and, and to finish actually to be very clear on when here they say, say, 80% is due to the environment. What they mean is that, at least one of those events was due to the environment. And the logic is what the logic is that if I could remove that one mutation that the environment cause this person will not have cancer. It's like, you know, there's no mathematical way to define things because of course you can imagine that the person still gets due to something else later on in life right if you stop at the very moment where the third mutation hit. And one of those three was due to the environment that person will not have cancer if the environment was not part of the equation right makes sense. So that's what is meant even one yes. Oh, we need to turn it on and it's on. Okay, yeah, I think. I just wanted to ask, where do you classify viral infections here because they change the genome often, but they're not inherited in that sense. Right, so viral infections. If it's an infection so if you get an infection, that's an environmental factor. Okay, as a classification. Yeah. And you know, we can maybe leave it like this. Yeah, just like that. Okay. All right, that's great. I like this. Please, let's keep going this way. I love it with questions like this. And these are all great questions so. Okay, so, by the way, I'm telling all of this because this is how this happened this research happened. And I want you to. When I was a PhD, one thing that I felt when I was a PhD it feels a little bit overwhelming. You're, you know, just learning a new field and it's like how can I possibly have any type of impact. You know, if not only after 20 years of research, and one good news I would like to bring to and you may already be very well aware so forgive me if you're reading all of this but I often I see students that don't know this which is because today we are in a, you know, in a, at a time where a lot of science and in particular of medicine and biology, you know, things have become so quantitative that in many cases, the ones that have this quantitative tools like the ones you are learning right now are the first ones to see a new or understand for the first time a new phenomenon. Okay, so I always like to say. When we think about medicine, for example, okay, do you know where modern medicine was born. This is not me saying this actually the if you look at the. It's Oxford actually I don't remember the exact text but you know the University of Oxford. The University of Oxford medicine, modern medicine, which was born with pathology really. So, if you want to be very specific modern pathology. Do you know where it was born. It was born in Italy. And in Padua and and Bologna, where for the first time this was Renaissance times, someone decided to take a dead body and cut it open. But enough with just trust in calendars and Aristotle ideas and hoping the animals that we observe that have exactly the organ shape that the human set right. Let's actually look at the human body. And that brought a description of the organs in the human body that we had no idea before. Right, so that's when you make progress in science, in my opinion, one major, you know, if I think about mathematics okay Matt, a lot of it can just happen in our brain like some some person very smart things about something really great. But in medicine, for example, often it's about observing something that you could also be for that makes you realize, you know understand better phenomenon. So that happened in Renaissance. Then of course we got microscopes. So that now, instead of just understanding the shape and try to guess the function of an organ right now we can see all the way down to basically a cell. And then in the years 2000 with the you know we imagine on being sequenced. That's been kind of like the third evolution in medicine right where now we can look inside the DNA of a single cell and and infer the behavior of the cell what the cell is doing. Okay. In fact sometimes even dynamically based on on this information. So, so, and every time you add one of the scientific revolutions you have a major improvement in how you understand medicine right. And so here, the ability today a lot of medicine has become, you know, letters of DNA, or things that are object that are complex, where you need tools like machine learning to understand the behavior of, of, you know the system. It's, it's a great time as, as I'm sure you obviously already know to be in this field but I'm telling the story because I think I think it's very important that you know that you understand that there's no field where people like a quantitative people are just serving the field like I don't know let me give you one more example where I worked where I work right now for them, which is basically a cancer center with the graduate school. The majority of the quantitative people until a few years ago were statisticians they were crunching the numbers for doctors. Okay, so you had a famous MD and medical you know doctor running a clinical trial and told the clinical trial they need to do some power calculation, they want to know how many patients they need in the trial. So they call the statistician, and they say please run with this power calculation. Okay, and that was the role. That's not what is medicine today. Many times is now the quantity person that's informing the MD about what is actually happening, because you cannot see it with your eyes right. So again, here's how it happened in this case so I was I was interested in this topic and and I was reading this and I'll give you one more example because I think it's really striking so here is Harvard. So Harvard epidemiology for United States is the most important department of epidemiology and they had they had a meeting. It's not anymore on their website. I have to use way back machine but they had a meeting. We're essentially in 96 we're essentially, you know this. We wrote a report on cancer prevention. Essentially, they had a conference with the specialist epidemiologist so about the time to gather as it says there, right, the best evidence, best epidemiological research they had on what causes cancer and then write a review and a consensus statement. Okay, consensus statement means I won't pretty much agrees that this is important. And with that conference, there was, you know, a paper came out, which is you see a table on the right where causes of cancer. This where this is essentially the estimates for the various causes. Okay. And now when you look at that carefully. Well, first of all, it's again, what I was telling you, all environmental lifestyle factors and something edited right. But when you look at it carefully. So they have this statement, nearly two terms of concept that in the United States can be linked to just the first three lines in that table. Okay. You know when a person when a normal person reads this says wow. So if I don't smoke, I'm no bees, and I exercise. I'm already, I shouldn't be in two thirds of the population that gets cancer, right. And in fact they even had a 10 commandments page basically do this 10 things you will not get cancer. Okay, what is striking and you obviously appreciate this is when you put all those numbers together you get 200%. Okay. So obviously this can now be true right because I'm then I'm not allowed to be a visa and smoke at the same time right. So, but look at look at the conclusion. This is the conclusion of the consensus statement this is the top scientist in the world on cancer etiology. Okay. One of the most important conclusions to be drawn from this report is the cancer is indeed a preventable illness. Okay. All right. I hope I convinced you that that was, that was what everyone thought again this is all the way to 2014 15. Now there were some interesting contradictions to what I just told you for example my condition was an overpriced. I recommend if you have no watch there is this beautiful documentary. So there is this book on Emperor of all maladies on cancer. It's a very big book few actually read it. I haven't read it. But it's informative and they made a documentary it's a six hours documentary, and he says the sentence in that documentary which came out literally a few months before the paper that I published a show. And he says, very honestly, actually, we have no idea about pretty much two thirds of the cancer cases, the cause of those cancers. In fact they use in that documentary a crab, which is a symbol for cancer, and they shade, you know, two thirds of that crowd to indicate that for two terms, we don't really know the causes. And so, what was interesting is that if you if I looked at cancer research UK in 2014 for example was saying 42% is to the environment. And I was like, wait a second so on one side, the numbers say that's 42% to the environment and maybe five to 10% to inherited. But on the other side, I read the statements where it sounds like you're saying that's obviously all the environment or you know, I couldn't put the two things together so I became very interesting understanding this what I what I thought was to me a contradiction. And so I thought that was something missing. And so, and I'll talk about this tomorrow so I won't I won't really. I have done some little modeling work where there was some evidence that is to me and to those that accepted to publish this paper that nobody's accumulated normally a lot of mutations. And it was kind of unexpected let me just give you a picture of the idea so before the paper, the idea but even like really the idea in the field was that if you think about time. The green, the, you know, the green, I guess it's a triangle if you saw what is under the other two. So the time the accumulation of these normal mutations. It's very small. Okay, so the idea was that when I look at the cancer in a patient and I do the sequencing and I find 100 mutations. Almost all of those mutations were caused by the cancer process. Okay. So, and I'll tell you tomorrow about this more but what I thought I found evidence for was actually a lot of the mutations of those 100 mutations that we found in the cancer that patient would have been there even without a cancer. Okay. And so this was a bit surprising so that, you know, if you reverse that those are so this was a bit surprising at that time was unexpected. But so what that meant to me is that this what I call the background and so the fact that when cells divide normally, they accumulate mutations. Why, because it's a living the cell is a living system and nothing, it's perfect right. There is not nothing that's ever proof. And so we estimate about three to six mutations are accumulated on the DNA, every time a cells divides and as to copy the DNA. So this is very well known. But because it's, you know, it's only three letters out of three billions. The thought was, so what, you know, this is relevant for cancer. And, and I thought, well, you know, this is something that I actually used to explain the concept. Probably, I don't need a free but essentially, let me just on the, I wanted to show the lower right because I always, I'm always asked that. And so imagine that you can be born or not with an editor mutation which is the blue in the in the figure in the bottom right. And then in time we accumulate with regularity. This mutations, they are normal. Nothing, you know, we're not it's not because we're smoking. It's just they are community normal. And then of course, if we expose our bodies to environmental exposures, then we accumulate more, which is the green dots. Okay. So, I call this normal replicative mutations the R factor for replicative mutations. Okay. So the question is how important are these replicative mutations, because I won't tell that they were not important at all. So, and here, I'll show you just a little more than the idea was, well, okay. How about you know, let's think about this and again look how simple this almost embarrassingly simple. Okay. The idea is the following mutations happen or these are mutations happen when a cells divide. So, how many of these mutations I get in an organ. It's mainly function of two things. How many cells that organ has, because the more cells, you know, a big organ will have more mutations right, and how often those cells divide. Okay. And tomorrow I'll show you that obviously this formula is quite wrong, but if you just roughly said, well, let's just consider the product of D, which is the division frequency of the cells and N, which is the number of cells driving an organ and stem cells for those that know biology. I think they will understand why specifically stem cells. Well, let's keep it here. So, you know, what about considering this product, and now looking at many, many different tissues. If this are mutations are important. I should see kind of like a dose response right I should see that the more divisions of the more cells the more cancer in a given organ. Before another another surprising thing is before this paper, if you asked any doctor, why there is more colorectal cancer than bone cancer. There was no answer. The answer was the answer would have been probably it has something to do with environment. For example, lung cancer is, you know, the most common cancer type. Well, because of smoking. Okay. Okay, well, so that's exactly what I tested I essentially not exactly the product that I showed you there but essentially I look at that product. And on the x axis that's the value of that product. Okay. And so we have different organs. On the y axis, I looked at the cancer risk in those organs. Okay, and, and this, and this figure appeared, and where the spearman correlation, right. For a biological phenomena, I think it's a very high value. Okay, point eight. And I didn't put the P value but it was like, you know, I don't know to the negative eight or something like that. And so, while of course this doesn't it wants to be the fullest potential. You know this across many orders of magnitude. And, and we did some sensitivity analysis to show that you know even if we had some estimates wrong for how many cells in an organ, the results will stay. There's no change. By the way, who can tell me. So one of the criticisms, immediately after this paper came out was, Oh, but you're using estimated on how many cells you have in a tissue and often they divide. How sure are you that they are correct. And depending on that. You know your result may be totally wrong. Well, they didn't read the supplementary fire we show with sensitivity analysis that there was not the case we, we allow for toward us some money for each point to be toward us some money to to the left or to the right in terms of the X one. Okay, the result state. Why is that I don't think you need to do that analysis to know that that will be the case. Well, that was the whole point of this analysis is when you put a lot of points together. Right. As long as this estimate first of all it's in order in logs. So here, you know as long as you get the right order of magnitude, you are probably pretty good in pretty good shape. You know, 30 points together, even if you're off on some points right. It's going to be a little bit harder you're going to be wrong in all points in the wrong direction to cancel the true signal right basically that's the point. Here I'm not looking at the estimate one value I'm just trying to understand if there is a signal across all of them together I'm just looking for a correlation right for an association. If I want to measure the specific risk. If I want to do any inference on one point, then yes, I'm super sensitive to my estimations of what the numbers are for that issue, right. But if I'm looking at all together, the analysis becomes a lot more robust. Yeah, please. I think they want you to use the microphone I think. Yeah, if you push it up I think it becomes screen. On the top on the top. Okay, so if, so if one had an alternative hypothesis that, for instance that this cancers were due to environmental factors. I would also expect that the bigger tissues would have more cancer simply because each cell has some fixed probability of you know being kicked by an environmental factor, and therefore the bigger you are the more likely you are so right. So, again, fantastic question. You have great questions. And so we did an analysis in that paper actually in the 2017 you can find the results where we look at the effect of the environment across tissues, and we show that there is no correlation. In fact, almost a little bit inverse correlation but you know why, because, for example, the most environmentally affected cancer. Well, in terms of number of cases is lung cancer. In terms of cancer proportional cancer cases is cervical cancer cervical cancer is viral infection that's it HPV, you get cervical cancer. I'm a proportional that gets a cancer, but so both of those for, but especially lung, it's right in the middle in terms of number of cells, you see. So the point is environmental exposures are going to affect tissues. They don't care if it's a big organ smoke. Yes, that is part of the equation. But if a tissue is more internal or less exposed, you know it's not like the intestine, for example, then you may not get as much of an effect as for us so the point of including all of these different issues together is also to, in a sense, we're leveraging out the effects of different exposures. And yes, so, and even when you look at them all together. As I said we show, we show the, for example, we provided two analysis to answer that question that you asked, and one was with Hiroshima, right. Atomic bomb survivors. Then you would think right these are, I mean radiation went through them. It's every organ the same way. Then you should see that definitely not the case. Okay. Sorry. I mean, if I read correctly you have that the smokers have a two orders of magnitude more risk. Correct. So the experimental environment is huge. Yeah, so it's not saying that the environment doesn't do anything. Absolutely. Absolutely. Yeah. And in fact, we put that on purpose. So to show that that was the difference right the environment. And there are a few other cases but the point is, in fact, from this slide from this analysis, you cannot infer really what is the role of our. If you understand statistics, you understand that there is something going on that's important. We know this is just an association. We're not saying that if I run a regression line, for example, that would be the true value, right. There is a lot of noise and, in fact, I'll show you tomorrow that pictures more complicated than this. But basically, this says that with the spiral correlation of point eight, right, when you take the square of that, then you get about two thirds of the variation on the y axis can be explained by the variation on the x axis. Okay, so two thirds, it's a pretty good chunk of explanation. But of course there is a lot that it's to, you know, we know that environment is fundamental. So then that Okay, let me let me just digress a little bit more on this. Again, I want to use this kind of like a for for today, kind of like a case study to show sometimes something's working research. Well, so that figure became a paper in science and 10 days later. WHO, which I just told you, the most important organization in the world, basically comes out with a press release number 231 And the very first time in the history for all I can tell I went back, they, I don't have access to the first 15 or the press releases, but for the first time against the scientific publication. Okay, when they basically you read the data bottom, they strongly disagreed with my conclusions and the other quarter. Okay. And, you know, I jokingly say but at that time wasn't so jokingly that my career was over, right, because I was a brand new assistant professor, and here's W job saying very officially that I was clearly wrong. Okay. Um, so, So then what happened, I'll tell you that I'll answer the two main criticism because that way we can stay here for a day. The first one was then their original analysis breast and prostate cancer were no included. And initially I didn't because those are, you know, affected by hormonal levels. So when I look at how many cells and often they divide. I approximate by assuming like kind of like, you know, a constant rate through life. And the things that are affected by hormonal like cycles and things that are definitely not constant. So I felt was a bit more complicated to model, but we did it later in 2017 when we answer to W show. And we show the effect breast and prostate didn't didn't change the result, but their main criticism really wasn't that the main criticism was as, and here again it's what they said that I did analysis in other states. If I did in another country, the results would have been very different. Well, they say different. And why they said that. I followed this because I'm hoping that since we are going to spend all this time together. I'm hoping that one of the things that you know you'll be left with is some a bit more understanding of, of the biological issues behind this topic, and maybe even some excitement about this, this field. So, the reason why they were saying that is the following map. Okay, this map is the poster boy of cancer epidemiology. Okay, and what I mean for that is that anytime you're here. If you go to a talk of a, you know, scientists, especially from W show by epidemiologists in general epidemiologists are does the study will cancel epidemiologists are does the study will causes cancer. They will tell you look at the war map. Well, there's clearly enormous variation. So, it must be the environment. Now, one interesting thing about this map is, well, as you can see, the countries that the most cancer are like Western countries and, you know, Australia versus on the other side it's Africa, the lowest and Indian zone. So, by the way, one thing that's interesting when we talk about the environment. Okay. As I told you, when we call environmental factors we talk about external exposures. So pollution and asbestos and all that, and lifestyle. So for external exposures, what will be your guests are, you know, say is United States more polluted than Africa, or less polluted. What would you say. So, I'm talking about pollution in terms of, you know, what as people we get as opposed to, you know why, because you know, when I think about Africa I think about, you know, the savannah and clear sky mean but the reality is that, you know, no matter which of those countries, usually people tend to live in very big metropolitan areas. For example, if the regulation on the type of gas that you can use for your car, or what the factory can shoot out of their chimneys. You know, if those regulations are not strict, right, the people right there are exposed to stuff that where there are regulations they are not. So there are indices, and actually these are analysis. There is a very nice study Yale study, you can just Google you find the data set right right ready to be analyzed with indices for all kinds of, you know, environmental factors indicators so air pollution, you know, particulate water quality blah blah blah blah blah. I'm not trying to look out. I think except for one and I don't remember which one but you know it's like four of them. It's the opposite of this map. So you know at some point I even thought would be funny to publish a paper when I say well, the environment is inversely correlated to cancer so I guess, go out and get some, you know, smog, some pollution because must be sensitive right which obviously is nonsense. By the way, this is a great example of the risk that we have in doing machine learning right or whatever field is like that. If you don't really understand the problem. It's very easy to just grab data and come out with conclusions that utter nonsense right because obviously it's not that believe the pollution is no harmful. But the fact is that I believe the external environments of pollution is as a very little impact in terms of cancer cases, right. So, that is not really for what we know today, especially Western society where we have data. The major source of cancer. Okay. Let me show you so so that's what I said so now you understand that. This is how today still today in many ways cancer is just if I as an environmental cause, you know, disease. And now I'll show you. Yeah. Oh, sorry, please. Just I think you have to wait. So, because we saw this map also the other day in another talk. And I wonder also if like we have this kind of division between of incidences of cancer, also because in some countries there is under diagnosis of cancer because. Yeah. It's a perfect answer. So where, where did you hear where did you see this map before. And last week, the first lecture I think from also in this very cool. Okay. Okay, so that's great. Yeah, it's a it's a very well known and use map right so okay let me show you to answer your question let me show you something that we are about to publish I mean about to submit. But it's a very simple thing. So the map on the top there is the same map I just showed you I just changed the colors because they are softer than we had that was that default. Okay. Now the second map though is data quality. By the way, I should say one second. Sorry, let me go back. In Brazil because they said, obviously if you do in other countries where cancer is all different, your correlations are going to be all struggled right. Again, if you really think deeply about it. Not at all. In fact, even before looking at the data I knew they were wrong. I was actually very surprised because these are people that are epidemiologists you know pretty good background in statistics, even though they're usually not statistician or mathematicians. I chose in this case because when you look at the. So, okay, so data quality. That's a second map, right. So, the third is just one possible normalization. Okay, which I'm not claiming it's the right one but one way to correct for data quality. Okay. So look, Italy is, I think a quality top quality right in the States and many countries are and some are so. As you can see there. By the way, this data quality score is provided by the shop. So it's their score. Okay, so I just took their score and I look at the results. So now you can all understand that we are not claiming that there are no differences, but you cannot just show them map and not mention what you just mentioned which is, but what about that a quality. Right. And look, I'm, I'm in it, you know, I was born in Trieste. I, you know, it's where I did my bachelor and my mother is from Southern Italy. Okay. And which I love Southern Italy. But, you know, some people pay so that the doctor doesn't record that their father mother died of this type of cancer because they don't want their neighbor to know. I'm talking about like, until some years ago, okay, things are improving everywhere. But the point is, even in countries, they are considered really high quality data countries, there is still a lot to prove you can only imagine right. What, when you look across the world, how much progress needs to be done there. Anyway, this is a, this is just to say, so then we went and Yes, it's already ready. It's just, yeah. Oh, it was clean. Okay, that is another factor that may be playing on crucial role is the lifetime. So in many countries, the, yes, the lifetime is much lower than in the other. So we have less probability to accumulate a lot of mutations, let's say in this way. So, yeah. So the, again, great question. They, okay. Epidemiologists are no mathematicians, but they do correct for age, they control for that. So, you know, what is the problem that when you look at the age group, 70 to 80, we're saying a country have a lot less people than another country. In a sense you are biasing the population because those that in a country where everyone dies at 40, and they're a, there must be something special about those people, those are not normal 70 to 80 year old right. In any case, no, they do try to control for that in general, but yes, that's that's a good point otherwise also that would be a major factor. Okay, so we look at the data. Analyze data. Yes, please. No, no, that's this this perfect the more questions, the, the better the lecture, please. I think it's just a matter of seconds and they will become green. I'll just plug it in maybe I'll just leave it out so the because I think he reset it. Still nothing at the top. Okay. Okay. So it's not really a question I'm saying that environment is not a factor, but it's not the only factor, I guess, perfect. Yeah. Yeah. Yeah, let me let me say because some of the media when the paper came out, you know, they. Some of the criticize my paper said, this paper is very dangerous, because now people are going to be go back and drink and smoke right and obviously we don't want that and I don't think people are stupid honestly right. Yes. Yes, so the environment, it's has a tremendous impact on cancer. I was just wondering if the number of cell divisions and lifespan and factors like this are so important because of the replicative mutations. It should also have implications for IPSCs where people use stem cells like differentiated cells and convert them to stem cells to make organs for transplants, because those have lived a lot of cell divisions and then become becoming a cell stem cell again. So the number of replicative mutations in them will be much higher. Do you think it has a risk for IPSC derived organ transplants like will those organs develop cancer at a higher rate, my dear. Yes. The question that's what is yes. I think so. I don't think anyone study but you can study it. But, you know, for example, a study I wanted to do and I by the end of the data when I was a postdoc and then two years ago I saw a paper on that was the older is the father of a child, the more mutations the child. And why is that is because the germ cell, the produce the sperm, right, keep dividing going through cycles, right. So, 10 years later, and then in its job in his germ cells will have more mutations than be the 10 years earlier. And so, you know, the second 34 child will have a lot more mutations so much, you know, the general mutations then so yes this. The cell division you have always to think about it and how it works and this something that we so feel we do, you know, we think a lot about that. And I'll tell you more tomorrow. Okay, so yes. And now just already agreed. Yeah, so I like the incidence of cancer has also increased over like a long periods of that's another question. Yes. So I think like a popular debate is that lifespan of man has increased so there's there's more sample space. But do you think this might also be a factor because of the thing you said about germ cells getting more mutations. So, generations. Yes, so, you know, just to repeat your question, she's asking about is, when you look at historical data cancer instance has gone up. You know, many lungs lung cancer has gone down because people are, you know, stopping to smoke it is some, some people are by John has gone up, and why is that. And, you know, you would think the number of cells doesn't really change much, but if you're looking at absolute numbers. This is a population that leaves a lot, you know, longer than 30 years ago. And even when you control for age so if you if we look at within the same age interval you will see that's gone up, but they have a claim a major factor is that we do better screening, and we record better. Who has cancer. Okay. Eight years ago, many died of cancer without knowing right and I think you're familiar, you may have heard like they recently did this study I think in Russia but some Easter country that remember where you know they looked at men. And basically every man that 80 or whatever age essentially has prostate cancer, whether it was diagnosed or not. Okay, it's just a normal process that pretty much every man goes through, right. So, okay, I hope I. Okay, so. We did that analysis in 2017 to answer those criticisms, and we show, actually this is a very nice test set right because I didn't even know their existence well I probably imagine there was some cancer registry data of other places in the world but I didn't even, you know, I don't know about that time. So then we got, we get criticized. So then this data sets from 69 countries come and now we have to run this analysis for each one of them. Right, these are 69 independent data set never seen before is the result going to all then there is all. In fact, you know one thing I was not happy is that when you look at the overall media value or yeah I think was the media was exactly number point eight that we had in the original paper right I wanted to be a bit different because I thought these guys are going to think that I cheated right. In reality the analysis is so simple that you can do it yourself and you know there is no cheating is there data and it's a correlation right. Okay, so. This is also to show you how much of reaction you can cause with just a correlation analysis okay so don't think you have to do anything particular sophisticated to publish in science, or to do something that may have consequences in the field. It's more about the question you're asking and what, what is the answer you're finding. Okay, more than the methodology. Okay. Maybe I'll skip this one in the interest of time. Well, let me see. Let me let me just say that this is another criticism or maybe I'll just mentioned one thing about this paper, and then I'm happy to, you know, but essentially here we answer to another criticism, where some people told that you could assess cancer risk by drawing a regression line in the middle and say, actually this relevant because of a game machine learning and being quite that you so you can this this also I think teaches something. So the idea was, by the way, I had this idea before publishing my own paper, but I thought was no a good idea, and I'll tell you what. So, say you run a regression line right there is clearly some important correlation, you can run a regression line everyone does that. Okay. And then, if you take the bottom tissues, the create, you know, you have this envelope right the lower here, this red line here. Now you can say well okay let's say it was it is right and there is indeed is our factor that's important some cancer types. Let's say there is, there is due to that and now I measure from there up. And the proportion of what is the actual risk versus what should have been just do to our tells me how much is our matches the environment and I do this, and I do this for the tissue at the time right. What is the problem that the problem that is that you are going to depend very heavily on the value of the point. Okay, because there, there you do rely there the analysis for one point at a time so there you really rely on the estimates, first of all. And second of all, I never thought that you could use something on over five orders on monitoring log scale to estimate the risk of a tissue. Okay, like there is too much noise. So for example, so I'll go very quickly for example if you, if you do the opposite so when we asked we said okay let's do the opposite game. But by the way, very, not simple they never define not simple to define what this line should be which point you pick to define your lowest line here. Right, what is the boundary. Right, we define a rule and we said now we play the opposite game so now we say that when it's the upper envelope here, it's basically all do to an environment and now we go down, right. So now, if you use this one you say everything is the environment. If you use this one, you say everything is our case so no very conclusive way to approach to give an answer. In fact we showed and you can find the details sorry at that point was submitted by been published in 2018. We show that the approach of finding, you know, we show that even if, even if the truth was this green light so a perfect regression line which I never believe would be the case, but even if that was the case so you know, all do to our. Essentially, the factor that we discovered, and then you just crumble it with some noise, and the green points become red. And then use this lower line, and you measure, we showed essentially, no matter what the truth is, you know I put an arrow here but could have been anywhere. Essentially, you know with we simulated, I don't remember 10,000 times. You will always say the 90% is good to the environment. Okay, it's, it's flawed by default. It's not a way that you can assess the quest. So, okay. In that paper we also provided and I'll conclude this part, we also provide a proportion if you look at the number of mutations that you find in cancers. We also assign over all cancer types. How many can be attributed to an editor factors to environmental factors and to our factors. And the answer was quite shocking actually, which was the two thirds was attributable to our. Okay, and we provide a lower bound to a very conservative lower bound. And again, I'm happy to talk about any of this. So, I think the conclusion of this part is that with the simple regret, not even aggression correlation. Okay. We. We brought to the attention of the field. The role, you know, the major role that this normally occurring mutations have in cancer possession. And later, you know, by now this, I would say very well accepted. Here is something I've been asked to write where I was recording on research that was no mind anymore at that point just the Broad Institute, which is the biggest sequence in center United States and send an institute in England. Yes, indeed our tissues are full, normally full of mutations and some dangerous ones. WHO now as a section in their book reporting on this, you know, with their Harvard epidemiology. Well, a consensus paper with us were now basically everyone agrees that there are indeed three factors that yield cancer. And in fact the one that's essentially over sure is this our factor. Okay. Okay, so let me spend the last 20 minutes on the, I promise you tomorrow I'll be, I'll be, because I hate to go over time so. I'm the last one so you may hope I finish earlier than later so. Maybe I'll have to finish some of this tomorrow but I'm happy. I think it was important to, to, you know, have a basic understanding of the problem. So, as I told you that that is just what I showed you until now is just to assess through a correlation and another part I didn't show you through the estimation proportion mutations. What is due to this three different factors. One way to do it is using what is called metational senators. Okay. And today, I will claim that there is almost no paper in genomics analysis, when there is sequencing, you know, say sequencing data that does not also put a mutation signature analysis in the paper. But if you are, we'll ask them, you know, have you run also a mutation signature analysis. And the paper that started all of this is an entry paper in 2013, where, you know, they come up with here you see about 21 but there are a few more in the supplement but you see this mutation signature so what are this I love the fact that with you it's very easy to explain it because So, think about the mutations in DNA. Okay, you know the C is paired with G right and T with a. So you can reduce all major mutation types if you're just looking at base that's me to six. Okay, that's a C to G C to T or T to a T to C or T to G, and that's the symbol to say that you go from a C to an a. Okay, so that's a point mutation. So one base one nucleotide in the genome in the DNA that's changed. Okay. What is the mutation signature, the mutation signature is just if you want a probability distribution over this. All the mutation, all the possible mutation types where instead of just, I mean, here I told you about the six by reality. If you consider the flanking basis of the basis they are on the side of the mutation with right before and right after. Okay, so you get a triplet. So consider all the possible combination of triplets. There are 96 of them. Okay. That's, it's easy to show that. So now you're 96 possibilities. And you are just asking, what is the probability distribution of each one of them. Okay, and basically the idea, the way to think about this is think I'm a cell that I'm about to divide and smoking is really bothering me and I'm trying to copy the DNA. I make a mistake. What is the probability that I make a spirit a certain mistake, you know, so is it going to be a C becoming a T or a C becoming a name and some right. So this is the idea is that an additional process has a favorite way to reduce mutations, which can be represented by probability distribution over the space of all possible potential types. I mean, if you don't understand me, but this is a, you know, the only point I think it's important to understand for me to show senators. This is for all. This is for all me to show you. Yeah. And so the idea is, can we find what smoking likes to cause and what I'll collect some of that because that's okay. It's called metational senators. It's like smoking likes to leave a senator of how it causes mutations in DNA and so. Okay, so in this paper what they used is non negative matrix factorization now I don't know if you are familiar or not. It's very simple, right. So, no negative matrix, it's the word itself says basically because you have a matrix your regional matrix is the V matrix there, right. So you have a matrix, and it's not negative so the entries can be zero or positive but they can't be negative. In our case think about the entries are the mutation counts. How many mutations on the DNA of this person where sees the became tease right. I count 15 of them. Okay, 15 for that entry. And then two of this other type and so on right so that's the original metrics. So what NMF does it's an unsupervised methodology. So, you know, I'm assuming some of you probably know about it some never heard before. But it's a it's a methodologies where you basically take the region matrix and you factor that matrix in two. Okay, and depending on which field you are, you know you may have heard about the loading matrix and you know it depends on the field but often it's feature and coefficient. Okay, and essentially you come out with two non negative matrices. And of course there is some leftover like the error like like in regression right you have some noise left, and you try to minimize the error you try to have that these two matrices are as close as possible to your original will do the product of them. So you can do it using right different type of, you know, major error in different metrics using different metrics. Okay, so that's the idea but let me show you the intuition. Behind is so this is, for example, let's let's forget the 96 types for now for simplicity let's say we look at the six major ones okay. We can see the become a C become G's and so on. And how many of them I have this is a specific class cancer patient. Well, we're not with this nmf does is is going to split in two matrices, where on one side, the columns are senators, I'll tell you a bit more in a second. And on the other side, and the other matrix contains the intensities of those senators. So, basically, I told you what a senator is this a priority distribution right a pattern that a certain carcinogen likes to cause. And the other matrix is how intense is the senator in that given passion, right. I see a lot of smoking this passion, not so much alcohol, and this passion is old I see a lot of aging. And so it doesn't make sense. Okay. This just to provide intuition behind the approach. Okay. So that's what they use nmf. Now, in this approach. Now I think you understand that we see what it before because I'm interested in what causes cancer. And, but I noticed that there were some problems. And, for example, when you look at liver cancer. Each rectangle here, it's one patient. And the height is the number we touch total number of mutations found in that version, and the colors are the proportion of the total, they are due to a specific senior. And remember that in this framework a senator is a specific nutritional process. Okay, could be smoking could be. Because it's an MF. I'm not. Right, there is no learning, like, like, it just unlabeled right it's, yeah, so I don't tell an MF this guy was a smoker. I'm just splitting, using many patients. You know, asking for senators and this is what I get. Okay, so one problem is senator one these aging. So everyone has some age. Okay, that's good. That should be the case. And for example, everyone has this light blue color. Well, okay, one second. So you're telling me that every person patient with liver cancer was a smoker, or a second and smoker that didn't seem very probable. Okay, even less probable MMR, which is associated to mismatch repair mechanisms. You know, the third one, everyone has. And then, similarly, if you go to breast cancer. These are five closures because there are so many patients in this data set that, you know, you need a five to and also with different orders of money to the mutations that splitting five. But basically the point is, again, age. See this is the R factor, by the way, green. It's a very important breast cancer, right. But still the orange, which is Braca, we know that actually the Braca mutation in present in a relatively small fraction of breast cancer in women. Okay, but the NMF here is predicting, I mean saying that pretty much every patient has them in pretty good quantities. Okay, so what is the problem. The problem is that, you know, and MF is a good methodology. There is nothing wrong with that. It's an approach that we can use and I'll show you in fact, in one or two days we are using it so I'm not saying it's bad. But you know it has limits. For example, because it's unsupervised, you have to decide how many senators you want. So, and there are approaches to decide what is the right number of, you know, senators that you want the algorithm to speed up basically the dimension of one of the dimension of those two matrices right on the product. Well that has to be decided by you or by some approaches and there are several, but unfortunately, in my point of view at least apply to this data none is doing a great job. Okay. And so what happens is that, you know, if there are truly four senators. But in that paper I think they have 31. Okay, now I just brought the signal of four senators across 31. Where none is really, you know, a true senator so what we did here is we we decided to to do a test. And the following that this is the following test. So here is the probability distribution that I just described and the senator one, which is agent. So this is the, in a sense, you could say the our signature, okay, the signature of this just normal processes. I call, you know, normal replicative indulgence processes. So as you can see the big pick, it's a C to T, funny enough my issues, by the way. And if you look at that, there are, because there are really 96 triplets right, but there are four, they're really particularly high. And those are all C to T is so C, a C that has been changed to a T, where the flanking base to the right so the next letter, it's a G. And this particular reason but I won't, I won't go into that I think some of you know biology can imagine what that is. Okay, so this C to T follow by G, it's something that we've known for 20 years to be the senator of aging. Okay, we metallurgies much just by observation. Okay. So that is not something that this mutational approach brought to the field. That was already known. By the way, at that time of publication we just call senator one afterwards by observing that in older patients that was more of this than the inference was this probably has to do with aging right. Okay, so the question is, is there any value in everything else, because the senator is a probability distribution. I have 96 teachers here right for for each senator so. Okay, here's how I just spent I, I produce random senators Okay, so basically it's like I asked my daughter to write to draw 31. Mutational senators 31. I didn't ask her, but it's the same thing right. Well, she draws 31 of these figures, however she wants them. Okay, there is going to be one of them that has going to have more C to T is that the other 30. Yeah. Now I call that the aging senator. And now I can test how that performs in predicting older people so aging versus this one. So there here, these are the random senator that I'm showing you so for example in this case, right. I will pick the green as my senator one because that has to be the one with the peak right there. And I say okay how does this one. Okay, and the performance of this completely randomly generated senator was on par if not slightly better than senator one for pretty teenage. So what that means is not that's useless the senator, but what that means that's beside the fact that we have a peak at C to T, there is really no information that it's just noise. Yeah. So here we are lucky because we knew about the C to T mutation before doing this machine using this machine learning approach. Okay, and so it's nice to actually see the using an MF you can recapitulate an agent senator that at least on the peak looks correct. That's nice. The question is, what do you do for the others where you don't know, right, is it working and how much of the actual distribution can you trust right. So that's, that's formed the basis for approaching in a different way and I guess I'll stop here with this my last slide and tomorrow. I'll take 1015 extra minutes for finishing this lecture, which is the idea was like. Well, one point was like why should, why should we not do a supervised if we can. Right. I'll tell you how we deal with when we don't know because you know if you don't have the labels then you cannot do supervised. But if we can do supervised why we don't because I think you all agree. Right, the any method. If I can super if I can do supervised learning, I better beat the supervised or there's something really wrong with what I'm doing. Okay. So, I thought, let's do a super in a supervised way and also let's not assume, because that was another fundamental assumption of the work which was to assume that the computational process cause a certain senator, independently of which tissue you're looking at. Okay, so if you look at that work you know senator for which is or five said which is mocking, I think it's for the main That's mocking. Okay, it doesn't come with smoking in language. That's a smoking senior. Okay. And I want to force that. We said, could it be that even aging or any factor really likes to make different mistakes in different organs. You know the biologist different different organs who knows right. So that's what we did. And I'll tell you tomorrow about that was really machine learning part that we did but I guess we talked about an MF so we did some machine learning today, technically. Thank you so I'll stop here for today. Thank you. Thank you, Kristen. We had a lot of questions during the talk. Are there any concluding questions. It was fascinating. Thank you. I was wondering. My understanding was that to decide whether a patient has a mutation that cancer genome is compared with the normal genome right of that patient. Yes. Yeah. And, but of course, also the normal genome potentially has different mutation. I mean passengers or what have you right. So why don't people sequence more than one normal bit to understand what are the, you know, the truly random unstable basis let's go ahead and come up with what are generally tumor drivers. Yeah, you. Fantastic question. And, you know, I was looking at Sophia because this is the very paper we are working on right now. It's a very important point. And, you know, I can I can just say, in general the problem if you're familiar with how sequencers work the problem is that if you do bulk sequencing, where you're taking multiple cells, you know, in the soup and make a soup. Then the sequencer will not call. Then the caller will not call things that are not at least subcrona. And the reason is because the error of the sequencer is so high. So just to give you a sense. And I think you I think I saw somewhere some sequencing lecture on sequencing to write, or, or, or it's coming. I saw something in sequencing, but just to give you a rough estimate, let's say in Colorado cancer. On the exome, which is 1% of the genome about. We have, say, 100 mutation, typical number 50 200 mutations. Okay, they are real. The sequencer adds one mutation every 1000 basis. So the genome is 30 millions. Right. 1% it's what 300,000. So now your signal has been completely obliterated, like, it's not that you can take the reads and say, Oh, these are reads belonging to just different cells in the soup yet they are and it's correct. But then the sequencer as throws on them, like orders of magnitude, higher noise. So the only mutations they are actually going to be able to call are mutations that are common among cells. So the probability that mutation is by chance creators on the sequencer by the sequencer on the same spot. It's very, very long. Right. So that's how you call in both sequencing that's how you call mutations you see sub clone sub clone so you know, you look at the variant a little frequency basically that's how it's called. And you look at colors can use 5% for example so you want to see in at least 5% of the reads or more. Okay. So the problem with which is, as I said, a fantastic question is. Okay, but then, you know, in cancer, what is good about when you do on cancer is that in cancer because the final set they had all the mutations are quite to be cancer, then expands and create your full blown cancer. All the daughter cells of the original cell contain the mutations of that mother cell. Right. So when I do sequencing both sequencing. In theory, all the mutation of the mother cell are clonal in all cells. So you can see what when you sequence a cancer tissue, you're really getting the reading of the mutations present in the original first cancer cell, it became cancer and created your clone. Okay. By the healthy tissue, where there is no one that's doing this expansion for you. Every cell is as their own private mutations. And so, you know, there are approaches like single cell sequencing, but then you know sequence just one cell. Unfortunately, you know, they're not still there, in my opinion, at least. But yes, the field is definitely going in that direction, because you want to know each cell, what each cell contains you don't want to just know, you know, clonal mutations basically, right. So yeah, it's a, and many are working on this and and in fact, we are not working necessarily on improving the technology, even though I'm involving something else in the direction but yeah, but we are working on a methodology to get around this problem, I guess, in some way. I think that was another yes. Oh, microphone. Yeah, so I just wanted to ask other mutations in the primary tumor and the secondary tumor like similar even if they are on different organs. In the primary tumor and in the secondary, yes, after metastasis. Yeah, yeah, you definitely have shown many times that, you know, the original mutations that were present in primary tumor are usually present in the metastatic parts of the tumor they are in other organs. Yes, of course. Yeah, thank you. Oh, I think there is one. First thanks for the report in lecture. I have a question and I want to know your idea about it. If I want to answer this question, how much random is the cancer. Does it make sense. And if yes, what is the scientific approach to answer this question. Sorry, what is the scientific, the scientific approach to answer it. Yes. So, first, that's a good question. Let me be a little bit of a mathematician here. When you say random, it's very, it's very important to define what we mean with that because, for example, when W. When we read the size of our paper, one of the things they said is, we all know that cancer is random. Then in this, I mean, you know, why, because of course cancer is a random phenomena, whether it's, I can be a smoker or my life and not get cancer right. And what I just showed you is not really about just being random. It's this, I call them round or picket me touches. So, to, to, to answer your question need to understand. Do you mean the R factor for random how much of cancer is due to this endogenous processes is that a question if I want to clarify my question. And we can talk about randomness in two different way. First, we have lack of knowledge. And that's why we use this randomness to formulate it. Sometimes maybe this randomness is a nature of the phenomena. And this year, I'm, I read some papers in nature in science and they say, for example, how much is the mutate cancer mutations are random. And they try to answer it, but it is, I think it is not so. I really not satisfied with the answer. When we say how much random is cancer driver mutations or how much random is the cancer. So back to question, does it make sense. I completely agree. And in fact, you know, I lived a little bit of that with some, even in our study some of the media translated our study as, you know, how much a cancer is random. What they really meant and they should have said is how much a cancer is, you know, this endogenous processes. That's really the question. Okay. But in a sense, you know, I understand at the general population level. I think they understand what that means if you say how much cancer is random meaning is not something that you are causing to yourself. How much cancer just an internal process that happens anyway. All right. But to be more precise, yes, if we're talking about how much cancer is an endogenous, then that is actually a very scientific question. And I think there are many ways to address that. You know the to the correlation I showed you was not proof, but definitely was an association that to me was screaming for this thing plays a major role how much I cannot say from that analysis. And another way which I didn't show just showed you the final answer is proportional mutations that you assign. Okay, one way is with the additional senators. As you can see, I can take a patient now this particular technique might not be particularly good but still is doing some job insane. There's much is due to smoking this much is due to aging, and that can assign mutations to the different, you know, ideological factors. Another way to do it is, for example, looking at me to show loads. As an example, we know the smokers on average have about three to four times the mutation rate of a healthy individual non smoking. Okay. And if there is a little bit of a mathematical rule you can take that to a power of two, right so four to the square 16. In fact, it's more like 2.5. So if you get an increasing mutation rate of four implies an increase of risk lifetime risk of something like between 20 and 30 time, which is exactly what you observe for smokers. So now, if I look at the population of smokers. I usually see in a lung cancer, about four times three to four, say three, say three meter three times more mutations than in a healthy individual. If a person with lung cancer smoker came to me and said, can you tell me what is the proportion of mutations that was due to smoking. I'll say, if I don't sequence your DNA I cannot tell you, I can tell you based on population data that what I observed that is that you are probably going to have 300 mutation instead of 100. So probably two thirds of the mutation a lot that you have in your cancer was due to your smoking. And I think that's a scientific answer. It's population base. But, you know, so so those are ways in which you can answer that question. Thanks. Thank you very much. Thank you. It's time to close. So thank you very much. And we will recommend again tomorrow morning with a lecture by gasper and and then the sessions in the afternoon. Okay. Thank you.