 did want to talk a little bit about my PhD thesis. So back to some more serious stuff. That's not really serious, I just like talking about it. So we talked a lot about QTL mapping, right? And that is just the association of a genetic marker with the mean in different phenotypes, right? So you have a marker in the genome where some individuals are one, other individuals are two, and then you look to see if there are difference in the mean of a certain phenotype, right? So when I was doing this presentation initially, I just had like the definition, right? So a quantitative trait locus is a section of DNA which is associated with variance in a phenotype, a quantitative trait, right? So that's the definition of a QTL, right? We already saw this slide, right? So to detect a QTL in a population, we need to have measured genotype, for example, SNPs, and a phenotype of interest, such as tail length or yield or whatever you come up with, right? And then at each genetic marker, you just do a regression or a t-test to associate the phenotype with the genetic marker of interest. So again, how we go one by one? Oh, this one goes automatically and doesn't have the little thing, right? So we go through the markers and at each marker we just say, well, okay, is the A group bigger or smaller than the B group? So that's kind of what QTL mapping is. We do statistics to associate the marker and we show it as a lot score. We already talked about that. And then had likelihoods are plotted for every marker on the chromosome, we already saw this. And then in RQGL, I also showed you how to load the library. So we load the library, we load the dataset and in RQGL, you have to scan one function to do a QTL scan if you don't wanna do the t-test yourself or if you don't wanna do the other tests, right? And then you get, you call the plot function and then the plot function generates this. So there are some serious limitations in QTL mapping, right? One of the things that when I started my PhD really was difficult for me to deal with is that QTL mapping only considers a single phenotype at a time, right? So we're only looking at yield or flowering time or some other phenotype which we associate. And it requires the phenotype to show significant differences, right? If we want to QTL map the locus on the genome which controls the tail in ice, we can do that for tail length, right? Because every mouse has a slightly different length of tail. But every mouse has a tail. So we'll never be able to find the gene which controls if you have a tail or if you don't have a tail because every mouse has a tail. The same thing holds for eyes, right? Everyone has two eyes. So doing a QTL mapping for the number of eyes, well, not so much the number of eyes because that might vary. Well, not based on heritability of course, but so had that the phenotype needs to show significant differences. And the problem is is that a lot of the phenotypes which are really, really interesting do not show any difference, right? Every human has hair. So figuring out where the locus is that controls hair or no hair is of course very hard to determine. And one of the other big drawbacks in QTL mapping and we didn't talk about this before is that when you have two phenotypes which are very correlated to each other, like the length of your arm versus the length of yourself then they result into very similar QTL profiles because the phenotype vectors are highly correlated to each other and because the longer you are, the longer your arms are, the load side that it will come up with are very similar. And of course that is genetically or like biologically that makes sense but for some things this is really annoying, right? Because generally we're interested in like where differences are controlled for and we also want to know for example where in the genome do we find the gene that controls if you have a tail or not? So one phenotype at a time there needs to be significant differences and highly correlated phenotypes result in a very similar QTL profile which just means that you cannot distinguish load side that make you grow bigger from load side which make your arm grow longer. So that's just a drawback. So imagine two phenotypes which are very highly linked, right? So imagine that I am a plant biologist or I am a farmer and I am growing corn, right? And I'm very interested in the yield of my corn so the amount of grain that I get from a single plant and the susceptibility to infection. Then these two things are highly correlated to each other because bigger yields generally mean a higher susceptibility to infection. And this is something that we have seen in many, many phenotypes. And here's one of these examples where people looked at wheat yields and the yields of wheat plants across the years, right? So you see here that, well, initially there was no real like improvement, right? Because we just did race selection and then we did pedigree selection to kind of improve our plants. But when we started like in 1955 using scientific breeding and especially in the 1980s with the advent of QTL mapping people figured out which locations on the genome of this plant controlled the yield. And we've been doing this now for like 20 years and the yields are decreasing, right? You see that in the beginning when we started QTL mapping we had a very high or very rapid improvement in the amount of food that we could make out of a square acre of a field. But this has been more or less stabilizing since like 2000, 2005. So we found the initial outside that we're controlling the phenotype, we selected the animals for having positive load psi, but in the end we selected this a couple of times and now we're in a situation where our plants are more or less optimal and there's this kind of improving plant yield further also makes them more susceptible which makes that the yield goes down again. And this is a very serious problem in science and in production as well, right? Because the way that we as humans live on this planet is that every year we need more resources because there's more humans. So we want this increase that we get from the scientific breeding approach or more or less the selection approach using QTL mapping and these kinds of things to continue improving. But we can because the more we improve our plants the more susceptible they become the less yield we get in the end, right? So at the beginning of my PhD I thought long and hard about this and I said that no, we should find a different method, right? Because we are interested in these two things at the same time, the susceptibility on the one hand and the yield on the other hand and these two things are coupled together so we want to instead of map the differences in mean we would want to map where this correlation is breaking, right? Because if we find a genetic locus where the correlation breaks then we can actually select for this locus and then we can continue improving without suffering the effect from the susceptibility. So that's why I define CTL mapping as a correlated trait locus, CTL which is very poorly chosen in retrospect because CTL also stands for cytotoxic T cell and there's a lot of literature about cytotoxic T cells and their literature amount is improving. So no one can find my method. So everyone who searched for CTL mapping on Google gets like genome-wide associations where people use cytotoxic T cell. So it's very poorly chosen. But a correlated trait locus is a section of the DNA so a locus in the DNA which is associated with differences in correlation between phenotypes. So CTL mapping is very similar to CTL mapping. It's just the differences is that it's multi-phenotype. So instead of looking at a single phenotype at a time, we're looking at pairs of phenotypes. And what we want to do is we want to identify genetic regions where there's a difference in the phenotype to phenotype correlation, conditional on the genetic marker that we're currently looking at. And of course, it's an unbiased data-driven method which no prior information so there's no type of bias that flows into it. The same thing as CTL mapping, we just measure the genotypes, we measure the phenotypes and then we do the association analysis and CTL mapping is similar in that sense. So there's no like iterative process or these kinds of things involved. So the idea that I, when I came up with the ideas what that CTL mapping should be applied in classical selection in breeding to improve the economically interesting phenotypes. So the idea was that you select for a beneficial CTL locus similar to QTL. So in the case of yield and susceptibility you want to break them, right? So you want a locus at which these two phenotypes are not showing any correlation because if they are showing correlation and you're selecting for it then the next generation will still show correlation, right? And the idea is that by selecting for these beneficial loci where the correlation breaks you can break the linkage. So the correlation between these phenotypes in the next generation. When we started doing this, we also figured out that when we combine correlated trade load side together with quantitative trade load side we can build kind of a phenotype by phenotype causal network. And I'm not going to talk much about that but I want to talk about this initial idea that by trying to find load side at which two phenotypes which are normally highly correlated are now not showing any correlation and then selecting for that locus will be able to break this linkage. So that's the thing, right? So the yield plateau tells us that both phenotypes yield and susceptibility because they are highly correlated with each other you will get the exact same QTL profile, right? So when you select for high yield you will also select for increasing susceptibility which you don't want because that means that you have to use more and more chemicals to get rid of all of the bugs every time. So the idea is that CTL mapping finds load side where correlation between yield and susceptibility is lost we can then use the CTL information to break the correlation between yield and susceptibility. So how does this look? So this was one of my figures from my thesis where I just said, well, this is a simulation. And so if we have phenotype A which is the yield of the plant we have phenotype B which is the susceptibility. We have the AA genotypes which show strong correlation, right? The same correlation that we see overall in the population and we have here the BB genotype which shows low to no correlation which you can see that there's no, like it's just a cloud of points and there's no straight line, right? So if you see a picture like this which genotype should we breed? Should we breed the AA individuals or should we breed the BB individuals? So that's a question to you guys in chat if you're still listening, right? So that was the whole idea behind it, right? That if you have two phenotypes which are highly correlated overall at each marker you look at the correlation between the individuals which are AA and the individuals which are BB and then you will find if you're lucky a locus in the genome where these two highly correlated phenotypes are not correlated. So in this case, of course if you wanna breed the next generation you want to breed individuals which are BB at this locus, right? Because the BB shows no correlation and in this case we have a positive phenotype linked with a negative phenotype. We can also have like a negative phenotype linked with a negative one and then of course you would probably want to have the correlation and show the lower tail, right? So select the CTL genotype to unlink two phenotypes and in the next generation we should show a decrease of correlation between the phenotypes and this allows again in the next generation to select high yield without increasing the susceptibility, right? And then in the next generation you would just select individuals based on CTL information like you would normally do. So the methodology here I explain it using recombinant red lines. There is a package on a crumb which actually is called CTL and this package handles many more complex crosses. So I just explain it here using AA and BB individuals but it also works if you have an F2 population where you have AA, AB and BB individuals. So recombinant in red lines I had the example here is assuming that I have four phenotypes which have been measured and I have six genetic markers, right? So I have four different phenotypes and six genetic markers and at every locus, right? At every marker there an individual can have either AA or BB just to simplify it for the presentation. So the way that this methodology works is first you select a phenotype called P1, right? So that's your first phenotype. Then you select a genetic marker. So the first one, right? You split the individuals into two groups by their genotype. So you have a group of individuals which at the first marker is AA and then you have a group of individuals which at the first marker is BB. So you have a population for example of 100 animals and 50 of them go into group one. 50 of them are in group two because these 50 have AA, these 50 have BB. Then what you do is you calculate your correlation for both the AA and the BB genotype. What you do is you do P1 times all of the phenotype correlation vector, right? So you just say, well, calculate the correlation of your phenotype P1 with the phenotypes which are showing AA, right? And then you just get a vector. So phenotype one, of course, shows a correlation of one to phenotype one because two things which are equal always have a correlation of one. P1 and P2 at this marker show 0.1 correlation, P1 and P3, 0.5, P4, 0.8, right? So here P1 and P4 are highly correlated at this marker. You do the same thing for the BB individuals, right? So you again get a vector with four correlation coefficients. Of course, P1 versus P1 is still one. And for the other three phenotypes, you also get a correlation. So correlation in the AA individuals, correlation in the BB individuals. So then the next step is just to define the effect size, right? So the effect size in QTL is defined as the difference in the mean between the AA group and the BB group. But in CTL mapping, the CTL effect size is calculated or defined as the difference in correlation between the AA and the BB groups, right? Just like QTL, but now instead of looking at the difference in mean, we're looking at the difference in correlation between phenotype, right? And to make it easy, we're just going to take the absolute difference, right? So not negative and positive, we're just going to say that, like the absolute difference, right? So minus 0.1 becomes 0.1. So when we do that, so here we have the two vectors from before, and now we just calculate the difference vector. So of course, P1, because the correlation of P1 to P1 is of course, one and one, the difference is zero. The difference from P1 in AA to P1 to P2 in BB is 0.1, 0.20, 0.7, right? So we just take the difference, so we just subtract these two vectors from each other. Then the next step, of course, is now we have mapped one marker is to do all of the markers, right? So what I do is I take the vector that I just had and just put it on its side, right? So now here we have the markers in the columns and we have the different phenotypes in the rows, right? So this is the result from the last slide. You can check that it's 0.0, 0.1, 0.2, 0.7, and indeed here you see the same thing, right? And then this is the difference vector for matrix two, difference vector for marker three, difference vector for marker four, and so on. And like I told you, we assume that we only have six, right? So I repeat this calculation for every genetic marker, so multiple difference vector for our selected phenotype. Of course, that when we map P1 against P1, it will only will shield the difference of zero, right? So we could have not mapped this and just skipped it, but just for completeness sake, I just wanna show you the whole matrix. Let me get a sip of water. It's been a long lecture. All right, so here we map P1 against P1, always a zero, and of course for the other phenotypes, we don't get that. So now we need to of course find what is significant, right? Is this difference of 0.6 in correlation is that a significant difference? So what we do is we repeat the same thing at 10,000 times. Just like I showed you guys for QTL, is we break the link between genotype and phenotype, right? So in this case, we're just assigning genotype vectors at random to the individuals, just like we did before in QTL mapping, but in QTL mapping, we assigned the phenotypes randomly, right? But now since we have two phenotypes, we want an individual, so the two phenotypes of the individual to stay the same, but now we just give it a random genotype vector in the end, right? So we redo the whole analysis, we remember the maximum score, and then we make a distribution out of it, and then we find our 5% and 1% thresholds for significant values, right? It's just the same way. So we're just gonna permute ourselves out of problems by just saying, I'm just gonna, instead of assigning a new phenotypic value for each individual based on the fact that we have, we're now just gonna assign like a new genotype. So, CTL uses 10,000 plus permutations to assign significant of course in the package because I did study this stuff for four years. I also devised a method to directly calculate your p-value using mathematics. So of course, we then convert, so we convert the differences in correlation that we see to probability values. So how likely is it that there is a real correlation difference at that point? And then we do the next step, which is just saying the same as QTL where we convert the p-value to the lot score. So we just take the minus log 10 of the p-value. So by converting this, have we performed QTL mapping for P1 as well? So because we have the data anyway, so we have P1, which we can CTL map against the four phenotypes that we have, but we can of course also just do the QTL mapping, right? So for P1, we get four vectors of lot scores from CTL mapping, right? So every correlation difference is transformed into a lot score. And beside that, we have the information about P1 itself, right? So because we can also associate P1 with every one of the six markers that we have. So how does this then look? So this is the way that we visualize it. So we take our QTL curve of P1 and we just plot it on the top and then we take the CTL curves of P1 versus the other phenotypes, which we see on the bottom, right? So on the bottom is the lot score the negative lot score of the CTL score. And of course, on the bottom, we then see four or five or six or seven lines, no matter how many phenotypes we had. All right, so that was the CTL mapping method. It's a relatively easy thing, right? And in the end, we find load psi in the genome where we see that there's, for example, a QTL controlling the variation in P1, but we also see that at this locus, P1s loses correlation with some of the other phenotypes. Let me actually pull up one of the other presentations that I did about CTL mapping where we use some real data, right? Just to show you guys how we can use this information. It should be somewhere in presentations. CTC, transmission ratio distortion, here. So let me open this up and please swap that. Just swap the PowerPoint. Yes, I can. So let's go to properties and then switch to this one, right? So what we were looking at here is in Arbidopsis-Taliana, we have metabolites and these metabolites are known to be in a linear pathway. So we have something called hydroxypropyl, which is then using an enzyme is transformed into a methyl sulfonilpropyl and then using another enzyme, this is transformed into methyl teopropyl, right? So it's just three metabolites with two enzymes in the middle. There is a major regulator of this pathway on chromosome five and all of this was known, right? So then I did the CTL mapping, right? So the first plot that I'm going to show you is, and remember the colors, right? So it's green, red, orange, right? So green is on the top of the network, then green gets transformed into red and red gets transformed into orange. So here we see the QTL profile of hydroxypropyl on the top, right? So we see that there's a major regulator of the difference in hydroxypropyl on chromosome five. The same thing holds for chromosome four. There's a marker on chromosome four, which also controls the hydroxypropyl concentration in the plant. But then we start seeing that if we do the CTL mapping with methyl sulfonilpropyl and methyl teopropyl, right? So the red one and the orange one, we see that we do find a little locus on chromosome one. And that is strange, right? Because we never had any indication from QTL mapping that something on chromosome one was actually driving the hydroxypropyl region. But we do get the idea that, well, there is something on one which makes hydroxypropyl lose its regulation or lose its correlation with the other two phenotypes that we're looking at. And we see the same thing on chromosome five, right? So on chromosome five, we learn nothing new because we already knew that there was a major driver of this network. When we then look at the middle phenotype of the middle metabolite in a pathway, methyl sulfonilpropyl, what we see is now, hey, that's interesting. There is a little QTL on chromosome one for the middle phenotype. So there is something on chromosome one which is controlling the concentration of the middle phenotype. Of course, there's also something on chromosome five, which is kind of passed down, right, from the initial one. So the initial concentration of hydroxypropyl is transformed into methyl sulfonilpropyl, right? So the more you have at the beginning, of course, the more you have from the intermediate product as well. So here we also see the CTL line and we indeed see that at this locus, we do get the idea that, yes, no, that there is correlation between the amounts of sulfonilpropyl of a hydroxypropyl, sulfonilpropyl and the teopropyl. So when we then look at teopropyl, now we see something interesting because if you look at the top, we see that when we do the QTL mapping of this phenotype, we don't know where the concentration of this metabolite is controlled from. There are no significant regions. So if I would do an experiment using methyl teopropyl measurements, right, and I would scan across the genome, I would learn that there is no locus that is controlling the concentration of methyl teopropyl. However, if we look at the CTL map, we do see that we get significant load sign. So we do see that the method tells us there is something on chromosome five, which is influencing the correlation of teopropyl, hydroxypropyl and methyl sulfonilpropyl, right? And the same thing again on chromosome one. So what we see is that for the first two phenotypes, we don't really learn anything new except for that there is something going on on chromosome one for hydroxypropyl. We learn for the middle one, we don't learn anything or not that much because we already knew that there was something on chromosome five controlling it. But for the last one, we don't get any QTL. And because we get no QTL as a geneticist, you're stuck here. You cannot say what's happening to teopropyl. But we can from CTL mapping, we learned that no, if you wanna influence this phenotype, you have to be on chromosome five. And there might be something on chromosome one which can also influence it. So the idea is that we have this network, right? So this network is driven. So if we combine the QTL information that we get, so on chromosome five, we see that the strongest association with chromosome five is with hydroxypropyl. Then the lot score, then because the concentration of this one influences the concentration of this one. So we can still see the effect of the chromosome five locus on this one. But we learn that this effect of chromosome five is not a direct effect. It is an indirect effect. It goes via hydroxy. So the thing on chromosome five is not directly influencing material sulfonylpropyl. It is influencing hydroxypropyl and that in turn is influencing material sulfonylpropyl. So the exact same thing, have we then find for this locus here? We don't find any direct association of chromosome five with this phenotype, but based on the CTL and the strength of the CTL, right? Because the thickness of the line determines the strength. We now learn and we now start seeing that indeed this is kind of a linear network, right? Because we see that the red metabolite should be in between the two because there is a very strong CTL from the red one, from the green one to the red one. But from the green one to the yellow one, it is much lower. And from the red one to the yellow one, it's also lower, but it's still detectable, right? So we can start building up this causal network and seeing that indeed the network should be hydroxypropyl, causes changes in sulfonylpropyl, which then causes changes in methyl teopropyl. And without CTL mapping, we would have never looked at chromosome five for this phenotype because there is no direct association. Only based on the correlation, can you see that the correlation is lost at that point. Good. So that's the whole big idea. That's what people gave me my doctor title for. Seems not much, but was a lot of work. Was a lot of work to do all of this. And thank you guys for actually staying until the end, right? So you can see that I actually use the same phenotype in this presentation as well. But the idea is that CTL mapping is only a part of the puzzle. CTL mapping gives you direct effects on the means of phenotype. But in the end, it's not about single phenotypes. It's about the relationship between phenotypes and how genetic loci kind of modify these relationships. Good. So that was what I wanted to tell you today. So for today, just as a quick overview, we did phenotypes, heritability. I tried to explain to you guys what CTL mapping is and how you need to use experimental crosses. I told you guys about GWAS. I didn't tell you about the BFMI. It will be in the slides that I upload, but I just skipped that part. And I will talk to you about CTL mapping. Also, the fine mapping part is not directly in the presentation because I skipped it today because we were kind of running out of time because we took an hour for the assignments. All right, so for me, that's it for today. If there's any questions, remarks, other things, then please let me know, throw it in the chat. I think all of the four guys that made it to the end, like you're amazing. Thank you for being here. And of course, for my moderator, who's also probably still here. Bacon, Misha, thank you guys for joining and staying until the end. That was it for me. It's five, so people on YouTube, see you on the flip side. So, see you next time.