 So welcome back everyone. Also, if you're watching this on YouTube later or in five years or, well, not yesterday or tomorrow, but welcome back. So in the second part of the lecture, I explain to you how we make crosses, right? So how we do things like a back cross or an F2 cross. So once we have this population, right? So once we've taken these two inbred mice, generated the F1, then took the F1 individuals, crossed them back to the parental strains, we generate like 50 or 100 or 200 mice. And all of these mice will have more or less a unique genome. And with this, they have a unique phenotype, right? So we can then do linkage analysis. And there's two flavors. So the flavor that I'm going to talk about first is QTL mapping. So QTL mapping means that we use one of these structured populations. So what happens is that we first need to create a genetic map. So for that, we can use primers or PCR, like I showed you guys in the second part. Generally, we use a genotyping array to measure single nucleotide polymorphism. So I have to measure individual points on the genome, see if the individual's AA, ABT, or TT. But we can do this also with methylation markers, right? So we don't have to look directly if the DNA is variable. We can also look at the methylation state of the DNA. So if DNA is methylated or if it's unmethylated. Besides that, we can also create these genetic maps from Mendelian phenotypes. Because a Mendelian phenotype is controlled by a single locus. When we have a Mendelian phenotype, like we did a couple of lectures back, then you can also build a genetic map based on Mendelian phenotypes. When we have multiple of these markers, then these together form a genetic map, right? So every individual, we can measure the genotype at each marker, right? And then what we generally do is then we calculate the genotype probability for each of them. Because when we measure the genotype state, we do this, and there's a little bit of uncertainty surrounding it. But when we have this genetic map, then we can perform association or linkage analysis. So here we see one of these experiments, right? So this is more or less the data, the way that it is loaded in R. So loaded in R or some other language, we have a vector which, for example, contains the yield of a certain plant, right? So here we see a plant which had six and a half yield. And here for the genotype, I only write one of the alleles. So when there's an A here, it means AA. And when there's a B here, it means BB, right? Because you have two chromosomes. So this individual at the first marker has an A at the second marker an A at the third marker had a B and at the fourth marker had an A again, right? So the one row is an individual. So with its value, phenotypic value, and then with the genome that it inherited from the two parental strains. So in this experiment, we have one, two, three, four, five, six, seven, eight, nine individuals, right? So what is linkage analysis? So linkage analysis is nothing more than looking at the first marker, calculating the mean of the A group, right? So if we add up six and a half, 7.1, 8.3, 8.6, 8.9, then divide it by the number of individuals, which is five, then we get an average value. And then we do the same thing for the B individuals. And then in this case, the average of the A is around equal to the average of the B. If we look at the next marker, then we see that something funny happens, right? Because now all of the A individuals have a very low yield, but all of the B individuals have a very high yield, right? So that means that A is much smaller than B, so we just put a dot there. And then we just move along the genome and we do this for every marker. So every time that we move to the next marker, we just calculate the mean of the one, the mean of the other one, and we plot to see if there's a difference, right? Here again, we see something where we say, oh, this is interesting. We have B, B individuals now having a small yield. And here at this position, we have all of the A individuals which have a relatively high yield, right? So there's a big difference. And that's it. That is association analysis. Go through the genetic marker by marker. At each marker, calculate the mean of the A group, calculate the mean of the B group, and then do a test, right? Or look to see if there's a difference. So here we're just looking at the effect size, right? So if we look at the effect size, right? And if we look in an F2 population, then we can have an additive effect, right? Which means that an allele contributes equally. So if we have an AA individual, it has a certain mean. If we have an AB individual, it has this mean plus the effect of the B allele. And when the individual has two B alleles, then we have the same effect. So the difference between the AA and the AB group is the same as the difference between the AB and the BB group. So when we see this pattern, when we look at the three groups, right? Instead of having two groups as in the example, you generally have three groups. Then we call this an additive effect. We can also have a dominance effect, right? So a dominance effect means that as soon as you get one allele, your average shoots up. And then it doesn't matter if you have one or two alleles. We call this dominance, right? So when we look at the AA group, we see that we have a relatively small phenotypic value. When we look at the AB group, we see that we have a relatively high phenotypic value. And then when we look at the BB group, we see that the BB group is similar to the AB group. So this is what we call a dominance effect of the B allele. Because having one B allele directly gives you the phenotype. The high yield, right? Having two of them doesn't increase that anymore. So that is the difference between additive and dominance. Of course, there are many other patterns that you could draw, right? You could take this blue box and pull it all the way up, right? So then you have an A allele, gives you a certain phenotype, being AB puts you somewhere up here. And then being BB puts you somewhere here again, right? So there are other patterns which are possible. And there's two different patterns which we can have. So we can have an additive pattern, we can have a dominance pattern, but we can also have an over-dominance, right? So over-dominance is also called heterosis or hybrid vigor. And that means that the phenotype of the heterozygote group, so the individual which has two different alleles, lies above the phenotypic range of both homozygote parents. We can have under-dominance, which is also called negative over-dominance. And this is, again, similar to the over-dominance, but now the heterozygous lie below the phenotypic range of both homozygous parents, right? So those are the three different patterns that you can draw, or you can have no effect, right? No effect is also possible when every group has the same mean, right? So under-dominance and over-dominance. So four different type of effect, additive, dominance, over-dominance, or under-dominance. So when we now do QTL mapping, we don't look at the difference in mean because every mean comes with a standard deviation, right? We map the effect of a QTL, right? Because we map the effect of each marker here by just making this graph, right? Where we put a dot for every marker and we say, well, here the A individuals are much smaller than the B's. Here the B individuals are much smaller than the A's. But we can just use statistics now to map the likelihood, right? Because in the end, we want to know how likely it is that a certain marker in the genome is influencing our phenotype. So how can we figure out how likely this is? Well, in the case where you only have two possibilities, like in the back cross, right, where an individual's AA or AB, then we can just use a T test, right? So we look at a marker, we have 75 individuals which are AA, they have a certain mean, and then we test the phenotypic value of these 75 individuals to the 25 individuals which have the AB genotype, right? So just a basic T test. Of course, when we have more groups, then we can just use a multiple ANOVA, right? So if we have three groups, then instead of doing a T test, we can then do a MANOVA. We can also use linear regression. Oh, that's so bad. Let me fix this slide. I don't like that. I don't like that. That's a major typo because it's called linear regression. Sorry for that, guys. So linear regression, right? So we can use linear regression as well when we have multiple groups. So in QTL, we generally don't write down the P value, right? Because we have for each marker, we get a P value and this P value says this is the probability that this marker is changing our phenotype. So we do calculate P values, but when we present them, and I'm gonna show you a whole bunch of these plots, then we write these P values down as lot scores. So a lot score is called a logarithm of odds. So what we do is we take our P value and then we do a minus log 10, right? So that means that when I have a lot score of five, it means that there's a one in 10 to the minus five that this has occurred by chance, right? So and the log 10 here and the minus head, it just makes from 10 to the minus five, it just turns it into five. 10 to the minus 10 would be a lot score of 10. 10 to the minus three will be a lot score of three, right? And of course, you can then better see the difference. If you would make a plot, then of course the difference between 10 to the minus five and 10 to the minus six is not really visible on a plot, right? Because if you plot from zero to one, then all of those would be very close to the x-axis. But if you take the minus log 10, now you start seeing these nice curves and these nice peaks because you see, oh, this marker had a five, this had a six, this had an 11 and that is much more visible than one times 10 to the minus five. So in QTL and also in GWAS, we don't show P values, we show the minus log 10 of the P value, which we call a lot score. So just remember, 10 to the minus five turns into a lot score of five, 10 to the minus three turns into a lot score of three and 10 to the minus 20 turns into a lot score of 20. So when we map QTLs using R, we have several ways to perform this regression analysis, right? So the easiest way is just take the marker and code the genotypes as a factor. So just write down this individual was AA, this individual was AA, the next individual was AB, the next individual was BB, right? So now for each marker, we fit the following linear model. So in R we do linear model, regress our phenotype on the marker as a factor, right? So just treat the marker as three categorical groups. So this basically means that for each marker, we ask the questions, are the AB individuals or the BB individuals different from the AA individuals? Because we have the AA individuals being the standard mean, and then we just see if one of the two groups is different from the AA. And so when we do regression using a linear model of the phenotypes to the marker as a factor, and then we get two beta estimates, we get the AB group versus the AA, so we get a beta for how different these two groups are, and we get a beta for BB versus AA. But we get only one p-value, right? And this p-value only tells us if there is a significant difference, it does not tell us which group carries the significant effect. So when we see that, oh, at this marker, we have a p-value which is highly significant, we then have to look at the beta estimates to figure out what type of an effect we have. If we have a linear, or if we have an additive effect, or if we have a dominance effect, have because we only get these two beta estimates. And of course the AA group is always the reference group, so we compare the AB individuals to AA and the BB to AA. We can also do the additive mapping, right? So if we're only interested in additive effects, what we can do is we code the same genotype groups, so AA is coded as minus one, the heterozygous are coded as zero, and the homozygous BB individuals are coded as one. So now we fit the linear model where we say, do me a linear model where you regress the phenotype on the numeric values of the marker, right? So it's just drawing a straight line, AA, AB, BB, draw a straight line through it, right? So the question that we now ask is that, is there an increasing or a decreasing additive effect of the marker on the phenotype? In other words, can we draw a straight line through the groups? In this case, of course, we get one beta estimate, and this beta estimate is the directional coefficient of the line that we draw. And the p-value now tells us how good the fit of the line is through the data. Then there is the third way of mapping, and that is looking at both the additive and the dominance deviation. So we code each marker two times. If we look at the model, we say that we have a linear model where we regress our phenotype on the marker coded as the additive effect, which is the same as before, minus one, zero and one. And now we add the same marker again in the model, but now we code the genotype groups differently. So we say code the AA individual zero, code the AB individuals as one, and code the BB individuals as zero. So now we put the marker in twice. So we say there is a linear, an additive component, and there is a dominance deviation component. So we ask the question at each marker, is there an increasing, decreasing additive effect? After drawing this line, is the heterozygous group higher or lower than the expected value? So let me do a quick sketch for you guys, so just that we can understand this a little bit better, probably, or perhaps less better, that just depends. So let me get a empty thingy. So here what we are doing, right, is we have our phenotypic values, right? So we have our axis, right, where we have minus one, which corresponds to AA, we have our zero, which corresponds to AB, and then we have our one, which corresponds to BB, right? So we just recode the genotypes at a marker. So we have individuals, right? So we have all kinds of measurements here, we have some measurements here, and then we have some measurements here, for example. Right, so in the first step, when we code this, we do the linear model, right? So of the phenotype based on the additive coding, right? Which means that we just draw a single straight line, and the straight line, because it's regression, will always go through the mean of the origin of the lowest group towards the mean of the highest group, right? So in this case, the line fits relatively well. But what if a whole bunch of these individuals would show a dominant effect, right? So then, let me actually, I don't know, like this, nope, all right. So if the individuals would now show a dominant effect, right, then I would draw the regression line, the regression line would go something like this, right? But now we see that the heterozygous group is still above. So now when I code this marker differently, right? So what the regression does is it fits the additive line, and then because it fits the line, the data now starts looking like this, right? So the data now starts looking like this. So the AA individuals are now having a mean of zero, the BB individuals also have a mean of slightly zero, but the heterozygous individuals were too high for the line, right? So now I code these groups, both groups as being zero, right? So this gets zero, this gets zero, so I take these two groups, put them here, and then I take this group and I move this group here to where it is one, right? So now what I'm doing is I'm doing another regression, so I'm drawing another line through the two combined groups here, and then looking to see if there is still an effect of the heterozygous group. And this is called the dominance deviation. So hey, if the dominance deviation is showing a significant effect, then what is happening is that there's an additive effect, but by fitting the additive effect, we end up with the heterozygous still being above or below the line, meaning that there is, it can be either dominant or under dominance or over dominance, but at least we now know more than when we just fit the linear regression line. And so basically this means, is there an increasing additive effect? And then after drawing this line and regressing out this effect, right? So taking the line, doing it like that, is the heterozygous group higher or lower than the expected value. So when we do it like this, right, we get two beta estimates, which is the directional coefficients of the additive effect, and then the beta coefficients of the AB group versus the mean of the residuals of the A plus B group combined. We also get two P values. The first P value that we get tells us how good the additive line fits the data, just like before, but the second P value now tells us how different the heterozygous are compared to what we expect. I hope that that's clear. I've never used a slide like this, so it's the first time that I'm showing the slide ever, so I hope that with the little drawing, I might actually just add a little drawing to the side of the slide to make it a little bit more understandable, right? So that's it, that's it. So when you do KTL mapping, you have three groups, AA, AB, BB, you can fit them, you can fit a model in three different ways. One's using a factor where you just ask the question, is any of the groups different from each other? You can draw a straight line, only look at the additive effects, and what you can do is you can do an additive and dominance deviation mapping where you look to see if there's an additive effect. After fitting the additive effect, are the homozygous individuals higher or lower than the line, or are they on the line? So when we do this, of course, when we do this mapping, right, we look at multiple markers and at multiple phenotypes, because we're having this population, and of course, we haven't genotyped one marker, but we genotype generally like a thousand markers. We don't look at a single phenotype, we don't look only at the yield of a plant, but we also look at, for example, the resistance to a certain parasite or the height of the plant or how long it took to flower. So we look at a lot of phenotypes as well. And of course, we need to correct for multiple testing, right? If we just say we accept a p-value smaller than 0.05 to be significant, if we have a thousand markers, just by random chance with this p-value of 0.05, we would get 50 markers that are associated. So we need to correct for the multiple testing that we do. And I think we already talked about multiple testing, but to get rid of this multiple testing, we can just take a very basic adjustment, right? Where we see that, okay, so now we did, we calculated a p-value for each marker, we transformed that to a lot score, and now we calculated threshold. And this threshold is going to be adjusted for the amount of tests that we did. So the threshold that we calculate is 0.05, which is the level of significance that we want. So 0.1 means suggestive, 0.05 means significant, 0.01 means very significant. So, and then we take this requested significance level that we want, and then we divide this by the number of markers that we have times the number of phenotypes that we're mapping in our experiment. And then we take the minus log 10 of that, and that is the significance line that we add to our plot. So this is the first way to deal with multiple testing. All right, and that's another auto ban. I love banning people, I don't know why. It's such a nice feeling of power. So this is the first way to deal with multiple testing, but this is very restrictive, of course, right? Because we have like 1,000 markers, five phenotypes, so we're correcting for 5,000 tests. But multiple testing correction is based on the number of independent tests that you do. And of course, this is massively overestimating the number of independent tests because there is linkage, right? So two markers which are close together across the population will look very similar, while two markers which are far away from each other across the population will look very different, right? So two markers which are very close are not really independent, right? Because they are bound together by the chromosome. So a better way of doing multiple testing is just to do permutation, right? So 10,000 times what we do is we take the genotype matrix that we have, we take the phenotype vector, and now we just shuffle the phenotype vector, right? So we just assign every individual a new phenotypic value from the distribution, from the phenotypes that we have observed. And because we are breaking the link between genotypes and phenotypes, there should not be an association between a genetic marker and a phenotype, right? Because we just randomly assign phenotypes to the individual. So because we do this 10,000 times, and we just, we break the link and then we just do the whole scan. So we just scan all of the markers and then we remember the maximum observed score. So the maximum lot score that we have for the whole genome after doing a single scan. Then we randomize again, we scan again all of the markers and we remember the maximum observed score. So after doing this 10,000 times, we can just make a histogram out of all of the maximum scores that we have observed. And now our 5% significant level means that, well, when we look at all of these values, right? We can just say, well, if a value that I observed in our real scan, so where we did not randomize, is higher than the scan, or the, is higher than the value here that we observed 5% of the times in our randomized data, then we say that the effect is significant. Or if it's higher than the 10%, we just say that it is suggestive, right? And generally these thresholds are a lot lower than the thresholds that you get when you just do it computationally by doing the number of markers and the number of phenotypes, just figuring out how many tests you did. Because when we do permutation, we only assign new phenotype. We don't change the linkage from one marker to another, right? The genotype of an individual doesn't change. So across the population, the whole genotype matrix stays constant while the phenotype vector is just permuted. So this is a better way of finding a threshold for significant. Of course, when you do 1,000 regressions, and imagine that we have 1,000 markers, we do 1,000 regressions, then by doing this 10,000 times, we're actually doing 10 million regressions. So it is computationally quite expensive to do permutation for multiple thresholds finding. But the thresholds that you get generally are much lower than the thresholds which you get when you do a normal Bonferroni correction. So how does this look? Well, once we have done this, I see that the AA and the AB groups here are slightly different. So this is one of the example data sets in our QTL. And this is a QTL profile of blood pressure in recombinant inbred line mice, right? So recombinant inbred line mice means that we only have AA and BB individuals. And here we see on the bottom, we see the different chromosomes of the mouse. And we see here in these little lines, the markers that we have, right? So we see that we have like one, two, three, four, five, six, seven. Seven markers on chromosome two. We have around five, six markers on chromosome three. We have a whole bunch of markers on chromosome four. But now we see here the lot score, right? So here at this marker, there is a one times 10 to the minus 3,7 probability that this locus here, so this marker here is controlling our phenotype. So when we do these scans, what we are looking for is these big peaks, right? So here at the top of the peak, there's a one times 10 to the minus eight chance that this would happen at random, right? So there is a high likelihood that this marker here on chromosome four is controlling our phenotype. So is having an effect on the phenotype. So what we then do is we look at the groups and what we see here is that the AA individuals have a much higher blood pressure than the BB individuals. Which means that, well, if we want to breed mice which have low blood pressure, then of course we can use this marker to select our parentals. So if we select parent mice which are BB at this locus, and then what we would expect is that the children of these parents would have a low blood pressure. If we are thinking about things like yield and plans, then of course if we see this pattern and we see a very high likelihood of this influencing our phenotype, what we then would say as well, we would select AA individuals to make our next generation because then we would have more yield from our plant. So one of the things that QTLs and what you do with QTLs is because it is a statistical association, right? So it is just a statistics. There is, of course when it's 10 to the minus eight there's a very small chance that this effect is not real, that from this, that at this locus there's not a gene controlling our phenotype. But however significant the QTL is, you have to confirm the statistics. You have to do the same experiment again in a different cross, right? So you now take instead of the mice that, so we for example have an Arbidopsis plant from Columbia, we cross that with one from Germany and had we do the scan in the parent population we find a very strong QTL for the yield and now we do the same experiment but now we take one from Germany and we cross this, we make do the exact same experiment when we take a parent from Germany, cross it with someone from the Ukraine for example, right? So then we check it in another cross and if in the other cross we also see this effect then we believe that this effect is real and we say that we have confirmed our QTL because the one QTL that we found in the Columbia times Germany cross we also find in the Germany times Ukraine cross. Another way to do this and this is really a lot used in plants is to use near-embred lines. So near-embred lines are lines which are coming from recombinant inbred lines but instead of making these animals fully homozygous we inbred them up until like generation 17, 18. So what happens is that you then have individuals which have chromosomes which look like this, right? So and of course here I'm only showing you one chromosome but of course they're having two, that's why I'm saying AB. So this individual for example at this chromosome four locus here, right? It has at the beginning it's totally homozygous at the end it's also totally homozygous but at the region that we are interested in so at the marker that we want to validate it is still heterozygous. So what we do is we take two of these individuals breed them together and then when we breed these two together which is an inbreeding of course because we can take two individuals from which are more or less from the same cross we breed them together. Then what can happen is we can get individuals which are AA, we can get individuals which are heterozygous or we can get individuals which are BB. So the AB individuals are not that interesting because they are the same as which we already had but now we can look at the phenotypic value of this individual and compare it with the phenotypic value of this individual, right? So in this case we would expect the AA mice to have high blood pressure and these ones to have low blood pressure and this is also a way of confirming your statistics to just take an individual which is more or less homozygous across the whole genome except for your region of interest and then we breed these individuals together and then we can just compare the phenotypic value of the BB group with the AA group and you only need like five or six to do the test now because you're only looking at a single marker and then if we see the exact same direction of effect then we can conclude that, yes, the initial association that we found is indeed true. For plants, this is really easy. You can just buy these individuals. So for example, plants like Arabidopsis, you can just call the guys in France who generated the original population and say, I have this marker, can you send me a near inbred line for this marker? And then they will say, okay, and they will just send you a couple of seeds, these seeds you plant and then you self-fertilize these plants and generate these three different offspring type. You look at the AA's versus the BB's and you can confirm your QTL. For mice, it is harder because these near inbred lines are really hard to keep but for plants, the near inbred lines is the preferred strategy to confirm quantitative trade load size. Good. So now we talked a lot about association analysis, right? So you need a population and hey, you need to do all kinds of breeding experiments to make sure that the individuals that you have have known genotype states but of course, we're generally interested in humans, right? Because if we wanna sell medicine, we're not gonna make high blood pressure medicine for mice, right? Because like no one cares really if a mice has high blood pressure or not but people care if humans have high blood pressure, right? So we can do the same analysis which we just did in mice or in plants. We can do the same thing in humans but in humans, we are dealing with an outbred population, right? Because humans, we cannot force one human to mate with another human and generate these populations, right? So in humans, we can only use SNPs just to check if an individual's AA, AT or TT at a certain location and we need to use really, really like large populations to do this. And then we don't call it QTL mapping then we call it genome-wide association. So QTL mapping is genome-wide association using a structured population or genome-wide association is QTL mapping using an outbred population, right? So the advantages is that you have SNPs across the whole genome. So we need like massive sample sizes QTL like this with a lot score of eight in mice we can find with around 150 mice. If you wanna get a peak which looks this good in humans because there are like literally thousands of recombinations which have been accumulated over the last millennia we need literally like 10,000 to 15,000 humans to do the same analysis to get the same statistical power compared to QTL mapping, right? So the nice thing about humans is is that we have very good resolution. So when we find a significant effect the region is really small but we have very little power. So we need literally tens of thousands of humans to do this association analysis. So when is GWAS used? Well, for example, when we map in humans so this is an example of amyloid accumulation in a human population, right? And then here they find these little peaks, right? And you see here that their maximum score is this EL1 rap gene, right? A SNP in that gene, which has a lot score of around like 8.7. But for QTL mapping a lot score of 8.7 is relatively low, right? Here you see also the threshold line. So you see there are 5% threshold and they're 10% threshold. And this is of course based on the number of permutations that they did. But we use GWAS for humans but also for wild isolates, right? If we are studying which genes are controlling the phenotypes of this little lizard that we caught in the desert and then of course we cannot breed this desert lizard in this very complex structure. We can also use GWAS when we want to find map previously detected QTL and compare here we have only like one marker which is highly associated on chromosome 4 with this amyloid accumulation, right? So we literally have like thousands of markers on chromosome 4 and only one of them shows a very significant lot score. But here when we look at this, then we see that here we identify almost half of a chromosome. So we don't really have that much resolution to say it's this gene because underneath this big peak there are thousands of genes. Well, in a human GWAS generally when you get a peak you almost know exactly which gene it is because of the fact that humans have accumulated recombinations over the last like almost 50,000 years that humans have existed. We can use GWAS in mice as well. Then we call it an advanced intercross line. So an advanced intercross line is a structured population starting from two founders which are bred in an F2 then you do an F2 scan. And then what we can do is we can start fine mapping using AILs. So we take F2 individuals and then randomly made them together just like what is happening in humans. So, but every generation in the AIL adds more recombinations to the population meaning that the intervals get smaller but we still retain the power of association. So here we see for example, a FETMAS in our Berlin FETMAS times the standard laboratory mice and we see here that we get a lot score of one times 10 to the minus 50 which is a lot score which you would never see in a human GWAS but again here we see that the peak that we get on chromosome three is relatively narrow. Generally not down to a single gene this is a generation 28 random inbreeding. So we have 28 generations additionally to rack up recombinations inside of our population but still now our region is big or at least there are generally like multiple genes still located in this region. So a GWAS can also be done in mice but then of course you have to take an F2 population and then start random mating them just like humans do because humans also kind of randomly made. So what are the differences and like similarities, right? So QTL mapping and genome-wide association both methods are there to find markers involved in the regulation of a phenotype. So in QTL mapping we use experimental populations we generally have very high power to detect effects so we can get away with only genotyping 150 or 200 animals but we have generally low resolution and when we show a QTL we show a smooth line plot and this is because of the linkage in the genome so because two markers very close to each other are more or less similar because we only have a very few amount of recombination so we use a smooth line plot. Genome-wide association is done not when we have an experimental population but when we have a natural population it has low power to detect effects but if you find an effect it has a very high resolution so you can almost point to the gene and say that's the gene that's causing it and we always show it in a Manhattan plot so Manhattan plots are like this where you hope to see these kind of little skyscrapers I hope that that's visible it's a relatively bad picture but here you can see this skyscraper we call this Manhattan plots here we also see one of these skyscrapers on chromosome 18 so you get these nice kind of so that's why we call it a Manhattan plot good, so this was actually the original spot where I intended to have the break but this is more or less my overview slide so guys, I told you about phenotypes, DNA, meiosis so meiosis one is where the recombinations occur I tried to explain to you what an experimental cross is I would definitely advise you to just sit down for a minute at home take two colors of pencils and just do the same thing that I did for like two markers and then do it for like three markers so draw all of the gametes then do a back cross so cross it with an individual which has just one single gamete possibility and do it with make an F2 cross or take an F1 individual which has eight different gametes possibility and cross it with another one which also had eight different possibilities which means that you have to make I think 64 kind of individual offspring but trust me once you start doing it on paper it will become much, much clearer how we do these experimental populations and then of course linkage analysis if you want to have a data set to do a little bit of QTL mapping on I can provide you one there are also standard data set in the QTL library in R so I can show you guys let me open up the R window so if we go to the R window, right so if we do install dot packages and we install the QTL package then let's just install it it's a little bit of time 6MB, so it's relatively and now we do library QTL, right so library QTL makes this available for us and now when we do data hyper which is the hypertension data set we can now for example look at the genotype so we can say pool dot geno on hyper and then show us the first 10 individuals show us the first 10 markers then it looks like this so here we see the first individual and here we see the different markers and here we see that the first individual had a genotype two the third individual had a genotype one, right we can also look at the phenotype so when we do a pool dot phenome from hyper we get the hypertension, right so and here we see that well here we see the blood pressure of the first mouse and it was a male blood pressure of the second mouse and it was a male alright so then you can now because you have your genotype matrix you have your phenotype vector you can now do the association analysis so at each marker you can just basically do a t-test calculate and just do a t-test between the individuals which are one and the individuals which are two good so that's it for this hour that's also and I shouldn't be saying this but for the exam this is it for the whole lecture so it's now four for the people who are still here thank you for watching we will do a little break and then I will do I will talk more about my PhD thesis and I will make a nice viewer reward drawing for Daniel for Testosaurus if you're still here Daniel is still here you're gonna get your drawing like I didn't forget I will make your drawing good so for all of the students that think like oh it's already four and I need to pick up my kids or I have this appointment at the spa for the exam this is it the next hour after the break of course I will talk a little bit about the method that I developed for correlated trade locus mapping so it's a it's a different method than QTL mapping but they are very much related and of course you can hang around and see me draw a puffer fish which will be a massive massive success again because of my mad, mad drawing skills for now this is it for this hour of the lecture so I will stop the recording for the people on YouTube so people on YouTube see you, see you later