 just so that I have a local recording as well. All right, so welcome back, everyone. So multiple testing needs to be done, of course, because you do literally hundreds of tests. Nowadays, with Snipchips, you might do 50,000 markers across the genome, so you're doing 50,000 tests. So you have to be aware that you run into the risk of calling something associated or calling a QTL at a certain point, which is, of course, false, just because it's a random association. One of the other things that is happening a lot is that these groups can be very small, right? So here, if we look at a certain marker, then some of these markers, oh, let me show you guys the mouse, and some of these markers might only have like three B individuals or four B individuals. Here, there's only one individual who is BB, and the other individuals are all A. So of course, if this BB individual is very low, right, the mean of the B group here is six and a half, while the mean of the other group is much higher, then of course it looks to be associated, but it's not. And that's why we have to do multiple testing. So we generally use Montferroni correction for that. There is another way of figuring out because since we are measuring a lot of markers across the genome, some of these markers are relatively close to each other, right? So if at chromosome one at 10 megabases you are A, then of course at chromosome one at 10 megabases and 100 base pairs, so 100 base pairs away, you are also A. So the markers are linked to each other. They are correlated with each other. So that's always difficult, and to deal with that, you can use permutation. So permutation is a strategy in which instead of deriving your threshold using a Montferroni correction, what you do is you break the link between the genotype and the phenotype, so how you do that is you have, for each individual, you have like the genotypes, right? So you have a vector of what the marker was, and what you're now starting to do is you're just randomly shuffling these. So every individual gets a new genotype. So you put all the genotypes in the back, individual one, you just pull a random genotype out, and you do that for all of the individuals, and then you redo your mapping, and then you remember the maximum score that you observed during the mapping because when you randomly distribute, there should not be an association between a phenotype and a marker, right? Because we randomly assign genotypes to individuals, so there should not be an association. So if you do this 10,000 times, and with a computer, this is relatively easy, then what happens is every time you remember the maximum score, and then you get a distribution which kind of looks like this, right? So you see that often the maximum score is very low, and like one times 10 to the minus two, or one times 10 to the minus four. So you make this distribution, and then you can just find your thresholds for significant. So you just say, well, if a score is higher than the 5% of the scores that I saw during this random permutation, then this is significant, or 10%, right? So these are just the standard threshold that you take, and these thresholds are generally more real. They are more like what you would expect them to be. So how to do permutation? Permutation is also a way of dealing with this multiple testing issue. You just do 100,000 times, or 10,000 or 100,000 times, you break the link between the genotypes and the phenotypes by giving individuals new genotypes, and then you just redo the whole analysis. You save the maximum score, and then after a while you have 10,000 maximum scores, and then you get this 5% and 10% threshold, which are generally thresholds which are more realistic compared to the standard Bonferroni cut-off. But generally they're not too far apart from each other, but permutation is a very valid strategy. So how does this then look? So here is one of the example data sets that I took from RQTL, a software package in which I contributed during my PhD and also after my PhD, and this is the RQTL profile of blood pressure in recombinant inbred mice. So these are mice. So they are either at a certain marker, AA or BB. We do the analysis, so we do the statistical test to see if the A group is different from the B group, and then you see this pattern occurring. So on chromosome one, you see that there's like two little peaks, but on chromosome four, here you see that the AA group has a much higher blood pressure than the BB group. So this means that underneath this peak, there has to be a gene which is involved in the regulation of blood pressure, and the two parental strains which we use to generate this recombinant inbred line, they have different versions of the gene. So like the father or the father which founded the recombinant inbred line had during all these, it had for example a mutation. It had a different amino acid in one of these genes. And now when you look at the children, you see that children which inherited this allele from the father have a higher blood pressure than the individuals which inherited this allele from the mother. So this is how you visualize QTLs. So you see that when you take lot scores, you see a really nice peak, and this is also the reason why we go to lot scores and not use p-values. Because if you would have p-values, then you would p-values are always between one and zero. So then the interesting parts would be the parts which are really close to zero. But on a graph where you plot values from one to zero, it's very hard to see the difference between 10 to the minus five and 10 to the minus eight or 10 to the minus four, right? Because they're all very close to zero. So by plotting lot scores, you get these nice peaky profiles and you can just say, well, here, there's something. So there must be a gene here which is controlling blood pressure in these mice. So when you do QTL mapping, however significant the QTL is, you have to always confirm your statistics. And that is something that as a statistician, I can't hammer on enough, is that a statistical association is just that. It is not proof. It could just be false. It could be based on a single mouse which is a massive outlier, which has a blood pressure which is like five times higher than all the other mice, right? So the group that just has this mice in there just looks to be larger on average. So there's two ways that you can confirm a QTL. One of them is to check it in another cross, right? So if you take two other mice which also have a difference in blood pressure, then you make an F2 and then you do the mapping and you also find that the same region is associated in the other cross. Then that is proof enough for journals and for reviewers to really believe that there is a gene which is involved in your kind of QTL. One of the other ways which is pictured here is to use a near-imbred line, right? So when we are generating these recombinant inbred lines, at a certain point just before they are inbred, right, they have stabilized their whole genome, but a little part of the genome might still be heterozygous. So this just needs like one or two more generations of inbreeding to stabilize into either being A or to being B. But at this point, and if you are lucky and in many cases you can actually just buy the right nil, right, if you are dealing with arabidopsis, then you can just call Inri in France and say, well, I have this QTL on chromosome three at these locations. Do you have a nil? And then they say, yeah, you can just buy a couple of seeds from us and these seeds are having a fully homozygous background. So they are fully inbred except for this little area that you are interested in. What you then do is you take two of these individuals and you breed them together. And then what you see is that some individuals will be A, some individuals will be AB. There will also be homozygous, but these can't really help us. But now we have individuals which have AA here or they have BB here. And if these individuals also show the same difference, so in this case, if the BB individuals here have a lower blood pressure than the AA individuals, then we confirmed that our QTL is really true. So that's how you confirm a QTL. There's some other structures, but this is the most common way of confirming QTLs. For what animals do nils exist? Mostly plants. Plants have a lot of nils, mice, some of them do. But yeast and plants and all animals which are relatively easy to keep. Because for a nil you have to be able to save this individual, right? And with a plant you can easily do that because you can just save a couple of seeds and not put them in the ground. And seeds you can store up to a long time. With mice you can't do that, right? Because if you were in breeding mice, then you would always breed this mouse and then the mouse would end up being either AA or BB or it would end up being heterozygous. But the chances of it ending up being heterozygous are lower and lower compared to the more inbreeding you do. So for what animals do nil lines exist? For most of the plants in which you do QTL mapping. So arabidopsis, brasicarapa, and these kinds of things. And yeast as well. Drosophila might have nils as well, but I'm not that familiar with QTL mapping in Drosophila, but I would bet that for Drosophila you could also order a nil which they maintain. All right, so that was QTL mapping. The other association analysis technique, which is used when you cannot force breed in a certain structure, right? So if you can't force breed in a certain structure and how is this different from just doing the same cross twice with more samples? To confirm? Yeah, to confirm you could do the same cross twice and then use more samples, but generally you can't really do that. That's really expensive. Well, for arabidopsis it's not that expensive, but for things like mice, it is almost impossible because a funding agency and especially a tier shoots, so an animal welfare where you have to write an animal application, they will not allow you to do the same thing twice just with more animals. They will just say, why? Why do you need another 100 mice? You already used 100 mice and now you want to use 150 mice to just confirm they're not gonna approve that. But yeah, if you would then you could do that as well. But yeah, so using a different cross with more samples or doing it twice is also a good way. I know that in arabidopsis the standard lines are Lansberg Erecta, CVI for the Cape Verdean Islands and you have coal for Columbia Strain. There they usually do a mapping between two of them and then they do a mapping of the other or of one combined with the other one. So you have, for example, do your discovery in Columbia Lansberg Erecta and then you do your validation in Columbia Cape Verdean Islands. That's kind of how they often do it. But yeah, if you would do it with more samples or if you would do it twice, then of course, as long as you leave the original data set out, if you can't include the discovery data set in your validation set, but if your validation set is different, then yeah, you could do that. So that's also a valid way of doing it. And there's more ways. You can use knockouts as well. If you know that, oh, there's only a single gene in this region and I'm just doing a knockout or I use RNA silencing to knock out the gene, then that is also proof enough. You could also do a phenotype rescue experiment where you take an individual which has an AA genotype and then rescue that individual by crossing it with an individual. There's other ways to confirm it. But the two general ways is checking in another cross and using a near inbred line. Those are the common structures, but yeah, just using a different validation will also work. All right, so genome-wide association, same thing as QTL mapping, but now we are not doing it on a population that we structured ourselves. We are doing it in a randomly mating population like a human ethics committee will not allow you to say, I'm going to do 20 generations of humans. I'm gonna put them on an island and I'm going to make them force mate with each other. That's not gonna go over well. There was a time in history where you might be able to do that, but nowadays they're not very much in favor of this. So if we want to do things like QTL mapping in humans, we can do QTL mapping. We call it genome-wide association. And then we just have to deal with the fact that we don't know how many alleles there were and how many people are kind of, or how people are related to each other. But the nice thing about randomly mating populations is that there are lots of these recombinations, right? If we use an F2 population, there's only two generations to accumulate recombinations. So accumulate where these crossover events during meiosis happen. But when we are doing this in humans, then there are literally millions of recombinations because humans are freely breeding for dozens and dozens and dozens of generations. So they racked up a lot of recombinations. So that means that you have a massive amount of resolution. If you find the QTL in humans, then the region that you identify tends to be in the order of 10 KB. So 10,000 base pairs. If you use an F2, like in mice, then your region is generally in the order of 25 megabases, so 25 million base pairs. And so the resolution of mapping, like the example that I showed you here, here we can only say that, well, it's the middle of chromosome four, which is associated, right? But you can see that the region, like the chromosome four region is big. This is really almost like a fourth of a chromosome. So in the order of like 20 megabases to 250 megabases. But in humans, you get a very narrow peak. This peak generally is very, very narrow, but you have little power to detect effects. And that is because you cannot control for the environment. Some humans smoke, some humans don't smoke, others eat fat, others exercise a lot. And no matter how well you are trying to kind of measure all of these covariates, it's never as good as taking a mouse and putting it in a box or having a plant grown in a greenhouse, right? The environmental influences on these plants are just not there. There's not some plants smoke, some plants don't smoke. No, they all grow under the exact same environment. So genome-wide association needs a lot of individuals. It needs a big sample size. And in genome-wide association, when we do it in humans, we generally use 100,000 humans. And then you can detect an effect. And this effect, if you would have used mice, you would have made an F2, then you probably would have needed only like 500 animals. So the difference in sample sizes is like almost 100 to 1,000 fold for genome-wide association. So when is G was used? When we use humans or when we deal with wild isolates or when we do fine mapping of QTLs? When we have a detected QTL in an F2 generation, we can use a genome-wide association between wild outbred mice, for example. So we just go into the field and we start collecting mice from different countries. And then we just do the mapping in mice. And so that is when we want to fine-map, so to get the resolution, but we don't really care about the power because we already know that the effect is there because of the F2 population that we did. The biggest difference is the visualization. G-WAS visualization are, G-WAS are shown like this. Oh, so they are shown like this. So they are using something which is called a Manhattan plot. So here you see all, these are all dots. So it's a dot plot. Well, QTL plot is always a smooth line. And it is a smooth line because of the fact that these markers are linked to each other, but that's not the case anymore or not so much. So what you are looking here is you're looking for these little towers. This is not the best example. Normally you have these really nice, like massive towers. But you can see here that they can just point to the individual genes, more or less, which are causing it. So that's a big advantage of G-WAS. So good resolution, poor sample size. This is a G-WAS that we did in a mouse-advanced intercross line. So an advanced intercross line is just taking the F2 and then randomly mating from the F2. So, and then we did this in our mice. We did this for 28 generations. So in these 28 generations, a lot more recombinations occurred compared to in the recombinations that occurred in the first two generations. And then you go from having a structure which is a structure where you have good power. Here, the power is still pretty good because we're still dealing with mice. So we have no environmental influences or anything. But you can see that the peak here on chromosome three is much smaller than this F2 population. All right, so an overview of QTL mapping versus G-WAS. Both methods are the same in a way. They find markers likely involved in the regulation of a phenotype. So QTL mapping is, all right, question. What other ways are there to introduce recombinant variation for fine mapping of a QTL locus? There's actually nothing you can do except for breeding. Recombinations only occur during meiosis. So they only occur when a female is in utero, right? So before females get born, they already make all of the excels that they have during their life. And males only recombinate during when they create sperm. But there's no other way to introduce recombinations. You can do chemical mutagenesis of the DNA. People did that in the early 1990s, but that introduces a host of other problems because then like you get like mice born with all kinds of birth defects and these kinds of things. So recombination happens during meiosis. And meiosis, like in general, the rule of thumb is is that every generation, every chromosome has one recombination. So after 28 generations, you would have around 28 recombinations in a certain individual. Another individual also in generation 28 will also have 28 recombinations, but at different points, right? Because all of these are unique because every sperm cell has a unique recombination. So the only way to do fine mapping using recombinations is to just use, is to just generate more generations. So to kind of just wait. How many generations are used in fine mapping of mice? That really depends on how good you want to have your resolution. And that also depends on a lot of factors. If your effect size is large, you need less generations. If you have a small effect size, you need more generations because the shape of the QTL is, at the higher the peak, the smaller the interval in a way. So if the effect size or the likelihood is really big, so one of the ways is of course, is to get rid of as much environmental variance as you can. But in mice, we generally tend to use 28 to like 30 generations. Although we have one project with a short hair mouse where we used 50 generations. And then we had an interval which was 25 KB or something. But that is very uncommon. There's no, there's almost no one that's gonna keep mice for 25 generations or for 50 generations and just do random mating every time. But yeah, on average, people will use like 20 generations. So the average length of a PhD project, right? That's kind of how science works. So in like four years of PhD, you can do like five generations of mice per year, sometimes six. So hey, in a PhD study, you have then people coming in doing the original F2 mapping, writing a paper about that and then the rest of their PhD is spent on fine mapping plus looking at things like genome sequencing to see if you have a very clear difference between the lines or using another cross. So generally the length of a PhD study. So if you're working with adorbedopsis of course, fine mapping in adorbedopsis is done much quicker because adorbedopsis can go from seed to plant in around like 40 days or something. So then you can do a lot more. Lunar fast Wi-Fi, Lunar fast Wi-Fi. Thank you for the follow. I don't know if you're still here, but I appreciate it. I appreciate it. So yeah. So the advantages and disadvantage, well not the advantage but the differences are more or less summarized here. So experimental populations versus natural populations, high power to detect effects versus low power to detect effects, low resolution in QTL mapping, high resolution in GWAS. The QTL mapping is shown in a smooth line plot while the Manhattan plot is used in genome-wide association because every marker gets a score. Well in QTL you can also map between markers because of the genotype, because of the linkage between markers. So that's better. Or better, both have their advantages and disadvantages. I did both during my PhD. I did some QTL mapping, but also did some human GWAS and we used a lot of GWAS structure in advanced inter-gross lines. Hope that's clear. So a little example that we are working on here, this is our famous, or well for me famous, the Berlin Fettmauß and this is the standard B6 laboratory mouse. So I think Stefan Kers made the photo, but I found it here. I don't know exactly why, but this is the Berlin Fettmauß and this, so Berlin Fettmauß are white, they are fat, they are happy because they are fat and they're just lounging around, they don't bite, like many inbred mouse strains they tend to bite and be really aggressive. So inbreeding tends to bring up aggression, but in the Berlin Fettmauß that didn't happen. And of course you can see that there's a clear phenotypic difference between these two. These on average weigh between 30 and 40 grams, these on average weigh around 20 grams when they are an adult. And these become fat really quick, so from like week five on after birth, they are already much fatter than the black six. So we are using it as a model for juvenile obesity. Of course there's still obese when they're adults, but when they are adults they don't add weight. But this is more or less the mouse model that our group is working on a lot. You have a lot of these different inbred strains like the New Zealand obese and they all have different reasons why they are fat. So in 2010 we did QTL mapping in an F2 population. So we used 365 mice, males and females, and we used 132 genetic markers, RFLP markers, so no SNP chips yet in 2010 because it was still too expensive. And here you see kind of the genetic map that they used for the first couple of genomes. And then they had fat measurements, so they measured the fat to lean ratio, not the body weight, but they put the mouse into an MRI scanner and then they determined the fat percentage. And that was measured from week four through week nine. In Germany you're not allowed to weigh a mouse the first three weeks. So after a mouse is born you're not allowed to do any experiments with it until it is like four weeks old. So putting it in an MRI scanner, you can't do, or it's not allowed to do. You have to go to the US, there you can do that. But here in Germany you can. So this was more or less the genetic map that we see. So you see we had a marker at like five mega bases or five centimorgans in this case at 5.1 and then there was a gap. But when they did the QTL mapping then this is the profile that they found. So of course like many times when you do this you don't find this massive peak. This is a peak where the likelihood that this locus is controlling our phenotype is 10 to the minus 50. So that's like phenomenal. There's no chance that this is not true. So the QTL profile of Fatmass in the BFMI using an F2. So they identified a 10 megabase region on chromosome three where the causal variance or the mutation or whatever it was that was causing these mice to be fat that they were there. You see that there are other little peaks but those are all dwarfed by this massive effect coming from here. So this is almost a Mendelian trade. It's not fully Mendelian but it's almost Mendelian. If both of your parents are Berlin Fatmice then you're almost always a Berlin Fatmouse. It sometimes happens that you're not but we don't know exactly why. Anyway, so we then split the individuals on their genotype at the top marker and then we saw this effect, right? So this is the AA group. So this is the Berlin Fatmouse alleles so to speak. Here we see the heterozygous and here we see the individuals that got the alleles from the B6, so from the laboratory mouse. So again, question to you guys. Is this an additive or is this a dominance effect? I'm just gonna wait for a little bit. Testosaurus is additive. This time you get strong Testosaurus. I'm sorry, this time, this time not. Yeah, dumb. So it's a dominant effect and that is very clearly if you have one Bealeel or two Bealeels then you are lean. So the Bealeel is dominant. Getting a single Bealeel will directly make you lean. Although there are some outliers here but you can see that there's no real difference in the average for the BB allele versus the AB allele. So this is a dominant effect. The dominance is just determined by the fact that if you have one allele that makes you lean then you are lean directly. There's no mixture. The individuals here are not, the AB individuals are not in between the BB and the AA individuals. They are the same as the... So recessive, yeah, the Berlin Fetmau's phenotype is recessive or in another way, the lean phenotype of the laboratory mouse is dominant. Of course, dominant and recessive are the same. If you have a coin and you just flip it then that's the same thing. So it is a recessive phenotype. So, all right. So, QTL regions from back crosses, F2s and rails are very large. They can span like up to half a chromosome. There's many genes. So how do we now find the gene that is causing this? How to find this magical causal gene, right? So there are several options for fine mapping. Like I told you guys, you can lose a larger outbred population or you can use an advanced intercross line. And we used kind of a hybrid structure between this. Because this was the original region that we used, right? So now we generated a new generation. And in this new generation, we kind of genotyped individuals in this region very clearly. So if we found a new individual which had the phenotype of being fat and it had a recombination, right? Then now we could fine map our region. So here the blue part is for the B6 mouse. The red part is the BFMI allele. So when we find an individual that is fat but has a slightly smaller region compared to the original region, then now we can exclude this little region, right? We can say that at least the genes in this region do not add. So then you look at the next individual of the next generation and that is not fat. So this individual is not fat but it has here this little BFMI allele. So we can just say that, well, this region here is also not causing it. So every individual that we add that has a recombination in this region will make the region a little bit smaller. It doesn't matter if the individual is fat or if the individual is lean. Based on the phenotype and where the recombination is, we can make the region smaller and smaller. But of course you need many recombinations in the right place. So you need a lot of generations to do this, to kind of get down and drill down to where you want to go. So many individuals need it but this is also a way of doing fine mapping without just doing a full blown. But of course this is relatively expensive because you have to genotype every individual that comes out after your original population. All right, so we did this when we were generating on AIL. So within the AIL every like four or five generations we genotyped a couple of individuals which were lean or which were fat which we thought might have a recombination. But then when we started intermating within each generation as well. So just random mating. And then we were hoping that we would have recombinations in our regions of interest. So we used SNPGPs to genotype the animals and then we used GWAS which has relatively low power but high resolution. And because after 26 additional generations of kind of randomly mating the animals, then you get new recombinations in the region of interest. But had the resolution goes up but the power goes kind of down. So we used 28 generations of intermating. So not 26 but 28. We had 350 individuals. I had to exclude one individual because it was sick during the experiment so it could not be used. In total we had 11,677 genetic markers which we were confident about and which were segregating between the parental strains. And we have several classical phenotypes like body weights and fat and lean mass. So those were measured every week starting week three and the MRI measurements were starting later because of animal protection laws in Germany. Body weights were measured at day 70 and at day 71 and that is because at day 71 they were slaughtered. So we measured them before slaughter as well. And this is how it looks. So had the results of the G was on generation 28, we now have the same peak but you see that the peak is much more, much finer, right? It's much clearer compared to the original peak which is kind of broadish. So had this one is more broad and this is a 10 MB region. Well, if we look here and we zoom in and then this peak is defined by a 380 KB region. So this 380 KB region only harbors four genes. So we had to just guess, no, we just used microarrays to see if these genes were differentially expressed between the parental strains and then one gene came out which we then published in Journal of International Obesity. So just a very, very standard fine mapping work on doing this. So some conclusions that we draw was that the BFMI phenotype is recessive, G was significantly reduced to region of interest and so we went from having 10 million base pairs that we had to look at to only 380,000 and this means that we went down from having 150 possible candidate genes to having only four possible candidate genes in the region. So the next steps that we had to take was of course follow up the fine mapping. And now only four genes, one of these should be causal and we ranked these genes by data available in molecular databases like the function, localization, molecular function and then we find a single candidate gene and we knocked it out. And we're still working on it because it's not the genes. So it's something else. It's probably like a little deletion or a large deletion or a microRNA or something else. So it does affect the gene expression so we know that the BBS7 gene is responsible but we have no idea how this is happening. So that's the current projects that we are working on in our group. So just a very basic classical example but you can see that it took around six years to generate these 28 generations and it just takes a long time to do fine mapping. So this is the CTL part because I really wanted to talk about it and I thought like, oh, I only have like 50, 60 slides and although I talk a lot about QTL mapping when people get me started about QTL mapping we might have too little. So for you guys, I think I'm going to save this for next week. It's like a couple of minutes to five. I don't have another break slide so otherwise we could go to six but I think that it's enough, right? Like just let me know in chat if you still want to continue and do more or if you are, if you're sick and tired of hearing me talk all the time and that you're thinking, yeah, it's enough with discovering Loci for genes. So it's up to you guys. If you want to continue, then we can continue. And otherwise we'll just continue next week. Next week is better. All right, that's one vote for next week. Thank you for your vote, Commando. Anyone else has an opinion or are we just going to base our entire stream on what Commando wants? I think I should actually just continue next week because I'm still interested but next week works too. All right, yeah, I'm just gonna glue it to the next week presentation. I think next week presentation, although let me look. Documents, little informatics. So next week we have primer design. Oh, that's gonna be difficult because primer design means that I get to talk about Kerry Mullis, the guy that invented PCR and I love talking about Kerry Mullis. Like he's my favorite Nobel Prize winner ever. But no, I will stick it onto the presentation next week. So next week I will tell you guys then about CTL mapping and how you should do it and why it is this perfect novel technique that I developed during my PhD in 2010, 2012. And this actually got me my PhD degree. So that's why I like talking about it because it's the idea that made me a doctor. All right, but for everyone here or for everyone listening in the recording, I will say goodbye. I will hang around a little bit to chat with you guys and say, all right, so I will.