 Welcome back everyone who's watching on Moodle or watching it later working on evolutionary biology lab on Ciclids so I Think that Qtl mapping is one of these. Yeah, lots of Qtl. Yeah Yeah, plants are Really really they use a lot of Qtl mapping in plants also in model animals. So good So we will go into more detail. So everything here hangs on heritability So This is the basic formula where we start off with right? It only has three terms It has the variance in the phenotype is Determined by the variance caused by the genetic part and variance caused by the environment Of course, this has as a very weird consequence That heritability can change without any genetic change so if you say like how heritable is for example, your your height like human stature the height of a certain person Then that can change that can just change based on the environment if there's more environmental influence then of course the Heritability goes down if there's less genetic and if there's less environmental variation Then the heritability goes up So when people say that a certain phenotype has a certain heritability They also should tell you Under which conditions this heritability was determined and people usually forget this people usually Think of heritability as something which is static and cannot change unless there are changes in the genetics But that's not true heritability can change also based on the environment Which is is an open field of research in Qtl mapping and in heritability research So when you estimate heritability Then of course you have the phenotypic variance in a trait the variance of P is of course determined by the variance in the genetics and the variance in the environment, but there's always a term which People never talk about and that is the interaction between the genes and the environment And that's two times the covariance between the G and the E and why do we never talk about it because when we do experiments We can control the the covariance and we usually assume that the covariance between the genetics and the environment is zero Have we don't have when we do an experiment all the animals in the experiment under similar environment conditions or all the plants are growing in a Greenhouse, which is very similar if we if we would do an experiment and we would say well all the Arabidopsis plants are being put at 20 degrees Celsius and all the Arabidopsis of a different ecotype are put under a different Temperature then then you have to deal with this G times E effect, right? It might be that some Ecotypes of the plant work better at for example high altitude where the oxygen is lower Has so then the genotype and the environment have a certain interplay with each other But had the normal definition of heritability So the broadsense heritability is just the variance of the genotype Divided by the variance in the phenotype in the population as a whole so that's kind of the broadsense heritability so broadsense heritability is very easily defined in and Laboratory or in an environmental setting by saying that will we can measure how much variance is caused by the genetic Architecture we can calculate the variance that we see in our population And when we divide these two numbers by each other, then we get the heritability Heritability is always written as H2 so ha to the power of two and this is due to See wall right That's something that was just decided in 1900 or around 1900 When they were doing the first calculations of heritability and looking at how phenotypes are passed from father to mother and The guy that came up with one of the first papers He used H as the correlation coefficient, which nowadays we call R and Because age was the correlation coefficient The the variance explained by the correlation is H to the power of two so but When we use heritability, we always talk about H to the power of two And when we use a capital H We talk about the broadsense heritability and then there's also the narrow sense heritability The narrow sense heritability is denoted by a small letter H to the power of two and this is only the Additive variance. So additivity. We already talked I think about additivity But if we have a genetic marker, so you can be homozygous AA you can be homozygous BB Or you can be a heterozygous and then the additive heritability is that the Is when a a b and b b when a b group is exactly between the two homozygous parents so Additive variance is variance caused by additive alleles and this is the narrow sense heritability And this is caused only by the additive alleles sort of the formula changes So instead of the variance of G sort of variance coming from the genome We now talk only about the variance of The variance which is which is caused by additive alleles and this this the difference is is that normally For example, when we are dealing with plants or when we are dealing with Animals then we talk about broad sense heritability, but when we talk about cattle or Humans we generally talk about the narrow sense heritability because the narrow sense heritability is easier to To change And to use in in breeding programs So when when someone says that this cow is very good Then that is usually based on the narrow sense heritability because that is the part of the heritability Which is used in in breeding and breeding estimates for cattle or other animals All right, so two types of heritability broad sense and narrow sense heritability So if you want to estimate heritability there actually two schools of thought as well So there's the seawall right who is responsible for why it is age two and then we have the Ronald Fisher school of thought and seawall right school of thought is just to analyze correlations using regression. So that is just when you have When you when you have the total variance and then you calculate the correlation then The Variance explained by the correlation. So if you have a single marker and you split the individuals like this, right? then instead of doing a Analysis of variance and you would just draw a single straight line And then how well the line is correlated with the original data gives you the heritability And then you have Fisher who said no you should calculate heritability by using analysis of variance So do a nova analysis, which is slightly different the thing that I always like the most is just to To have an experiment right and have an experiment to calculate the heritability And this is more or less the narrow sense heritability which you calculate So I wanted to explain this figure and when we have the parent generation then here you see the parent generation so this is just a Kind of histogram and you see that the parent generation Has a certain mean and it has a certain standard deviation and had normally phenotypes are normally distributed So you have like a normal distribution And now if you want to calculate the heritability what you do is from the parent generation You select certain parents, right? So here we are selecting parents which are on Average having a phenotype value of 7 while the whole population only has a phenotype value of 5 Right, so we're just selecting parents which are larger than the average Then we make a new generation based on these parents, right? And now we see that the children have a mean of 6 So we put some selective pressure on the parents by selecting the largest ones And then we look to see what the kind of responses to the selection in the children So when we do that their additive heritability is very basically is just r divided by s So a question to you guys. What is the heritability in this little? Example have where the parent generation has a mean of 5 the selected parents have a mean of 7 and then had the Children end up having a mean of of 6 Looking at you chat. This is your time to shine And it's okay to get it wrong that an answer is better than no answers because then we're just stuck on this slide forever and ever so 86% that's an that's a good good guess But no Anyone else Alexander like ten points for participation 50 50 as in 50 units or 50 speed or 50 percent 50% I think you mean come on though. Yeah percent Yeah, yeah 50 percent. It's actually 50 percent because s is determined as the selection pressure Relative to the mean of the parental generation and then the responses also determined Relative to the parental generation so are in this case is one and s is two So this is one divided by two is 50 percent heritable So the phenotype that we're looking at is 50 percent heritable 50 percent determined by the environment Didn't expect to be right. Well, it happens Okay, so that this is the this is this is just one of these experiments that you can use to use her To to estimate heritability, of course, there's many many ways to estimate heritability like in humans We usually look at full sips versus half sips So if you're a full sibling Yes, so you share the same father and mother then you share 50% of your genome Well, if you're a half sip, so you only have the same father then you share on average 25% of your genome And so in humans we look at the difference between these two. What if r is higher than s? That should not occur It does in real-world examples, but it should not so Because of the definition right that the phenotypic variance is genetic plus environment Would one speak of a hundred twenty percent heritability? No, no heritability is defined as a number between zero and a hundred So if r happens to be higher than s Then something in the environment must have changed Yeah, for example, if the parental generation is grown at 18 degrees Celsius and then the child generation you grow at 24 degrees Celsius And then it could happen that r is larger than s But in general, it's not So it it does it does sometimes happen But that usually has to do with environmental effects which are not compensated for Additive effects in a multi-locust rate now because here we're just looking at the response of the whole genome, right? We we don't care if you if if a trade is Mendelian Then r would be s so a hundred percent If a trade is non-Mendelian is r always in the middle. Yeah. Yeah, that's that's yeah although I I have my thesis here And in my thesis we actually looked at that in Arabidopsis for metabolites and then Let me guys. Let me switch you guys to full screen Put myself in the middle. So in my thesis here, we have this really nice figure So this figure that I made here here you see the parental Lines, right? So you see the parental lines being minus one and positive one So the lowest parent is just standardised to minus one and the other one is standardised to positive one And then you see that the children for trade number one So on the y-axis we see oh on the y-axis This came we see all of the different phenotypes that we measured. So in this case we measured like a hundred and sixty different phenotypes and you can see that The children for many phenotypes the children are actually never larger or smaller than the parents Even on the top here where we see a whole bunch of of children Which are larger than the largest parent and some children which are smaller than the smallest parent The average of this trait still falls in between the largest parent and the smallest parent and because because we're dealing with With populations, right? So we're dealing not just with An individual measurement. We're always dealing with like a whole bunch of measurements Our generally tends to fall into the middle and that's just the way that it works But yeah, there can be children and you can see this in the graph as well like the the parents have been selected up to 10 but here in this oh You can't see my mouse. So here in this Graph it could be that some children are even larger than the largest parents Some children will be smaller than the smallest parent that you then you took But the the average of the child population generally falls Like 99% of the time between the mean of the original parent generation and the selected parents for the next generation So this is just a very basic analysis, right? So you need to have a population You need to be able to select animals. You have to force them to mate with each other But that of course doesn't work in humans. So in humans you have to Deal or you have to use full siblings or half siblings And then compare phenotypes within populations of these So general if you want to do it using analysis of variants, for example, if you have sires And like in cows, so you have a lot of different cows or a lot of different bulls and you made these bulls with random cows then of course Every child gets half of their genes from the father half of the genes from their randomly chosen mother So you can then analyze it using this following linear model where you say that the phenotype of individual Ij is determined by the father that they had and a certain error term And then what you look at is how much variance is explained by this father term And then you do that times two and then you divide that by the total variance in the phenotype And then that is your heritability estimate for this phenotype, right? So in this case we would have like five Bulls or ten bulls every bull will have 20 or 40 offspring And what we then do is we put up the linear model where we say well the father is the explanatory variable And so we map the variance onto the father that the child came from And then we compute the heritability By looking at the variance explained by the father We have to multiply that by two right because children only get half of their genes from the father And then we compare that to the total variance So in this case we leave kind of the the cows out of out of because they are randomly chosen So this is more or less how you do it by analysis of variance Of course, this this becomes very complex if you have multiple environments or other things But in this is kind of the basic structure to estimate the heritability So you look at the variance explained by the father term and you divide that by the total variance of the phenotype Alright, so the relationship to the DNA is I think very clear that heritable traits are known to be passed from one generation to the next via The DNA which encodes genetic information So DNA allows you to pass information from one generation to the next generation DNA also allows for modification and mutations right so a novel mutation could cause S to be large or a coast cause R to be larger than S Because we always assume that nothing changes from father or from from the parent generation to the offspring generation But of course novel mutations that occur could push some individuals out of the range of the parents And DNA is responsible for kind of generating semi-random offspring genotypes because you get Half of your genome from your father half of your genome from your mother But of course, you don't get whole chromosome one from father whole chromosome one from mother. No, there's also Recombination in the myosas which happens which swaps part of chromosome one with other parts of chromosome one So how does this work DNA works via crossover? So have the first step in in in creating gametes So to create like sperm cells or x cells is that we have something which is in in the a in the a phase And then in b-step here is where the homologous Recombination occurs and so we first have chromosome one chromosome two they get duplicated So we have two copies of chromosome one two copies of chromosome two What happens then is that these things are then attached to spindle poles within the Within the nucleus and then the cell is separated into two cells But what happens is is that you get these? Merged chromosomes right so your your your father has a copy of your chromosome from his father and a copy of his mother And he Exchanges parts of the chromosome that he got from his father with the chromosome that he got from his mother So you end up with this kind of hybrid chromosomes. These are then pulled Out of each other again to create gametes and these gametes now contain like part of Yes, so chromosome one contains part of this blue chromosome and it contains a part It could be a whole red or it could be split at a different point. I hope that's clear Myosis is not really my my my strong part But step one is the replication of the DNA so DNA gets replicated so you've got chromosome one chromosome two those get duplicated chromosome one and chromosome one, so This is this is your father who has chromosome one from his mother chromosome one from his father He duplicates the chromosomes they recombine so there's recommendation Recombination here and then these are pulled apart again And then in the end you end up with with four different sperm cells But the DNA in each of these sperm cells is slightly different because of the recombination step Because of the homologous recombination and then the separation So this happens in several phases. So we start off with using pro phase one Yes, so the chromosome condenses the nuclear envelope breaks down and crossing over occurs So this is then then this part here And then we have met a phase one where pairs of homologous chromosomes move to the equator of the cells And then they are pulled by the spindles into separate partitions of the cell after which The cells defied and makes two new cells. So this only happens, of course in sperm cells and egg cells this doesn't happen in normal cells normal cells do normal myosas but This is the the myosas not the standard cell division but cell division for Creating sperm cells and egg cells And so this is another diagram so the spindles form around the chromosomes Chromosomes line up at the equator and then they divide and then they they match up Using homologous recombination That's that's just the way that it works But what we are interested in is how do we how are we able to track these changes in the DNA? so One of the things that we use to use a lot in the old days Which is strange because I don't feel to be old or I don't feel myself to be very old But when I started in like 2010 perhaps 2008 with my master and after that my PhD We still used a lot of restriction fragment length polymorphisms Hessell are our for P markers. So this means that we have an A allele and a big A allele and a small allele. So this is chromosome one version one This is chromosome one version two And of course, there are DNA mutations. So in this case we have three cleavage sites In the big a so we get if we would cut this DNA We would get get one part two part three part four parts We would get one two three and four So we would get four parts if we would cut the DNA using restriction enzymes when we have the big a When we have the small a there's actually a non-functional cut site So we would get only one two three parts. So head. This is just based on simple PCR So you you cut up your DNA you put it on a gel and then in the one case for the big a you see three Bands and here you only see two bands Hessell the restriction fragments are just separated according to their length and then how we have this labeled DNA probe And so here we have a DNA probe which labels here So in this case we pull out a fragment in this in the large a which is small and in the small a we pull out a large large fragment so if you would look at a gel Hessell if this would be the The polymerase gel then normally it has to be turned around but for clarity. I did it like this So hey if you have it actually should be turned the other way around No, it shouldn't so if you have small a then you have a big fragment if you are a heterozygous Then you have two fragments, and if you are homozygous a a and then you have like a smaller fragment Is this clear? I think I messed up a little bit with the long and small But yeah, we cut the DNA into random pieces. We label this part of the DNA So we have a probe which targets this specific part of DNA And then when we put it on a gel then we see that the a a individuals small a Individuals had they have a longer fragment because they are missing a cut side here, and this is called RFLP markers, so random fragment length polymorphisms So you just use a DNA cutting enzyme like a car for And then you cut up the DNA and then you target like a very specific piece and Long and short fragments based on the fact that some DNA has a cut side while the other version of the DNA does not So if something clear just shout out in chat And so RFLP can also be done when you have a variable number of tandem repeats So V and TRs are very common in the genome common in the genome of plants But also in the common are in the genome of animals, so it doesn't have to be a cleavage site, which is broken It could also be a structure which looks like this Where for example, you have an AT repeat So some animals have one two three four five six seven AT repeats in their genome while others only have four Right, so now the length of the a fragment will be larger compared to the length of the small a fragment Because the of the fact that there's a variable number of repeats in in the time and this is also called RFLP But this is an RFLP based on a VMTR So a variable number of tandem repeats while this is just based on the fact that a cut site might be mutated And the cutting enzyme cannot cut the DNA at this point So V and TRs are still used a lot. I think in plants in mice We still have the mice V and TR panel where there's RFLP markers Which are based on the fact that you have repeats in the genome and the length of these repeats is different between individuals So this is the way that we can do Discovery nowadays actually we don't use this a lot like the last Five years. I've only been working with snp chip data. So snp chip. There's a single nucleotide polymorphism So some animals have an a other animals might have a G at this location Yes, so you use a snp chip to kind of Figure out What an animal has if it if an animal has an AA then it's then it's first variant if it has an AG It's a heterozygote and if it has GG, it's homozygote the other variant but there are many different techniques that you can use to to kind of measure DNA and Determine if a fragment comes from the father or comes from the mother like DNA sequencing But also mass spectrometry. We can use like single base extension or hybridization, but this all of this is aimed at just being able to Quantify if the DNA came from the from the one parent or if the DNA came from the other parent so snp chips are They contain like these immobilized alleles specific olucinolucreate probes They are freck so you fragment the target DNA you label it with fluorescence dye And then you detect to and to record the hybridization signal. So this is kind of how the human Gene chip looks from from afymetrics. So it's just a little Little glass plate on this glass plate. You have different Oligonucleotide probes. So this has 50,000 different probes on there. I think Snipper a six. Oh, this is a snipper a six So this already has a hundred thousand little pieces of DNA on there Well, a hundred thousand different little pieces of DNA, of course, the DNA is in there like a hundred thousand times as well And then hey, you you just hybridize your sample to this So you extract DNA you cut it up into little pieces and then you put it on the array And you you you based on the color signal You can see if an individual had an aid air if it had a G there or if it had an a and a G there so the first paper and this is a really interesting paper to read because Eric Lander and David Botstein. They are the inventors of QTL mapping. So this paper was published in 1989 So you can see that QTL mapping is a very novel technique It was only developed when I was already six years old So it is Mapping Mendelian factors underlying quantitative traits using RFP linkage maps So only in 1989 Did someone come up with a fact like oh, we have this DNA We now have these techniques like RFP to determine where it comes from from which parent and then they they Showed how to do this association analysis So this is where it starts before before Eric Lander and David Botstein We did not have any method to Figure out what DNA is exactly doing if this piece of DNA is controlling growth or if it's controlling like Metabolites or if it's controlling like your weight that only started in 1989 so very recent very very novel technique but Very ancient in a way already So if you want to associate your phenotype with a genetic marker You need a certain population for example. I have a hundred armidopsis plants You need genetic markers either using RFP or using SNP chips And you need to have a phenotype measured like the yield of the plant or the weight of the plant So three things required for association analysis for QTL mapping So let's talk about these populations that you need right because these populations that you need need to be structured in a very Very structured way right? So one of the things is you have for example the back cross and so the back cross is we just take a mouse of Which we know nothing right? This is just a random mouse that I found in the street, and I'm crossing it with my laboratory Inbred mouse strain. So this is the standard strain. This is a black six This one I don't know because I just picked it up from the street So I also know nothing about this genotype But the laboratory strain because it's our laboratory strain it has been inbred for four dozens of generations It's genetically homozygous So head every chromosome if I look at chromosome one the first one and I look at the second copy of chromosome one in This inbred strain the they are the same Right so so we just call this mouse a a and That's just by definition So then we cross this unknown mouse with this a mouse and now we get a generation of F1 individuals Who are all genetically identical or are they? That's the big question right because they probably are not because this mouse not has a single question mark It actually has two question marks Right because that's that's that's one of the issues here, but have we assumed that this mouse is Homozygous, it's probably not but have we just assumed that we cross it with an AA mouse then we get a mouse which is Genetically identical so all of these offspring more or less have an a allele from the from the black mouse And they have an allele from the white mouse And then what we do is we take these individuals and we cross them again with the original laboratory strain and now the funny thing is is these mice all are Genetically identical because they they have one allele from this and one allele from that So they all look exactly the same if you would do an F1 cross between a two Inbred mice then the children of these mice are all identical The only thing that's different is that some of them are male some of them are female And that's because of the X Y chromosome but if you then made these individuals back with the AA and the question mark I now go B then now we end up with the back cross generation and the back cross generation variants just explodes These mice generally are a mixture between the one mouse and the other mouse so they are gray They are more or less having the same or a body weight which is in between the two parents But as soon as we cross children from this cross back to the AA individuals their children will look completely different So they will have all kinds of colors different dot patterns. They will have different weights and that is because they are genetically different However, if we look at a certain marker, right if you generate like a hundred of these little mice From this cross then if we look at this at a single marker, then at this marker 75% of the animals will have the AA genotype and 25% will have the a question mark or the AB genotype So there are some disadvantages to doing a cross like this because we can only see The the effect of the AA allele so the homozygous allele towards the heterozygous allele So we cannot say anything about Additivity or dominance and we can only say that there's a that there's a difference and the effect size is relatively low Because we only get like half of the additive effect We cannot compare an AA individual to a BB individual We can only compare an individual at a certain or individuals at a certain marker So have we have a population of individuals which are AA We have a small population which is AB and we compare the difference between those And this is this is one of the basic crosses that you can do. It's relatively easy to do relatively quick, especially with mice And it only takes you two generations so you have to Find a random mouse outside you cross it with your laboratory mouse And then you have another generation where you cross the offspring with the laboratory mouse again And then in this generation you can then do the genotyping and then afterwards do the QTL mapping But the disadvantages are You cannot know anything about additivity or dominance and that the effect size is relatively low Because had the AB individuals are in the minority. So statistically, it's not a very balanced test And you only get half of the total additive effect So you don't see the difference between AA and BB, but you only see the difference between AA and AB All right So another way to kind of circumvent this is to make an F2 cross So an F2 cross is that you do the same thing as you did before So you generate these F1 mice But instead of crossing these F1 mice back to the known laboratory strain mouse you cross these broader sister So you take brothers and sisters and you made them together and then so that's what happens here So you have the female parent the male parent Then you have for example one male being born and Five females and then you just brother sister made these and then out of this comes the F2 generation of course when you When you do that the genotype frequency start changing because these are AB this one is AB as well. So what happens in the offspring generation? We now get individuals who are AA One fourth of the time we get individuals who are AB So heterozygous like half of the time and we get BB individuals also one fourth of the time And this is just at a single marker, right? If I look at another marker, then we would find the same frequencies, but the same individual at marker one Might be at marker one. It might be AA, but at marker two it might the same individual might be AB So this is just looking at a single marker in the genome Another type of cross which is used a lot Especially in plants not swell in mice. There are some recombinant inbred lines as well, but the recombinant inbred lines are Made from the F2. So here we see how the B6 and DBA mice So this is one laboratory mouse another laboratory mouse We cross these two mice we end up with heterozygous So the F1 population which get one copy of the genome from the father one copy of the genome from the mother And then what we start doing is we start sibling mating These individuals and then we generate like an F2 generation, but we don't stop with the F2 generation What we do is within each of these Pears right because we don't have one we don't have two mice But we have like 50 mice to start off with and then have within each of these families We start inbreeding so we start as so we take the offspring of a single F1 cross And then we cross these together and then the children of this get crossed together again and cross together again So the genome what happens stabilizes because every time you have a recombination But at a certain point These start becoming identical because at each marker some allele well It won't be lost, but the allele will be stabilized within this family as so here we have then in the end We have three different lines strain well lines of this of this cross and These lines so BXD one will have a mixture so a mosaic of the two parents But the animal after seven or eight generations will be fully inbred again So we're just inbreeding to kind of stabilize the genome and of course because we do because the starting point is different When we look at the second BXD, so not the second mouse, but the second line These animals will be completely different Compared to BXD one, but again their genome will be a mixture of the two original parents The advantages of this is that the genotype frequency at each marker Across the population will be 50 50 half of the individuals will be a a half of the individuals will be a B So the advantages is that we get the full additive effect, right because the We get like 2a alleles versus 2b alleles but the disadvantage of the recombinant inbred line is that you cannot look at additive and dominance and Here I say seven or more generations of sibling mating, but in mice we need around 20 generations So seven eight generations generally is enough for plants or other animals, but in mice It is relatively hard to do because you have your inbreeding animals Animals don't like to be inbred and that also reduces their It reduces their fertility generally so some of these lines will start dying out And it won't breed as well because of genetic disorders because you're forcing them in a certain structure but the BXD strain is One of the most well-known mouse populations, I think Recombinant inbred line and the nice thing of course is is that if you cross a BXD one male with a BXD one female Then you get a BXD one back right the genome of the this mouse will be a clone of its parents Because of the inbreeding so these mice or these mouse lines or lines as they are called These lines are immortal. So you can study this mouse your Children can study the exact same mouse and your children's children's can study the exact same mouse as well Because of the fact that they are homozygous so they are Immortal you can you can make the same mouse again over and over again and this has a very big advantage in in science Because you want to have reproducible research and you don't want to have mice which are one off All right, so Summarized if you do a back cross, this is quick and very easy to set up It has a low resolution and we will see I will explain why It has a low effect size and you don't get any information about additive or dominant effect The F2 is kind of in the middle. It's not too complex. You just tell you you sibling made them You get information about additive versus dominant effects, but the resolution is generally limited So medium resolution slightly better than the back cross But again, you have low power because of the 25 50 25 structure The recombinant inbred line is the most powerful or the more powerful to detect the effects because you have the full Additive effect because individuals are either homozygous a a or homozygous bb It gives you very good resolution because this this splintering up of the genome is different for all of these lines And generally you make like a hundred or a hundred and fifty of these lines So the resolution is very good when you do a QTL mapping But it again doesn't have any information like additive and dominant because it suffers from the same Same issue as the back cross because you only have a a versus bb and you don't have the Heterozygous All right, so these are the three majorly used crosses. There are some other complex crosses So for example, there are crosses which instead of starting with two mice. So An inbred line one and an inbred line two you have for example the collaborative cross in mice So the collaborative cross in mice is a recombinant inbred line made via the same structure as before However, this is made from eight different founders. So there is a let's say you should start with eight different Inbred mice strains you cross them together and then you you start in the children crossing them again So you have a more complex crossing scheme, but in the end you have eight alleles floating around instead of just two alleles And so you don't have an a and a b you have an a b c d e f g h i Yeah, so you have eight different alleles The same system as the collaborative cross was used so the collaborative cross Is a very interesting project. I was not really involved, but when I was doing my PhD They were setting up this collaborative cross in mice They spend Hundreds of thousands of dollars making this cross It was made in like five different locations in the world. So it was a massive project between Europe the USA Israel was involved so the and also the Australians were involved They wanted to make a thousand different lines But in the end they only ended up making like 70 or something Because 970 lines died out during the generation and that is because Mice do not really like to be inbred if you start inbreeding animals Then fertility just drops off a cliff and you get mice which do not breathe anymore and are not able to produce offspring So the collaborative cross is like a really really good resource that people try to set up But in the end It suffered from a lot of infertility problems, but it taught us a lot about how fertility works, especially in male mice Since it ends up being the males who are usually infertile because females do the Myosas stuff when they are still in the womb But for male mice inbreeding is a massive massive issue This cost us I think 15 years perhaps even longer. So Gary Churchill the guy who came up with the idea spend almost half of his career doing this And then the guys in the arabidopsis taliana so in the in the plant community They did the exact same But they did it in a week And that is because plants just you can you don't have to breed them They made them and for a plant you can just take eight different eco types from arabidopsis So you have for example the arabidopsis from columbia You have the lansberg erecta you have the the cvi so the cape verdean island ones so in Wageningen They took eight different eco types of arabidopsis and then you just use like a little pencil and you just made The plants and you can do that in a week So they did in a week what in the mouse community took around 10 years to accomplish Of course the inbreeding still needed to be done, of course But in the end the magic cross is much more successful Because it also doesn't suffer from the same inbreeding depression as the mice do The diversity outbred is kind of the same it also has eight inbred founders the same eight inbred founders as the mouse collaborative cross strain But it is an outbred population. So the nice thing here is that these 70 lines that survived they are Immortal so you can keep them forever and ever the same thing holds for the magic lines of the arabidopsis Recomendant inbred line population the diversity outbred population is an outbred population So every animal is a mix of these eight founders But since they are not inbred you can only get a mouse once You cannot have repeated measurements on the same mouse And of course here when you do more complex crosses, then there's no analysis software. There's no statistics They are not readily available and are being developed Currently it is more or less done for the collaborative cross and also for the magic lines So but it took us a couple of years to get everything in order To be able to map these map map qtls in these population All right, so linkage analysis, so let's start right so now we have our population We force our population. So at a certain marker an individual can have like two or three different allele states um have What do we want to do with it? Well, we take a hundred mice from for example an f2 cross we measure these hundred mice for the length of their tail and then We do association analysis So we try to see if there's a locus in the genome where the genotype is correlated or associated with the phenotype There's two flavors like I told you there's qtl mapping qtl mapping uses these populations and you have genome-wide association genome-wide association Is using outbred populations. So when you are not forcing individuals in a certain In a certain order and of course linkage analysis only works for phenotypes that are heritable Not only that but there are some other drawbacks But this is the main one. So if you are interested in a phenotype And you want to know if there's any genes in the genome involved in regulating your phenotype Then you can only do that when your phenotype is heritable if your phenotype is not heritable then You can't use qtl mapping or genome-wide association to find genes Involved in this process All right. So the first step is to create a genetic map. So you use primers and pcr or you snip snip chips nowadays, we also do it for methylation markers where we look not so much at the The content of the dna, but we look at the methylation state of the dna But in the old days, we just use primers and pcr Nowadays we use genotyping assays just using snip chips And head so multiple markers together form a genetic map So you genotype each individual at each marker. So you have like 150 markers across the genome at which you want to Calculate Or in which you want to measure and then you measure These 150 markers in your 500 plans or in your 150 mouse Then the next step is to calculate your genotype probabilities that you can only do this when you do qtl And i'm not going to explain this if people are interested in that then I can talk more about that later But had the ideas that you just perform basic association analysis So how does basic association analysis look so here we have the genotype so if individual one Individual one had a yield of six and a half kilograms Here we have individual n which had 9.6 kilograms and here we have the genotype Right, so if we look at the first marker then we saw that some individuals were a some individuals were bb So i'm only using a single letter instead of two letters Just to keep the slide a little bit shorter. So this is homozygous a homozygous bb So we have marker one marker two We have marker three four and so forth. So we have a whole bunch of markers across the genome And what do we now start doing now? We we just start looking at each marker So each marker divides the population into two there's individuals which have the a genotype There's individuals which have the b genotype and we just want to ask the question If the mean of a is smaller than the mean of b or if the mean of a is larger than the mean of b So if we do that and we look at the first marker What happens is that well both of them are around the same, right? So if I calculate the mean then I just add up 9.1 plus 9.6 plus 7 and a half plus 7.0 And then I add up the a's and then I just look is the mean of a different from the mean of b So in this case, they're more or less identical at the next marker. We see that something Interesting happens, right because all of the individuals that carry the a a allele Are having a low yield while all individuals that carry the bb allele have a high yield So this is what we call a QTL a quantitative trade locus a locus at which there is an association between the a genotype and low yield and the b genotype leading to high yield The next marker again, not a lot of difference Next marker again, not a lot of difference and so forth and so forth and so forth So you see that in the end we are creating kind of a profile across the genome Or across a chromosome Here there's something interesting Because here we see that this marker actually gives us the opposite information compared to the marker here Here all the individuals carrying the b allele Are having a low yield all individuals that carry the a allele are having a high yield So in this case, we cannot decide if it is marker two, which is responsible for the yield Or if it is this marker number 089 9 Right, we we can't decide so there is an association with marker a with marker two. There's an association with marker 9 But in this case we cannot decide which one of the two it is But this is just how QTL mapping works So this is all you just get the yields you get the genotypes measured using your snip chips And then you just look if the mean of a is smaller than the mean of b or if the mean of a is larger than the mean of b And so on Is this clear? Everyone's still awake Shout out in chat But that that's all that's that's what linkage analysis is this is this is One of the most famous papers in The history of of genetics They just describe doing this Calculate the mean of a calculate the mean of b Compare them if there's a difference. That's where your gene of interest is. There's a gene there which causes your phenotypes to change 1989 Right, so that still then you could write papers which like nowadays seem very very smart It is a very smart idea actually that All right, so there's two types of effects, right? So if we if we go back With the a a a b and b b, right? So if we look into an f2 Then we can have an additive effect of a marker So that means that having one copy of the b allele increases your phenotype by x having two copies of the b allele Increases your phenotype by two x, right? So this is called an additive Effect because the b allele has a certain effect It makes you bigger or it makes the yield bigger having two b alleles just makes the yield twice as big So that that means that an additive effect means that alleles are contributing equal And it looks like this Then there's also a dominance effect that the marker and that means that one allele dominates So if we if if we do a certain marker, right? And we have an f2 population and we see this structure Then we say oh, this is a dominance inheritance effect and the allele here, which is dominant That's a good question Because you've been listening to me. What is the dominant allele? I've actually asked this question at a phd defense of someone getting his phd thesis and they weren't able to answer the question But you guys should be now Because you follow the course, right? So So in this structure, right? Which allele is dominating? Is it the a allele or is it the b allele which is dominating? Yes b. Yeah, because if you have a single b allele You jump up to being the high uh phenotype So this is just a dominance effect So you have the additive effect you have the dominance effect and there's all kinds of other effect And like I showed you sometimes the children are higher Than the parents sometimes they are lower So you can have like over dominance and under dominance and there's all kinds of different structures Sometimes the a group and the bb group have the same mean and the the the a b group hovers in between But that's that's the basic idea So here we are only looking at the effect, right? So we're looking if the mean of a is different from the mean of b But since we are doing science we also want to know How likely this is right because there might be a big difference between a and b But if the standard deviation of the a group is very big and the standard deviation of the b group is very big Then this effect might still not be real. It might be that there seems to be a big effect But this is just because there's a massive standard deviation in both a a and b b. Um, so How do we how do we then do this? So in the previous example, we use the means So we are mapping the effect of a quantitative rate the trade locus and how likely this effect is we can test Just using statistics so we can just do a basic t test in the case of uh, two alleles, right? If you have only a a and bb then we can just t test the um, the a group versus the b group But in in many cases if for example if we do an f2 cross where we have three alleles, so we have a a a b and bb Then of course this effect is is more difficult to disentangle So and then we need to use something like an ANOVA or we need to use something like linear regression to figure out the effects in in this situation so I like t testing some people like ANOVA's I like ANOVA's as well some people like linear regression I like linear regression as well, but it's up to you what you want to use you can use any kind of statistics To kind of show that this is a significant effect When we do This p-value calculation, right? We calculate a p-value for each of these markers But in qtl we always show lot scores and that is because um Because of the reason that it that it looks better So what is a lot score a lot score is a logarithm of odds? So what we do is we take the minus log 10 of the p-value So that means that um if you have a p-value of 1 times 10 to the minus 5 Right because we take the minus log 10. This just becomes a score of 5 Um, so if you see a lot score of 5 then that means that the likelihood that this effect is real is 1 times 10 to the minus 5 If you see a lot score of 30, it means that the probability of this Effect being real is 1 times 10 to the minus 30. Um, and it just looks better when you when you plot it Of course, we have to deal with multiple testing as well Um, because we are doing a uh an association analysis We're not just doing a single test, but we're testing like Every marker, right? So we do a hundred we do a hundred markers. So we have to correct for that Um, yes, so we have to do a multiple testing correction And we do that just by using a by adjusting the the lot threshold So the lot threshold just goes up the more markers we test to hire the threshold And the lot threshold in QTL mapping is determined by minus log 10 of 0.05 or Depending on what if you want to have a suggestive a significant or a very significant level And then you divide the 0.05 by the number of markers times the number of phenotypes So it's just the basic Bonferroni adjustment, um of your, um Of your of your p-value and then you take the minus log 10 of your adjusted p-value and this is called the lot threshold All right, so we're going to take a short break here. I've been talking for almost more than an hour already again. So I will