 For this last lecture, I'm going to tell you about quantitative genetic analysis of natural variation. So the question here is looking at evolution within a species, so intraspecific evolution, and asking if I take two organisms from outside, and they are phenotypically different, or not necessarily actually, you want to know what's the genetic variation that's causing the phenotypic variation between them. And this is only possible looking really at the DNA level and finding the causative mutation. This is only possible in the last 10 years also because of advances mostly in genomics. And so I will give you some clues at least of how we proceed to do that. So the outline of this morning is this. So I'll give you some background on quantitative genetics. So you'll notice there is quantitative in the term. And this field is actually population and quantitative genetics, our field of biology, which have been quantitative from the start. And then I'll go in the principle of genetic mapping for those of you who don't know anything about this, and how we can identify the nucleotide mutation basis of natural phenotypic variation. And then I'll come back to the C. elegans vulva to show you an example of what we did to try and identify one of these variations in the process that doesn't show at the level of the final phenotype. So first, a quantitative trait, I guess I don't need to define this here. It's just a phenotype, a trait that's measured on a quantitative scale, which is different. So I'm sure you all heard in high school about Mendel's piece and the basics of genetics. So if you think of Mendel's piece, basically they are green or yellow or they are wrinkled and smooth. So it's a discrete trait. Was a quantitative trait would be, for example, the height in humans or any level of activity of a signaling pathway, for example. And so there was a huge controversy at the beginning of the 20th century between Mendelians and what the school that was called biometrician. So Mendel had really wealth and this is just, I mean there might be a little bit of conscious selection but I think that's how we found his basic laws is looking at characters that would segregate like a single trait in his crosses and that were very characters that you can put as discrete traits, not quantitative. So that made it possible for him to find this segregation as single loci. Now another school was, so Gelton was the cousin, the young cousin of Darwin and E and followers started something called biometry, which was a quantitative part of biology. And so from them, so Gelton was interested in human stature. Human height, also in intelligence or this type of thing. But let's keep with human height. The problem here is that first of all in humans you cannot do crosses willingly, I mean the experimenter cannot. And second, this is really a continuously a Gaussian distributed trait more or less. And indeed there is not a single locus. Still to our days the genetics of human height is a total mess. There is no strong, there is not a single variation that causes a large difference in height. Plus it's extremely sensitive to the environment, to what children eat or things like this. So this is a very different type of trait from Mendel's traits. And so the biometricians like a person well done a little later around 1900, so a natural variation so like Mendel's trait has been very abnormal recent mutations. And so real variation has very continuous for existing in population. And following Darwin and this was one of the strong points in Darwin's books is that evolution occurs through a gradual change, a very gradual change. So gradual depends on the scale of time you are taking obviously. But they absolutely refuse the idea that you had this strong mutation that were involved in evolution. On the other hand you had people like Betson and De Vries who are, so you may know that Mendel worked in the 1860s more or less. But his work was not really recognized for 40 years. And it's in 1900 that several people came back and re-found its laws. I think, for example, you go to a race with plants. And for them, again they chose characters which were discreet with a single locus that was explaining the segregation. And so they saw evolution as the fuel of evolution being denouver mutation, not variation standing in the population, and evolution in germs. So there was a really very strong scientific fight in the first decade of the 20th century until a few more quantitatively inclined people realized that as soon as you have a few factors you can turn a discreet factor distribution into a Gaussian distribution. I think for you it will be pretty obvious. But this is the work of these two people here in the 1910s, I think. So just if you take two genes which have two alleles. So an allele is one form of the gene. So we have two genes A and B, and they have two alleles, and usually big A is the dominant. But here they're actually not dominant and possessive. They are the, oh no, they are actually, yes. So you have two loci here, and you have two forms, large A small A, large B small B, and depending on the combination of alleles you have you get a color which is more or less intense. And so from this you already get this type of curve. If you have a population that's equally populated by all these genotypes, you will get a quasi Gaussian distribution already, okay? A second factor that contributes to distributing the phenotypes is the environment. So yesterday I told you that there are several origin of variations. You have a genotype environment, so including the parental genotype and environment. And then you have stochastic variation that basically you cannot control in experiments. And so this experiment by William Johansen at the beginning of the 20th century actually is really the base of quantitative genetics. So what he did here was to take beans with a size distribution, so for some reason the large one on the left and the small one on the right here. And what he did was a selection experiment, a little similar to the butterfly experiment I showed you yesterday. So it took the extreme of the distributions at each generation for a while. So the large ones or the big ones. And then he run the experiment like this and you still had a Gaussian distribution which displaced itself. So at the beginning you had like this. So if you take this part you're going to displace your distribution a little bit and you take again this, you're going to displace it and so on until there comes a moment where you can take this and you don't displace at all the distribution, okay? So there is, that doesn't seem to be any basically selection here, artificial selection, not natural selection but it's the same process. Cannot improve the trait. You cannot get it larger by just selecting the end of the distribution. And so what's in fact is that he was here in the second phase where he had pure lines so either with a large size or a small size here and he could not select further. So there was no genetic variation he could select on. And the reason for this distribution now is purely environmental or stochastic, okay? So it's from this type of experiment that he actually coined the term himself genotype and phenotype because here to a given genotype you have a range of phenotype. So you really need to consider two different things. You cannot take just one phenotypic class and assign a genotype. So now if you add up the fact that you can have many factors and or several factors and you have also what's called environmental phenotypic variance, you can get distributions, various distributions. So here what's represented is that you take a parent P1 which has a small value and a parent P2 which has a small value in a diploid. So most, the diploid means that the species have two sets of chromosomes like us. So in the F1 generation you're going to have heterozygotes for every chromosome. And then that's where the meiosis that's important takes place. And so what you're looking is the F2 generation and the distribution in the F2 generations. So in a normal Mendelian trait you get I forgot what's recessive and dominant with wrinkled and smooth but you get one fourth of phenotype 1 which is the recessive phenotype and three fourth of phenotype 2 which is the dominant. So here now if you have a quantitative trait instead of having a clear just two classes of phenotype, you can start having a distribution due to environmental variance. So this is a case with a single locus. You just have some, you can have measurement error, you can have environment you don't control or true noise in the population. And so here you have a three quarters here and a quarter here. Now if you continue to complicate the thing, for example, if there are both co-dominants, so the heterozygote has an intermediate phenotype, so now you're getting three distributions. And depending on the strength of the effect of these locus, they're going to be rather far apart. So here you have an allele effect of three and on this scale down you can read. And here if the allele effect is smaller, you're going to have almost a tight distribution. Now if you take a second locus and for example you have all additive effects, so additive effects means that, so if you have locus A with A1 and A2 and locus B with B1 and B2 and the difference between A1 and A2, for example, has an allele effect of two, this means that A1, if you have on the scale A1, A1, A1, A2, A2, A2, and for example the baseline is one, so this is going to be one, three and five, and then you can also add, do the same with B1 and just add the numbers, okay? It's really you, which is not very realistic in biology because if you have variation at two genes very often they are not going, the response is the system is not going to be linear so you're not going to get this. But very often for simplicity in quantitative genetics they assume additivity. That's one of the many assumptions of quantitative genetics that is not very realistic. Anyway, here just for the sake of pedagogy, here we have two loci with additive, so they are both co-dominant and with each other they also add up and so here you really get a quasi-Gotion distribution. The power of mapping is that now, and I'll show you that better later, you can actually, if you actually know the genotype along the chromosomes of all these individuals, you can actually realize that they are distributed differently, yeah. This Gaussian distribution is not due to just environmental noise but is actually due to a genetic basis. Okay, so to come back to history basically, we had in the middle of around 1930s a synthesis, what's called a synthetic theory of evolution, where several fields and especially genetics because in Darwin's time he had no idea of heredity. He actually had weird ideas about heredity. Darwin kind of believed in heredity of a quiet character. But so when the basis of heredity was found, the synthesis was to get natural selection together with the laws of heredity and also the fact that you had no contradiction between this discrete Mendelian heredity and continuous variation. And so came at that time two different fields, one called population genetics, which basically cares about the propagation of different alleles in a population, not caring about the phenotype just using a selection coefficient that would explain how much they are selected or not selected. And so population genetics is really the science of propagation of alleles. And then you have quantitative genetics, which was historically built to help breeding in agriculture. So the history of this science is very strange because of this. So it was a time where they didn't know about DNA and their goal was to perform selection experiments like this, but on a chicken or on corn to improve this. And so one of their main concern was to know how much of their variation they had, for example, was of genetic origin versus environmental origin. And so a lot of it is partitioning the variation that exists in the starting population into genetic versus environmental basis. So here I'm taking a cartoon from a colleague of mine to try to present the different types of genetics here. So what is represented here are genes, the transmission, so really the heredity mechanism, the genetic effects, especially through development, for example, the phenotype and the fitness, which is what's important for the dynamics of natural selection. So if you take Mendelian genetics, so the genetics of single locus, which is often the lab genetics of developmental geneticists, here what you're carrying are the genes and the phenotype, and obviously the transmission of the gene. In population genetics, the phenotype is absolutely relevant. The only thing you attribute is a selection coefficient. So it's coming here, so you have the transmission through generations of genes, allele at different genes and fitness. But then molecular genetics, which is, so the dotted lines here represent the fact that in old Mendelian genetics and in population genetics, you don't identify the loci themselves. So the genetic basis is in firm, but you couldn't a long time ago identify the genes. And neither in population genetics. So it's taking, usually theoretical, or it's taking an average of many genes adding up. So molecular genetics, which appear in the seventies, eighties. Now you can identify the genes, which are the basis of, for example, a mutant phenotype. And this is typically developmental genetics. Quantitative genetics as said cares about fitness, and especially when you're doing artificial selection, the fitness is the artificial fitness that you decide that the milk is good or not good, or enough or not enough. You don't really, so the genes are underlying, but again, there are dotted lines. You don't identify the genes themselves. It's only the overall genetic variation that you're trying to touch. And so you're selecting on the phenotype. And in actual evolutionary quantitative genetics, fitness is the fitness of natural selection. And again, you're looking at the propagation, at how phenotypes will be transmitted to the next generations. Now, what we can do, I think today, is a new synthesis of all this, where we can actually identify four, you know, relevant traits in nature in terms of natural selection. We can identify the fitness effects and their genetic basis, really, at the molecular level. And without forgetting to look, so some part of quantitative genetics, modern quantitative genetics that goes down to the gene actually just goes from genes to phenotype and doesn't really care about genetic effects and development. And I think an interesting synthesis is to actually take everything into account. So the question now is how do we get here to the actual gene that's underlying a phenotypic variation? Do you have questions so far? Sorry? That's a good point. So as I said, the environment is a key factor. And basically, for all this science, especially for breeders, there has been a historical, I mean, environment is noise, basically, for them because they want to improve the genetics. So you're completely right that environment is a little too forgotten in all these pictures. This is a very good point. The second reason now if we, you know, on our part is the environment is more complicated and an environmental effect. So how do you study an environmental effect, for example, for a developmental biologist? And basically, you know, we cannot to dissect the environment, so you can do chemistry, you know, you have a pheromone and you purify the pheromone. But at the end, what we do as a laboratory geneticist is that we end up doing a screen for mutants that don't respond to this cue, for example. So we try to simplify the problem for us by turning it into a genetic problem. Okay. So genetic mapping. So the goal is to identify the variation at the level of DNA that's causing a phenotypic change or part of it. It can be a tiny little part of this change. And so I will go on all this. But the idea is that so you use, so as you know, like an animal or a plant or fungus has different chromosomes. So let's say chromosome one and chromosome two. And you're trying to find along these chromosomes sites where you know that there has been, for example, a transition from A to a C. Or here there is a deletion of a four base pair. Or here there is what's called a microsatellite. So repetition that's at 17 times in one parent and 24 in the other. And so you have molecular markers along the chromosomes that you can use. And you have what's called a genetic map that relates them to each other. Okay. So a genetic map is a map of chromosomes, which are called the linkage group, because they get transmitted at meiosis together with these markers. I come back to it. And then using this recommendation at meiosis, you try to see whether your causative locus in such a cross is segregating together with these markers or not. So just again, high school biology, I think, but so at meiosis. So during reproduction, you have meiosis. So you have when you have a diploid organisms with two pairs of chromosomes here represented in blue and red. So it could be coming from the mother and the father, for example. At meiosis, the two homolog chromosomes that have also duplicated before undergo crossing over. So for example, you have an exchange here where you have this little piece of chromosome from the father that hooks up to the mother chromosome here. And so you end up in this particular case, for example, you have here this chromatic. So this one half of the chromosome that's not recombinant. It's completely blue. This one is completely red. But these two chromatics in the middle are recombinant. So if you have three markers here, A, B, and C, you will see here a recombination between B and C. And so the closer you are on the chromosome, the more likely you will segregate together. And so the matrix that's used, which is the santimorgan, is the proportion of recombinant gametes out of the total number of gametes. So this is the base of establishing a genetic map like this, where you actually, if you take two markers, you can calculate their distance by measuring the proportion of recombinance between them. So for mathematical details, it's actually more complicated because this matrix is not, you cannot add up genetic distances because you may have two crossing overs in the interval and you can also have interference. So when you have a crossing over, you're more likely or less likely to get a second one next to it. So because of this, the genetic distance need actually correction. The other point is that, so the maximum distance that you can detect is 50 santimorgan because it's 50. So if you have two different chromosomes, the probability, imagine that you had a second chromosome here, chromosome two, the probability to reassort the chromosomes at meiosis is one half. And last point on this, so what's the relationship between the genetic map, so the genetic map is in santimorgan. So this may be the map of a species where you say that the zero is here, that's 50 santimorgan, 15, 22, and 35. And so this is a genetic map in santimorgan. And now we have genomes where we have what's called physical map. The physical map is the nucleotide sequence. And so on the physical genome, the matrix is kilobase or megabase. And the relationship between santimorgan and kilobase is not one to one. So here, for example, you have what's called the marine map of C. elegans, where you have the physical position in the x-axis and the santimorgan, the genetic position on the y-axis. And what you can see is that for all chromosomes, you have the same pattern. And this is true for C. elegans, it's not true for other organisms. Every organism has a specific pattern where in the middle here of the chromosome, the slope is much lower, which represents the fact that recombination in this region is low. So in C. elegans, most recombination occurs here on the arms of the chromosome. And now we understand more or less why, because of the mechanics of meiosis. So recombination is this, sorry. Recombination is equivalent to crossing over. You recombine the chromatids. Other questions? Don't hesitate if there is a vocabulary like this, I'm not even aware. Speaking of vocabulary, I put a few definitions here. So we saw a quantitative trait is something called quantitatively. What's often called a complex trait is the fact that it doesn't segregate as a single locus. So for example, a human height is a complex trait. And the one Mendelian trait is used differently depending on the situation. It can be used to denote that it's a discrete trait. It can be used to denote that there is a single locus. And it can also, Mendelian inheritance is also used to denote something that has a normal nuclear genetic pattern of inheritance. So non-Mandelian inheritance is something like mitochondrial or other forms of, for example, DNA methylation or small RNA inheritance. So this is the first question, yes. So again, this depends on the organism. C. elegans is very clear that the genes in the center are genes that are highly conserved across species that don't evolve fast, whereas those on the arms are gene families, which are genes that are proper to C. elegans, that we have no idea what they do, actually. And that evolve fast, that's where transposons accumulate and things like this repeats. Now, what is the evolutionary dynamics behind this is a more complicated issue. You also have, so this is the general pattern, but you can look, you can zoom up here at a given position and it's extremely, it's also very, very local. You have also a fine-grained combination. And this is being studied at the moment and I think very soon we'll understand better what this fine-grained, where does it come from, whether it's always, you know, under selection and there for a reason, I don't think so. The global pattern for sure, I mean, so sorry, I don't want to, the hand-waving arguments is that recommendation is good to make evolution go faster and so having things which evolve fast or that you have to get rid because of deleterious mutations on the arms is supposedly good. That's the hand-waving argument. In C. elegans it's particularly hand-waving knowing that we don't have much out-crossing at all. So just look at the X and Y axis. So the X axis is the physical position, so the genome position, and this is the genetic position. So here, for example, you have this is not the, so yeah, let's say here you have 100 centimorgan and 200 centimorgan. So this reflects the fact that, so if you attack 10 centimorgan here, it means that in a given cross, this was a cross here that was more complicated than just this, they had 10% of recombinant. So you have a direct link between this genetic position and so the fact here that you have a lot of recombinant even though on the X axis, it's not a lot of DNA compared to this means that you had more recombination in this region. No, no, here there is no notion of genes. Genetic position has nothing to do with the notion of genes. It's positioned on a genetic map, so it's the map of recombination. Okay, the word genetic is a gene. No, I think I removed it. It was on this slide. So genetic at the time where people didn't know what DNA was meant that there was a genetic basis in this type of setup, right? And here in terms of recombination, it means that it's the genetic map is the map of transmission through recombination at meiosis. But again, there is no notion of a gene, no molecular gene that we know where you have, let's say at every region and open reading frame and so on. You could have DNA which doesn't do anything, you know, doesn't have any or for anything and you could still have a physical and a genetic map as long as you have meiosis. Okay. So a term, a jargon term that's used a lot is QTL. So for quantitative trait locus, what it means is that it's a genetic locus and ideally now we can identify, sometimes we talk about QTN, quantitative trait nucleotide, we can identify the nucleotide that has changed and affects the phenotype of interest. So quantitative trait locus is a locus that we can, I'll show you later, but that we can map on the genetic map but we don't necessarily know whether there is a single nucleotide variation behind. There might be two different ones, for example, but it's an operational definition in a cross. And another jargon is the notion which is I think an important concept though, is a genetic architecture. So if you take the trait, for example, a human stature in the human population, you can ask what's the genetic architecture? So how many underlying loci, how do they interact with each other? Is it all individual additive contribution? Is there a non-additive effect? And so on. So simple genetic architecture, maybe the Mendel's piece, where you have a single locus. A complex genetic architecture is something like human height. So this is, so here I have a genetic map where I have markers one, two, three, and so on, equivalent to this. And so when we look for QTL, we're looking for place on this genetic map relative to markers that we can genotype that appear to be linked through meiosis. So the variation here appears to be linked through meiosis with the variation in phenotype. And so the principle of detection is very simple in terms of, so you genotype different markers. And so, for example, I have a marker here with two alleles P and J. And you look for those animals or plants that have a genotype PP, those which are heterozygotes and those which are mosaicots. What's the distribution of phenotypes? And so here you may have a typical case. You have some kind of a Gaussian distribution for each of them. But that seems to be an actual relationship between genotype at this marker and the phenotype. Okay. So you make a statistical test. It's very heavy in statistics, this field. And here you're going to find that there is a statistical association between the state of this locus, this marker locus, and the phenotype. The marker itself is not causative. It just means that if this marker is marker one here, it's probably located close to the actual causal locus. Is that clear? So you're using the fact that at meiosis regions of the chromosomes which are close to each other physically are also are going to recombine together to see whether markers that are present naturally and that your genotype on the chromosomes are segregating together with your phenotype. And so in this field, what people use is what's called a lot score for logarithm of the odds based on a likelihood model. So likelihood of a model with a QTL at this position next to the marker versus a null model without the QTL. You mean the distribution is tighter here? That's what you're saying? No, no. So this way happened and that's interesting. But I mean, this is just one example. No, no, it's not the general case. So coming back to the sources of residual variance here. So there are many sources. First of all, this marker is not your causative locus. So you still have recombination between the marker and the causative locus. So this may add a noise here. And therefore was designed interval mapping, which basically takes into account this and you take into account genotype at several markers to account for this. You also have the rest of the genetic background. And so this is taken into account in composite interval mapping, where you actually do a multiple regression on several places in the genome. And finally, you have environment. And the best you can do is to control your environment and your design as much as possible. The problem being that you actually detect the QTL if you have too much environmental noise, you will not. So there are two main ways to proceed experimentally with this, which are often called linkage mapping. It's all based on linkage. But in the jargon, this is called linkage mapping and association studies. So in linkage mapping, what you're doing is that you're doing the crosses yourself in the lab. And I will show you example. So you're using recombination that you produced yourself in the lab. In association studies, so this you cannot do with humans, of course, except with humans, you can use pedigrees of families, which are more or less equivalent to that, except you don't have so many progeny in humans that you can have with model organisms in the lab. The second method is to take use of recombination occurring in the wild. And so that's what human geneticists do a lot in the last 15 years or so, is to use. So we have a lot more in the lab. We can produce a few rounds of recombination. But in natural populations, for example, of humans, you have had historically a lot more recombination that you can have in two or three generations in the lab. So the chromosomes, our two chromosomes of each of us is really a mosaic of different parts of chromosomes that have a history. And because of this, the fineness of recombination in terms of fine on the genetic map is much better. So you get a much sharper signal here. The problem of course here is that so you can control the environment if you're taking the elegance from the wild, bringing them to the lab, and you control the environment. With humans, we cannot control the environment. So association studies have a lot of caveats. And in any case here, the other problem with association studies in humans is that they are very difficult to actually confirm in any way. So it's just based on statistical results. And you cannot check it. If you can cross organisms in the lab, you can check, you can do transgenesis, you can replace the locus, you can actually prove that your locus is causal. No, of course, there is mutation. So in the lab crosses, we neglect a mutation. Okay. Here the mutation, I'll come back to it. So yes, so the problem is you get new mutation. And so the linkage decreases over time with this new mutation. There is a whole history of mutation and recombination. So first lab crosses, so which are often called linkage mapping or traditional QTL mapping. There are many designs. And I will show you a few. The easiest one is just an F2 cross, so the same cross that Mendel was doing. You take two parents, A and B, you cross them. And so the actual recombination occurs here in the F1. And in the F2, you get a mosaic of the parental chromosomes. Okay. The problem here, as I said, we have little recombination events, so you have little resolution on the genetic map. You may have a lot of F2, so you may have a lot of power, though, to detect something. Okay. A design that's often used is a back cross design. And this is true, especially in organisms where having homozygot pieces of chromosomes are deleterious, so where you have inputting depression. It can be difficult to cross F1s and get an even distribution of all possible F2s. So very often the design then is a back cross where you back cross the F1 generation to one of the parents. So here you get at the end about three-quarters blue and one-quarter red aliens. But you still can do a similar statistical association between genotype and phenotype. And finally, especially in organisms like Arabidopsis or silicon that can self easily, what you often do is in the F2, the problem is that you have a lot of combinations with a lot of heterozygotes and mosaics at all the loci. So to simplify what we do is just self them for 10 generations and then obtain at the end what's called a recombinant, because they have recombined, inbred because they have been inbred through generation by selfing lines where you have a mosaic, but it's a homozygous mosaic of the two parental genotypes. So recently things have changed a lot because sequencing has become much easier to sequence the whole genome. So what you can do instead of genotyping marker by marker along the chromosomes is that you can directly sequence the line or the animal, the F2 animal, because it's still expensive usually to sequence 1,000 animals. What you can do is pull designs where for example you take the extremes of the distribution and you sequence them as a pull. And so what you're going to get at the end is you have the blue parent versus the red parent, so you have the frequency of the blue alleles along the chromosomes. If you are on the chromosomes that's absolutely unlinked to the causative mutation, you're going to have about 50% of each of the alleles. Now if you have a chromosome where you have a causal locus, you may have close to 50% here, then it decreases here, for example like this, which means that here you have, this is the percentage of blue. For example, if you got your pull, it's the pull which have a low value of the phenotype. It's probable that the red alleles here is causing a low phenotype. So then this is the way to map the region. And of course the larger the pull is, the better you will, if you get a larger pull, you're going to get more recombinance and so you may actually get a finer interval here. And the best way is what's called extreme mapping, which is basically, instead of you phenotyping all the individuals, you let nature, our lab experiments phenotype them for you. So for example, you can use a selective medium or a pathogen, let's say you're interested in resistance to a pathogen. What you're doing is that on your F2 population, for example, you separate it into one half where you add the pathogen and one half where you don't add it. And after several generations, for example, you can sequence the population with the pathogen or without the pathogen and see again differences in other frequencies. So now the association studies and here we'll go back to this problem of mutation and linkage. So we have this concept called linkage disequilibrium, which is called LD in short. It's also easier to say. If you consider two loci with two alleles, A and B here, linkage equilibrium is when you have the same proportion of all combinations. Linkage disequilibrium is when it's not equal proportions. And the matrix is that, which is basically equivalent to this. So it's the frequency of large A large B minus the frequency of each. So for example, in this representation, I usually have large A with large B and small A with small B. So here I have a very strong linkage disequilibrium. And D here is zero at linkage equilibrium. Okay. There are other metrics which are scaled versions of the same or so this goes between minus one and plus one. This scale to the maximum you can have given the frequencies of your two of your four alleles. And then this is which is basically the squared correlation and the correlation between alleles. Now the interesting part is indeed when you get a new mutation, what happens to its linkage with the background? So let's say you have a chromosome that's, you start with a piece of DNA. You have a population here where you have this and then you have another one which is, you have a whole population of molecules with different genotype which have been evolving for a long time. And then comes a mutation for example here. You have a T here and suddenly comes a mutation here and you transfer this is mutates to a G for example. And so now this chromosome here is going to be linked to this new mutation. And so what you're measuring in this graph is the time it takes for this G to be to be unlinked to another marker or to be less linked to another marker. So it's for linkage disequilibrium to disappear. Because over time you're going to have recommendations with the other chromosomes. So this G for example is very soon going to recombine with this A. And so linkage is going to get smaller over time in the population here. And but it will be tighter with this A for example. Sorry it's my hand waving mathematical way of doing this. And so what you're seeing here is this coefficient of LD. This is a simulation or I guess or an electrical result I don't know where for different recombination rates here. So if you have no recombination it's going to be to stay the same. If you have very strong recombination it's going over generations to decrease very fast. And so this degree of linkage disequilibrium is a biological property of each species basically. And so depending on the species you're talking about you have very different degrees of average linkage disequilibrium in the population. So if you take Drosophila for example the half decay rate is about 100 base pair on the genome in average. In humans it's about a few kb. So this is kind of ideal for mapping at the level of a gene for in human medical genetics. In C elegans because there is little recombination it's on the order of megabase. So this is really something and this is not true for example for normal male female species of close to C elegans where it's very much like Drosophila. You have a lot of mating and recombinations. So this in natural population depending on the size of the populations depending on all sorts of structures and many other parameters you're going to have a short or long range. So the use of association mapping in natural populations is very different in different species. So in Drosophila it's difficult to do. There are advantages and disadvantages on both sides. I'll show you an example in C elegans. We have a five megabase so this is totally unlike association in humans where you get very sharp piece for example. And the advantage in Drosophila is that then you can do association within a genome. But it's not at least until now until world genome sequencing it was not used for much because you would have to sequence the world genome to find your the region. So here it's a plot representing in color the degree of linkage is equilibrium on the genome of a cow actually interesting. So here you have the 29 chromosomes of the cow and as you can expect the diagonal is pretty red so linkage is here is high on the diagonal. But you may also have places in the genome where you have actually linkage in the population between them either because there is a structure in the population so in cows you have that trace of cow and that trace of cow or it can be geographic in other species. And so to have a picture of this in a given population you're doing the association is very important. So to just to come back to comparing the two methods the QTL and through linkage through crosses in the lab and association. So you're not looking also at the same thing biologically because if you choose two parents you're only going to address the variance in these two parents which may be what you're interested in or it may not be was here you're looking at the world population. Here the resolution is low because you have few recombination was here it's much higher at least in some species because there is a lot of recombination in natural population through many generations. The power which is the the ability to detect something of small effect is much better in the lab where you can control the environment and where you can have a lot of progeny. And the other problem is that here you have this structure of populations for example races of cow or structure of human populations and you have to or in humans obviously environmental differences between people. So you have a lot of confounding factors here that you don't have here. A second point before I give an example is this question of power to detect a loci which have only a small effect. Here what is drawn is so this is a system that has been studied a lot in Drosophila which is the number of whistle they have on the body and these are two parts of the flyer body doesn't matter what and so what you're seeing here is the number of QTLs so number of regions of the genome they didn't identify the nucleotide just the region which have been found to be to have a certain strength of effect. So zero is here in the middle and you have negative effects or positive effects it could add up on one side doesn't matter. It looks more or less a Gaussian distribution with few loci of large effect and lots of loci of relatively small effects but there is a hole in the middle and this hole is the fact that you cannot detect things of tiny effects. So you actually if you think it's really a Gaussian distribution you should imagine that you're losing a lot of of variation genetic variation in the genome that have very small effects that you have trouble detecting yet maybe very important in the wild. So this is a conundrum for evolutionary geneticists is that basically maybe what we can study in the lab is just the gross part of the variation and whether this is the most relevant in nature is not clear. If evolution is really mostly through gradual change we are completely losing this part. If evolution is through de novo mutation with large effects we're going to detect them more easily. So just by the method we're using we're biasing basically the result. Okay now I will give you two examples with C. elegans so which come from my lab so what the first one is not vulva for those who get allergic to the vulva that's a good news. The first one because just what we did in the lab so the first one is an association study that we did to look at sensitivity of C. elegans to a virus that we had found. So I just come back to the mode of reproduction of C. elegans because I missed this in the previous talks. So what we've been talking so far are the amaphrodites which have a vulva and the amaphrodites their body looks like a female of the closely raised species and they make sperm first and then oocytes throughout adult food and so the oocytes are fertilized internally and then they exit through the vulva. You also have males which are X zero so a single X that arise either by non disjunction of the X chromosomes at meiosis or as cross progeny when you have a male and you cross it to an amaphrodite you get half of the cross progeny which is X zero as you would expect and so the male is very different body wise and makes sperm okay. So practically C. elegans reproduces most often by selfing so an X X amaphrodite gives rise mostly to X X amaphrodites and sometimes to a male but when you have a cross the cross progeny is half half. So in this way we can have stocks which are always overzygous but we can also do crosses when we want and so if you look at natural populations so we've been sampling C. elegans in everywhere and what you're seeing here are the six chromosomes of C. elegans and on this axis is a hundred white isolates of C. elegans and the color represents the fact that so look for example at this chromosome that for 90% of this isolates in the center of chromosome five where there is little recombination you have basically all the same genotypes throughout the world okay. So this is very special I mean C. elegans is the extreme in having very little out crossing and this is supposed to come from the fact that if you had a positively selected mutation on this chromosome that is spread throughout the world it took with it basically half of the chromosome okay. So here linkage disequilibrium on this chromosome is enormous it's less so on other chromosomes and here you see clearly that the arms have a lot more recombinations usually you can see. So we isolated many pathogens from the wild on C. elegans and so the one who are of interest here is a virus that we isolated from a C. elegans strain next to Paris and so it's called the RC virus it's an RNA virus so this means that it's the RNA molecules of the viruses that replicate and so when we assay for the presence of the virus we assay the RNA level and just so the experiment we did here was to take 100 isolates that had been genotype as I showed on the colorful slide before so these are the 100 isolates and for each of them what we did was to infect them in the lab with the virus and to look after a couple of generations at the level of the virus in the line and so this is what's the phenotype that's represented on the y-axis on a log scale and so here you have a pretty continuous distribution and so from that we did an association mapping so we used the genotype that our colleague Eric Anderson had found here and for many markers along the chromosomes we asked in this hundred isolates is there a statistical correlation between the level of this phenotype and the genotype and the answer was yes so here you're seeing the six chromosomes and here you have the log of the p-value that you have an association with a threshold that's calculated by permutations and what you see here is that in the middle of chromosome four we have a very wide peak which makes five megabays here and so I'm going not to give any details on the molecular things but from this what we did was to continue taking two parents from the distribution of the hundred isolates isolate this region by crossing one parent many times in the other reduce the interval by recombination and we identified the causal locus here so here it's a case where we can start from an association and because it's a model organisms in the lab we can actually confirm we confirm the the molecular identification by rescuing the the one which was defective now I'm going to give you a second example which is on the vulva and that's now using mapping in the lab exclusively not association so if you remember we have this and again we're going to neglect piece 3p for today it's just looking at the central cells here I told you we have the same fit pattern in the whole family but so also in all white isolates of c elegans and the problem then is to to detect a variation between these white isolates and and so you are on this curve that I showed you yesterday where you have the level of the signal that's inducing the whole system the number of induced cells which is three normally if you vary here we found a plateau here so we think the wild type is here the problem is to see differences between the different wild types we need to perturb the system so one way with we I showed you between different species was to remove the anchor cell and reveal different patterns here now we are within the same species so we can do this and it doesn't reveal a lot a little bit but mostly what we did was to introduce the same mutation in different wild backgrounds of c elegans so you you destroy you basically displace the system either taking a mutant that this place here or up and down you're asking with the same mutation does the background genome of the white isolates have a different effect on the expression of the mutation so to do this we took many mutations that were I mean this was done a few years ago now we could use CRISPR for example to introduce these mutations in different backgrounds but at a time what we did was to take mutants that were obtained in the reference background of c elegans called N2 and just introduce them many times by back crossing them into these backgrounds and then score the phenotype and so here we express the phenotype as the number of induced cells which is three in the wild type and what you see here is that for example for this mutation in the receptor of the signal it has a strong effect in the background in which it was found but it is suppressed in these backgrounds for example a b1 okay so in both cases it's a partially penetrant phenotype but the level of penetrance depends a lot of expressivity here which is the proportion of induced cells depends a lot on the genetic background conversely this mutation which has little effect in this background has a much stronger effect in other backgrounds and interestingly this pathway was not found in the reference background in the screens that were done in c elegans so to come back to a question I had the other day one can ask actually so in this background do I have because this pathway is used many times during development and behavior of the animal is it the fact is it is this mutation actually suppressed in this background for all tissues and so we looked at this and this is a very complicated slide don't look at it but basically the answer is no there is also a tissue specificity of the regulation in the different backgrounds so here you have two different phenotypes and what you're seeing is that the lines are not parallel is the main message so that the penetrance of the mutation in different backgrounds which are the different colors differs a lot so for example here this phenotype is much more penetrant in the blue line than this one was here they are equivalent in the green line and or the orange line so we revealed I mean so in the cartoon way basically we have here these two isolate the reference and two and this other isolate a b1 they have different genotype and these results actually and we could measure this with markers in different level of activity of the rest pathway so this pathway of induction the blue pathway in my slide but because there is buffering of the system it actually doesn't change the phenotype now we could reveal this using a mutation and so now we have these two parents which have a different induction index for the vulva and so now we can ask so what's the molecular basis for this difference what's the genetic basis for this difference so what we did was to take these two lines so these are these parents with the mutation in them so one parent is in orange and the other one in blue plus they have this piece on chromosome two of this mutation and then we did an f2 cross and then as I said we did this selfing to get only homozygous these recombinants in bird lines and then from this we genotyped and we phenotyped we genotyped along each chromosomes we phenotyped the lines and therefore we could obtain so this is a picture let's look just like at the middle row at 25 degrees what you're looking are here is the lot score the so the likelihood of the the logarithm of the odd sorry so the in the likelihood model that there is a qtl the markers that we took are these black triangles and we have two replicates so basically the strongest qtl we get is on chromosome one here so here and it's also so here it's a case where it's a temperature sensitive qtl so we can also map the plasticity so plasticity is the difference in phenotype between the two temperature and we have a large peak here so once you have a peak like this you need it's still a large part of the chromosome so we continued by basically we confirmed that it existed by producing a strain like this which was orange so the n2 background in all chromosomes but then it was a blue background on this piece of chromosome one and so you can confirm like this taking different pieces of chromosome one that it actually has an effect on vulva induction so the orange background here the induction level is around 0.5 here you have two replicates the other parents is very high and here we explain we don't explain 100 percent of the difference between the parents but we explain a large part with just this little piece of the chromosome two now it was because the piece was still a substantial we continued by crossing a line like one of those and trying to to get more recommendations here so this is the typical experiment of recombination in the lab where you're taking two different parents so here it's the only orange and the orange with this little piece and you screen for recombinants that would have a marker here with a blue genotype and one with the orange genotype and so you're screening for recombinants with reverse genotypes so you're screening for animals which have undergone a recombination between your two markers and with this you get a lot of possible combinations and and you can score their phenote so you score their genotype along the chromosomes and for example if you take you know that one it's only blue up here and the genotype the the phenotype is that of N2 so it must be that the the causative nucleotide is not in this piece of the DNA this one for example has a high value of the phenotype so it must be somewhere in between here and so on so if you continue like this you can obtain a small interval and from this interval we had few candidates and so then the approach is to guess which one it is in this case we had a single mutation that caused an amino acid change and we actually confirmed it by several other means that this was the causative mutation again today what we could do is just CRISPR replace this nucleotide by the other one so here we found a causative mutation in a gene doesn't matter really what it is that explains the cryptic variation between these wild isolates in the signaling network underlying C elegans vulva patterning so if we come back to the question we had the other day is this evolution completely new neutral in terms of evolutionary dynamics or did it evolve under selection and so here we used the fact a fact that was at the beginning a bad news but ended up being actually usable is that this mutation turned out to have a reason in the history of laboratory cultivation of this strain not in my lab but much before before any lab started working on it so what we did was to ask okay if is this mutation actually adaptive in the lab environment compare to the other allele and so we looked in the vulva it doesn't change anything because irrespective of the allele you have you have a normal vulva but do you have other phenotypes that can change when this when you vary this allele and yes we found that basically it changes the number of sperm the animal produces and therefore the the brute size of the animal it reduces the brute size but it also thereby reduces the time it takes to reach a reproductive maturity so in C elegans because as I said we first have spermatogenesis and then oogenesis if you make more sperm you have more progeny but if you make less sperm you're actually becoming a mature and laying the first egg earlier so here you have a compromise between producing more progeny and reproducing with a shorter generation time so you can explain expect that there is a trade-off between the two phenomena and that you reach some kind of steady states in a given environmental situation the second thing this allele does is also to increase the rate of production once it has started so the rate of every ovulation so here we have this this is the maximum egg laying rate which which is increased by this allele yeah so finally what we did was to directly compete the allele of N2 so the wild reference allele which is actually not wild and which is the mutated allele with its wild counterpart and what you're seeing here is the proportion of the two alleles along an experiment where we transferred 25 times so that's about a month and you clearly have a decrease of the actual original allele showing that the allele of N2 is actually derived the derived allele is actually adaptive in the lab environment so in conclusion what we have here is a case where we have a genetic variation in this gene which has a cryptic effect on vulva formation so it means it doesn't have a an effect on the final product of on the self-made pattern of the vulva it has an effect on the level of signaling activity of the pathways and therefore on penetrance of a mutation that you introduce but it's difficult to explain so here it could have evolved in a totally neutral fashion with a neutral drift type of dynamics but the fact that it also has a non-cryptic effect an overt effect on other phenotypes which are closely related to fitness spam production and ovulation rate suggests that since we know the environment where this has been evolving it's just a you know one example so we cannot prove it but it suggests that we may have had a positive selection in this new environment for this and that has been driving this evolution in the vulva of course this is not the ideal situation so to actually address this type of question better what we should do is an experimental evolution setup which we can do with the elegance where we we try to put the elegance in a new environment and see whether at that adaptive mutation in these new environments may affect for example the vulva in a neutral fashion when you have directional selection the dynamics of fixation of the allele is much faster than if you have a neutral mutation so you're more likely to fix a maleal if it's under directional selection in another tissue and I just finished by thanking so the people the virus work was done by Tony and the this final work on a cryptic variation was done by Fabia