 Welcome back also if you're watching this on YouTube and Moodle. Let me switch the PowerPoint to the other one, which I should be able to do like this. All right so this is work in progress. I have written a paper which will probably submit soon and the title of the paper is linear mixed model multiple QTL time series mapping. So it's a long title but it's about how I use linear mixed models combined with multiple QTL mapping. So we'll just go through the slides and I will more or less explain what I did why I did it and why I think this is a good idea to do it like this. So first we will have a short introduction about linear mixed models. I will tell you about multiple QTL mapping, a little bit about the Berlin FET mouse because we've been talking about it a lot but I don't think that I actually showed you a picture of the mouse yet so I will have to do that. I will tell you about model selection. So how do you select which model is a better model than the other ones and that again goes into the AIC. Of course there's many different ways of comparing models not just the AIC but you also have the BIC and you can look at the log likelihood. And then I will show you how we dealt with litter size and litter number. So litter size is if you have a female mouse giving birth then the birth weight of a litter is more or less a constant number. So it doesn't really matter if you have five offspring or if you have seven or nine offspring. It just means that if you have seven offspring then every mouse being born is just a little bit smaller but more or less there is a constant factor there. The same thing holds for litter number. The first time that a mouse gives birth the litter tends to be a little bit smaller compared to the next one. So if it's the first time giving birth you generally end up with a slightly smaller litter compared to the second or the third litter from the same mouse. I will tell you a little bit about the results that we got and a little bit about the conclusion and discussion. Let me actually move this up a little bit because the logo is out of range and then I have to move this one as well a little bit. So I'm hoping that yeah another logo fits. Alright so a short introduction in linear mixed models right. So it's an extension to linear models like we just discussed allowing for a combination of fixed and random effects at testosterone. Okay that's just an internal jet thing. So fixed effects are model parameters that are fixed non-random quantities and we've seen a lot of them. Things like sex is of course a fixed effect and it makes generally a big difference. But the thing that you have to remember is this that if you are modeling something then including something as a fixed effect is always correct. It is a non-bias estimate for this parameter. A random effect is a model parameter which is considered as a random variable and this means that there is a hierarchy in this variable right. It's a grouping variable. It groups things together and the advantage of using random effects means that it is efficient as such random mixed effect models are good at dealing with things like repeated measurements. Because you can group the measurements and say these are all measurements of a single mouse so they belong together and they should not add to my statistical power. And there are a lot of things that you can say about linear mixed models and there's like a whole field of research behind when are you allowed to use a certain parameter as a fixed effect and when are you allowed to put it in as a random effect and that's something that like if you could you could do a whole PhD study about it. So as an example I just took this from Wikipedia. Imagine that you have a large elementary school from a single country so m then we have so m elementary schools. We have n pupils that are chosen randomly at each school and then we look at test scores so why so the test score is our predictor right because we want to know if some schools are better than other schools and then had the yij is the score of the jth pupil at the yth school. So of course when we do a random effects example then we are going to say that well the scores that we are going to get is something based on the overall mean because there's just a global mean and then we have u which is a school specific random effect and then we have an individual specific random effect right so m the mu is the average test score of the entire population. We have ui which is the school specific random effect the difference between the average score at school i and the average school in the entire country so it's the school effect relative to the average and then we have the individual specific random effect which is the ijth pupil score from an average of the ith school right because every individual score slightly higher or slightly lower than the school average. So fixed effects can capture differences in schools among different groups across different schools have so for example we can have things like the sex of the individual are you a male or a female which might influence your score we can have something like race like if you are white or if you're black or if you're Chinese and of course we can have things like the parent education level which also might have an effect on the score of an individual right so and these are all effects which are individual specific that's why they get the ij well the school only gets i right because i is the school effect and ij is an effect which is independent which is interested or ij is an effect which is measured on a single individual on a single school so we can extend our model saying that well the test score that you get is based on the global mean then we have a whole bunch of parameters in which we are interested because we want to know if guys or girls are scoring better at a certain test we can look at the effect of race or we can look at the effect of the education of the parents right so we have three betas here that we want to estimate so that's kind of how linear or how linear models works right so you do measurements and then you try to kind of assign the variance of the different measurements to the different components in your model so and when we talk about multiple Qtl mapping that's slightly different but normally multiple Qtl mapping is John just using standard linear models and so what is did I actually explain to you guys what Qtl mapping is I probably didn't write or did I do that yeah you probably didn't because I think that I did this presentation for a bunch of people that knew what Qtl mapping is yeah I think I think we did it in the bioinformatics course but we didn't do it here so let me very quickly very very quickly let me get my drawing board back and let's get it here as well and I want to have a pen and I'm just gonna very very quickly tell you guys what is Qtl mapping right so Qtl mapping is a method to find trade associations with the genome so what do we have well we generally have a population for example we have a population which is created by taking and I have to go here so we have for example our Berlin Fetmau so BFMI's right and then we cross our BFMI with a B6 right so now all the individuals coming out will be F1 individuals so each individual so if you look at the genome so for example chromosome one right then one of the chromosomes will be from the BFMI in white and the other one will come from a B6 right so every chromosome because you get two one from your father one from your mother so chromosome one will look like this so the individuals coming out are called heterozygots meaning that they have one chromosome which comes from BFMI and one chromosome which comes from B6 so let's use blue for B6 all the time right so now what are we going to do we're going to make a new cross in which we say well we take two of these individuals right so we take a heterozygot BFMI B6 and we cross it with a another heterozygot so I have to do it like this right so we take two of these mice and then we cross them together right so now what happens is when we cross this mouse we get a new mouse and this mouse has a genome but this genome is interesting right because what happens is you have recombinations in biology so what what happens when you cross a mouse with another mouse is that the genome will have a random cross over at a certain point when you cross these two mice of course this also happens when you do it here but if a BFMI allele crosses over with a BFMI allele you just get a BFMI allele and the same thing happens for B6 but now since every individual has two versions of chromosome one we get mice which look kind of like this oh that's that's bad I don't want to put this thing here so now when I look at chromosome one off of a certain mouse then chromosome one of this mouse will look more or less like this so some parts will be white and white right because it inherited this allele from the father this allele from the mother some parts of the genome will look like this so I get a part of the BFMI original founder and I get a little part of the where's my blue color there so I get a little part of the B6 of course it can be that the next part of the genome I am unlucky or lucky and I get two times an allele from the B6 and of course it could be the other way around as well where I get a color which is the opposite where I for example get a part of the genome from the BFMI on this side and I get a part of the B6 on the other side right so what we are kind of doing is we are mixing these two genomes and we can we can mix that so if we now look at a single genetic marker how does dominance then we will get to that that's when you have a whole population and you look at a phenotype right but this is just about the genetics right so it's just about what what part of the genome is inherited from what right so now we can have a certain genetic marker for example here right and at this marker we can see that this individual that switched to white again so here at this single marker this individual has the BFMI allele from the father and the BFMI allele from the mother right and if we would look at a different type of marker so a marker that is located here now we see that it has a BFMI allele from the father and a B6 allele from the mother and we call this a heterozygote so generally we just shorten this down to BFMI right so this marker is a BFMI marker because both alleles come from BFMI and here we have a marker for which one of the chromosomes comes from BFMI and the other chromosome comes from B6 we can look at this marker here right so if we have a marker here then here we see that now it's the opposite because we have a B6 and then we have an allele coming from BFMI and that now means that we are also heterozygote right so in total in these mice these mice can have three different states so they can be BFMI totally they can be heterozygote having a BFMI and a B6 allele or they can be having a pure B6 allele which happens here in the middle in the blue part right where both alleles come from so every genetic marker in each individual has three different states you can have BFMI state you can be heterozygote which is a mix of both parents or you can be B6 so now we don't do this just for one individual but we do this for 500 individuals at the same time right so 500 times we just cross these mice or we generate 500 offspring from these from these two mice and then every individual have these break points at different points in the genome right so at a certain point let me get a new slide so when we look at a population and I'm just now going to draw one side of the allele so if I have individual one oh I'm still learning how to use my pen thing so I will be better next next week so so we have for example individual one two three and four and we for example have marker one marker two and marker three right so now the first individual at the first marker can be BFMI completely right this individual happened to be heterozygous here the third individual also happened to be in this heterozygous and the fourth individual is a B6 right at marker two we can have this one being BFMI this one is now also BFMI and this one is still a heterozygote and this one also is BFMI right so when we do QTL mapping we measure all a phenotype of all of these individuals right so we have 500 individuals so we have also 500 measurements for example the weight and then what we do is at a single marker we now group these in three groups so we get a graph which looks like this so we have BFMI individuals we have heterozygous individuals and we have B6 individuals and now if we look across the population we can see a structure which looks like this right where we have measurements of the one we have measurements of the other and we have measurements of the B6 group and now what we see is we now see that there's an association right so we can just do linear regression and we see that depending on if you get the BFMI allele or if you get the B6 allele you're either a very fat mouse or you're a very very lean mouse and this is an additive effect right so this is additive right because every B6 allele decreases your phenotype by a certain amount of points right so for example minus five grams and here again minus five grams we can also have a different structure so a different structure would be for example an additive structure and an additive or it would be a dominant structure and that would mean that the picture would look something like this this is going to be really really shaky but BFMIs right and now we have the heterozygous and we have the B6s all right like this and now we have a dominant structure right because now we see that there's an effect of what allele you have but as soon as you have one B6 allele right because the heterozygous has a BFMI and the B6 allele but now if you have one B6 allele you are already lean the B6s are also lean but the individuals inheriting the BFMI allele become fat right so here we see an additive effect and here we see a dominance effect right so this is dominant and of course recessive is the inverse of dominance so here we can say that that the BFMI allele is the recessive allele that means that the B6 allele is the dominant allele and I'm only using one of them right because I'm writing BF which means BFMI from father BFMI from mother heterozygous means BFMI from father or mother B6 from the other one and B6 means two of these are deals is that clear so QTL mapping is nothing more than this and you don't do this for a single marker but you do this for all markers in the genome so you generally use like five six hundred markers which are across the genome and at every marker you do this test so you do this linear regression to see if there is an association between the marker that you are looking at and the phenotype that you have measured and that is called QTL mapping it's a very very short introduction um so let's switch back to our non-drawing board so what multiple QTL mapping does is that it treats so when you are looking at a certain marker for example marker 15 you are now including another marker from which you know that it has an effect into the model and for example we know that marker 60 has a big effect on the body weight of our mouse but when we are now looking at marker 15 then we include marker 60 into the model because we know that marker 60 has a big effect right so by using multiple QTL mapping we gain more power to detect genetic effects because we're not only looking at the individual marker but we're looking at the marker here but we're also compensating for the effect of a marker which might be on a different chromosome so there's a lot of reason why you want to use multiple QTL mapping like accounting for known genetic effects you have more power to detect other effects because you're already catching some of the variants and putting it on chromosome six and when you're mapping chromosome one you don't have to deal with the variants that you already assigned to chromosome six you can disentangle QTLs in close proximity which is not that interesting and you can deal with QTLs of opposite direction of effect right and multiple QTL mapping is again a model selection approach so you're looking at a model where you are saying that my phenotype is determined by the genotype at a certain marker including when we correct for other markers into the model right and QTL are detected using the best model so you first do a selection step saying that well I scan across the whole genome find which markers have an influence I build a model based on all of these markers that have an influence and then I'm going to add all at the first marker to the model and see if it has a name then the second then the third right so you're you're kind of continuously scanning across the genome trying to find regions of interest so how does this kind of look so here we see a multiple QTL mapping so here we have three different models that we are using right we have two markers which we know are significantly influencing our phenotype when I'm mapping in this region so when I'm mapping for example this genetic marker I have the model where I say well I put this marker into the model plus this marker plus the marker that I'm currently mapping when I'm in region two I'm saying well I'm taking this marker here and only including marker number three because I'm very close to this marker so I'm not going to include it so you have three different models here so model one is include the marker at the top marker at two and the top marker at three here we are including only the top marker at three and here we are including the top marker at two so it's a very complex situation but the the the multiple QTL mapping is just a way to get more genetic variants explained by looking at the current marker that we want to associate with the trade but first compensating for the other markers that are in our trade I really hope that the next slide is easy Ha! Good! So let's do this slide in German so here you can see the very famous Berlin fat mouse inbreed linea so it's a model organism for polygenic obesity that means that this mouse has not one genetic location that determines the fat mass of the mouse but there are different different places in the world where the fatness of the mouse is modelled the mouse shows a five-fold increase in fat percentage of the fat percentage in the mouse so when you look at the percentage of a b6 mouse for example it's two percent but in a berlin fat mouse it's ten percent so five times higher these mice are created through a long time selection for high fat mass but the mouse is interesting because it also shows different symptoms of the metabolic syndrome so the metabolic syndrome is not only one phenotype not only fat leibigkeit but it also means insulin resistance and also other phenotypes associated with diabetes type 2 so what you see here is the berlin fat mouse the big mouse here and the small mouse that is the standard b6 mouse that is the reference mouse that everyone uses around the world here you see a pie chart and here in the pie chart you see about how much variance is explained by the different factors so you see that the sub family for example explains a very high part of the variance you also see that the marker so the genetic marker on chromosome 3 explains a very big part of the variance explains for example saison so the the year time does not have a very big influence on the fat leibigkeit of the berlin fat mouse but when you count everything on each other then there is about 30 percent of the variance in the fat leibigkeit which is not explained by known factors and of course what we want is that we can now not explain this 30 percent think that went pretty well so it's a model for polygenic obesity meaning that there's just not one genetic locus but multiple genetic loci there's a five-fold increase in fat percentage it has been long term selected for high fat mass and it has several features of the metabolic syndrome so it's not just fat it also has like an increase in liver triglycerides but it also has things like insulin sense it's not as sensitive to insulin so it's a very good model for people who have type 2 diabetes there is actually a lot of variance which we can already assign but if we assign all of the variance that we know by by now we see that there's also a big part of unexplained variance in in the model so we still see that 30 percent of the fatness of this mouse cannot be explained by general modeling so that is what this whole presentation is about is to find if we can explain slightly more of this unexplained variance in the mouse so what do we have so our material methods are 344 individuals who are in generation 28 so I just showed you like an example where we have an f1 and an f2 these individuals are f28 so there have been 28 generations of mixing the genomes of these mice meaning that we have very very small regions of the genome and so there's a lot of recombinations that have occurred in these 28 in these 28 generation in total we have like 18 000 genetic markers so the whole genome of this mouse which has 20 chromosomes is is more or less tagged at around 18 000 markers right so the genetic matrix so the very small matrix that I just drew is in this case we have 344 individuals and we have measured them at 18 000 positions across the genome and for each individual we know where this allele came from so if this allele came from the berlin fat mouse or if it came from the b6 mouse or if this mouse has two alleles at this position so the bfmi and the b6 one we have time series data on body weight so every every week we measure body weight starting from three weeks on so at three weeks so at day 21 we measure the mouse for the first time and then we measure them up to 70 days and every week we have a body weight measurement for this mouse all right so now we have to do model selection right so model selection during this whole analysis we did using the AKI information criterion which is a we talked about this right we have talked about it last last week and today we also talked about it but it is a model selection is the task of selecting a statistical model from a set of candidate models and the AIC can help us to assess which model is the better model relative to the other one right and here we have to remember that the lower the AIC the better our model is based on the observed data all right so in the first case we have to do a selection because we i told you guys that we have litter size which is the number of individuals in a single litter of mice and then we have the litter number right so this is the nth litter of a female because a single female can of course not have one litter it can have like three litters but the first litter generally tends to be the smallest one and then the other litters tend to be very similar after the first one so we can encode this litter number in two different ways right we can say litter a which is the first one litter b the second one etc but then we use five different levels because the one that the the most litters that an animal in our study had was five so then we have to correct for five different levels however we can since we know that there is an effect of the first litter and that the rest of the letters are very similar we can also code it as being the first litter so f versus n not the first litter right so instead of using five different levels and estimating five or four different effects relative to the standard so one litter we're now just going to say no we're just going to estimate a single beta and this beta is going to tell us what the difference is between the first litter and then the following letters besides that we then define something which is called litter type which is a combination of the litter size and the litter number so we can have for example LT5 so this is the litter using the five types and we can have LT2 right so we can say that A8 means this is the first litter and it had eight offspring B10 means it's the second litter and it had 10 offspring C12 means this is the third litter and it had 12 offspring we can also code it differently right so we can say F8 so this is the first litter eight offspring and 10 so not the first litter 10 offspring we can then say first litter 10 offspring not first litter 12 offspring right so these are different ways of coding in so when we look then we have to do model selection so the first thing that i'm going to do is say i'm going to define my null model right so i'm going to include a random effect for each individual because we have measured individual multiple times right because we have many different time points and every individual is measured at every time point so then here we see P so our response variable is called body weight and then we have F which is the ID of the father and of course now we want to look and see which of our codings is the best right so do we want to so we have all of the different models where we say that we have P equals F P equals father plus litter type coded as N2 father coded as litter type N5 or we have so we have different ways of writing down the same information and then if we do this we can compare all of these models against each other and then what we do is we sum the AIC of each of the models for so for the first model versus all the other models and of course we don't care about M0 but we then get an AIC drop and the model which shows the biggest AIC drop is the way that we want to encode litter type into our model right so in the end we learned that the best model is using M2 underscore LT2 which is M2 LT2 that means that we write the model as being the phenotype so the the weight of the mouse is determined by the father so who is your father plus the litter coded as N5 plus the litter type 2 so litter type 2 is this one so is F first versus not like so LT2 so the way here so writing it down like this gives us the best statistical power and gives us the most accurate model in determining what the influence is of the litter size and the litter number but we combine them in a single variable so that we don't have to fit two different variables with different levels so that's just the way that we did the model selection here so this is our best way of writing it down and then the second best way is ML2 so M2L2 so M2L2 which means that you take this structure so then we have litter type in there twice right so we take one variable for the litter number and then we take one variable for the litter size so keeping them separate the next thing that we want to do is model our growth curves right because we have growth curves so we want to model these growth curves and then we do stepwise model selection and an AIC drop of more than 10 is considered a model improvement so we start off with the model that we just found so we say that the fatness of our mouse is determined by who is your father plus LT2 and we include a random effect for individuals so this is our first model that we are going to use and then the next model is is that we're going to say well now i'm going to add season into the model and like we saw before season doesn't have a big effect and we can also see that because when we include season into the model we see that the AIC actually doesn't drop the AIC goes up meaning that season should not be a fixed effect in our model so we then do the next model saying that we include time right which is logical because we have a time series model so we want to include time into our model and when we do that we see that when we include time in our model as a fixed effect then we see that the AIC drops around 4,700 points so of course time should be included as a fixed effect then the next thing that we're going to test is to see if time should be included also as a random slope right i told you that every individual has its own intercept meaning that the birth weight of each individual is unique but of course every individual can also have its own kind of growth curve right you can imagine that males might grow grow quicker than females across time as a linear component so when we include time as a random slope into our model we see indeed a massive drop in AIC meaning that time should indeed be included as a random slope effect we then say because growth curves tend to not be an exact linear line right so we also might want to include time as a square effect right so the next model that we're going to try is say well your your fatness is determined by your father plus your litter type plus the time plus time to the power of two and we also allow time to be a random effect as well as the fact that we have repeated measurements for every individual so that every individual has its own intercept and time to the power of two should indeed be included as a fixed effect we do the same thing for time to the power of three and that should be a fixed effect as well and then when we include time to the power of four we see that the model doesn't improve anymore so that means that with time plus time to the power of two plus time to the power of three we have our best explaining model and of course the next step is to include the main main locus right because we have this one region in the genome which explains a massive amount of the variance so we also have to put that into the model so we put the marker into the model and we also include the marker by time effect so an interaction between the marker and time right so at the start of your birth you have this marker has a certain effect on your weight so if you get this marker from the BFMI you tend to be more fat than the other mice but if you have this marker then this marker is also allowed to modulate based on the time curve and it might be that when you start off bigger your grow actually slightly slightly shorter so this is our top marker and it should also be included and it should be included with time as an interaction as well so now when we want to visualize these things right so the first model right is the first model that we're looking at is just the basic model y equals mean right so here we see all of our data points so here we see all of our 300 something mice at each time point and what we are looking at is just zero uh so zero days after we started weighing so this is three weeks four weeks five weeks six weeks seven and we see indeed that the the mice are starting to grow and we see that every time that they grow the variance also becomes bigger right so there seems to be some heteroscladacity in our in our measurements so the first model that we have is p equals mu plus the repeated measure of the individual and then of course the mean is estimated here so they see the line for the mean right so we estimate the global mean so we're just going to talk to all of these different models that we had right so the next thing that we want to do is include time in the model right so here we see the model where we say estimate the average and then include time as a component so now we see that across time animals tend to get bigger which is logical right and we already saw from the model selection that time should be included into the model and so we go from a from a model where we have individuals just being predicted by the mean and of course having massive residuals not a very good fitting model and we when we include time into the model we see that time starts now starts fitting the data points better than the previous model we also saw that we have to include time as a squared parameter as well so and then the model starts looking like this right and we already see that now the predicted body weight of the average mouse seems to fit already quite well to the distribution right just using the model where we say that your your body weight is determined by the mean body weight across the whole experiment plus time plus time to the power of two and of course i'm not showing you the random effects because then you would have 300 something lines in the plot and you wouldn't see really what's going on because each mouse is allowed to have its own intercept right and also its own personal growth curve so its own personal like time curve so again like this model fits already relatively well so now what we are going to do we're going to include time to the power of three and that will add a little bit more right so you can see that the difference is not that big between the two models but it it catches some of the variance and this is because growth curves tends to tend to kind of follow an s shape right you you grow a lot in the beginning and then when you get older you tend to not grow as much anymore right so based on these two you can see that at this model it doesn't improve that much but it improves significantly right so it's it's a relatively minor change then the next thing what we are going to do of course is include this main locus that we have into our model and then we start first things we are going to include the different families right i'm i told you that father has a massive effect so now each father is a different line so you see the line here and now for each of the father we allow every father to have an effect of an on its offspring and then it happens or then it shows you like this so there are different families and each of these lines represent the the the growth curve of animals within this one family and this reduces variance even more all right so then the next thing what we are going to do is now include our top marker into the model and when we again not the top marker i'm so sorry um oh the litter type so we go from having a model like this where we have the family in then we also include the litter type into the model so when we include litter type into the model um we see that the the model starts fitting a lot better and we see that almost all variants at three weeks is more or less explained by this model but we see that there's still a lot of unexplained variants when we look at the later time points so what do we do next next we include the top marker so the jobs one locus so this this region on chromosome three which we know is influencing the body weight on the mouse depending if you got it from the b6n or if you got it from the bfmi right so the bfmi if you had at this position on chromosome three you inherited the loci from the bfmi then you are more or less determined by the by the orange lines and if you are b6n or heterozygous then these are the lines and you can see now that going from this model which does not include the top marker to the model which includes the top marker we see a really good increase or a really nice um um fit of the model so the model starts fitting better and better and better to the data that we have observed and of course i'm still not showing you the random effects because the random effects would just add a whole bunch of lines so for each mouse you would have a have a unique line which is not that interesting so now what do we do we now start scanning around the genome right because we now have our minimal model right these things need to be included before we can start evaluating a marker on chromosome one so that is what we're going to do next so now what we are going to do is we're going to take this whole big model and then say plus snip one plus marker two plus marker three plus marker four and that every time we are going to compare this model which has the current marker in there so marker two for example to the model which does not have marker two in there so now we're just going to do a new model like 19 000 times because we have 19 000 markers across the genome and for each of these markers we're just going to compare the new model that we create to the old model and see what the increase what the probability is of inputting this marker into the model then if we look at for example chromosome one so we add the marker under consideration to the model so here we each of these dots here is a marker and at each marker we look to see what the improvement is in the model compared to the standard model and we express this as a lot score so a lot score is the minus log 10 of the p value right that means that a that a score of like eight here means that this is one times 10 to the minus eight four means one times 10 to the minus four and we see here for example one times 10 to the minus one so 0.1 we also have two lines in this plot and these are the significant lines so if we look at the orange line the orange line is 0.05 and the green line is 0.01 so if if the if the score of the marker is way above the green line it means that this locus here has an influence besides the job's locus so it's not just the locus on chromosome three but there's also one region on chromosome one which in this model seems to have a direct effect on the body weight of our mice so we scan all of the chromosome and then we detect five additional ptl in our mice and what we see is we see this standard job's locus that's located on chromosome three it starts here it ends here so it's a very small region only like 300 kb and this is the top marker that we include it and you can see here for example all of the effects relative to the b6n and we also see the number of alleles that we have so we see the amount of like animals that got bfmi bfmi we see the amount of animals that got bfmi b6 and we see the amount of animals we got b6 b6 right and then here we see not only the main direct effect but we should also be growth effect over time in grams right so if you are heterozygous you start off being like 0.8 grams bigger compared to a b6 or an animal which has a b6 at this locus and if you are bfmi you're 0.4 grams bigger than a mouse which has a b6 locus but here what we see is that when when we look at the growth curve we see that the increase for the heterozygous per day it means that these animals are not growing as fast as a b6n animal so this locus although the the effect is is positive for bfmi and the heterozygous it the the effect on the curve is actually a negative effect so the it it it slows down your growth compared to a fully b6n animal at this position so five new qtls detected that's really really good because we haven't found any new qtls in these animals for a long time so there's head this is this massive job's one locus which you see here and then when we do the lmm mapping we see that this is the standard curve and so in in the in the when we do not correct for this massive locus here then we would get the dotted line but when we do this new mapping method where we say that we we use the linear mixed model combined with multiple qtl mapping you see that something strange happens and you see that this big effect is still there but you see now that here another effect is there which used to be hidden in the massive effect of this job's one locus because this job's one locus is associated so it shows like a big region but if we compensate for this locus we see that a new region occurs here and a new region occurs here at the beginning so have originally they did not seem to be associated or they seem to be associated here they did not seem to be associated this locus was associated but not but was kind of swallowed up in this massive effect of this chromosome 3 of this job's qtl so now we want to look what what's actually underneath this peak right and then here we zoom in so we zoom in here we see the association again so we see that it's associated above the orange line so it is a significant effect and what we now see is here on the genome I've plotted the different genes and what we see is that in the middle of this qtl more or less we see a massive dip in the association and this massive dip in the association is because one of the genotype groups is completely missing there is no animal here which is found which is actually a b6 b6 individual so for some reason being b6 b6 at this locus you can't because you you probably die prematurely so you never get born or there's some other effect going on here but this kind of drop shows us what actually the causal gene is we can now conclude that this foxo one gene here is the gene which is driving the effect of this of this of this region on the genome so it's a very nice way of of looking at growth curves modeling growth curves and then looking at each marker in the genome and seeing if they are actually still associated so we see that there's an association but we see that this association is gone in this region because the effect is a dominant effect and this dominant effect is not here because the b6 group which is carrying the effect relative to the other two groups there are no mice which are there are no mice which have this locus homozygous b6n so foxo one is a well-known regulator of insulin so it makes a lot of sense that this gene has an influence in our fat mice compared to the lean mice and when we look at it so here we see the whole insulin signaling pathway from keg we actually see that this foxo one gene is more or less a central regulator of insulin when you have the insulin receptor pathway all the way down to starting up glycolysis or lipogenesis and in the other regions that we that we defined we also find genes which all belong to this pathway so that makes makes us very confident that this method is really able to very accurately pinpoint which genes are causing the obesity in our berlin fat mouse or which are the loci which are protecting the b6 from becoming fat so lmm time series mapping is much much more sensitive than just using lmm mapping so had the the addition of mqm to the lmm part is is is is much more sensibly and that comes because it uses all the available data and it allows you to correct for known large genetic effect we detect five novel qtls within this nr3 region we observe segregation distortion which occurs exactly on top of this foxo one gene and many genes from the insulin pathway are located underneath these newly identified qtls making us think that this obesity is driven by mutations in genes in the insulin pathway all right so that was it for today so this is the way that i use more or less this modeling right so we start off with a very basic model just saying that the fat phenotype is determined by the overall mean then we add a factor time which of course has an influence when you look at a growth curve we model we say time to the power of two time to the power of three then we add family into the model we add the litter type into the model and then we add our main top marker into the model and at each step you see that our prediction increases with enough certainty to make sure that these models are correct and this is more or less how you model one of these very complex phenotypes have measured across like seven different time points using a lot of mice using a lot of genetic information so you just start small and then build up one by one include things that you think might have an effect and test and that's why we have this little graph here where we say that well we add season into the model and the model doesn't become better so we don't include it after that time should be time to the power of two should be and and we we also swap from using only individual as a random intercept so every individual is allowed to have its own birth weight to saying that no every individual is allowed to have its own birth weight but also is allowed to have its own individual growth curve so its own slope model all right i hope that that's interesting so during the the first hour we talked about the assignments the second hour was more or less showing you how you kind of build up one of these models using a example data set or using the tutorial by Bodo Winter and here i show you how i applied the tutorial to the data that we collect here in our own group and how we apply linear mixed models to our own research to do stuff which we couldn't do before so i hope you guys enjoyed it i've been talking almost for three hours and it's my holiday holiday holiday so i'm off for a week we will have a lecture next week of course i will do that from holland so i will probably be sitting somewhere in my underwear um you won't see that of course i could be sitting in my underwear now and you wouldn't even know um anyway i will i i'd like to thank you for listening to me being here i think everyone for the different remarks and questions um and uh you can pack more than underwear i know i know i know but why should i it's a holiday have a lecker holiday yeah thank you guys thank you so much if there's any questions then feel free to ask them