 So thanks a lot, Patrice, for a nice introduction and it's a pleasure for me to be here for the second edition at least of this very popular course. I will talk to you about two different topics. And I made a mistake and I included only the topic of the first topic in the, in the title, but maybe it was a bit sexier than the other one so Yeah, I don't mind. I will first talk about how we can use a whole Bayesian framework in general and priors to improve association results in genome white association scans. Since I guess the audience is, maybe let's just check who is awake in the audience, can you raise your hands or make any virtual sign that if you know what genome white association studies are. Very good so it looks a little bit more than half excellent. And so I will then not maybe spend too much time on this but But but I will still there's some people who don't know so I will I will explain a little bit of details of how genome white association scans are done and how we can improve them by using a relatively smart Bayesian priors. And then we could have afterwards questions related to this topic and then I move on to another topic which is related to that. Also, just and how we can use G was for causal inference and how we can also improve causal inference by using a Bayesian framework. And as Patricia said, please feel free to stop me we are not that money. So, just anytime you don't get something, or you have a question, or a comment, please interrupt either directly talking the microphone or raise your hand. Probably if you raise your hand I see that better, or type in a check, any question. Okay, so I will, as I mentioned, we'll talk about genome white association studies basics, and then on how we can improve them by building priors. And finally, I will basically show all these through an example of how we can tease out more of longevity genetics basically predicting how long you're going to live based on your genotypes. So, standard genome white association models are looking at a phenotype. In this example, can you see my mouse. Moving. Yes, we can. Thanks. So you have a phenotype. Why in this case imagine. I often use this example body mass index. So in obesity measure, which ranges between 2025 is a normal range. And these are data from a local study in Lausanne. And these are actual PMI values of people in the court in the, and the G represents a genotype factor. So, you don't necessarily need to know the details it's basically it's measuring certain whether somebody carries genetic variants or not and in how many copies essentially. If you have zero, one or two copies of certain allele. We can have two copies of course because you have one matter and one paternal copy. So for example, you look at the very long DNA sequence in there. You are asking at this very position how many TLA as you carry, if you carry zero TLA is the value will be zero if you carry one, you got TLA lonely from one parent then it will be one. If you have to it means that you got a TLA from both parents. So we have the genetic data as the phenotypic clinical data and we are fitting a linear regression model for continuous outcomes where we can include covariates as you can see here, such as age or sex diet physical activity and so on these variables contribute to the outcome. Smart thing to include as long as you know that these have probably a causal effect on the outcome and not a consequence. And, but these are just new sense for us because these are well known from from much larger epidemiological studies that have been conducted in past decades. And what we're really interested in is whether which part of the genome which genetic variations which genetic markers are associated with the trait and the regression slope or the coefficient is in our field called the effect size so it's basically just the size of the strength of the effect. So if you carry an extra TLA, how much it changes your BMI by how many units, so if this beta is half, then it increases BMI by half a unit. If it's minus one it means that having an extra TLA decreases BMI by one unit. You can also include random effects. Of course this is the general random error, which you're all very familiar with. But we can also include family memberships or people who share the same geographic region or district or they went to the same school and it can be included here as a random effect where you define the structure very often in genetic association studies we include a random effect which is has the variance covariance matrix of the kinship. So basically, this is a variable that reflects family relationships and of course you don't want to discover associations which are just based on sharing family, because sharing family it also means sharing environments we want to regress those out as well. So these are the four main components of the model, where one is a very simple error term with the diagonal or covariance structure and here this can be a more complex structure covariance variance covariance structure. And we have the two fixed effects. And the really interesting thing for us is this effect size and what I will tell you about is how we can improve the estimation of this effect size in gene-wide association studies. And of course, as you know from linear regression models the most important parameter, at least for us, we get is this effect size, and we get the standard error estimate, and then we can also get the p-value from a linear regression. And if the p-value is significant, then we are interested in the effect size, if the p-value is insignificant of course we get an effect size estimate which is very broad and includes to zero. And probably that's less meaningful, but still it can inform you about the range of what kind of effects can be plausible given your data. Okay, so here is just a very simple example where what we do is look at the participants of a local Lausanne study and we group individuals according to how many TL is they carry at a particular position. The position is defined by this RS number, this is an identifier, unique identifier of single nucleotide polymorphisms, but just really imagine that you can always split the population. If you look at a particular genetic marker, most often into three groups and those who carry zero one or two TL is in this case. And what I show in the y-axis on this plot is that different groups actually have different BMI distributions. You might even think that they have even slightly different variances, or what's more interesting for us in general is that they have different mean values, and that's exactly when there's an association if these mean values are significant different. In this case, in this study, in the sample of about 6000 people, the estimate was around 0.7 units, which means it's every additional ALE increases your weight by about two kilograms, which when I translate the BMI units to kilograms so it's a bit more digestible. Okay, so that's what we are after. So this is actually real association. This is a snap close to the gene FDO. Actually, it's not impacting FDO, but it impacts another gene nearby, but it's a minor detail. So these are typical associations what we're looking for, but if you look how large are standard errors, then you can see that actually we can't reject an hypothesis that there is actually no effect whatsoever. So because of this we need enormous sample sizes. So the goal is that how can we tease out more associations, how we can find more genetic associations is if you boost of course the sample size, you will get stronger P values if the association is real, and then that helps your discovery so it includes increased statistical power. Of course we can change the size of the effects those are given. And what you can change of course if you have a cohort which has more environmental noise than the residual error variances is decreased and that's also increased power. So what it proposes that and many others have thought about it is that you can use other related traits to also improve power and that's exactly where the Bayesian priors will be coming from using other related traits to boost the association strength of a focal trait, such as BMI. So this is what I have a question. Yes, go ahead. Thank you for your question. Can you maybe go back to the previous slide. And so here, the P value is 0.08. And in the way you talk about it you just said, Okay, there is no, it's not significant as, as if, as if it was totally clear and unquestionable that 0.08 is totally clear and unquestionable. And of course we all know the traditional P value cutoff is 0.05. But most of us also know that this is kind of an arbitrary, arbitrarily set cutoff. So I was just wondering, I mean, I understand that in a presentation like that you need to keep things simple and we're not going to the details of how we can go. But I would find it really interesting in your work, in your lab, how you, how you deal with this. When you see something like that, okay, P equals 0.08, it's, I mean, it's not that large either, then also question, is this nominal P value or in this kind of this is nominal. What the adjusted P value, do they play a role or not? Just a little bit more of context of P value and what you can do significant and why these thresholds. Okay, I've got 50 slides on multiple testing correction, maybe next time. But basically here the 0.08 makes us unexcited because we are testing a million markers in the genome. So if you test million markers, you will get about 50,000 of them which will be of a zero P value of below 0.05, even under the null when nothing is associated. So if we see something 0.08, the problem with it is that of course you can get excited about it, but then we would get excited about the many tens of thousands of SNPs. And of course there is no lab on earth which would follow up that okay now let's crisper this one and then let's see what happens in zebrafish or whatever. These, that's why we have to be very selective and it's always a trade of between, if you want to control type on error rate, which we very obsessed about, which we basically don't want to just give away signals that are, we don't have too much confidence and there is a very high chance it's not going to be replicating. On the other hand, of course, that hinders us to discover a lot. So, of course, if you make the people threshold milder, you will have more sort of plain discoveries or potential discoveries. There's always suggestively significant and significant and all these kind of terminologies, which show that the 0.05 is absolutely arbitrary. And indeed, you can play with these thresholds you can play with different controls. So you can control family visor rate then you'll be very, very stringent since we do many, many tests. You can do force discovery rate control which is much milder and that's much more interesting for such efficient experiment as G was the same thing happens if you just run for simple gene expression levels and you associate the trade and then you see, let's say people with cancer without cancer was a differential gene expression. And again, you will get them 20,000 different values. And then where do you draw threshold. The idea is that if you propose whatever you drew the threshold and then you propose a set of genes or in this case, by genetic markers, then what fraction of them you expect to be true. And, and that's what we default discovery rate we can really exactly say so if I use this threshold, then I expect probably 97% of my proposed markers to be false. So people wouldn't really be very enthusiastic about taking them to do the full up experiments, but but it's a very important point that here I just made it arbitrarily to show you how horrible it is that a million Swiss Frank spent on a cohort. And you can't even discover this, the most strongly associated snipped with BMI in a way, which, which is convincing so this snippet self in the genome it would rank, not even in the in the top 5%. So, so, but this is just to show that we need some smarter methods to boost these associations and to better prioritize the very variants that are associated with a given trade. So this snippet self indeed, it's a single variant it's very frequent in population. So it's more than 30% frequent and it explains about a third of a percent of BMI variation. So it's still very very small, but still people who are born with two TLA versus two ALU will give you about four kilo weight difference, which is, of course, unfair. If you think about it that at birth it's given. Thank you. Are there other questions. So this is this what happens when we instead of 6000 people will look at 100,000 130,000 people. Finally we get some association so this is a manhead template. This is really a nice and very simple summary of when you run genome wise you see here chromosomes from one to 22. We very rarely look at other chromosomes to ignorance to be honest, and then each chromosome there are of course there's different physical position of the different genetic markers and the y axis position of these genetic markers tells you how strongly they are linked to a trade. It is the minus locked in people of the association but you can also think about it is really how much variance they explain that the higher the tower the higher the point is the more variance they explain the trade. And this was exactly learners point is that since we do a million tests. There's the magical 0.05 threshold but people are using and very stringently controlling from your visor error rate at 5% meaning one of them. The means to do this is is born from a correction where you take this threshold divide by the number of tests, which is about a million, and then you get the threshold of five times 10 to minus eight, which is the minus locked on of it is about 7.3 and that's where you see this black line so anything which is that you see many many gray points below this line. That's where we are not sure what's happening there but we can't claim this discovery isn't anything which is above this line. We are pretty confident and actually later studies looked at these and all 32 when we look now at 3 million people. And these are all super strongly replicating and it's a very, very, very strong all these results. So basically the probability of even one of them is wrong is less than 5%. So that's, that's quite convincing. But as you can see, it's quite ridiculous that we test the million and we only find 30 and, and the 30 cumulatively, not even explain in more than about two 3% of them are definitive variance. So, it's, it's a lot of fuss for not much maybe seemingly. So it will be very nice to try to boost and discover more without compromising type one error. So what we did so far is we just do one snip at a time test. But what we can do is try to estimate the effect of all snips simultaneously. So if you have the phenotype, again imagine you can think about BMI here, and this G will be a matrix. It will have as many roses and many individuals you have in the court and as many as many columns as many snips you have, or genetic markers you have so it's typically about a million column and several hundreds of thousands of rows. And these alphas is a factor. So it's as long as many markers you have and it's an estimator is the effect size of what we want to estimate is the effect size for each snip. Now the difference is pretty to them to my previous slides that I showed you the equations is that now here we put all snips at the same time. Of course, you can just throw this all in our because it will just blow up. It won't have enough memory for such such a big manipulations we can't really easily do and the second reason for it is that we have actually more variables than individuals. So this we cannot actually estimate very reliably. What we can very often do just to simplify calculations that we standardize the phenotype that it has zero mean and unit variance. And you also standardize genotypes that they have zero mean and unit variance, and plus we have of course the error term. But now the difference here is that this is basically the, we try to get the best linear unbiased estimator, essentially. So if we assume instead of trying to really estimate every single parameter, we can ask something different. We can put the prior distribution on this on this alpha effects. So here we say that these alpha effects, the snip effects are coming from a normal distribution with the mean zero and some barriers. Very biologically justifiable because actually when we see real data and then try to estimate these alpha effects. At least we can estimate the tail of this distribution and that looks pretty much normal. And as we get bigger and bigger sample sizes we can more and more accurately estimate really these effects in a fixed effect model. And it all looks pretty much either normal or mixture of your normal distributions. Very rarely more than two normals to be honest. So, so the advantage this is that we have an assumption distribution assumption of this parameter. And instead of not trying to estimate each individual parameter value which is would be afterwards you could estimate the posterior effect of these values. So here actually what we can use this for is we could use something which is called the empirical base which we are often using in human genetics is where you, your main aim is not that you, you put in a prior for this alpha with a fixed value of the variance, but actually let the data tell us what is the best value of variance that describes data best. So, if you look at the outcome variance covariance matrix. Then it can be written in a form which after some algebra, which I will obviously skip here for the interest of time. It decomposes into two components one is the genetic kinship so it tells you how genetically closely related to individuals are multiplied by the heritability of the trade to the heritability is the total contribution to the variance of the genetic predictor, divided by the variance of the outcome, which was set to one so it's basically just the variance of the genetic predictor. And the second component is the remaining unexplained variance, so you can see here that really this term corresponds to the variance covariance matrix of the epsilon, which is the unexplained variance with a with a diagonal structure, the covariance covering structure. So it decomposes into these two matrices and there's a single parameter we want to estimate the kinship can be estimated from the data because we have the genetic data so we can calculate the kinship between any two pairs of individuals. And it's a fairly simple thing to do, even if the sample size a few thousand you can estimate this heritability up to, for example, if you have five dozen sample it's roughly standard error will be about 5%. And you can do it in various different ways, either you do it as a likelihood function maximization, where you have a single parameter here you observe the phenotype you observe the genetic similarity. And this is the, you want to maximize the likelihood function with respect to this heritability parameter. And you can also use husband as the regression which is, I always like to teach it, because it's much more intuitive. Because it basically looks at how different to individuals phenotypically. And so that that's maybe a phenotypic dissimilarity, and you can regress it on the kinship itself. And sorry there's a minus two times dismissing from the slide but the point is that how phenotypic similarity relates to genetic similarity is basically you can do it with the regression and the regression slope of course it's related to heritability if the heritability is lower than they are not related at all so the slope should be zero. If they are super strongly heritable, then they're this, the genetic kinships should be a good predictor of the phenotypic similarity. Imagine the extreme case when the trade is 100% heritable then knowing your genotypes, you should be able to know the phenotypes. So if you have a perfect genetic similarity you have an identical genetic profile, then you should have an identical phenotypic value. So this is where something where we can use this empirical base approach to estimate heritability of trades, which is very useful, or even 30,000 samples for BMI gives some heritability of 0.2%, sorry, 20%, 21%, and for another trade is about basic ratio adjusted for BMI which is a body shape measure. It's about 10%. These estimates, we shouldn't really doubt the people who are really obsessed with heritability and that there is one heritability value. The heritability changes with age, it changes with sex, it changes with social context status, it changes with the year of birth of a cohort you're looking at. So it changes by geographic regions. So this is not just one single value, but it can be highly variable. Okay, so now let's get back to the actual GWAS estimates instead of globally estimating how much they contribute in total and let's try to build some priors. How could we inform a trade? And why we come across what motivated us to do this work was when we looked at genetic association scans on life expectancy, and of course how long you're going to live, you might argue that it's basically just down to the environment, what you do, how you live, whether you smoke or not, whether you drink or not, and how much physical activity you do, how much coffee drink and so on and so forth. Of course these are major contributing factors. To instill these suggestions that still about 20 to 30% might be down to genetic factors, most probably the narrow sense of really the additive heritability captured by common genetic markers is more than about 10%. It's also an interesting trade because it's also very much under assertive mating, which also inflates genetic heritability estimates, because what's weird is that you tend to choose partners which tend to live the same age as you. And of course, at the moment when you choose someone you don't know how long that person will live but still just the fact that the time you spend together, you share a lot of environment it will make you much more similar in terms of environmental factors that have a pretty specific life expectancy, but even your correlated even in terms of the genetics of life expectancy, which of course you can't know how it happens, but I will shed light why it really happens in a few slides. Okay, it has some heritability, certainly less than obesity or height or diabetes, but still it's decent. And genetic studies usually focused on extreme case control, so they looked at people who lived extremely long over 90 years and people were hoped to find that there would be some magic genetic factors that make these people live extremely long. Often they show many of these extreme long live people who are smoking and drinking very heavily and despite this that they have some solid genes and people were after this. Very quickly what they identified, which is very robust gene is this apo E, which is even multiple different independent genetic variants that contribute to lifespan, and I will explain to you how very soon. And there have been other couple of other proposed genes and most of which have really replicated. So when we looked at UK Biobank data and the early release was only about 120,000 people. The first problem was that everybody was alive. So of course it's good for them, but it's bad for the scientists who want to do life, lifespan genetics. And but luckily they asked the participants how long their parents lived. So we can do something which is called like proxy G was, which means that instead of using your phenotype use the parents phenotype. Actually, I don't like this explanation I like much better to say that we use your genotype as a proxy for your parental genotypes. And then you are associating everything in the parents because that's where we have the phenotype, but we don't have the parental genotypes and then we use your own genotype as a proxy. And that because you have two parents, so you can have two different association scans, you can run one scan for your mothers, and in UK Biobank, since the participants are between 40 70 years of age. The parents are the fathers at least three quarters of fathers by now are dead and more than half and between two third of the mothers are also dead. So we have very accurate estimate of really lifespan of the participants parents. So the, the aim was, okay, it's good if you have some genetic markers that we can identify new ones, and also test the old ones, and to see what, what kind of mechanisms through which the effects are exerted through. And then we did some also follow up study in mice. Okay, so now about the priors, how can we get good priors. So if I want to estimate the sniff effect on lifespan. I can use other trades that helped me. So I know, for example, that if a snake predisposes poses me to, to heart attack could be a risk factor. I'm much more likely to die earlier. So that's a risk factor for early lifespan for short lifespan. And I can use that she was studied so that associations can I can use because if I know the sniff effect on this risk factors such as heart attack, or even type of diabetes shortening lifespan. We have generally coronary art disease shortest lifespan. If you have high HD levels, lipid levels that also high cholesterol levels that also shorten lifespan and so on and so forth. So there are a couple of potential risk factors here which we know very well and many studies have been done. So those studies already provide as estimates of these the effects of those Snaps on the risk factors. So we have, we could use external studies to estimate that estimated or the effect of certain risk factors and disease on lifespan. That would give us this causal effect of those risk factors on lifespan and then you can imagine the total effect of the snip on lifespan should be. If you just go through all the possible path is here and you sum them up the effects. So of course here we need multi variable causal effect estimates. All the different risk factors, and these are coming from the previous G buses. So that's where the prior information is, is both here and here. And then our prior effect would be the sum of the beta times alpha. So the effect from snip to risk factor then to risk factor to lifespan. Because let's imagine the snip is increasing your BMI by two units. And each unit of BMI is decreasing your lifespan by five months. Then the total effect of the snip would be the two units times five months. So it would be 10 months decrease for the snip. Of course, rest assured that there is no such temperature increase your lifespan by 10 months. Well, the strongest one is actually it's about eight months. Okay, so that's how we get prior effects. And this is the advantage that we haven't used anything about our actual study ourselves. There is a caveat and I'm not going to talk about it too much. How can we estimate these multi variable causal effects it's through metal convent linearization. And I will tell you a little bit more about in my second slide deck. But just for the moment imagine that this was estimated from some independent source. Okay, so we have these priors. What kind of suggest you might be curious what kind of risk factors impact lifespan. Probably you you're aware well of any guesses here. Just to try to engage the audience a bit more. What kind of risk apart from the ones I told you what kind of any environmental or these risk factors do you think would shorten your lifespan. That we should consider which is genetic basis. Yes, exactly nutrition very good physical activity. Absolutely smoking is the biggest killer. Good. Okay. You captured. Yes, our pollution is very good. Actually, yes, so we didn't consider a pollution because here we care about only risk factors that, but it's still a correct answer to my question. We care about risk factors that have that are impacted by genetics and air pollution is, of course it depends on where you live and where you live it has some small genetic basis maybe it's about 2 to 5% heritable where you live down to your genes. But it's not really a direct genetic effect. Yeah, writing grant proposals definitely shortens lifespan I can assure you exactly stress. The measure of stress is very difficult in cohorts and and also the other trace the nutrition physical activity it seems like it's something super simple. But what people lie a lot about what they eat and how much physical activity they do and I have another slide deck about this it's super interesting to compare the two. When people were estimator devices so that we know exactly how much they move at the same time we asked them how much have you moved in the past two weeks. And the correlation between that is ridiculously low, so it's about point 15. Meaning that contrast what people say, and nutrition is again if you compare, for example, sugar consumption, people who are the heaviest they report to consume the least sugar. And that's mostly because how your other traits impact your nutrition or how you wish you would change your nutrition so it's very difficult to assess. So many of these traits are difficult to assess but a couple of them we can, and some others have very small genetic basis. And basically this is what happened when we looked at a couple of trades so we started off with about 60 trades. And then we narrowed it down to the few that were significant and very robust unsurprisingly of course diabetes having high triglyceride or high cholesterol levels which are here so here's the good cholesterol here's the bad cholesterol good cholesterol your lifespan and bad cholesterol decreases your lifespan for more than good cholesterol. That's good to you. BMI is also something which roughly a every kilo every extra kilo you carry in your life it decreases your lifespan by two months. Smoking, as was correctly guessed, is decreases but quitting smoking you gain back the years. Of course here the time component is very important and I can't make any guesses based on the data we have we can't estimate really how quickly you have to quit to gain it back. The best is not to start anyway. And then density. This is smoking intensity, it adds an extra minimum so if a pack of cigarettes, every pack of cigarette is roughly maybe 10 years minus. Good. What was interesting and unexpected is to you to see years of education so it means that roughly every year you spend in education makes your live one year older, one year longer. Yeah, there's a comment about question about lying and memory and and you can have good intention exactly so I shouldn't call it lying. It's it's misreporting. So you actually have an interesting project ongoing about reporting bias reporting bias also how much you misreport and how inaccurately you report. The nature of misreporting is is being male. So, man's are diverse study participants they misreport the most, and the older you get the most misreport thing you do. Anyway, parenthesis closed. So, very good. So it means that we have a couple of traits which have impact on lifespan and many of these are genetically have genetic basis. So we use them to build these priors. So again we get back to these priors so we will just plug in here education diabetes obesity and so on and so forth, and build these priors and then we can do Bayesian scan. So what we will do is we get the GWAS estimate so when you run a genome wide association scan for each snip you will get an estimate on the snip effect on the outcome. And of course the estimate will be coming from a normal distribution with the mean value of the true parameter, and with some variance and of course the variance is usually going down as a simple size increases. Now the model is that the true value is zero. So this is absolutely no effect on the outcome. And we're just saying some random noise basically as the estimator for the effect size. Now the alternative model is that it comes the effect size the true effect size comes from a normal distribution. But now our mean value is the one that we expect estimated. So these are the ones the effect estimates of of the snip I so the genetic variant I on trade J. So imagine J when Jake was one it might be the effect on diabetes when Jake was to be the effect of education and so on and so forth, and we sum it up for all the trades. So there's causal and that have a genetic basis. And of course, each effect that we estimate from G was in to multiply by the causal effect of the trade on lifespan. And that gives it that was the prior effect so that that that would be our prior to mean for a prior. And then of course we have an uncertainty because these estimates are coming from actual studies. They have their own variances. So we have actually in reality, this causal effect has also a variance. So in reality test three terms, but this is the most dominant term. Okay, so we have, of course, if these estimates are very imprecise then this tau value will be very large so we don't have much confidence about about our prior. So it's good at least we pushed in as much as possible, all information available at hand. And then we looked at something called base vector probably heard it in the course, which tells you what's the property of the data given your alternative model divided by the property of the data, given your model. In our case these base vectors will be very, very simple. And essentially the this model has really few parameters it just basically parameterized in general by this theta which in our case it's really this gamma, and the two different models have the two different distributions one is the, the fully zero and the other one is a normal distributed variable so it really simplifies this base factor to the ratio of Gaussian curves to Gaussian density functions. One with the mean value, of course, of the prior, where we simply, we need to just sum up the original uncertainty about the estimate and the uncertainty about our prior estimate. And plus we have the null, which will be the mean value of zero, which will have simply the uncertainty of the estimator itself. So the ratio of these two. If this is large, it means that we believe that the model one is much more likely than the model zero and that's when we are, we are of course interested in and get excited about. Basically, imagine here the null hypothesis is that you have a mean zero with some variance in the alternative hypothesis is that we have this prior effect with some error, which is a bit larger than the, the non error. And if we have an observed value, let's say it was to the observed effect size. The prior was three. So you can see here that still the model one density is much higher than the null model density. So the density function so that the numerator will be this height of this red line and the denominator will be the height of this red line. So you can see that in this case the base factor is probably somewhere around four. It really compares that are observed value. Of course, if you're observed value and your prior completely coincides that's where you have the best chance to get a really good ratio. And if it approaches zero, then you will make your base factor will be very, very low. Great. So we have now some sort of quantity that tells us how much better the observed data is fitting the alternative model that we derived. So we can run it for every snip, but now the problem is that of course if I give a huge prior very strong prior on some variants, maybe just by chance I will get a very huge base factor. So we need some sort of what happens under the null and the way we be shuffled it is that we imagine now that the the priors are not distributed totally randomly in the genome so we keep the prior distribution. So the prior effect distribution across the genome is kept as it was it just now we randomly distributed to different snips the priors. So we break this link between the prior effects, which were attributed to particular snip now we give it to a random snip. And then we see what kind of base factors do we get them genome wipe, and then we can repeat this experiment we reshuffle again the priors, and we get new base factors so we can do it. We did in our paper about 1000 times. So for each snip you will get 1000 null base factors. So it's very good because we really now have a comparison of artists base factors are really important or unexpected. So just randomly throwing priors on the genome. I will in the interest of time I think I will skip this part but we did basically a bit smarter way of instead of permutation procedure we can do a very accurate estimation of these base factors under the null. So it's a very trick which I will skip for the moment. And I will just show you now what is the distribution of the base factors that you observe for the lifespan data. So you can see sometimes it's about even up to three million as a base vector is so much more likely the alternative model. And you can see that of course for the measure to the snips the base factor is one, the non model or the prior model is pretty much the same. And of course it happens also for a reason that very many of these priors are very close to zero, because when we don't have much evidence that these snips are are impacting lifespan threatening trades. And this is the what we get under the null the black line. So you can see if, if we just randomly throw these priors in the genome these are the distribution of we get for the base factors. We see a massive access of high base factors compared to so this is really something meaningful, and not just happening by chance to have such to observe such high base factors. So the advantage is now that what we do is, of course with these null simulations that we get the null base factors we can also get p values because people are of course it's a Bayesian approach but in our field people I mean I mean journal reviewers very much like to see p values and of course they want really to see a control type on error. So, luckily we can then turn these to p values. And then you can see that we get back very similar facts so basically the initial g was identified only these four hits. Actually this one was not even identified the origin g was so basically they were. Sorry, no, the initial g was when you don't use any priors you will get essentially these two hits and if you're lucky depending on a sample size you may get another one. Now with the very same sample size but using these informative priors, we get actually 16 hits. So we have a lot more. But when we look at the actual hits so you see FTO comes up, which you might remember from the obesity study that was a top hit. All these are lipid genes. This gene is the strongest has the strongest effect on smoking intensity. So basically, this is a nicotine acetylcholine receptor, which basically makes the receptor. Much more active, much more excitable by nicotine, which gives more pleasure. So of course it increases intensity. And this is the well known a poe locust, which is of course well known for for lipid levels cholesterol levels and Alzheimer's disease. So when we actually look at the genes they look like actual disease genes and they show the values we get, and then we did the replication study. And so many of them replicate nicely this is the effects in terms of month months so you see that, but actually the largest effect is about seven months. The apple itself is almost five months decrease. So you can see the effect of these individual variants are very very small, just a few months change in life. And what happens is that if you look at these 16 markers that the lifespan associated can see that actually because of the priors were informed by the diseases we look back again. Now let's let's check whether how much it's really just do we rediscover diseases and actually for the majority. Disappointingly that's what happens. So, when you see the, the bottom one, this is the one with the largest effect this is the apple e snip that has that's the huge LDL effect. So who's Alzheimer's diseases as prior. Yes, so what is interested in the snip. And then the, this is the second one is a is a smoking snip. So this is the CHR and a three and five. Other cholesterol snips. You got triglyceride ones HDL ones. So these are extending lifespan these are decreasing lifespan so you can see if you increase HDL levels that increases lifespan. And if a snip is decreasing triglyceride levels it's also increasing lifespan. So, anytime you see a very darkish color it means it has been already discovered really by G was coronary artery disease of course if you increase your chance for coronary artery disease. It will decrease your lifespan. And there are very few examples. There are some here where you can see generally, relatively mild levels but cumulatively. So the none of, for example, this is not discovered by any of these traits, but it has a mild impact on each of these and cumulatively that leads to increase lifespan. And of course that's the criticism we can have is that once since we use these priors we really tend to pick up snips that are are exerting their effect on longevity via modulating these respectors. If you don't use them we don't discover anything else either. What's interesting what we can also did is that you can take the difference between what's expected because of these traits and what is what you see with lifespan. As an effect, and what wherever you see a discrepancy it means are there's some excess effect which is not explained by these traits and always we see some excess effect for example very strikingly for the employee variant we see an excess effect but because we didn't consider Alzheimer's disease here as an exposure trait and had we considered probably that would have been also explaining far more of the effect on longevity that's a very particular one because this Nepal gives you heart disease on an early relatively earlier age, so up to 70 and beyond 70 it hits you again, because then it gives you Alzheimer's disease which also is expected to shorten your lifespan. So, when we look at these discrepancies that we don't see very much convincing extra effect on lifespan so so far it seems that a majority of lifespan might be just explained by by these respectors. What is very interesting to look at is as a confirmation is that if you look at the age effect of the course and in every cohort if you look at old people, they are generally much much healthier genetically than young people. So, for example, if you have a bad variant, which decreases your lifespan, and it has a 30% frequency in a young population. If you carry that bad variant, then you're much more likely to be dead earlier. So, when you look at the cohort that that's up part of the court or strata of the cohort who are above 70. And they're still alive. They, of course, they will carry less of this variant because those people who are either dead or unfit to come to the study and participate, they will be eliminated from this. And because of that it will increase the frequency of a lifespan extending alleles. So actually if you just correlate age of participants with the early frequency. We can also pick up such variants and very strikingly this the 16 variants that we discovered their effect on lifespan very nicely is concordant with their effect on the simply the participant age. We also looked at them in mice. And we could we could identify a gene which is very interesting that this gene. If you look at just that 72 day old mice, and you measure this expression in the prefrontal cortex. It's a very good predictor how long the mice will live. And the higher the expression the shorter they will leave. And this is one of the locus that pick up. And we also can link it using expression QTL data to the, to the discoveries we made in the genetic markers. Some of these effects are probably modulating gene expression and some of these seems to be impacting brain. Good. So in summary, we used this Bayesian approach to build informative priors in order to boost the genetic discoveries and also, they are very useful on the other and also to understand. Some of the results, because if you just run a standard G was, and you run and you build your Bayesian prior and you see discrepancies between the two, then you can be. You can give you new research directions to understand what those discrepancy might be coming from so it might have some additional extra direct genetic effects on a given trade. So these kind of Bayesian approaches cannot be the priors are interesting, because they both discovery but the priors are also interesting, because you can distinguish between direct and indirect genetic facts and that that's a super interesting topic of how you can partition basically heritability of the trade into direct heritability and mediated heritability. So that's all I wanted to say for this part, and I would like to thank collaborators from different parts of Lausanne and Europe, and the lifespan study was spirited by a forum postdoc in my group Aaron. I stop here for the first presentation and I think we'd have some questions. Yeah, me again. In the first part of your presentation. You talked a lot about BMI and then also you had this other measure way to hydrate show adjusted by BMI. And I think to recall that you have done some work on these different ways of assessing obesity. What are good measures and what are not so good measures. Could you just talk a little bit about that how useful is it to use BMI what exactly is this WHR adjusted by BMI and advantages and disadvantages of how to actually have a measure for that is a proxy for obesity or a deposit T or overweight or whatever. So it's, yeah, it has been quite a bit of disappointment in my career so far when I looked at okay BMI such a stupid measure because of course it doesn't distinguish between bone density between muscles, water mass lean or fat mass nothing. So, so it's such a stupid measure but it's available in enormously large samples so that's why people were very enthusiastic and using it in genetics and discovering genetic basis. While it's genetic is disappointingly little so maybe 30% so our current estimates are more likely around 30% in terms of heritability. When we look at other measures, such as a body shapes a waisted ratio is an interesting measure because it, it basically tells you where the fat is stored. It's actually going to measure it is a correlation about point five with BMI. So partially sharing things but partially giving orthogonal information, especially in women in the correlation is even higher it really tells you whether the fat is more on the belly as opposed to on the hips. The hip fat itself is most subcutaneous fat. And that's what often referred to as metabolically healthy obesity. So, if you have that kind of access fat which of course you don't necessarily know because this equation is just a very imperfect measure of that you need to have a full dexa body scan in order to really know your body fat percentage and then to the location of the fat itself to be able to tell more it's if it's internal organ related fat or it's subcutaneous. So the visceral fat is really the bad fat. And the the subcutaneous one which is captured by this way separation just the BMI is expected to be this metabolically healthy obesity. What we did, but when you correlate BMI with very sophisticated measures of obesity, such as really body fat percentage arm fat leg fat trunk fat and so on and so forth. The correlation is super high, it's about 70% for almost any of these trades, so it's still a very it's capturing very well. And then the G was it have been run on different, let's say lemes versus BMI, it barely discovers anything new. It's basically whatever is BMI increasing it lemes decreasing, you know, in terms of relatively mass. So, if then the next step what we did is is then you can add many many other different trades so when we had all these sophisticated trades which are now available in in also in hundreds of thousands of samples in the UK by bank. You can see for major obesity groups. And so the first group is general size increase. So just because somebody has increased BMI it can be for different reasons. The increase BMI can be general increase size, because BMI is such a bad measure that it's if you're taller, your BMI seems to be is badly corrected and you seem heavier or basically fatter or do just because of the height. So the first group of if you take all this kind of obesity measures together and which to a clustering, based on genetics or non genetic factors, basically increased height increase body mass, but not really necessary increase fat percentage. The second obstacle component is is really shorter stature with more fat percentage. And that's really the worst kind of obesity, and that's not the subcutaneous fat. And then the third component is the fat distribution so it's basically whether the fat on the waist and the hip was the proportion of the two of them. And that's where the more on the hip the better, especially for women. And the fourth component was was a little bit the opposite the shortest stature and more leanness combination and these are all orthogonal and we've seen is that the subcutaneous fat has a beneficial effect on a couple of other traits while most of the other fat types, or subclassifications have variable effects. So the second type had the large diverse effect on all cardiometabolic traits. And the first kind, which is generally coupled with larger body size as well and not necessarily the fat percentage, which is actually that explains most of the variation and impacts a lot of people. It has a bad detrimental effect but by far not as strong as the second group which is about 20%. I just wait a moment is because maybe it's still digesting some information anyway if you have some more questions just put in the chat and we can discuss that also during or after my second presentation. All good. Everybody's ready to hear something different. Then I try to find my slides, or I assume you can see everything now. So this one I a little bit a loaded to the fact that we use a lot of this so called empirical base approach when they put a prior on something such as the sneak effect sizes. But we don't use it as necessary so we don't put a prior with a fixed variance but we put the prior in order to estimate the variance of the prior from the data. So we now show you how we can put this further and how we can apply general this kind of Bayesian approaches to structural equation modeling. Because so far all I told you about was mostly a snip effect on a given trade, but now we'll move to snip effects on two different traits and how we can model this. Okay, so I will have a very short introduction on correlation versus causation. So when you see this image, I see this person pushing this truck and actually the truck is moving. So you might correlate that this person is pushing it and the truck is moving at the same time. So the two things are correlated. But actually what happens is that there is another set of people who are pushing this truck and actually these are the cause for truck moving, but this is just a correlate. While they all happen at the same time and they all happen at the same time because maybe somebody shouts to push and then there's just somebody stupid or funny enough to do it there. So that's in a nutshell what differs correlation and causality so when correlation happens between two entities, the event of pushing something and truck is truck is moving. It does not mean that if this person stops pushing the truck will stop moving. While when there's causality, it most often implies also correlation. There are rare examples where it doesn't, but most often it also of course implies afterwards correlation and but the key difference here is that if this person stops pushing the truck will stop. And if the person start pushing the stop will move so intervening on this factor which is causal for the outcome is very important of course. It's a stupid example but if you think about now medical applications. You can imagine that what you want is to apply some intervention over risk factor or to modulate an outcome and to improve health outcomes. So for example of correlation, which is again not causation is when you look at chocolate consumption per capita per country on the x axis, and a Nobel Prize winners per capita per 10 million population of a country. Switzerland is stopping both categories. Correlation is extremely strong. But of course, eating chocolate doesn't make you smarter and doesn't make a nation to be a Nobel Prize country. And what this really is that there is a, there's a confounding factor so confounding factors are in this case it's the GDP of a country. So the higher the GDP, the more the country can invest in the higher percentage the country can invest into into research, and that simply translates into better quality research and eventually a few lucky ones. For example, a Nobel Prize is but also the GDP improve GDP allows people to spend higher fraction of their salary on luxury products such as chocolate, and that increases and boosts chocolate consumption. And because of this the two have nothing to do with each other in terms of causality, but since there is a very strong confounding factor it impacts both of them. And we love to draw this kind of deck so it's directly acyclic graphs but it's not really acyclic because this one has a cycle, but in general, I like to draw these graphs that are connecting that it really helps us to understand many relationships in general in medical genetics or in human genetics. And when we look at BMI diabetes there's a causal effect here. And these graphs very nicely described there's practically barely any confounding factor which leads so essentially the correlation is almost the same as the causal effect physical activity and BMI the higher the physical activities of course it reduces BMI but if you have higher BMI you're less likely to do more vigorous physical activity. So there is also a feedback loop here. So BMI and education, that's a very tricky question. Clearly higher education we have solid evidence decreases BMI. There is some marginal effect remaining probably there is maybe some small effect of, of lower BMI leads to higher education rather higher BMI leads to lower education, very very mildly. And it's still contested, but there's a very key factors such as succcionic status which is impacting both of them. Okay, so you can this graphs can be very very complicated and this is the central question what we have is if I want to estimate the causal effect of a risk factor and outcome, why there are co-founders, it can be very tricky. You can use Mendelian randomization to tackle this. And it's a little bit alluded to before, and this is a causal inference technique. But for example if I want to estimate the causal effect of obesity on diabetes, I can use genetic markers as instruments. So you might have heard about randomized control trials where they want where the aim is to create two groups and these two groups one will be an exposure group, and the other one will be a control group. And let's say you apply treatment. While one will be just taking placebo tablets the other will be taking a real tablet and then you create two groups and you need to follow them up for some time depending on the effect. What you're looking at, and then if you see a difference in the two groups then you're pretty sure that it's because of the treatment. It's very similar to that. We split the population into two groups. If you remember for obesity I talked about this FTO allele. This FTO allele is if you carry an extra allele, it increases your weight by two kilos. If you carry two then it increases by four kilos. So what happens now if I take the two groups here now. One group which carries zero allele so it's a TT group and I take the AA group. So this is like a randomized control trial. These people were born with these alleles. From birth they're carrying these alleles which from birth they're changing the expression level of some RSX3 and 5 genes in the region and internally lead to different weight. But that we already know that it leads to different weight. Now the question is diabetes. So if I'm asking now what's different in diabetes prevalence in the two groups. It's a two group where essentially we did an intervention. Of course it's not we but nature played lottery and intervened and gave some people alleles and some people TLEs and then this happened randomly in the population. And now we're looking at these two groups. So basically they had this obesity intervention from birth. And now we just look at these people at 50 or 60 and ask what is the diabetes prevalence. And it did different. Then we can estimate then then we can say yeah there is a true cause of effect of obesity on diabetes and we can estimate the effect in relatively simple way. If you know the effect of the genetic marker on the exposure and you know the genetic effect on the outcome. And if this genetic marker only impacts the outcome through the exposure so through obesity. And there's no other arrow going through some confounder or some independent effect. Then the total effect from the genetic marker to the outcome should be the product of the genetic effect on the exposure times the cause of effect. You remember very much it's this was exactly this prior effect. If you remember from the first the Bayesian prior derivation. So these are the total effects is the product of these two effects. So it's very good because from genome where the cessation scans we know for tens of thousands of traits and tens of millions of variants what is their effect on the snips on the variants on the different disease and outcomes. So if I have a diabetes G was and I have a BMI G was all I need to do is to take the effect size on the diabetes divided by the effect size on BMI and I will get the causal effect. Of course it's very noisy and that is very many problems that, of course that can be related to founding so for example let me do BMI on education. So she can status a confounder social status has genetic basis. And so that's that that's bad. But if I take now many, many different genetic markers, they will be all providing me with a different causal effect, and each of these causal effects will be a little bit biased. So if I take one, like the I take nine FTOL FTO variant, which is BMI increasing variant. It will give me a causal effect estimate just by the ratio of the two of the effect on diabetes divided by effect on BMI and I take another I take an MC for our variant which is also BMI snip. And I, that will give me another estimate for the causal effect and so on and so forth I can do it for hundreds of snips, which are associated with BMI. Each of them will give me an estimator and the standard error. And then you can look at these different estimators, and then you can choose your favorite way of summarizing them you can take the median the mean the more than each is correspond to a Mendelian randomization method. Okay, so it's very nice because each of these can be biased in its own way. But on average they probably this biases average out and then cancel out. So this is what happens when I do actually really BMI on diabetes that I take now this is the FTO variant. Each dot is a snip here on the x axis I'm showing was the effect of the snip on the exposures on the BMI and the y axis I check what's the effect on diabetes. And I know to take the second one, a second snip, and that also gives me an other pair of associations and basically just fitting linear regression line through this cloud of points. And that gives me, it will give me an estimator for the causal effect, which will not take into account every single so called instrument that I use. So it's basically splitting population first by FTO allele, then I can re-split the same population by an MC4 R allele. Then I can re-split the population by by another allele which is obviously related message to be one let's say so on and so forth. And each of these different splits will give me different prevalence differences in terms of diabetes, and that leads to different estimates of causal effects, and then just by fitting this regression line tells me the overall causal effect estimate. Okay, but real life is more complicated and there are violations all the time. So, here, the genetic marker influences my exposure, and the it has an effect this exposure and the outcome but the outcome can also have a reverse causal effect which is not included at all in the framework that I just talked to you about. Then you can have confounding factors, such as when you remember BMI on education, the social coming status is a confounding factor, which will have an effect, both on x and y. I know there are a couple of the many notations in this slide. Just keep in mind that A is the key and B, A is the forward causal effect, B is the reverse causal effect. All the rest is some sort of nuisance parameter. So the genetic markers have an effect on the exposure, so these are the direct effects, but it can have indirect effects which x through the confounder and then gets to x. And then it can have direct effects on the outcome. These are the all the red arrows are the violations of the classical assumptions. And now what we want to do is of course, not just to use one step but to use many steps. So we will build actually G here will be the full genome. So for every single snip, we will try to, in theory, it, we could estimate its direct effect it's in direct effect, but the problem is that you is unobserved. So what we want to do is we look at the whole genome and we include some priors, and these are called spike and slap priors. So this is the effect size distributions of the different genetic markers on the exposure. And so this is a work which has been done by Lisa and you know a very nice collaboration. This is a time work where the gammas are random effects, and they come from a spike and slap distribution, and I will show you just right now here what it really means. So what we believe in very, very much is that the majority of the snips have zero effect. I just didn't put the full spike of really just zero so that otherwise you can't see it. So really, there will be let's say 90% of the genome has no effect on trades. Maybe the 10% of the genome has some effect and that effect comes from normal distribution. And this has been empirically shown and seen that this is very reasonable for different trades, for example, height we expect about 5% of genome to be to be related to height. And about 95% should have zero effect for BMI it's more like 10% of the genome it's more polygenic, impacting BMI and the remaining 90% has no effect whatsoever. So the actual effects of snips are coming from this spike and slap distribution. And so this is a mixture of two Gaussians. So actually, this is how it looks like it's still very much looks like just a spike. But you can see that there is still some bump there. Okay, so that's the assumption that we use these prior distributions here. But again, we don't want to re estimate these effects, especially because we don't even know what you is and we don't know what the effect of you, the genotypes on some unknown confounder. So our, our structure equation model is as follows. So we have x the exposure is determined by you, which is the confounding factor times the effect of the confounding factor exposure, the effect of the confounding factor on the outcome. And the causal effects of y is impacting x, and also x is impacting y. It's, to be honest, it's a very difficult to grasp what it really means because it's, if, if x is increasing wide and why increases x and it goes infinitely so essentially the same thing as applying an infinite loop here, and and seeing what the system converges. So while people in general don't like cyclic graphs is because this the cycle really happens. If you think about it temporarily. So if you have higher BMI that increases chance to develop diabetes. But if you develop diabetes that then you tend to go to doctor the doctor then tell you to lose weight. So it will have a positive forward effect and it will have a negative feedback loop. But these are only if you look across time. So initially it will increase your diabetes, your BMI but then in turn, at a later time point your real BMI will go down so this is not a static network really but this is you should imagine it across time it's happening and we're just looking at the convergence, where it converges to so at the steady state. So let's start genetic effects. So these are the direct genetic effects from g to x which act here, plus there is all the rest which is unexplained by by these three components, why is the same. You can look at that the confounder is an effect on why, then the x has an effect on why, and the genetics have some direct effect on why and the confounder itself it has its own genetic basis. So it will be the genotypes times the effect, plus the, the respective error. What's nice is that we can eliminate the you, and just directly express x and y from this with some other tricks, and it everything now will depend on just the genetic effect because that's what really care we care about what are the genetic effects and then we will have a different error terms. So previously those e and nights epsilon so it's really not the same error because this error includes now environmental contributions. Okay, so the thing now that if you get these equations can calculate the genetic association so the genetic association is like I take a SNP K such a genetic marker K, and I regress it onto x and if everything has been standardized that genotypes have zero mean unit variance the both x and y of zero mean unit variance the linear regression. It's a universe regression as simple as that it's simply just a product of the transpose vector times the vector itself. So these will be the this marginal association so it's really nothing else than the, that the correlation between the genotype and x. And these are the, the really the marginal effects and that's, we have access to that's available for as I mentioned for tens of thousands of trades and for tens of millions of markers. So this is really good because we have, we have access to this data on very large scale. So how to get from here to the effects is simply we need to multiply by this GK transposed the full equation. So if I multiply this everywhere. Then what's key what's happening is that there will be this GK transpose times G times, and the GK transpose times G is the local correlation is basically the correlation between SNP K and all the other SNPs. And if the steps are too far the correlation will be zero so it's really just the local correlation. And if you heard about the LD score regression it's it's related to that. So it's the local correlation here. And these are the multivariable effects which are coming from spike and slap distributions. And in the interest of time I will skip a few points, but really this local correlation just really telling you how close by SNPs are correlated to each other. Now these marginal effects that we have available so for SNP K on trade X and SNP K and trade Y can now take the it takes this form where these z's are essentially local correlation times a spike and slap distribution. And where all the remaining errors are are just depending on sample overlap and phenotypic correlation between the trades and sample size. So it's very nice because then we model something which is easily available and accessible, and there's no privacy issue we don't need to have actual access to genetic data. Then we can model this spike and slap. This is pretty complicated but we can derive the distribution of such a weighted sum of a spike and slap distribution. We do it through Fourier transformation. Now I will skip some of the details, basically the local genetic correlation this is for real data that looks like this. And if you look at the histogram of local genetic correlation, it's very much also like a spike and slap. It's basically it's a mixture of two Gaussian distributions. So it will be a product of two, essentially, this is a mix of two Gaussians this is also a mix of two Gaussians and you take the product of those and you sum it up. So you can imagine that this becomes a pretty complicated distribution and the PDF is not an identically tractable. Each every and every term here is a is a best second function of the second kind, the PDF of it, but but it's become super complicated but what's nice about these best self functions is that their characteristic function is very tractable. So practice function is just a Fourier transformation of the of the PDF function. So what we need to do is simply calculate just work with the characteristic functions of these variables and then back transform it afterwards at the very end. And that's exactly what we did. Eventually, we derive then the likelihood function of this data given all the parameters. So what's nice about it is integrated out all these parameters, because we just use the prior in the prior. It is only dependent on two distributions on the polygenicity and on the heritability. So what's nice about it is to imagine another way is that what we work with is the effect sizes of millions of snips on the exposure and on the outcome, and that's how it looks in theoretical scenario. There will be some snips which will have no impact whatsoever on the exposure, maybe just some part of the outcome. Some snips which are related to the confounder, so which have a direct effect on the confounder and an indirect effect on the exposure. Those will give rise to one slope here, and those that have a direct effect only to the exposure, they will have a different slope here. So basically, and there will be a third slope which will have a direct effect on the outcome, they would have an inverse of this alpha reverse causal effect slope. So in reality you really observe a confounder slope, a causal effect slope and the inverse of the reverse causal effect slope, and plus there will be the null set of snips which are not related to anything. So this is actually what we try to model. Our real aim is to get these slopes estimated from the data in a rather convoluted way. In some situations we've shown that in such situations what's nice is that we can estimate the heritability of traits, the direct heritability, we can estimate the confounder, how strongly it's confounded and how strong is this genetically heritable confounder is, and we can estimate the causal effects. So it's all very good, we can reverse causal effect and forward causal effect, this is the true values in blue. So these are simulated values and these are just different methods and you can see that most of the methods can't recover the true value. When you simulate data under the null, we want to see a null zero causal effect, and then most methods are biased because of the confounding. Once you have a heritable confounding factor, essentially almost all methods will be biased. If we violate it with this confounder, you can see here the real causes, so it's a non-null causal effect now, then it will be underestimated in this case if these have an opposite sign. And when you do another violation, typical violation, we have a reverse causal effect, our method still works and the others are underestimating again heavily. So that's good. Of course we simulated the database in our model, but at least we can get back to parameters what we wanted. What's nice about this structure equation model and about this random effect model is that you can estimate the genetic correlation between different traits and we can recover it pretty well. And we can now understand why two traits might be genetically correlated or even correlated, even that we can estimate is partially because there's a forward causal effect, partially because the backward causal effect, and also there's a contribution of a confounder. And these three reasons give rise to a correlation between the two traits and we can break up now this into the three different components. When we apply it to real data to get actual causal effects, we see very nice and convincing results. For example, if you have higher BMI, the increase in blood pressure, if you have higher LDL levels, you will have more likely to develop coronary artery disease. Higher BMI is of course leading to higher diabetes. And smoking is increasing your chances to have myocardial infarction. What's very nice is also we see how education has been actually affect reducing coronary artery disease, diabetes. If you are more educated, you are less likely to smoke as well. It tends to even increase HDL levels and so on. So it's all very much in line with what has been medically established so far. What's very nice is that we can also get the confounders because that's what people ignore so far. We want to see the causal effect, but what can we get the confounders right? And when we do, for example, birth weight and type of diabetes, most methods, they claim that there is a causal effect from birth weight to type of diabetes. We don't see any significant effect. So it's just an estimator, but it's compatible with zero. But our model claims that there must be some confounder and our model says that our confounder has to have the same effect sign on both. And when we look at potential confounders, so now we actually scan through traits and we estimate their cause effects on each of these factors, we get a couple of good estimates. And what the conversion is that parental obesity might be a reasonable confounding factor. So our model, it treats the confounders a latent variable. And it still can estimate the ratio of the effects of this confounder. But Kant, of course, tell us what the confounder is because we just only use data for these two traits. So it's counting formats about something else, but it tells us the suspicion that the data tells us that it cannot be explained by just causal effects here. There must be a confounder. And it tells us that actually there's no cause effect. It must be a confounder only. And that confounder is compatible with parental obesity. HDL and systolic blood pressure. And we're using systolic blood pressure because the beneficial effect of the good lipids, but it also claims that there must be a confounder and there had been a recent study claiming that might be alcohol consumption is a confounder which is positively impacting increasing both of those. And, of course, there are many other traits where we suspect there is a confounder, but we don't yet know what it is. So I think with this I'll stop and there will be still a few minutes left for questions before the lunch break. And of course I would like to thank very much Lisa Nino who were spearding were really key for this project to succeed.