 So today, as Jacques just said, we're going to talk about interaction with the host genome. But in particularly, what is interesting here is the variation in the host genome. So at the nucleotide level, so polymorphism sniffs in the host that can have an effect or that can modulate what happens, in particular in the gut, but we'll see a few examples of not necessarily limited to the gut. So this is just a brief outline. Basically we'll go over some of the measures of the microbiome that will be assessed and that we're going to try to correlate or associate with the genetic variance in the human. And we're going to go through three different kinds of host interaction in this concept. The first one is how the human sniffs, the human variations, can affect the gut microbiome. So in broad terms, as you can conceive of. The second one is how the gut microbiome can interact, how drugs have an effect on the body and vice versa, and I'll give you a few examples of that. But also we're going to end with a new kind of analysis, which I think is really exciting. It's a kind of work where you're doing genome to genome. It's like GWAS to GWAS analysis, looking at genetic variance throughout the genome in the human versus genetic variance throughout the genome of a bug. So before we go through some of those examples, let's start with a few words of wisdom. So first of all, the major differences between the human genome and the human microbiome. And I'm sure that by now you're quite familiar with the human microbiome, which has been the topic of conversation. So when we're considering the human genome, obviously there is only one genome and it's stable. And so when we do analyses with that, we're basically taking the DNA from anywhere, could be buccal swab, it could be blood, and basically we assume that the genome is going to be stable and that the variation in that genome represents the variation we had at birth and represents a variation that is having an impact on our health throughout the lifespan. But in the human microbiome, of course, there are many genomes and recombination occurs. So somatic recombination in human is not that frequent and when it happens, we tend to be concerned. But in the microbiome, it does happen and it happens, it can happen independently of reproduction. And so funny things can happen and also the human microbiome will change in response to the environment so it can evolve, it can adapt to changes. This is definitely not something that is happening to the human genome which is barely stable. And so those are some of the highlights of the difference. Now, when we do genetic association, when we're trying to establish a correlation between the human variation and the microbiome, we're going to have to define what we want to test. So what we call in statistics the endpoint or the outcome or the dependent variable. What are we trying to model? So in broad terms, this is what can be modeled and this, I hope you're familiar with, so hopefully this is the kind of outcomes you've been working with. So we can be interested in the diversity of the microbiome. This can be characterized as alpha diversity, beta diversity. We may be interested in the abundance of a specific species. So how much of one species is present? And how does that correlate with the human genetic variant? So remember, those are the outcomes. This is what we're trying to model in the microbiome. And we may be interested in functional profiling. So instead of asking how diverse it is or how much of one bug is present, we may be interested in asking, what does it do? Regardless of what the bugs are behind it, but what is the function? And this may be performed in a number of different ways. And we're going to go through a few examples of that. But I'm sure you're familiar with that by now. And of course, we may be interested in the strain level variation. So if you take one strain, that strain will have variability. And that variability will be dynamic over time. And we may be interested in using that as an endpoint. How does the human variation associate with specific variants in the bug of interest as it mutates over time? Few last words of wisdom. So what would be the advantages or what are the distinctions between doing microbial functional profiling versus just abundance? Well, when we work, when we do statistical associations, as you know your outcome that you're trying to model, you want it to be as stable and as replicable as possible. You want the variability within individuals and between the individual to not be so great that you can't model it, okay? See, some variability is essential, otherwise you can't model anything. But too much variability that is explained by unknown factors is just too much. So when we're looking at the abundance of specific species, we know and you probably know this, that there's a lot of variability, even within an individual and between individual. And the idea of doing functional profiling is that we're attempting to reduce that variability, where it's in a way we're trying to reduce features, to reduce error that couldn't be accounted for. So I think that's the major aspect of why one would be interested in functional profiling. Okay, so first example and so this is an example of how genetic variation in humans and people is modulating the microbiome. And this is the, I don't know if any of you prepared for this lecture, but I saw on the poster that there was a reading that was assigned to the cars. And this is the paper of the reading. Anybody looked at it? Okay, of course. So Jacques chose this paper and it's a good paper. It's not my favorite, Jacques, but we're going to go over it. So it was published in an issue of Nature Genetics in November 2016. And in that issue, in that specific issue, they had three papers back to back on this specific topic of how genetic variation can modulate the microbiome. And this is one of them and we're going to see a second one. We're not going to see all three. We're just going to look at two of them and it's going to give you a pretty good idea as to how this can happen. So this one is very simply called the effects of host genetics on the microbiome. So what did they do? Well, first of all, when we're studying human variation in the host, we need large numbers and this is in the world of genome-wide association studies, GWAS, we typically work with large cohorts. And here they have what would be considered a small cohort in the GWAS world. So typically when we do GWAS in order to be published in Nature Genetics those days, we need cohorts of above 150,000 samples. Okay, so this is a smallish cohort. Still it's providing interesting result and I think the reason why those three papers made it into Nature Genetics is that it's because of the novelty. It's just the mere fact of being able to tackle all of the variation, defining the outcome properly. So this was of high interest and that's why it made it there. So in this study, it's 1,500 individuals coming from three cohorts. And the three cohorts are homogeneous in terms of the geographical origin of the people sample their Dutch cohorts. And they use a funny statistical approach in the model and that's why I said it's not my favorite paper. They use like a staged approach. They had a discovery cohort and then they had replication cohorts. This is the kind of statistical analysis we used to do in the GWAS world 10 or 15 years ago. Nowadays, we tend to lump all the cohorts together over 100,000 samples and do a big meta-analysis and take the results at face value. You get the most power. When you do stage analysis, what happens is that at every stage, whenever you ignore to look at something, you obviously you lose power. If you're not looking at it, there's no chance you're going to find association with it. So the most amount of power you can have in the statistical analysis is if you put all your samples together. So here they use the stage approach. Okay, so they did germline genotyping. So basically they drew blood from those patients. They had, it was a cohort that already existed. I think they already had the blood and they genotyped. And this is typically what is done in the GWAS world. So you extract DNA and they did genotyping. And in this particular study, they genotyped less than a million SNPs. But as you see here, I said 8 million SNPs were tested sequentially. So what they did is a very standard procedure. And that's one of the differences between the microbiome and the human genome analysis is that in the human genome analysis, we have very standard procedures. Things are done, things that methods, approaches that we use have been tested and validated and there's robust documentation that as to what is the best statistical approach to use. So this is a very standard way of doing a genetic analysis in humans. They genotype 1 million SNPs and then they imputed probabilistically the genotypes of unseen SNPs based on external reference samples from the thousand genome. So thousands of genomes have been sequenced and those references are available in the public domain. So when you have genotyping on a subset of SNPs, you may ask, given my SNPs, what's the probability of unseen SNPs? And you assign a probability to those genotypes and you include those in your analysis. So that's what was done here. Yes, question? I sure do, yeah. So we've been using that for many years in my group and in many groups, I'm no exception. And the way it works is that, as you know, the human genome is not totally random. It's not like assigning SNPs at random because of the lineage, of the history of the people, of the populations. The chromosomes are shuffled in chunks. And those chunks create linkages equilibrium and the human genome. And we can exploit that linkages equilibrium to infer the probability of a neighboring SNP unseen, but so close that we're pretty confident as to what it would have been. And we can reconstruct those probabilistic predictions based on haplotypes. Not only on a single SNP, but you're actually using a series of haplotypes. And you're asking, how many times have we seen this particular combination? And what's the most likely SNPs in between those combined SNPs? So it works very well. And in my group, we made discoveries in the past based on imputed SNPs. And of course, I'm not going to publication until we validate the invitation. And each time we did, the correlation is super high. Just a few glitches here and there, but it has a minor impact on the statistics. Because remember, the models that we use for association take into account the probability of that value being correct or not. So it's all waged out. Don't be ignorant as well. I mean, I'm wrong, but I feel that we have, like, there's a real box of haplotypes in the human genome, like, around 30,000 weeks ago. So that probably helps in the imputation as well. It helps, yeah. It's totally different from the microbiome world. Not sure. In the microbiome, there are selection pressures that vary. And it's very dynamic. And it's a totally different environment. In humans, of course, the linkage disequilibrium patterns will vary from population to population. So other populations that have more diversity and populations that have less. And we know that allele frequencies will differ from one population to another. And so that's also very relevant. So whenever we do genetic analyses in humans, the most important covariates used in our association modeling is some sort of representation of the geographical origin of the samples. Otherwise, we could get confounding based on more diseases in some population. And we just assumed that because the SNP is more present, the SNP is causal, but it's not causal. So those situations may occur in human associations. And we have to take that in the video. Sorry. Yes? This could be a very basic question. But I'm curious about your last comment. How does that work if people aren't from mixed origins? OK, so how that works. I think you use similar approaches in the microbiome. So basically what we do is we try to take as homogeneous a group as possible. So let's say we define our analysis. If I'm doing an analysis, and 95% of my samples are Caucasian, and 5% would be of mixed origin, Asian, African origin, then because the representation of different population is so small, I may choose to ignore it. Otherwise, I wouldn't be able to model it properly. If I have a good representation, it would be a different story. But if there is a one dominating population, I may choose to homogenize my set and analyze this. But that's not the end of the story. You still need to take into account the population structure within that broad definition of Caucasian. And the way we do that is we typically use principle component analysis based on all SNPs throughout the genome or selection of SNPs throughout the genome. And we reconstruct the orthogonal similarities between the individuals. And we use that as cool variants in our model. That's how it's done. Okay, so back to our GWAS to microbiome analysis. So they did a very typical GWAS approach that was based on 8 million SNPs. And they did gut microbiome shotgun sequencing. The outcomes of interest, and remember we saw what are we trying to model in the microbiome that is going to be the target of our model. They used different factors. So one thing they looked at was just the abundance of different microbial taxonomies. And they defined over 200 such species. And then they looked at functional units, defined using different approaches. And they had over 1,000 different functional units that they were modeling. So how many statistical tests are we talking about here? We're talking about 8 million SNPs. Each one of those 8 million SNPs we're going to test with over 200 species. So that's already 8 million by 200 plus 8 million by 1,000 for each of the functional units. So when we do multiple testing like this, we expose ourselves to false positive association just because we're testing multiple, multiple, multiple times. And we need to adjust for that multiple testing. Am I going to get a microphone? Oh yeah, super. So in this paper, and the reason why I was so concerned about this paper is that the approach they use for multiple testing is unusual for adjustment for multiple testing is unusual. And in the methods section of the article, if you ever get curious enough to go look into it, they actually have almost a full page of justification of how they use multiple testing because would they have used the traditional approach to correction, none of this would have passed the threshold of evidence. Now what about covariates? So the covariates that they use here were limited. And I'm going to show you a different example right after where the covariates are more extensive to adjustment for agent sex. Okay, and it's not, they didn't actually rule covariates, but they didn't even model those things as covariates. They normalized their outcomes, normalized based on agent sex. Okay, so I was taken into account into this. What did they get? So this is the results. We're looking at a very typical, classical display of results for genome-wide association studies. It's called the min-hat and plot. So those blue dots represent p-values. So we should be looking at 8 million p-values or even more. I think this simplified the plot. It would be two dents on the page. And they drew a line, and that's the significance line. So you will understand that the higher the more significant, the higher the smaller the p-value. They actually use the log 10, negative log 10 of the p-value. So the higher, the smaller. And they're telling us that, I'm not sure how this works, that everything that passes the line would be significant. Is this on? Does this work? It's a fake. It's a sham. Question? Yeah. I have heard that there was like a consensus on the GWAS feeling that the threshold of p-value was to 10 to 16. Correct. So in the GWAS world, the threshold for significance is 10 to the negative 8. This represents correction for approximately how many multiple tests? Approximately 1 million. 5 times 10 to the negative 8 is correction for approximately 1 million genome-wide statistical tests. We used 8 million. Why are we adjusting for 1 million? Because there is correlation between the SNPs and the human genome. So they're not. That's a different point. So the SNPs are not independent tests. Each individual SNP is not entirely independent from its neighboring SNP. And we don't want to overcorrect. Otherwise, we'll never publish in Nature Genetics. So we use that standard. And it's been broadly accepted that we adjust for approximately 1 million tests when we do a genome-wide study, regardless, really, of how many SNPs we use. Now clearly here, they did not use that significantly. Or they did, but not taking into account the multiple species and the multiple functional units that they tested for. And this is just right in the paper through the staged approach that they use. So any more questions on this plot? OK, we're moving on. I didn't really want to go through each and every single result they have. I think this is not the point. Today, let's try to focus on the conception, understanding how those studies are conceived and how one would go about testing those studies. But just to highlight one result that they highlighted in their approach. And it's interesting that they decided to highlight that because it didn't directly stem out from their genome-wide statistical approach. It actually is the result of candidate genes that they had pre-specified saying, oh, we're also going to look at those genes because they have been reported in the past. And this is one example. It's the lactase gene, LCT, in human. We're talking about the gene in the human. And this gene in human is known. And it's broadly established that it is associated with adult ability to digest milk, so adult expression of lactase. And they wanted to see whether that gene would modulate the gut microbiome. And they say, well, we have some interesting signal here. And what they see is that people, individuals in the study who were homozygous GG at that specific SNP, previously known to be modulating lactase in adults, they had higher levels of one bacteria, difficile bacterium, abundance. And they tested the results in their three cohorts. They had three cohorts in their stage design. But what is particularly interesting in this model is that they saw what they call an interaction effect with the consumption of dairy products. So in fact, that association was mostly apparent in individuals who had higher intake of dairy products. So if individuals had low intake of dairy product, then they did not see that abundance, the correlation with abundance. But as the individuals were taking in more dairy products in a day, then they saw the separation. And they did a statistical test for interaction. And the statistical interaction was significant. This is probably the highlight of the results. Questions on this interaction? It's quite elegant. Yeah? Actually, sir, it's a question. Oh, fine. Oh, you don't have a question? I have a good question. Go ahead. OK, I'll be able to talk to you. Most of them, yeah. I think they are. Because I found the microbiome, sometimes getting this. Oh, the host DNA. Maybe they, so usually with the host DNA, what they typically do is they publish the summary statistics. So you don't get the individual level data of the genotypes, but you'll get the better value for the association of each endpoint, not knowing how each individual contributed to that better term. And the reason why they do that is to protect the identity of the individuals. And we can still use the statistical better terms to do meta-analyses. And actually, it's funny. It's interesting. The amount of things you can do with summary statistics, you can dissect them, you can. So it's still quite valuable. Yeah, question? So I haven't made a paper, so I'm going to ask you about the interaction itself. How did you go about missing that? Would it like just the product of the genotype times the integrity of the area and looking at the line? Well, basically, it's quite simple. So you have your outcome. And so the outcome here is just the abundance. And so you're testing for association with your SNP. And your SNP is basically dichotomous here. You have GG versus AG and AA. So you have two categories. So let's say 0, 1, or reverse 1, 0. And it's a regression model. And you add your diet. And here, the diet is portions of milk. And so you may have different levels. And then you add an interaction term. And so in statistics, it's just the multiplication of your things. And so you have your interaction term. And you can report the p-value here. And the p-values they reported was significant. So that's how you model it. And this was a meta-analysis because there were differences in how the diet was captured in the three cohorts. And so it's a meta-analysis p-value. So what they're saying is that it's a multiplicative effect rather than a derivative effect. So the more you. It's a GLM. It could be a super additive effect. Yes? I've been looking in this kind of study is an interaction between os-snips, like to check that there is an interaction between mutable snips or a joint-sniff effect. Because this is a very complex trait. So it's less probable that only one snake can have an effect on there. OK, so that's a very ambitious question. So we're asking, are there, throughout the 8 million sniffs, how are they modulating the outcome? But you want to take it one step further. Any of the 8 million in interaction with any other of the 8 million modeling your outcome in 1,500 people? I think that maybe in the future, when we put all those data sets together, we may start tackling those questions. So typically, gene-gene interaction models have been attempted in the GWAS world, but not been very successful. So typically, we'll do a targeted gene-gene interaction. If we have a very good reason to believe that two genes would interact with together, then we'll model that specific interaction, but not necessarily go out fishing for any snip-snip interaction, because there are too many possibilities. I mean, it's going to take all the snips which have the highest odds of reaching. So the question is, can we take only the snips that have the highest effect size as candidates for the gene-gene-snip-snip interaction? We can. But this brings me back to my comments initially about stage design. Whenever you do a stage design, anything you're not taking to the second stage, you lose power to discover. Question. So how exactly did you take into account the three different formats that you used, like a random effect in the past? Just remember, this is not my study. How did they take into account the three cohorts? So it was a meta-analysis. In the GWAS approach, it was a stage design. Anything that passes a certain p-value in the first cohort, then they would test in the second cohort. And then for this interaction model, they tested it in the three cohorts and used the combined in a meta-analysis. They combined the better terms and got a p-value on a combined approach using meta-analysis. Question. I have a question regarding the SNPs, which have a minority frequency, less than 5%. Okay. Is there any value to study them? Because since there is no enough space to stick a ball or tweak them. Okay, the last question on this topic, and then we'll move on. The question is, how do we deal with SNPs that have a small allele frequency? And this is a question that's probably very relevant to your microbiome analysis. What happens when you have a species in your analysis that has very low counts? And I've seen, so in genetics, how we deal with it, is for GWAS like this, if the number is too small, we won't have power to detect small numbers. And what happens when you're trying to establish associations with things for which you don't have power, what happens, you expose yourself to false positive findings, which become more likely than true positive findings. Okay, so typically we will not use statistical approaches that are underpowered to detect something. In your microbiome world, I've seen different approaches used for this. This may be a hot topic. So some people will say, well, if the strain is absent, we'll ignore it. And in individuals where we see the strain, then we'll count in that individual. So you're basically ignoring samples where the strain was not observed and putting that aside. This is, in my view, not the best approach. You need to have a modeling approach that can capture in your regression model, in your statistical test, those individuals where that species was not observed. Okay, so in genetics, what we do, we can use binning. And I'm not a big fan of binning, so you take a gene and you say, if I have mutation that's very rare, less than 5%, in one position, I will combine it with a mutation that's very rare in an other position and bin them together and use that in a test. In your world... But sorry, for these two positions, they have to be in linkage disequilibrium? No, they don't have, so the two rare variants don't have to be in linkage disequilibrium. There has to be a functional biological reason for binning them together. So typically we'll be, they're in the same exon, they're in the same gene. In your microbiome world, it would be they're in the same family of species, they're in the same, okay. So that would be the kind of binning that could be considered for low counts. Okay, I'm moving on because I'm never gonna feed. Before more questions. Okay, so this brings us to that second paper in the November Nature Genetics issue where they decided to go all out for microbiome. Statistically, I preferred this one. You may not like it as well for the reason here. So in this study, they had slightly more individuals, so 1,800 Europeans from two cohorts. The GWAS approach was very similar. They were a bit more strict on the quality of SNPs, which I like. But here's where it's the first. They didn't do shotgun sequencing, they used 16S. And so as some of you will agree, you lose some information here, it's not as precise. So they had 38 phyla and over 300 genera that were analyzed. And they did shotgun analysis, but it was follow-up work and it's not part of the main discovery. The outcomes, what they were modeling in this study here is beta diversity and bacterial abundance. So I think it's quite a classical kind of approach. And they labeled their paper, genome-wide association analysis, so GWAS, identifies variation in the vitamin D receptor and other host factors influencing gut microbiome. So they found a number of things, but they decided to highlight one of their findings in their title. So quite clearly they thought that the excitement of their discovery was in the vitamin D receptor. And I would tend to agree. What's really nice about this analysis is the care they took into modeling the different factors that can have an impact. And here it's the beta diversity. So they're basically looking at the factors of diet, age, body mass index, smoking or not, and sex and how that has an impact of what we're looking at here are principal coordinates analysis. Looking how age, BMI will push the principal coordinate analysis if you can plow them this in your imagination how the effect would be acting. And also with respect to the different elements in the diet. So they had a food frequency questionnaire and they structured the different parts of their food frequency questionnaires into units, protein, energy, sugar, fat, water. And so by creating those scores of units in the diet they could include those high school variants in the model. And so what we see here is the proportion of variance in the outcome, the beta diversity that can be explained by those variables. So we see that diet is quite important age, BMI smoking, slightly in sex and we had this discussion last night, sex has a very small effect, I misunderstood. So we see here that sex is only small. I'm moving on. This is a display of the results they have in a circular plot and we like colors in it. And what we're looking at on the right-hand side of that circle are the 22 human chromosomes. You'll say, wait a minute, there are 23 chromosomes and that's true. And these people made a major mistake. They did not analyze the X chromosome. It's simple, you had the sex gender person give you a talk about the sex chromosome. And on the other side of that circle what we're looking at is the mouse genome. Now they did not do a mouse genome analysis in here. They're basically cross-linking any of their hits that they had with hits previously known and established in the mouse as if this would be a reinforcement of evidence towards the hit that they're finding. Okay, so we see here the genes and it's hard to read but the number of the genes that we're associated with bit of diversity and individual taxon and whether or not they link. And they don't go through details of all of those hits but they focus on the vitamin D receptor. Oops, another way, crap. So, why are we interested in the vitamin D receptor? Well, the vitamin D receptor is a human transcription factor. It plays many roles but it's a transcription factor and it forms a hetero-damaged, it's very well studied. It's implicated in the regulation of a number of genes and it forms a hetero-dimer with the retinoid X receptor and another nuclear transcription factor and it basically lives its life by activating the expression of genes. That's what it does. And so it's particularly interesting here because it's not the first time that it's been associated. So this is like a nice confirmation and they felt confident about this finding. What they saw here and they did model that individuals who did not have a specific, the presence of a specific type of species in their gut and they ignored those individuals, they used that which I don't agree with but they try to model it. So the percentage of people with non-zero value for the perebacteroid, so in their sample, so with respect to the vitamin D receptor genotype. So we see that TT individuals here behave differently from the CC and CT people and I'm gonna go back one slide. Oh no, oh I guess I didn't show you that. And so we see that those individuals have the TT individuals at this sniff that was associated with the abundance with a better diversity. We see that they have smaller abundance of that specific species. So this is what happened when people have been doing GWASs with diversity. Once they establish association of a sniff with diversity right away they wanna know, well which species is overexpressed in that diversity. So that's what they attempted to do and this is what they report. So we're looking at statistics that come out of over, so 1,800 individuals and this is the highlight of the paper. This brings us to the second set of examples that we're going to look at. Remember we said we were going to look at GWAS association, GWAS is an association with the gut microbiome, drug modulating the microbiome and lastly we're going to look at genome-genome interaction. So now we're on the drug second set of examples. So two things, drugs can impact the microbiome but the microbiome can impact the drug activity. And we have a few examples here of how this can occur. So first of all, for those of you who are not familiar, when we take medications, I'm not talking recreation, recreational drugs, I can't have been like this ever. I'm talking medication to treat diseases. When we take medications, some of those medications will be active, so we open the box, take it in, the medication we're taking in is active. Sometimes that medication is not active, it needs to be metabolized to become active. This second kind is called pro-drug. So it's like proto-drug. It will become a drug, but it's not drug-drug yet. So pro-drug. So pro-drugs typically need to be metabolized before they become active and there are examples of drugs that can be metabolized by the gut to make them active. And then there are active drugs that we take in that can be metabolized by the gut and then we become an active. And we're going to see one example of that. And there are examples also of drugs that may become toxic after being metabolized. Drugs that can inhibit microbiome, antibiotics, quite clearly we'll do that. And some drugs that could favor the growth of specific bugs. And we're going to see an example of that. So we're going to see the example of metformin with the selective bacterial growth and digoxin where the gut microbiome is making the drug inactive. Okay, so let's start by the example with metformin. This is a recent publication. It was published in May 2017, so quite fresh. And I was personally very excited when I saw this publication. Metformin alters the gut microbiome of people with type two diabetes. And this alteration contributes to the therapeutic effects of the drug. Now this was big news because metformin is a drug that's broadly used for the treatment of diabetes. And it's an old drug, it's cheap, and its mechanism of action is poorly understood. Now these most drugs that are developed and put on the market, we understand their mechanism of action, but the older drugs that passed approval years ago, sometimes they pass without really knowing how it works. And metformin is one of those drugs. And so this is why it was so interesting. Now, this is an example of microbiome drug interaction. It's not a GWAS. We're not looking at genetic variations in the host that would interplay. I'm sure there are variations that will impact metformin and impacts the microbiome, but this is not the question that was asked in this study. It was a randomized placebo control, intervention, double blind. So this is an experimental study. It's not observational. They had a hypothesis and they tested it. So this is generating solid evidence as a result of hypothesis testing, not a phishing experiment. We're looking for things that are modulated. This is a dedicated study where they wanted to see whether metformin affected diversity in the gut. Only less than 40 individuals in the study, small study on any genetic standard, but remember it's not a genetic study. First thing they saw, they drew blood and they saw that the interventions are okay, let's look at those graphs. We're looking at a measurement. So we have BMI, glycated hemoglobin and glucose. And on the X-axis, they're basically just plotting in a histogram, placebo measurements from individuals in the placebo arm and measurements from individuals in the metformin arm. Measurements were taken at baseline, so time zero after two months of treatment and after four months of treatment in both placebo and metformin. What we see here, and I have to say those individuals, they had never ever been exposed to any diabetes drug. And they were randomized, placebo or metformin, so metformin would be the first drug that they're using for the treatment of diabetes. Plus, they were put on a strict low-calorie diet. Obviously they had diabetes issues or something had to be done and lifestyle intervention is typically recommended. So this was a normal course for this intervention. First thing we saw is that in both arms, placebo or metformin, they lost weight. Good news, that's the impact of the diet. And this is very important for us because if we're going to see how metformin affects the gut, we want to know, is it just because the person is losing weight that the gut is changing? Or is it really because metformin is doing something? So people lost weight in both arms. The glycated hemoglobin, which is a measurement of how severe their diabetes is, improved in the intervention arm and the metformin arm, but not in the placebo arm until they did that extended crossover study. After six months, they took the individual in the placebo arm and put them all on metformin to see, to control for intra-individual variability. And fasting glucose improved also in that crossover arm of the placebo group. So that's not microbiome. That's just how are the patients doing? Now what happened? So they looked at different things. They looked at strain abundance. And here we're looking at the significant strains where there has been a change between baseline and two months or baseline and four months. In the placebo, there was no change in those that had a change in the metformin arm. And those symbols represent statistically significant findings. And here what we see is that in those significant findings, there was some enrichment of some species. They did something interesting in this study. They looked at microbial growth induced by metformin. And so previously we saw the results of the primary study where the intervention was happening. But they also did fecal transplant of those individuals that were exposed to metformin and put them in diabetic mice to see whether it could rescue the diabetic mice. And it did for the metformin treated individuals. And they did a number. They really wanted to understand what was happening in the metformin treated gut. And one thing, one analysis they did, and I'm not an expert and I'm just telling you because I thought it was quite neat, they did something they called the peak-to-through ratio. They're basically following shut-down sequencing, looking at reads, DNA reads that mapped at the origin of DNA replication versus the end. Every DNA replication within strains. So that they could identify those strains that were actively replicating more. And they saw a significant finding where there was a difference in the metformin treated gut versus the placebo cut for Bishido Adaliscentis, which is a very common bug. And they cultured also the fecal specimens and they saw that in the presence of metformin, this bug was growing more, so they supported this finding. And so this is part of the results and the conclusion of this study. That's all I had to say about metformin. I'm going to give you another example of how there can be a drug in a microbiome interaction. And it's the example of digoxin. Digoxin is a drug that's used for the treatment of heart failure. It's still used, that's another old drug, cheap, broadly used, and for the treatment of heart failure and some cardiac arrhythmia. And this drug, and this has been known for a while now, 1981, can be inactivated by bacteria in the gut, and in particular, elenta in the gut. And what happens when people take digoxin, if they have elenta, it will metabolize the drug and render it inactive. And so when the doctors are dosing, trying to find what the right dose is for the patient, they need to up the dose to get physiological effects. And so they up the dose in those patients that have more bug. And this could be of a concern because digoxin has a very narrow therapeutic index. And so if you change the dose, you can be exposing the patient to adverse events or if you're under treated, well, patients with heart failure are sick patients and they need treatment. This treatment is important for them. So monitoring is essential. And this led to the creation of a project that is ongoing between Jacques Carbezlade and my group. And we have Jeremy, who's a student with you in this course, who's working on this. So the hypothesis we put forth is maybe what happens when patients are using digoxin, one situation that could happen is that whenever those individuals are treated with antibiotics, this could change the metabolism of the drug and it could expose patients to overdose. So what Jeremy has been doing is looking at what happens to the abundance of elenta in gut samples of healthy volunteers exposed to a broad spectrum antibiotics. And this is what we see. So Cepherazole administration for seven days, we have day zero, day seven and day 90 after three months. And we have some individuals did not have the bug, some did. And what we see is we see a trend for a reduction in the presence of elenta after exposure to antibiotics and that this reduction is stable even after three months. Now sample size was on the small side and now of course remember those patients were not taking digoxin. So it's just a theoretical experiment. Now Frederic and Jacques have been working by with the cultures in the lab. So there's been growth of samples from those patients in the lab. And what we see is that when we expose the samples to another anti-broad spectrum antibiotics, we also see a reduction in the presence of elenta in those samples. And this time it was significant because the sample size was slightly larger. And I want to finish with another example which I think is quite exciting for going forward with this, the kind of interaction and analysis we can do between the host genome and the microbiome. And they've been called genome to genome analysis. Now we're not talking about gut microbiome. We're talking about hepatitis C virus in circulation and patients. So genome to genome analysis, published in May 2017 highlights the effect of human innate and adaptive immune system on hepatitis C virus. Remember we said that the microbiome virus is adapt. We're not born with it, they adapt. So whenever we treat ourselves there's an adaptation. And so this adaptation and evolution occurs for hepatitis C as well. So they did this very interesting study using 600 hepatitis C infected patients who were taking part in a clinical trial but this study is not a clinical trial. This study is an observational study of samples that were sampled as part of a clinical trial. And so they had baseline plasma that they used for whole genome sequencing of the virus. So they sequenced the virus. They also had the host DNA. So they did genotyping. And as we saw before, genotype imputed so they had genetic variants for it for the 600 individuals in this study. And one thing to remember is that human key lymphocytes will attack the virus. And depending on the kind of immune system the patient has, HLA variants can have an effect on how well it attacks the virus. And what escape mechanisms are left for the virus to survive. So this was the major interest in conducting this study. So a few words on the, and I'm not an expert on the hepatitis C virus. There are seven major types. And hepatitis C three is the type that was of particular interest as the most prevalent virus. And also it's the type that has the most resistance to treatment. So they did this genome to genome analysis. It's a busy slide, but it's a beautiful slide. So again, we see the circle plot and we have the human chromosome and again they only have 22 chromosomes. And basically what they're doing is that with each one of those SNPs, they're asking the question, is my SNP one going over millions of SNPs? SNP number one, is it associated with specific positions in the virus genome? But they didn't use the genome. They did a very elegant thing here. So they had sequenced the genome of the virus, remember? And of course viruses are dynamic. They don't have one strain copy of the virus. There's multiple different copies and different copies of the virus. And so what they did is they asked the question, does that nucleotide change after an amino acid? And what amino acid is encoded? So the association was performed at the amino acid level. So that was one way of reducing the variation in the viral genome. Now, amino acid's not like four bases of nucleotides. It's like many amino acids. So they had declinations of observed encoding of amino acid at each position and they normalized this to have a quantitative endpoint. So what we're looking at here on the top circle are the genes or the proteins encoded by the virus. And the viral genome is pretty small. It has over 3,000 amino acid and they saw variation only in 1,200 amino acids in the virus and they tested for association each position with each amino acid. So how many statistical tests are we talking about here? We said when we do a genome wide association analysis, we adjust for a million tests. So we have at least a million SNPs in the genome and we have 1,000 amino acids, so a million by 1,000. So the significant threshold here was established at 10 to the negative 11. So we have 10 to, five times 10 to the negative eight and we add a few zeros to that, brings us to 11. Okay, outer-minitonal viral diversity. So they tested for viral diversity. They had, oh, this is what we're looking at. And so this plot is the density of SNPs. Obviously the distribution of SNPs in the genome is not uniform, those tests that we do, the SNPs can lump in specific region and they wanted to express that. Typically in the genome, we have less diversity near the centromeres. So we have fewer and from a fewer variants located near the centromere and this is what this is depicted. So what they did is the analysis is they did logistic regression, also it was model zero one, sorry, logistic regression, additive model for the SNP and they adjusted for sex. There are sex-based difference in the human genome, so it's quite relevant, even though you may think the microbiome is not influenced by sex, here we're talking about viruses and it's not even the cat, the way that the host genome will act in the body can very well be modulated by sex. And we're safer to add sex in our model than to confound our results by sex, risk confounding our results by sex. And they adjusted for geographical origin of the individuals by using principal component analysis based on genetics of the people. And they also adjusted for the diversity of the virus by using principal components from the virus. Ascovarius in the model. Finding, so it's tables and tables and this is a truncated table of the findings. So very few signals past the pre-established significant threshold, but in the tables they're showing the sub-significant also because people like to look at data, there's a fascination with data. So what are they reporting? They're reporting the P value and they did for the association whole population and they did a subgroup analysis in Caucasian white population just because genetic associations are so sensitive to confounding by geographical origin that you wanna do a subgroup analysis just to be safe that what you're seeing is not a fake result. Okay, so what's the take home message from this study? Here we're looking at again a Manhattan plot, all of the chromosomes in the human sort of and looking at viral load and they found an interesting association with one gene, an SNL4 involved in the immune response and they show how different genotypes will affect the different viral load of their patients. And I'm ending on this, I think that this study is quite promising looking forward for the possibilities that we have to do genome to genome analysis with large number and you will have understood by now that we're going to be needing large numbers, large cohorts. So Jacques, if you want to do shotgun sequencing on thousands of sample, I'll analyze the data. And what have we seen? We saw host genetics can impact the gut microbiome. We had two examples from that nature genetics edition. We've seen that drug microbiome, there are drug microbiome interactions. We saw two examples metformin and digoxin and we saw an example of genome to genome analysis. Very few of those, only two in the literature. So the hepatitis C virus as we saw, there was also an analysis with the HIV virus which I didn't talk about here. Looking forward, sequencing is cheap. It's getting cheaper and cheaper. So there is a real possibility for large data availability. Longitudinal analysis and repeated measures of gut samples and microbial samples is very important. It helps reduce the variability. It helps us explain the observed variability that are due to the within individual effect and account for that in the statistical model. I've heard talks, I've heard things and there's interest from the pharmaceutical industry to study the microbiome, to see how it affects drugs that are in development, but also the potential for developing drugs that target the microbiome, which is now a real factor in health and disease. And for you guys in this workshop, my two cents of wisdom, is you should look forward to the establishment of guidelines, to standardize methods, to benchmark novel methods when you develop a new tool. Don't stop there. Benchmark it against as many tools available, making the data available, and promoting benchmarking will help establish useful guidelines for methodologies. That's it, questions. Question. Thanks for the talk, great talk. So we mentioned earlier, it was a genome-wide association study. There was a high risk of false positive, just based on the number of associations made. So we define a lot of findings, or the next thing you would expect because of that. Okay, so in the field of human genetics, we had to tackle the problem of false positive association due to multiple testing in early 2000s, late 90s, when SNPs appeared, when genotyping panels appeared, people started testing small cohorts of 200s with many, many variants. And we realized that a lot of the findings weren't being replicated. And it was at the time when we did not have guidelines, strict guidelines to help us into developing the methodologies for approach. And it's with years of hindsight that I can tell you that you will benefit from those methodologies to make sure that there's some sort of community-wide standardized methods that can be used and can be looked for. So in the human genetics world, there was a time, an epoch, where false positive signals were rampant, around 2008, 2007, the major journals said, that's it, from now on, we're not publishing a single GWAS paper unless it has a replication embedded within the study design. And so all of a sudden, the number of publications dropped radically, and it promoted collaborations between teams and the generation of super-large cohorts. So this is how our host genomics community has adapted to the pressure of making sure that we're not publishing false things. I'm not sure that answers your question now. Well, I was just curious, from your opinion, did they find a lot of things or little findings? So there's a lot of GWAS findings. There's an online, if you're curious, there's something called the GWAS catalog. And it lists thousands of associations that have been established. And I think, and actually going forward, as I was looking at the microbiome publications coming out, a lot of people are now interested in focusing on SNPs in the host that are known to be correlated with disease and asking precisely, that specific SNP, how does it affect the microbiome and how could that relate to health and disease? So I think that that's also one venue that we may be seeing in the near future. I had a question here. Am I, no? In regards to the hepatitis C system, have you looked for sort of an increase in sniffer ability in the regions that the MHC would find too in order to escape the... Remember, this is not my study. I'm just teaching you guys what's out there. Have they looked at how specific variability in the viral genome that may be adapting? If I was a bacteria and I entered a human that had a specific MHC finding complex, I would try to mutate my regions and apply it to that more so than the other ones. Yeah, and that finding that they highlighted, the IFNL4 finding was precisely about that regarding the host variant and how that has an impact on how well the virus adapts. Sorry, I went quickly. I wanted to be on time. But that was precisely what they were looking for in the paper. Question. I have a question. It's not related to the paper's technology question, and I was interested in your take on viral volume when you started to redress the words of, you know, wisdom. You did highlight the fact that it's this process. So from an analysis standpoint, how do you see the future of, like, how we analyze data that changes when you're going to have examples at time one, time two, time three, time four. So once it's forward, people are going to collaborate. We're going to get more data. So what type of methodology in terms of statistical analysis or just how do you deal with this type of data? Are there going to be able to do this anymore? Oh, we can. We can use hierarchical modeling. We can do a lot of nasty things with it. My take on it is that we're going to need to understand better how the dynamics work. So we, as humans, we like to simplify things so that we can model them. But we first, I think, need to understand what is the dynamics of the different strains, different situations, the daily cycles, how that is taken into account, but also the longitudinal. Some factors will replicate faster than others. Some will adapt faster than others. And going longitudinally, as we saw with the digoxin, some with elenta in the presence of antibiotics, some may not come back that easily. Some will come back easily, but some won't. And what are the factors involved in that? So the factors involved in the dynamic and how different strains, the types of that cycling. Is it daily? Is it weekly? Is it monthly? And how fast it bounces back from an intervention? I think that that will help us model the dynamics of the data. And going forward, in terms of longitudinal analysis, it's the more data, the better. That's always the bottom line. And it's the sampling, if it can be. And that's methodologies, it's not analysis. As systematic as possible, you know, and as consistent between sites as possible and on as many people as possible.