 I think come from a quite different background than I do and so it's very exciting for me to to learn from what you can get out of more experimental settings but with a lot of theory and strong modeling that's that's very exciting for me and and so I apologize if I'm using a slightly different terminology to to describe some of the things as I come from biology and and medicine and and and a lot of the focus here will be also on technology. So I'll talk about the human gut microbiome and how it can impact the the serum metabolize that we are measuring in humans and how that all together connect to insulin sensitivity in humans. I would also like to say that that I work at a consultancy company called clinical microbiomics that is a contract research but it we work pretty much as an end as we would do in academia but we of course do it for clients which the clients are both academic and industrial. Okay so in analyzing and understanding the microbiome on humans the methods the preferred method has recently been to do metadynamics and and doing that with a deep sequencing and and and we have learned that you can get a lot of precision out of using shotgun metadynamics over 16s metadynamics. I think that is now fairly established in the literature and and I think everybody that works with this are aware of that and some of the limitations you have from 16s amplicon sequencing so that if I just back up a little bit 16s amplicon sequencing you sequence one gene that has a good resolution in tax taxonomical resolution and you sequence that and then you compare all of those sequences you get to databases. There are a number of problems with that one of the problems is that you have a PCR step and PCR is an exponential amplification of your signal and that's known to have a lot of biases in it so the different taxes will amplify to a different amount. Also 16s you have a lot of problem in resolving the lower level taxonomy so you in most cases can get to a genius level taxonomy but in many cases you can only get to family level taxonomy and only rarely can you get down to species and certainly not strain level and most of those things you can overcome by random shotgun metadynamics sequencing where you sequence the whole thing and also in addition to to to having a good taxonomical profiling you will also get knowledge about the functions the proteins that are in the system with the shotgun metadynamics but there is a big problem in this and this is a general problem for our metadynamics is that if this is all of your sequence reads from a sample you will be able to account for about 20% of those sequence reads by mapping them to reference genomes so you have the majority of the sample you don't know what it is and obviously that's dissatisfying more over if you use reference genomes for measuring the abundances of of your species you'll sometimes be very confused you'll see that some region of the chromosome appears to be in a large and a high abundance whereas other regions of that chromosome seems to be a lower bun so you sort of confused is this species here or is it not here and this these differences can be dramatic and of course the reason for that is that there are in addition to those known species there are a number of unknown a lot of unknown species in the system and they may look even identical to your reference genome at certain regions and of course you get a confusion of signal there and and all together we call this the metagenomic problem back in 2010 when we first got a very large data set of metagenomics from human stools we we were we started out to try to understand this and and one of the ideas we came up with was that of course the genes that are from the same chromosome they should be approximately at the same abundance in our sample and if we had a lot of samples they should follow each other in abundance across those samples and that that we call that the co-abundant principle so and and that principle you could say that's a very simple and even naive principle but it turns out to work extraordinary well so I'll give you show you an example of that so here is the abundance of one gene across 1200 human stools samples okay so in some sample this gene is highly abundant and other samples it's absent or very low abundance and this gene doesn't travel alone it travels together with all the other genes from that chromosome and here now I'm showing you actually 4,000 genes rather than just one okay one gene 4,000 genes the genes are shown here with extremely thin lines okay so you see here's an extremely coherent signal and and at such a signal we call metagenomic species okay now these are these are relative abundances but we yeah they are relative abundance so so what you get out of this sequencing machine is you get a number of reads say 10 or 20 million reads and then you say how many of those reads can I map to an entity and that's of course a relative because also you know how much thesis should I look at so yeah so it is relative yes sorry and gene what copy number yeah yeah right so so occasionally you have genes that are duplicated in the chromosome right so in order and and that should sort of distort this right so in order to capture these signals we use Pearson correlation which doesn't care about the absolute abundance but it's cares about the profile so it's in order to do this and make this work really well you need many samples in order of and this depends on the complexity of the system of course but in the order of hundred samples then you can you can capture a lot of those pieces and so we built a pipeline for doing this and I don't want to go into too many details about how that that works but body you take a number of samples a large number of samples you you assemble genes in that and make a number done in gene catalog and then you you take the reads from the original samples and measure the abundance of all of those genes in a gene catalog and then you have we have an algorithm we call a canopy clustering that's able to sort of segregate these signals out and then we got we get what we call metagenomic species and they account they they they are we believe that most of them are a species but we also see smaller entities that are phages actually it's not true that the majority are species the majority are certainly phages and plasmids and things like that but you know those those metagenomic species are mostly what is interesting and we identify them by basically by size and and here you can see that from using reference genomes and to using metagenomic species you can suddenly account for in order of 73 percent of your content okay that's pretty good actually because if you if you look at bacteria chromosome about 85 percent of it is coding and we're only counting for coding region so but we're still missing some percentages and that's mostly very rare things if you look at the size distribution of these metagenomic species in terms of number of genes you will get a distribution that's bimodal and here it's it's plotted in a lock lock scale so you get this bimodal distribution here where you have what we call metagenomic species then you have in addition to that we have the smaller metagenomic units or CAGs but we have we can identify in humans we can identify 1500 metagenomic species so large entities that we believe are bacterial genomes in in most cases now this segregated extremely well as I said and and you have probably seen some of those plot that tries to segregate genomes using this method based on two samples or or a very few number of samples or or alternatively using base composition etc and now you get these islands of genes that are that can be segregated but if you use many dimensions like in this case we use 2300 samples from humans so you get a hyperdimensional space you can separate things extremely well and so this plot sort of illustrate that so here here are our 10 million genes and and they are plotted in a space where you have the correlation to the profile of one of the metagenomic species here and a correlation to one another metagenomic species here and these are the genes that belong to that one of those metagenomic species and the genes that belong to the other one so you see a very strong segregation and it's not you know it's not so critical where you put the cutoff for that you get a very nice segregation you benchmark those on on taxonomy so you take one of those metagenomic species see what it looks like they'll there will be some of them that looks like this so this the metagenomic species number three has a lot of genes that are similar to a blastocystis hominensis that's an alkyd and and so it has a very high similarity to 15,000 genes of that species and that correspond to 99% of all the genes that are in this metagenomic species and this is with a high threshold for similarity and you see that that's the number of those that are almost identical to reference genomes however this is the minority of the metagenomic species in humans now with all the accumulated data and that has been generated and I know there are people papers that say oh we have thousand reference genome yes you have thousand reference genomes but most of them are for the same species so the actual number of species that we can recognize in the human god is 300 all with all the effort combined and this is of course only a fraction of what was there and so if you look at this picture and this now I'm showing you the same measurement as I just showed you the percentage of the genes from a metagenomic species that is similar to a to a reference genome and you see these are the ones that just showed you here those are the ones that are almost identical to a metagenomic species sorry to a reference genome and here is all the those that are well completely unknown or or similar to something so that may be similar to to the reference genome at a genus level or something and and so you can you can work with this in different level of sensitivity and you get the coherency at different taxonomical levels if you look at a human stool samples you you'll see that this action to describe quite a lot of the data and so this is this is 300 individuals where we are account so this is the richness of those 300 samples in terms of how many metagenomic species we can identify in those samples and we I've colored them those that are similar to a reference genome in color here based on the phylum and then the black part is the unknown species right and and what you can see here is that the richness at the species level here actually correspond quite nicely to the richness at gene level gene level is of course in a different scale here that the blue line is the gene level and also if you saw as I did here sort the samples based on on the number of species you can also see that individuals with the Crohn's disease are secreting very clearly with a very low richness this this can also be done this segregation of Crohn's using other methods but it's much clearer if you have this complete view of the system this this gives you as I said more much more comprehensive overview of the system but it also gives you a much more precise overview of the system and in in a number of cases we have looked at the intervention studies where people were eating a probiotic and using reference genome for exactly that probiotics it was very hard to tell whether that probiotic was actually in the system because it has all of this cross signal to other related species in the system but if you use it if you if you designed a metagenomic species for that reference genome you get a very clear signal those that didn't take it didn't have it and you can even see the different abundances that people had of this and that goes down to to about five parts per million this accuracy it also can be used to to go even even further down in taxonomical resolution so we're aiming for what we call ultra high-resolution microbiomics and I realized yesterday that that that's not a proper term actually because I was thinking that ultra high would be that we could account for the SNVs of the samples but now I realize that people here actually wants single cell genomics and we're not there yet so but anyway if you if you take one of these metagenomic species and this is the one that's correspond to our convention municifila and map all the SNVs across that chromosome you will we can find that there is about 21,000 SNVs across our cohort and this is the different samples here so 700 samples here and in these samples this species is very low abundance but up here it's pretty high abundance and you can see that it there are what I will call a subspecies population structures here and and you might may want to call this a strain or or a clade and something up here we have something that's that's more complex but clearly different from this one of the the interesting things here is that up in this block you have a lot of of the individuals that have IBD so the chronic disease alters colitis where there there are conventional strain apparently seem to be more sort of but they have different strains and we can use all of this information to build for the genetic trees of those of this species in directly from the gut data and actually we found that rather than trying to call these for strains it's much more informative to use a tree for doing our association analysis and this this picture is trying to show an attempt for an association analysis between the this this structure and human phenotype so what you can do here is you can label the the notes of this tree according to human phenotype for example and then you can you can make an association between the distances in this tree and this phenotype and potentially you can even zoom in to which SNBs are associated with that phenotype I'm gonna switch now on to focus on diabetes so diabetes and when I say diabetes I mean type 2 diabetes okay it's not type 1 diabetes it's as as you probably know it's it's a global phenomenon and it's it's it's a very substantial problem and and actually it's not only in in the rich countries but it's very severe in the Arabic countries but it's basically everywhere and and it's it's one of the biggest sort of cost problem for for our health care systems worldwide and it's increasing so so type 2 diabetes is if you break it down it's basically two things you have a B cell dysfunction so lower amount of insulin produced and then you have also insulin resistance so lower level of sensitivity to insulin and the number of factors have been associated with insulin resistance age, obesity, your diet and how much exercise and of course genetics but also recently recently there has been a number of papers indicating that the gut microbiome could be a factor in this the first the first paper that actually made a real association study between the gut microbiome and and any human disease was actually this paper that that showed that there was an association between type 2 diabetes and in the microbiome and there's been a number of study since that that has confirmed that among them the Swedish studies that also showed that this was indeed the case also in elder women but then in 2015 we observed that that the treatment that the diabetes patients was taking namely or in most cases the first line drug against diabetes is metformin and that that drug actually seemed to influence our signal so much that that was basically what we would have been seeing in those papers so that they actually effect of type 2 diabetes was much harder to to observe and actually very recently there's been a paper that actually claims that part of the mechanism of metformin is to change the microbiome in such a way that it's more healthy for and elevating some of the gut microbiome problems for in type 2 diabetes anyway all of these problems with the people that was using metformin made us go for people that were pre-diabetics and and instead of looking for it actually people that was classified with type 2 diabetes we were looking at at pre-diabetics with and using their insulin sensitivity measures to to associate with the microbiome and to further get into a mechanistic understanding of this we also looked at the metabolome of the serum of these individuals and and and in doing this relatively complicated association or work we we we came up with this outline of our association so we get around 1100 metabolites from 300 patients or across those when then we can we're doing some effort to redundancy reduce this this data for in the metabolites we do co-abundance clustering and this is this is a classical way of dealing with metabolites it's known that metabolites that come from the same pathways they sort of have similar abundances and very similar metabolites also tend to have similar abundances across a cohort of individuals so so in order to reduce the complexity we did co-abundance clustering of that and reduce that to 74 metabolites and similarly as I just explained we did with the microbiome we use this co-abundance clustering to identify a little more than 700 species here and also a number of functional groups here and then next we use the clinical data to filter out those metabolites that were more interesting and also those species that were more interesting to look for and then it reduced that further to 19 metabolite clusters and 81 species that were associated with the phenotype and and after that we were trying to see if we can make association between those domains of data at the so connecting species to metabolites and that could explain the human phenotype now in doing that you're you're getting into a huge complexity problem here with the many different species that are able to actually produce the same metabolites and the host is also able to do that and you're eating food and and probably a lot of this is going on in the small intestine and we're measuring in the stool so you know we're pretty far from from from what we an ideal situation but so in order to to deal with this we came up with with a new concept we call the functional species concept and this is chopped off a little bit here but anyway so the idea is is that across different ecosystem you may have different species that can feel the same ecological niche or the same function as for example in in macroscopic ecosystem you could have different top predators in across different ecosystems or on a savanna you can have a lion or in in in the jungle maybe a tiger and the ocean in a shark and if you were trying to do an association to the top predator effect across those systems you completely miss it right because with three different species but if you could somehow group those species that shared a property in some way you could you could use that grouping to make your association and this is sort of and this is then what we we did here so it looks like this in the microbiome so for example here you have a phenotype that a human phenotype that could be the BMI and you have a number of species that that you can associate it to that but maybe they are each of them has a weak association because different people may have different species that fill that function and if you didn't know that they were shared a function you'll you'll only have these level of association but if you're able to group them you will be able to find we believe a stronger association and in saying that I should say if you and some of the early papers in microbiome association were we're just trying to combine random sets of gene a species in this and remember we have in the order of 700 common species in the gut so if you combine just three of those I'll be 700 times 699 times 698 right you have approximately 300 million combinations and that's not good for your multitesting but by using a system where you're looking for shared properties or just species have an autolook then you reduce that complex to by about 500 times 500 thousand times down to in your order of a couple of thousand groups of species and these groups also can vary in size so there's some have 50 species in and some only have very few and so it looks pretty much like this you the symbols form of this is that you use the eggnog annotation so that's autolook annotation of the genes and you can just group those pieces and this this is maybe a little bit similar to guilt building guilt but but here any one species can actually be a member of many groups right and and many of those groups are not meaningful or not relevant for this case but you can do association with them and and here's an example of such an association where you see that this is an association to BMI and and where we we we here's sort of the expected association between BMI and and all species and here's the association of the group of species that can deconjugate bile acid there's a first step in degradation of bile acid and and and so that shows a very nice association between the ability or between species that can degrade bile acid and being lean and and and in addition to being a nice way of of doing associations these association also comes with the legend right they also come through the description that these species are the ones that can degrade bile acid here right and and and if you ask a physiologist bile acid would be one of the things that you see that are key for regulation of of satiety and a lot of the the metabolism in humans in addition to this there is there is a big problem in in annotating functions in for genes you would think maybe that the similarity of of a gene should indicate its function you can be you can transfer the function from one known gene to another gene by its similarity and it works pretty well in many cases but there is a large false discovery rate on this and and because we have these metagenomics species and we have some complex pathways systems like for example the Keck we're able to look for pathways completion in species and this is this is an example of that this is the LPS pathway and you see that there is a huge number of species that have annotated just one gene from that pathway this is likely not you know maybe it's a false annotation or that function is also used in another pathway that's likely but you see that there is a bimodal distribution of of these the number of genes that you find in this pathway and and so we believe that that it's a much more accurate description of which species are actually able to to do this function if we require a certain number of of the genes in a pathway to be present in that and you can see this is a nice little bimodal distribution and so we did this across all functions and association to the metabytes of humans and what we found was that there were a number of sort pathways that could be or a mid sorry functional species group that was defined based on pathways that could be associated with metabolites and in very many cases the the the product of those pathways actually were some of the elements that were in these metabolite clusters for example the branch genome amino acids metabolites were nicely associated with the by the capability to biosynthesis science those and it was inversely correlated to the bacteria's ability to optic branch genome acids branch genome amino acids already a known indicator for a pre-diabetic okay pre-diabetic individuals that have elevated levels of branch genome amino acids in the serum has a higher chance of becoming type 2 diabetics it's it's funny note here that if you go to a gym one of the things you they will sell you it's exactly branch genome amino acids to build muscles right and and so I think actually it's it's very likely that branch genome amino acids is a good thing to have if you exercise a lot but if you don't it's probably not such a good idea so the model we we built based on this was was like like this that you have some got microbes that are able to to produce these branch genome amino acid and they are contributed to a metabolite pool that will in term influence the insulin resistance of the host on the other side you also have some bacteria that can uptake branch enemies unit acids and consume them and they of course count in a different direction of this in order to zoom in on which species were sort of the the worst species for doing that we we made a leave one out analysis of this and that basically means that we repeated the association and else but just taking one species out and so how much worse were the association and so you for example you you say here if you have you have a correlation between human phenotype and and the presence of the species in such a functional species group you if you remove some species you get a much reduced association and if you do if you do that for the branch genome amino acids we found that there were a lot of species that had a very minor effect on this association but there were a couple of species that had a quite dramatic effect and one of those were pervetella corporeal and another one was specteritis vulgatus and then there were another couple of pervetella that were important for this association and also however for the branch genome amino acid transport we didn't find some any species that were really strong for this it was seemed that there are a lot of species that can uptake this and that can be important for the system and and so to to sort of conclude on on that and all of this is I agree this is all associations and and things that fits nicely together but it's just you know associations so so to go further with this we we isolated the private teller and and and gave that to to 12 mice on a high fat diet and we could see after I believe it was so we could see already after three weeks we could see elevated level of the branch genome amino acid in the same of these mice and we could also see that their insulin sensitivity were significantly higher and actually a number of those species are sorry a number of those mice actually had what you would classify as type 2 diabetes after four or five weeks so it this should suggest that that this study is not only association but you can actually inoculate these bacteria into at least mice and see that the phenotype occurs so this was this was my main story and and and I have very little time for additional stuff here so if there's questions I could take them