 Well, thanks so much for the invitation to speak today, to the NIH for organizing the meeting and especially to Lita and Owen for putting together such a really interesting agenda for the next few days. So I'll talk a bit today about high-throughput functional analysis methods and techniques for the human microbiome, focusing to a large degree on sequencing-based methods such as metagenomics and metatranscriptomics, and then touching a small amount on other high-throughput methods such as Janet was covering earlier. Hopefully, there we are. So I think the question that functional analysis methods really start to address for the microbiome is what the underlying biomolecular networks are that are dictating or causing the emergent phenotypes that we've been seeing today and for the next few days that manifest in the microbiome as a microbial community and in the host as human health and disease. And a nice parallel that I often draw between the microbiome with the microbiome, excuse me, is with the last several decades worth of cancer research where we've really built up a nice understanding of the detailed interactions between genes and gene products that drive emergent phenotypes in cancer. This is important in the microbiome for one because it links our translational applications back to basic biology. We have a long history in microbiology of really in-depth understanding of microbial metabolic networks and regulatory networks, which we have not yet reached for microbial communities and host interactions. But also this is important because it identifies actionable targets for, say, molecular therapy in the microbiome, which is hopefully something that we'll be reaching soon. If you think again about cancer, we wouldn't have personalized precision molecular targets such as, say, our hit by Gleevec or Heceptin without this deep understanding of individualized underlying regulatory and molecular networks. So if we want to reach this level of understanding for the microbiome, we need to build up our set of tools and our suite of analysis methods and, of course, our sets of experimental designs and results in order to take advantage not just of structural understanding of the composition of the microbiome but of its functional potential and functional activity in health and disease. So a quick overview of sort of the state of the field right now in terms of analysis methods, specifically focusing on shotgun sequencing approaches such as metagenomics and metatranscriptomics. Of course, this type of data can provide information about taxonomy and phylogeny, the structure and composition of a community, just like 16 as amplicon sequencing would. And it's worth calling out that due to that nice slide that we saw this morning, the rapid decrease of sequencing costs have brought the cost of a typical, say, metagenomic assay down to only about three to eight times the cost of an amplicon survey. So we're getting to the point where there's not a large price differential or effort differential between the two types of assessments of the microbiome. And, of course, you get not only structural information about a microbial community from such an approach, but also information about things like functional potential or functional activity if you're tackling sequencing in RNA to look at a community by metatranscriptomics as well. And both of these are areas where there's now robust suites of tools for assessing who's there from shotgun sequencing data, for assessing which genes or pathways are active from shotgun metatranscriptomic data. And a third area that's fairly distinct from either of these that started to be investigated using these sorts of high-throughput sequencing approaches is comparative genomics within microbial communities within an individual, between body sites within an individual, or within the communities across many individuals in a human population. It's worth mentioning that all of these approaches can be tackled with or without metagenomic assembly, putting back together large contigs or whole genomes from shotgun sequencing data. So, and again, this is a very rapidly changing field. Methodologically, you'll see all the dates on these papers are within the past year now. And it's already had a number of really nice successes of pulling genomes out of metagenomes. It integrates very well with single-cell sequencing, which provides another complementary high-throughput tool for a lot of these approaches. And again, can be performed upstream or not of some of these other downstream analysis tasks. So, to provide some background on how some of these analysis methods work, I'll talk very briefly about one of the tools that we worked on for specifically taxonomic profiling from shotgun metagenomic data during the Human Microbiome Project. And I'll use a few examples from the HMP during the talk today. Metaflan is a system that specifically leverages the thousands of microbial isolate genomes that are also now becoming available in order to optimize taxonomic profiling of metagenomes. So, just like 16S say uses a single marker gene that acts as a microbial name tag, amplifies that and uses it to count up which microbes are there. Metaflan catalogs a very large number of unique marker genes, typically several hundred per microbial clade, where a clade can be anything from a species or a subspecies on up to a genus or family or whole phylum. By pre-computing a catalog of many unique marker sequences, we can get much more precise and robust identification of which organisms are present than you typically do from 16S sequencing. It's tough but not impossible to get say species level identification from 16S sequences. It's very easy to get species or subspecies or we'll see later strain level identification from shotgun sequencing data. So Metaflan uses this pre-computed information to very rapidly within say minutes per sample per metagenome identify the relative abundances of which organisms are there. And to give you some idea of how this really takes advantage of the thousands of microbial isolate genomes that are available, a year ago we had just over 3,000 finished and draft bacterial and archaeal genomes in order to put this system together. These represented together about 2 million total marker genes, each of these is a microbial orph family, and we kept about the 400,000 most informative ones. Again, this represents 2 to 300 markers per species. So several hundred times the robustness of a single marker gene. We're working on updates now that incorporate almost 9,000 genomes. So this is a tripling in the number of available microbial genomes in one year. It gives you some idea of the growth of microbial genome sequence in addition to metagenomes. And of course something I can't emphasize enough is the fact that shotgun sequencing approaches such as metagenomics or metatranscriptomics really do give access to all four groups of microbes that are typically of interest in the microbiome. We've heard a great deal about viral sequencing already this morning and I'll give a couple eukaryotic examples later today. This brings our available gene pool up to about 5.5 million total families, boiled down from a total of 12 million microbial genes in reference genomes. And we heard this morning again we have a pool of 8 million that we're working from in the human microbiome as well. So there's a tremendous amount of reference and metagenomic data to combine in these sorts of new analysis approaches. What this allows us to do is very quickly and easily assess which species or subspecies are present in hundreds or thousands of metagenomes, assess their relative abundances, meta-analyze together very large data sets. This is an example combining about 120 individuals from the HMP with another hundred individual gut metagenomes from Metahed. We can see if you have very good eyes. The Privatelacopry example that Dan mentioned is right on the bottom row there. And of course we can perform the same kind of taxonomic profiling for known viruses using viral genomes to identify markers in either DNA or RNA sequencing data. So I won't talk in a great deal of depth about the HMP specifically today but I do want to emphasize that one of the nice aspects of the project was the way in which it integrated large-scale shotgun sequencing metagenomics with very large-scale Amplicon profiling and 16S data. So that by putting those two things together in this population and surveying individuals over time we now have a nice base of about 1,500 individual microbial genomes that were detected at significant levels across the HMP population. Their phylogenetic relationships, their prevalence in this healthy population across seven different body sites in this particular example, their relative abundances and their pathogenic potential in this particular case, all of which can be visualized and used to summarize the 3.5 terabases of sequencing that were introduced so well this morning. So those are some examples of identifying microbes specifically using shotgun sequencing approaches but I said I'd talk about function and tying together which microbes are present with the kinds of genes and pathways that they're carrying and carrying out in the microbiome. So we've seen this example a couple times today in which when we view the variation at the genus level or in this case at the whole phylum level from person to person in the microbiome it's very high. Even in health and the absence of obvious disease perturbation we range from very high loads of specific phyla on the gut to very low loads. Despite this there seems to be a maintenance of a distribution of microbial pathways that is habitat specific. There's a set, and again summarized at a high level, there's a set of pathways that are maintained stably by different microbes and different individuals that are habitat adapted. And as David mentioned this morning I'll talk about that in a little bit. I again want to give a half slide summary of the kinds of methods by which that we currently have to detect genes and pathways specifically in a metagenome or a metatranscriptome for the human microbiome. So these of course also rely on taking advantage of the very large catalog of available microbial genomes and in this case open reading frames by starting with that as a reference database and comparing it to a characterized amino acid sequence database of interest. So if you're interested in fishing out say carbohydrate active enzymes from the human microbiome or antibiotic resistance proteins you can start by identifying gene family or clustering excuse me gene families of interest within antibiotic resistance proteins or carbohydrate active enzymes or your favorite amino acid sequences of interest comparing these to a reference database to identify any non-unique sequences. So screen out any areas of a gene or protein family that do not uniquely identify that particular functionality. So the system that I'm describing shortbread catalogs a small number of optimized sequence markers for functional gene product families just like we saw a few slides ago where Metaflan's cataloging sequence markers for taxonomic clades. Here we're looking for sequence markers that can be used to quickly and accurately identify functional gene families and get this sort of overview of habitat specific functionality in the human microbiome that David mentioned this morning. So as he pointed out there's a great deal of variability excuse me well there's a great deal of stability excuse me within an individual within a habitat within the human microbiome there's a great deal of variability between habitats in which microbial processes are enriched or depleted in order to adapt to the community to that particular habitat which processes are selected for various of course from habitat to habitat. So this is one high level way of looking at that in which each dot around the outside of the circle represents a single metabolic module in this case from the keg catalog and I'll talk about that in a little bit. It's colored if it is significantly enriched in at least one body site and it turns out that about two-thirds of the pathways that we had to analyze in the HMP were enriched or depleted among the seven habitats where we assessed them there. However it was very rare to entirely lose a process from the community metagenome so most processes were core less than 10% were actually lost from the metagenome in any particular body site or individual and you can contrast this to no microbes meeting that level of cornice or stability between individuals or certainly between body habitats in the human microbiome. So this suggests a model in which our metagenomes are shared despite variability in which microbes are carrying those particular genes and processes. They're differentially regulated in a way by enriching or depleting metagenomic abundance within particular habitats and although I won't show a lot of transcriptomic data today they are again personalized with individuals just like our own genomes by differential regulation, differential transcription from person to person at a much more rapid time scale than the abundance changes metagenomically. And again, I mentioned this already, most processes are differentially metagenomically abundant even when they are not lost from the metagenome. So to show a couple of examples of what this looks like in terms of specific data again from the HMP, this is an example of sulfate transport. Each of these dots are an individual. The y-axis here is a metagenomic, again not transcriptomic in this case, but metagenomic abundance. And you can see this is a process that is carried by the communities at almost every body site. It's enriched throughout the GI tract but rarely have ever lost entirely from the metagenome. A counter example to this, one of the rare cases in which a pathway can't be detected even in particular metagenome at our particular limit of detection comes from this interesting example, the complex one, NADHD hydrogenase. This is a uniquely eukaryotic pathway, it's a mitochondrial complex actually. This is detected almost uniquely in the body sites where there is a significant fungal presence on the skin and the nose and in the four individuals here among the HMP's healthy cohort in which we also detected taxonomically a presence of Candida albicans in the vaginal microbiome. So again, I showed a couple of viral examples earlier. We can pick up eukaryotic function here. It's important to emphasize that these types of functional approaches, shotgun sequencing, do pick up all of those different domains of life. So there are limitations though of course. A big one being that any catalog that we currently have access to, keg being one example, metasite being another, has been developed over years or decades worth of individual single microbe metabolism typically, not taking into account either the specific microbes that occur in the human microbiome, the specific processes that occur in habitats of the human microbiome or especially the interactions that only occur among microbes in communities. So there are big gaps in our knowledge of which genes and pathways are active under those circumstances. One initial, some initial work to remedy this that I wanted to highlight today comes from urine races lab, recently of K.U. Leuven, where they've been manually curating a set of pathways that are unique to the gut microbial community and starting to curate genes and the relationships between them that carry out microbial metabolism in different compartments of the gut. So this is one initial step along really curating and organizing our knowledge of which genes work with which other genes, specifically in the human microbiome, but this is an area where there's going to need to be a lot more computational and high throughput work to assemble that knowledge and apply it to the thousands of samples that we now have available, which still really mostly addresses individual organisms' functionality. And as David mentioned this morning, it's very hard to really even define what we mean by whole community functionality or ecological functionality. One way in which we and others have been starting to approach this is by looking for ecological interactions among organisms in the microbiome. We did some work on this again with your room during the Human Microbiome Project in order to identify microbes that co-vary together or co-occur together within body sites and across body sites within individuals. And this proves to be a surprising methodological challenge where I could give a whole separate talk about how hard it is to actually detect this from the kind of measurements we can make today. But you can start to see patterns such as this example where two different groups of streptococcus and violinella species here or streptococcus and aggregate vector species here co-occur across the HMP population in three different oral body sites in this case. Each individual tends to either have this particular pair of microbes or this particular pair of microbes and they do not occur together. In this example on the right, the streptococcus violinella pair has been observed earlier by microscopy to co-occur due to a specific receptor interaction on the cell surface in this case. So this is one example where because of independent functional data we're able to confirm a co-occurrence that we found using high throughput approaches. But for example this example on the left we need additional more specific functional data to identify why it is that these species are co-occurring. So on top of that what the functionality they're contributing to in the whole community sense might be. So to again highlight another recent project that started to address this this time from Ellen and Borenstein's lab in a paper that came out very timely as of last week in order to identify metabolic rather than physical interactions between microbes in a community. They started from a very similar species co-occurrence set and looked for in addition to information from microbial reference genomes again in this case whole cell metabolic reconstructions for individual microbial species. For potentially co-varying pairs of microbes they were then able to identify metabolic products that were or needs that would either conflict between pairs of microbes or complement each other suggesting either ecological competition for resources or cooperation for one microbe to provide resources to another starting to identify mechanisms by which functionality might be contributed to the whole community by in this case interacting pairs of organisms. So we heard a lot in Janet's talk about multiomics but I felt I'd be remiss to give a talk about function in the microbiome without at least mentioning some of the other ways in which this could be applied outside of shotgun sequencing approaches which of course focus on giving us genomes, genes, variants, catalogs of which microbes are there all the things I just talked about in addition to if you're lucky shotgun RNA sequencing or metatranscriptomics which will tell us a bit more about transcriptional regulation and gene expression. These are both very microbially focused assays they don't tell us independently about protein activity which again we just heard a lot about but even then we don't necessarily know from this type of high throughput information which extracellular small molecules are being used say for signaling or might serve as metabolic products or nutrient needs for intake as we might get from metabolomics and most of these assays don't tell us explicitly about the host which has its own genome, its own transcriptional regulation, its own proteins and of course all of these differ by cell type they differ by biogeography just as do the species and strains and bio molecular activity that we can measure in the microbes themselves this hasn't started to touch yet on more exotic mechanisms of regulation we heard a little bit about viral epigenetics today but I don't think we'll hear a lot about either host epigenetics or microbial epigenetics yet in the microbiome and all of these are the types of information that we'll really need to get functional models in the sense that we have for single model organisms for example in signaling pathways and regulatory pathways and metabolic pathways that have been built up over years or decades of work again in some of these model organisms so to give just a few vignettes of how we can start integrating different types of multiomic data to get a better sense of functionality in the microbiome I've mentioned a few times metatranscriptomics and this really is starting to be a more accessible functional data type you can run this alongside of shotgun DNA sequencing fairly easily now and although there is a great deal of basal transcriptional activity in the gut it turns out this is from a study we did of the gut metatranscriptome in which about 50% of the metagenome was not a differentially regulated it was basally transcribed there are of course exceptions to this this is one we've highlighted here where a low abundance organisms the methanobrevibacter over transcribed by about 100 fold specifically the methanogenesis pathways here and we know both the functionality and the organism responsible for it and they're very characteristically over transcribing this particular set of pathways we heard a lot earlier about metabolic profiling differentiating subtypes with an IBD I don't think I'll go into that very much and of course that's with Janet and Romnick Xavier at MGH and the Broad and we've had several projects with Romnick now looking at the host's functional contribution often specifically an IBD and I'll talk about that on the next slide we have a couple of examples where we've been starting to look at the interaction of host transcription with microbial presence, absence, and transcriptional activity in inflammation, excuse me, in pouchitis associated inflammation and then we heard a lot from Ruth this morning about the longer term host genetic contribution to disease risk and composition of the microbiome and one comment I do want to make on, again, functional data and its contribution to our understanding of microbiome structure as well is that there have been a couple comments made today about the uniqueness of the microbiome or variability of the microbiome which is something that we've found more tractable and easier to understand again by looking at it with strain level shotgun data in this case as far as uniquely and stably identifying an individual from their microbiome composition over time it's easy to see that there are differences among individuals based on amplicon profiling but differences do not equate to uniqueness they don't equate to sufficient stability to identify someone over time again this is an area where I could give a whole separate talk but it's turned out for us at least to be very difficult to stably identify someone over an 8 to 12 month time period using amplicon sequencing data but much easier to identify them using shotgun sequencing strain level optimized markers which can reach up to about a 75% level of accuracy over that year or so time period for unexposed areas like the gut and of course this drops as we saw in the example of strain level persistence this morning in microbial habitats that are more exposed to disruption so to wrap up with an example of how these functional analysis techniques can be combined in an application in human health I'll talk a little bit about IBD and we heard a lot about it earlier in Janet's talk it's a condition with a really rich history now of investigation of the involvement of the microbiome in Crohn's and Coletas there were some good early indications that the ecology of the gut microbiome was disrupted in IBD and over time this has evolved into an understanding of specific clades that might be involved how they might vary between IBD sub-tet sets and one of the things that I like about Crohn's and Coletas is an example of a microbiome-linked health condition is that it really parallels the kind of models that we built up from genetics there are Mendelian genetic diseases in which we know there's one gene of large effect involved and there are complex genetic disorders like IBD where many genes of individually low effect are involved the microbiome or microbiology generally provides a nice parallel in which infectious diseases may have one or microorganism of individually large effect involved but IBD remains a complex disease here in which there seem to be the whole community is involved or many microbes are involved of individually a small effect so we wanted to understand how these structural changes in IBD were really involved with the disease itself not with other environmental perturbations that might be going on like disease treatment and which of them might be functional, again what might be the processes or pathways of deriving these differences rather than the overall ecology of the resulting differences so I'm running out of time, I won't get into a lot of depth of this particular study we set out to answer these questions in a cohort of about 200, a pair of cohorts totaling about 200 individuals with Romney Xavier and Bruce Sands in which if I start from an overview of the differences between subjects in terms of their microbial composition it's easy to see that the different different IBD subtypes in this particular case were not the major driving factor in differences in microbial community structure and because we had really nice metadata for these particular cohorts we could unravel which of these other environmental and treatment effects were distinguishing microbial community composition among these subjects as opposed to disease and again as opposed to microbial functional differences and I think Rob will talk a good bit more later about some of the effects that exogenous factors such as sample handling and treatment effects can have on our measurements of microbial community structure what I'll wrap up with instead though is what we were able to determine about microbial function in these conditions in this particular case using not a functional analysis excuse me not a functional measurement but a functional analysis method called PyCrust that we developed in order to make general inferences about community function beginning just with composition or in this case 16S sequencing data so even if you just have Amplicon profiling you can still make an educated guess about microbial community function and the basic summary of the process is again by taking advantage of those isolate genomes when we measure a community using 16S sequencing sometimes when we're lucky we know exactly the or very close relative of the microbe that we detect by Amplicon sequencing in our reference genome catalog other times we have a nearby relative or in the worst case scenario we're able to take advantage of many genomes that we have available from isolate sequences perform an ancestral state reconstruction and make an educated guess with known confidence intervals about the gene content of each microbe that we observe by 16S sequencing by doing this for the entire microbial tree of life and then multiplying our taxon abundances by their putative or known gene content we can get a good guess as to the virtual metagenome or the abundances of genes in a microbial community beginning from just this phylogenetic information and we've performed a wide range of validations on this process in general it's easy quote easy to get a good correlation between 16S based predicted abundances of microbial genes and measured abundances of those genes in validation data where we have paired 16S and shotgun data sets for the same samples and we perform these validations on hundreds of such paired examples from the human microbiome project where we tend to achieve very high correlations in our predicted versus measured abundances and carried these out in other communities as well where we have generally lower reference genome coverage and I'm keeping an eye on the time, thanks so to wrap up one can recover general community function again this is just a prediction and we're able to determine this way that there are over six times as many processes disrupted in IBD as microbes I won't go into the details again but this allows us to start assembling pathway maps for the whole community specifically as they are perturbed in a dysbiotic state such as Crohn's and Colitis so some of the gaps that we need to close I think to start making our next steps in functional analysis of the microbiome start with tools to make these analyses really easy and these can span a wide range of modalities as I mentioned earlier systematic and cross species protein function and pathway cataloging quantitative models that go from single organisms to whole community interactions metabolism signaling and regulation and really better identification of host microbe and microbe microbe interaction mechanisms we've heard about this earlier today and I'm sure we'll hear more over the next few days we haven't heard a lot yet about temporarily resolved microbiobiography where are these bugs when so that they can interact with each other and we haven't heard much yet about in vitro models that allow us to do controlled experiments on functionality in systems made to parallel the human microbiome and finally a point that I really didn't talk about today at all but which is just as important as functional profiling is maintaining high standards for both the experimental protocols and the analysis protocols that we use on these complex high throughput data in order to ensure reproducibility and translational quality of the results so that's my last slide I want to thank the folks in my lab who worked on these examples are collaborators that I mentioned and one who I haven't yet and they've been central to almost all of these studies as well as the human microbiome project and our funders including the NIH thanks so much Thank you Dr. Hattenhauer do we have one question? I didn't expect to get time for it Yeah, you don't Okay, so thank you all of the speakers we're gonna have a break until 3.30 when we'll start prompt again, 3.30 and please visit on the right corner our sponsors for today's meeting they are excited to have questions from you