 Thanks Michelle and I should be able to rise to your challenge. I'm from New Zealand originally and we're allegedly among the world's fastest native English speakers, so I guess we'll be putting that to the test. Anyway, so what I'm going to talk about is how we can move from these association patterns, many of which are intriguing, but only suggestive to ideas more about causation and mechanism. And as we've seen over and over again today, there's a tremendous amount of diversity in the microbiome between different subjects. So in this 2005 paper from Dave Rowman's lab, you see a lot of bands that are unique to the groups of samples from individual subjects, A, B, and C. We confirmed this with Jeff Gordon's group and in a study of Lena de Bees twins where we see these very large filing level differences between different individuals. And META had seen the same thing in other populations. Even the species they define as core vary in abundance by orders of magnitude between different people. So anything that makes up 10% of the microbiome in one person is as rare as one cell in 10,000 in other people. And of course the HMP has extended this to other body sites. So Keata showed you some of this data before, colored by bacterial genus, where essentially even in one body site, any genus that you find a lot of in some people is very rare in others. So this leads us to some of the key questions in the field. We need to be able to figure out what aspects of this diversity actually matter. This other diversity, are we seeing these differences because they do matter or because they don't matter? How can you tell cause from effect? So which of the differences you see between individuals are causes of as opposed to consequences of the phenotypes that they differ in? And then finally, a lot of people want to know how they can leverage large existing investments in data sets such as the HMP data set that have cost millions of dollars and could really complement their studies. So let's go through these one by one. So let's start with this first question. How can I tell what aspects of diversity matter? So as we've been hearing throughout the conference, there's a lot of reasons why you should care about your microbes. And I've just picked out a few of my favorite stories here. Apparently they determine whether Tylenol is toxic to your liver. Pete Tenbaugh is going to talk more about drug interactions and then Luca Essel, one of my students who's also been working with Pete, has a poster on that as well if you want to hear more about drug effects. And Drosophila, at least, they determine mate preferences. This hasn't yet been established in humans, but maybe it's just a matter of time. There's all this exciting gut brain access stuff going on. And it turns out that your anecdotal experiences of camping are true. Different people really are tenfold differentially attractive to mosquitoes. And what they're zooming in on is basically VOCs produced by your skin microbes. But to get to these stories, what you have to start with is a whole lot of sequence data and phylogenetic trees. So what I'm showing you here is just the first nine out of 130,000 sequences from a study we did recently with No Affairs Group. Here's the piece of the phylogenetic tree file. And here's the visualization of the tree. And you can just tell immediately that you've really got your work cut out for you, right? It doesn't help you that much if I tell you the study design. We were looking at biogeography, but not on the grand scale of Darwin and Wallace and the other 19th century explorers who mapped macro organisms across the Earth. But being microbiologists, we did it on a smaller scale. So we looked at computer keyboards. And initially, the idea was, you know, does the space bar have more kinds of microbes simply because it's larger than the other keys? Are the keys a desert where a few microbes survive compared to the lush valleys of your fingerprints? And so on. So anyway, another view of the same data makes the main things going on immediately clear. So now each point represents an entire microbial community, either from a key or from a fingertip. And if I color them by person, you see the first, the second, and the third person, all their keys and all their fingertips clustered together. Whereas if I color the points by what's a key in blue versus a fingertip in red, you see no separation. So you can tell immediately that each of us has a unique skin community. We transfer from our fingertips to our keyboards as we type. It's part of the same study we showed we could match the palm of the hand up to the computer mouse someone uses with up to 95% accuracy. So this came out in PNAS a couple of years ago, but more importantly, it was on CSI Miami. So you really know it's true. So how do we do this? Well, so Greg Caporesso and a number of others who were in my lab at the moment, Greg's here at the meeting in the second row, developed this pipeline called CHIME, which stands for Quantitative Insights into Microbial Ecology, which like the tools Curtis showed you make it much, much easier to integrate analysis from hundreds of samples together. And what I'm going to show you in this talk is essentially all 16S data, although the same tools work for metagenomic data and other multiomic data as well. And basically it provides a complete pipeline from sample. So once you have sequence of samples, basically it takes the data right off the instrument, performs a whole bunch of statistical analyses, and then winds up with pictures and statistics that you can tell these stories about. So this lets us start getting at the effect size of different phenomena. So for example, when we reanalyze Dave's 2005 data, what we're looking at here is six sites along the distal gut from three different subjects plus a matched tool sample. When I color those by subject, you see all the samples from the same subject clustered together nicely. If I color them by site, you can see the stool is off and offset. But the distance between the stool and the other samples from the same subject is less than the distance between subjects. So we can tell this technical variability is less important than the biological variability that we're hoping to uncover. And this will be a recurring theme throughout the talk. So Liz Costello, who was in my lab at the time, now working with Dave. And as you've seen a couple of times already, put together this first overall map. And the reason I'm showing this again, because you've seen it several times today, is just to point out that we can use this kind of display to figure out which variables are important and which aren't important in the same data set. So what you can see immediately here is a lot of clustering by body sites, so the mouth and the gut are very distinct from the skin, for example. When we take the samples and color them by person, sorry, color them by sex or by person or by day, you see much less separation by those variables. Although antibiotics have a large enough effect to be visible even on the scale of different human body sites. And so we can use this kind of thing to start putting an explicit scale using the distances between microbial community samples. So here a bigger bar means greater distance. So two samples from the same body habitat are less different than two from the same habitat. Within a habitat, two from the same person are less different than two from different people. And then within a person in the habitat, two samples a day apart, are less different than two, three months apart. Then the mouth was the most stable habitat we looked at, the skin the least stable and the gut in between. But this is kind of frustrating with a handful of time points, right? Because what you'd really like to do, as you'd really like to go to some of the same, what you'd really like to do is you'd like to figure out dynamics. As again, we've had several times already during the meeting. And so this is a viewer we use for this. So the guts down there, the mouths over there and so forth. If I rotate this around to give you an explicit access for time, you can kind of see the three months separation. You can't see the one day intervals. And it looks like there might be some changes, but it's kind of hard to figure out what they are. So what you'd really like to do is go back to some of the same people, as Dave mentioned earlier, sample them every single day for like a year and a half. And so with a switch from 454 to Illumina, we could afford to do that. And it looks like that. And the correlation between the 454 and the Illumina data is really nice, right? So the difference between platforms is a lot less than the difference between body sites. And so when you have data this dense, Greg and Antonio figured out how to process it and animate it. So what we're doing now is we're using the Castello et al. data frame, all these little points. And then we're looking at an animation, each frame is one day over six months of two people's skin, mouth, and gut communities. And so you can just immediately see how variable the skin is, how static the mouth is, and how the guts intermediate. So this is all kind of fun, but you might wonder, can you do the same thing for something that's actually clinically relevant? And so with Alex Kuritz and Mike Sadeski, we just recently did the same sort of thing for Clostridium difficile infection. And so what we're looking at here is a bunch of samples from C. diff patients, a bunch of samples from healthy subjects. And what we're about to look at is the results of fecal microbiota transplant. And so what you can see is almost immediately the subjects receiving FMT wind up much more towards a healthy distribution. And then they bounce around within that healthy distribution, but they maintain that healthy state for the four weeks that they were followed in this animation. And so this prospect for being able to track in a very detailed way changes in the microbiota associated with health and disease now exists with access to higher throughput sequencing and more samples. And you can imagine building an index that would be useful for clinicians to track this kind of property and to ask how stably do you stay in the healthy state after FMT? So this might lead you to wonder where our microbes come from in the first place. And if you have dogs or kids as I do, you probably have some dark suspicions about that, all of which it turns out are completely true by the way. So we can match up people to their pets pretty accurately according to the microbes they share as well. And with Maria Gloria Dominguez, who's going to be speaking as a keynote tomorrow, so I won't say too much about this, we were able to show that baby's vest microbes depend very heavily on delivery mode. So basically if you deliver the regular way all of your microbes resemble the vaginal microbiota. If you're delivered by C section, all of your microbes look like skin. And so fascinatingly, despite the very high degree of differentiation in adult body sites that you've heard about several times during this meeting, so here's some others mouth, then the mother's vaginal samples in red over here and skin samples in dark blue, the baby no body habitats at least initially are not differentiated. And so you can see that all of the body habitats from all the vaginal babies cluster with the vaginal samples, then all the body habitats from all the C section babies cluster with the skin samples from all the mothers. So again, we get the scale of a fake size. And as I think Ruth mentioned, we can also start to understand how the microbiome develops in infants. So we see what looks on this scale like a very very steady progression from the meconium towards the mother over the first two and a half years of life. That's coupled with a steady increase in diversity. Although all you can really see on this plot is change and resemblance to the mother's stool, it's hard to tell what's going on and I'll return to this later. We can also do the same sort of thing across cultures which I'm basically going to skip since Dave went through it. I do want to make the point that we converge on very different end points between between Americans versus Amerindian and Malawian populations. And in this context it's important to remember that the HMP, despite the fact that it's a very valuable project, it only covers the start of human diversity, right? It's only looking at Western adults who are healthy. There's all the rest of the space of possible microbial configurations that it does not yet cover. And so what we're starting to do at the moment is to cover the rest of the diversity by crowdfunding with projects like American Gut. And so what's exciting about this is it gives members of the general public the ability to participate directly in microbiome research. And so it's generated an enormous amount of excitement by this point. This is Daniel McDonald in my lab handing out kits at the PGP meeting earlier this year. So we've raised about half a million dollars, had about 6,000 people sign up. We've had 1,200 kits returned, including about 400 samples from participants in the personal genome project who either have or will have their complete genome sequenced. But you can imagine how spectacular this will be with tens or hundreds of thousands of samples with a really detailed sense of what kinds of microbes are out there linked to all kinds of different conditions. So this leads us to this critical question, how can I tell which of this diversity matters in a sense that's going to allow us to tell cause from effect? And so there's a sense that we should be establishing causality with COCK's postulates. Basically the idea that you find the microbe in sick individuals, you isolate it, then you reintroduce it into healthy individuals and make them sick and then you get it back out of the healthy individuals. But even COCK didn't believe his own postulates right, because even as far back as when he was doing this original work he noticed that a lot of subjects were asymptomatic carriers of what we think of as pathogens. This was reinforced by the HMP work on pathogen carriage where a lot of alleged pathogens in fact are prevalent in the healthy population. So obviously your IRB is not going to let you try to do COCK's postulates in humans but we can use mice. And one thing that's been especially effective has been the use of notobiotic isolators. So these are the ones in Jeff Gordon's lab at WashU which I'll describe some research from. We also collaborate with a number of other notobiotic facilities around the country. So basically you're growing the mice in a bubble with no microbes of their own. You can then introduce a defined microbial community from a person or specific strains. So do these differences matter? Well in mass models they matter a lot. So these are the OB-OB leptomutant mice that have a defect in an appetite hormone so they're much heavier than a regular mouse. They also have a profound shift in the ratio of two of the major groups in the gut, the femicutes to the bacteridetes. Now just from this data alone you can't tell whether that shift in the community is a cause or an effect of the obesity. But an additional work in Jeff's lab what they did is they took fecal pellets from the leptomutant and transplanted it to a genetically normal mouse with no microbes of its own and found that that genetically normal mouse gained weight substantially faster than an equivalent mouse that was inoculated with fecal pellets from the normal mouse showing that you could transmit the adiposity phenotype by transmitting the microbes. And more recently with Andrew Gewitz's group we looked at a different mouse model the TLR5 knockouts that are unable to that are unable to make tolite receptor 5 which normally recognizes bacterial flagellin. And so again you see an obesity phenotype, you see larger fat pads in males and females, high triglycerides, high cholesterol, high blood pressure and again you see this profound alteration in the microbiome between the between the knockouts and the wild types. And again you can transmit it to a genetically normal mouse by transmitting the microbiota. What's fascinating here though is that instead of an energy balance phenotype which is what it appears to be in the OBOB mutants and the TLR5 knockouts if you do obon calorimetry on the fecal pallets what you find is the same amount of energy left behind. And fascinatingly what's going on instead is that it seems to be a behavioral phenotype. So the TLR5 knockouts eat more than a regular mouse does. Additionally gem-free mice that receive the TLR5 knockout community also eat more than a regular mouse. So Andrew found that you could cure them with antibiotics or you could cure them by putting in their cage the amount of food a normal mouse would eat. And so just as a reminder we've got about 10 times as many microbial cells as we have our own cells associated with our bodies. Andrew had this cartoon commission to reinforce the point that perhaps they are voting you when it comes time to make choices in the cafeteria line. You might be wondering if these results extend to humans. So some work that Ruth and Pete did when they were in Jeff's lab suggests that perhaps they do. So subjects on a fat restricted or carbohydrate restricted diet for a year started to resemble lean subjects more microbially. And fascinatingly Dan Knights who's now working with Robin Xavier showed when he was in my lab that you could actually classify people as lean or obese with 90% accuracy using their microbiome. So this might not seem like a very impressive test given how accurately you can do it with a mirror and a set of bottom scales right. But it does establish the plausibility that microbes are involved in the mechanism. And what's interesting in this context is that if you take all the genes ever identified by GWAS and build a predictive model for obesity the area under the curve is only about 58%. So your microbial genes are far more predictive of whether you're obese or not than your human genes are. So can we actually do something with us and change the microbiome in a way that lets us cure disease. Well where we're taking the work on the the microbiome associated with disease is to develop in countries. So this is this is a photo that Tanya Yatsunenko one of Jeff Gordon's grad students took at one of the field sites that we work with in Malawi. And so this is malnutrition clinic. You'll notice that although food is scarce in this population they are able to have access to cell phones for example. And the remarkable fact about this is there are now over 5 billion active cell phones on earth which is pretty impressive when you consider this only about 7 billion people. Of the billion poorest people in the world 20% of them have their own cell phone. I was reading a couple of weeks ago that the current penetrance of cell phones in Kenya is 95% of households have one. Right so this is a technology that just penetrated everywhere. A large part of the reason for that is Moore's law the decline in the cost of computation. So the digital signaling processing that supports that technology is so cheap that it's effectively free. And this is true not just in Malawi but everywhere basically so I took this photo in Bangladesh a little over a year ago. So even in relatively bad parts of Dhaka you can buy a sizeable hunk of a cow and then right next to that you can have your cell phone minutes recharged. So one thing that's getting a lot cheaper, faster than computation though is the cost of DNA sequencing. So I think you're all probably sick of this diagram showing the decrease in the cost of computation versus the decrease in the cost of sequencing. And this is on a log scale so in the time that computation has got about 100 fold cheaper DNA sequencing has got about a million fold cheaper. So what we're doing at the moment is studies in humanized mice. So the idea is to take fecal samples from individuals transplant them into mice. So from our Gates funded study on malnutrition our NIDDK funded study again with Jeff Gordon on obesity transplant those microbes into germ free mice and then look at how those mice gain or lose weight depending on who's microbes they got. And so this works really nicely. So this is data from Jeff's lab. Each interval is about two weeks. Five different mice. This access is principle one of the principle component one of the community. So starting off on mouse chow you switch them over to the Malawi diet which is 90 percent corn 10 percent leaves. You see this rapid decline consistent across mice along PC1 that you can just model with a single exponential decay curve. You switch them to ready to use therapeutic food so the peanut butter supplement a base supplement that's used in the clinic. They start to recover. You switch them back to the Malawi diet and very reproducibly they go back to the same state. And so the first data from this we published in science earlier this year so Michelle Smith and Jeff's lab was the first author on that. And basically what you can see so we're looking at we're looking at members of a twin pair who are either healthy the red line or have quashia core severe amount severe nutritional deficiency which is this blue line the mice that receive the quash community do really badly they lose more than 30 percent of their body mass they recover with our UTF this is coupled with a very distinct change in the microbiota and also in the functional response although I won't get into that so basically what we can start to do is we can take we can take microbial communities from individuals put them into mice and then how those mice respond to treatment depends on whose microbial community they got. Now taking us a step further what we can do now is we can use these things called personalized culture collections where instead of transplanting the primary fecal specimen what you can do instead and this was really pioneered by Andy Goodman who's here at the meeting and Jay Faith when they were in Jeff's lab is this idea that instead of taking the primary fecal specimen what you can do instead is that you can dilute you can dilute the cells down into a 384 well plate coax them through a few cell divisions and then you can sequence their genomes but unlike single cell sequencing you don't destroy the whole organism when you sequence it you only take part of the culture so you have the rest left over so you could potentially mix them together and test ecological hypotheses about which species are important and to put this in perspective for example wolves greatly affect the Yellowstone ecological system but we didn't find this out by going out to Yellowstone taking a cubic mile of stuff grinding it up and looking for the wolf DNA right the reason we know that is that people shot all the wolves the ecosystem changed profoundly then they were reintroduced and the ecosystem came back and so the ability to instead of trying to get rid of a species by antibiotics to leave it out of the community in the first place see if there's a functional difference and then reintroduce it and see if the function changes is really exciting prospective longitudinal studies can also be critical for testing causality so basically the idea is that you want to enroll subjects prior to disease and figure out if you can predict who's going to get the disease and who isn't from the microbes of their earlier time points and so there's some examples like risk, Teddy, BMMI and so forth that are doing this now as Robbie Klein pointed out to me earlier for this instead of cox postulates you want to use Hills criteria so looking for the strength and consistency of association whether the association is plausible based on what you know about mechanism and so forth rather than necessarily being able to prove transmission so just in the last couple of minutes I'm going to say a little bit about leveraging existing investments and large data sets and there's a lot of technical factors that could affect readouts of the microbiome including things like storage conditions DNA extraction conditions PCR primers what sequencing platform you used analysis software and all the rest and we know for sure that all of these factors are going to affect the results so the question is will they affect it so much that you can't see the biological pattern you're looking for and so one thing that's really exciting in this respect is standards like those produced by the genomic standards consortium and by the earth microbiome project that allow us to do a lot more data integration by having a standardized data set where often we don't need to worry about these things another key aspect to this is being able to take chime into the cloud and so NIH is currently running a running a cloud pilot through NAID with Amazon that allows us to make these techniques accessible to a lot of people who don't necessarily want to set up their own cluster and for example the reason why we were in Bangladesh was it's very hard to set up computational infrastructure in darkest so whereas in contrast you can basically give Amazon your credit card number and I'm guessing they already have your credit card number right and use that to put together an instant cluster run your jobs on it and then let it dissipate without having to set up your own set up your own supercomputer so anyway so the vision here is to see your samples in the universe of other samples and there's a data integration workflow that I'll skip in the interest of time when we use that they're basically what we can see is that the HMP cluster patterns so these are across different body sites match very nicely with the community clustering pattern so this was work with Curtis's group and with Ruth Lee's group so on the scale of the differences between body sites all of the technical variability between these different studies didn't actually matter all that much and so this kind of makes sense right on that scale the different body sites that we're seeing here are very distinct from one another and then that outweighs things like the differences between the v1 versus the v3 primers and the HMP so the primers have an effect but they're smaller than that effect however as soon as you start going into stool specifically so this is still in mucosa you start to see things you don't like so here we have the stool clustering into these two completely separate categories and what's going on there is actually that we're seeing the separation between the v1, 3 and the v3, 5 primers and the HMP that greatly outweighs the variation that we're trying to see in these individuals like whether they have IBD or colon cancer so when you're starting to get more specific questions you suddenly care a lot about this technical variation that will otherwise obscure what's going on and so Cata's talked to you about integrating multi-omics data is just as important when you're integrating data of the same data type to have some consistency in how you do that integration now this is just another view now of some more refined data where we're just looking at the Illumina platform and so when we look across body site you can see a lot of separation there and what you can see is these different experiments including in this case the HMP experiments which were on 454 separate not by project but by body site like you would hope but again when you go into stool and in this case what we're doing is we're restricting it just to Illumina data what we see here and you can see we have a ton of metadata about these individuals what you can see here is that you see pretty clear clustering by study rather than clustering by clinical variables which is what you'd like and this is very disappointing because we know there are some interesting clinical variables in here so for example when we went to Bangladesh we tracked ourselves over time so what we're looking at here is a cross body site effect which is very large and then if we look at country on this scale you can see that there's basically nothing that really separates the time we were in Bangladesh from anything else when you look at cross body sites but if you zoom on the stool specifically and we scroll down here so we're all looking at stool and we look at what happened with travel what you can see now is that that trip to Bangladesh caused some substantial shifts in the microbiota of some of our individuals who then wandered off into different regions of configuration space after that and did not return despite lack of treatment with antibiotics so being able to integrate this kind of data across many studies because for this kind of experiment it's often difficult to get enough volunteers within one study to do it many times is really important and again to cut a long story short taking this same data I showed you before with that very clear per study clustering and doing some additional analyses with it what we're able to do is we're able to bring it all back into the same distribution and we're able to now go looking for variables that are potentially clinically interesting that vary in that cloud including the American gut data which we're currently cleaning up at the moment one problem that this points out though and you see some examples of this is that there's a lot of cases where we have no data and there's a lot of cases where the data are not consistently annotated so you have for example NA versus no data as separate categories rather than integrated into the same category and that's something we're cleaning up at the moment so anyway so I'm going to talk a little about gaps needs and challenges and I assume about it out of time but the major gaps that we're facing are that we don't know what effect sizes to expect the techniques for dynamic analysis are just emerging we don't know yet whether prospective human studies or mechanistic animal studies will in general yield more information and most of the sample information is not yet in the standard format and databases including SRA so what we need we need standardized scales for effect size and better ways of doing power calculations to power microbiome studies going forward we need better ways to relate the animal to animal data to humans we need prospective longitudinal studies a huge library of microbial strains of known provenance that we can use for those kinds of microbial wolf hunting experiments and better annotations at all scales as Curtis mentioned and so the challenges are how can we balance subject privacy issues against the utility of having rich clinical metadata publicly available how can we integrate the data across omics levels in these longitudinal studies and how much data do we need of each type and how can we integrate the knowledge of gene and taxon functions produced by multiple researchers using multiple techniques in an efficient way but I don't want to put you off doing this so combining studies is hard but as I mentioned some biological effects are really large so this is from a recent genome research paper we did with Janet and with Jeff where even across different studies so this is colored by study effects like the effects of age are still very visible even against that background and going back to Ruth's data set of that infant development you can get so much more out of that by combining it with the HMP data set and when I do that what we can see immediately and this was put together by Antonio and Joshiki in my lab you can see that the baby starts are very much in the vaginal distribution so these are all the adult samples from all the HMP subjects showing the vagina, the skin, the mouth, and the stool and so initially what you can see is a whole lot of inoculation of the baby gut with sequences from the skin and from the vagina probably with ongoing inoculation from the skin you can see an approach towards the adult gut microbiota so the scale here is from birth up to 27 months and what you can see is an approach to the adult microbiota state although it's much less stable than it looked like on the previous slide fascinatingly and this is very relevant to Marty's talk you can see treatment with antibiotics for an ear infection followed by a rapid recovery in the microbiota so there's a lot of resilience there and by the time we get up to two and a half years you can see that infant is basically in the adult stool distribution and you could never have seen this by looking at the infant data alone right you can only get this by being able to integrate that data set with the HMP so with that I'd like to thank the large number of people currently or formally in my lab many of whom I've thanked throughout this talk our many collaborators on various projects including the CDEF project and the IBD project that I talked about briefly our various sources of support including of course NIH and the HMP specifically so thank you I'd be delighted to take a question or two of this time thank you Rob for a very nice talk as usual so when you travel you have shown that there is a shift in your microbes but of course there is a shift in your diet so the first thing most people think is you are getting new microbes from the local place but how much is it just the shift of the new diet you are ingesting there that's a great question in this case we're pretty sure it was a shift in the microbes because we all got sick on the same day and had some substantial had some substantial diarrhea and so forth which was probably not which was excuse me which was about a week into the visit so in that particular case we think we have a really we think it's really likely to be a specific microbial cause and through the sequencing we know what that microbe was in terms of the dietary shifts in patient studies that have done diet over the short term like the WWR one that I dropped from this in the interest of time generally show that you can get a statistically significant effect in humans that's relatively small but you can see it however it's really diet over the longer term that causes a big shift and who goes where thanks thank you Rob my name is Maria Giovanni I'm from the National Institutes of Allergy and Infectious Diseases and it is my great pleasure to introduce Owen White who's from the University of Maryland School of Medicine and he's going to talk to us today about large data management data standards and data sharing Owen