 First, I'd like to thank Leader Proctor for the invitation and also the organizing committee. This is great to have a chance to present our research and also to have an opportunity to give my own opinion about some future directions in the area. So I'll be discussing multi-omics technologies and giving a few examples of how we use that in our own research. But first, I'd like to give a little bit of background. We already saw this slide earlier this morning. I think we've seen it a couple times already. And I think it's important to think that the information we get from these DNA-based studies is only part of the way towards where we want to go. So if we look at these particular graphs where we can see the 16S data that shows different body sites and the variation in the microbial community composition between different individuals. So just to reminder that these are different persons, different body sites, you can see there's considerable variability in the composition of the communities, whereas the functional genes are relatively constant at this very gross level of examination of function. So this is DNA-based cog level examination of function, and things look very similar. Sometimes, however, composition does matter. I did take away a few slides where I was going to show this, but I know that Alex Corbett's is going to give a talk about this later, so I won't talk about it. But we do have examples where we know, for example, through fecal transplantation that the community composition is very important for health. And so I'll let Alex talk about that. So this is actually one of my own photographs. I was just in Greenland a couple of weeks ago, and this is great to have your own iceberg to show. But what we know is just really the tip of the iceberg. We know information, and quite a bit of information, about the community composition if we specifically focus on the gut environment. We know a lot about which microorganisms are there, which are the most dominant species, and how variable it can be across human populations. So this knowledge is starting to really consolidate. What we don't know is what lies underneath the water. We don't really know what these organisms are doing, just at a very basic level. Most of the microbes still, you know, they're not very well characterized. There are many that have not yet been cultivated. So in order to understand function, we do have access to omics tools nowadays. And for those of you that aren't familiar with these technologies, I kind of refer to this as an omics pipeline. So depending on the information, you can get different types of information about expression. So at the very beginning of the pipeline, this is where we've really focused a lot of our work so far, and it's on the composition. So using 16S sequencing, for example, to understand the microbial community composition. So I would call that the microbiome. The next level is the metagenome, so sequencing not only the phylogenetic genes but also the functional genes, so we know the gene composition. The next step would be to look at the RNA. So which of those genes are expressed? Then the proteins, of those expressed genes, which ones are translated and form proteins? And then finally the metabolites, or the metabolome, and the metabolites are important for carrying out a lot of the reactions in our bodies. So I'd like to think a bit about these different kinds of omics tools. So if we think about genomics or metagenomics, this is information about gene content that has the potential for being expressed. So just because you see a functional gene, that doesn't mean that it's being expressed. It has the potential to be expressed. And what is particularly problematic is that the DNA that you're looking at could be extracted from dead cells or from dormant cells. So the cells don't even need to be active or even alive in order to get DNA. I'm not saying that this information isn't important, but it's also important to keep in mind that this is a limitation. If we go to the next level and look at RNA, that provides you with a snapshot of activity at that moment in time when the RNA is extracted. And it's also very important to understand that that's the expression profile at that, with those given set of circumstances and conditions. And as we do know, cells are experiencing a lot of different kinds of regulation. And so not all genes are going to be transcribed, but at least you get information about activity at that moment in time. If you do look at the proteins or the metaprotein, which would be the community proteins, or complementar proteins, that provides evidence then that this protein has passed all of the regulatory steps at the RNA level and also has been translated in producing this protein. A caveat with that is you don't know if the protein is actually active in all cases if you do detect it. However, the genes must have been transcribed and translated to produce the protein. And I would say that that's better for assessment of microbial function, is to look at the proteins. This does require an annotated gene database, so then you do need the metagenome information anyhow because otherwise it's not possible to identify what the proteins are. So what do we know from model microorganisms? So I'm just going to kind of go back a little bit here. This is an E. coli cell, and a lot of work has been done on systems biology of single organisms. And so this is a reference from Corbin from PNAS in 2003 where they detected a positive relationship between protein abundance and transcript abundance during exponential growth of E. coli. So I think that this is a very kind of, it's a nice confirmation that the kind of information you get from RNA and protein is very consistent, and you can use both kinds of data sets and just exchange them. However, if you look at the single cell level, and this is a reference from 2010 from Tenaguchi, they found no correlation between messenger RNA and protein levels in single E. coli cells at a particular point in time. So I think that even with a single organism, we're still really at the beginning of understanding at a systems level how we can use these different kinds of information to understand function. Now there hasn't been that much done in the microbiome yet using these different kinds of tools, but there has been quite a bit done in the marine environment. And what can we learn from other ecosystems? Well, this is a relatively very recent paper from Mary Ann Moran, and what she did was she looked at the amount of macromolecules in a single milliliter of seawater. And here you can see the amount of genes, transcripts, and proteins from the same milliliter of seawater. So you can see that the abundances vary dramatically, and this is a log scale. So this is a quote from Mary Ann Moran. She said that the most important factor responsible for the poor messenger RNA yields compared to the protein correlations is the long half-life of proteins relative to messenger RNAs. And I like it that she actually did these calculations, that a typical bacterial protein half-life is about 20 hours, which is about two orders of magnitude longer than a messenger RNA half-life. That means that most proteins persist in a bacterial cell long after the messenger RNA that encoded them have been degraded. So this is important also to keep in mind if you're using a different omics method. And again, I think this is a reason that I like proteins, because at least they're going to give you more information about the history of activity of those cells. So this was the first publication using this relatively new technology to look at fecal samples, so to understand what the protein complement was in human fecal samples. And I have to say that this technology has really been a major revolution to the use of proteomics for microbiome studies, because up to this point, everything was based on 2D gel separations and extraction of spots and sequencing those. However, this is a shotgun approach, so where the sample is taken, in this case we use differential centrifugation to extract the bacterial cells. The cells are lysed, the protein is extracted, and then directly digested the trypsin into fragments. Those fragments are separated by 2D LC-MS-MS, so it's completely gel-free, and then collected on this colon and using electrospray into a very high mass accuracy mass spec. Now there are even better mass specs for this purpose now, but this was in 2009. So then you get these spectra, and those need to be searched against your databases. And this is where the metagenome data and also reference genome data is extremely important because you need to have those annotated genes. And you rely on exact matches to understand what your proteins are. So those that can be identified, you can then predict their functions, but in the case of hypothetical proteins, it's also possible to look at the sequence and to be able to do a hypothetical protein identification. So this is an average of the metagenomes that we had available at the time, so just looking at average cog categories of function from the metagenome level and comparing that to the average cog categories for metaproteomes. And what we can see is that if you look at the metagenomes, it's a relatively even distribution of cog categories. But if you look at the proteomes, it's really enriched in certain functions. For example, translation, energy production, and carbohydrate metabolism. And these are functions that you would expect to be dominating in the gut environment because they have to, for example, metabolize carbohydrates. So this is a good sign that the information that we're getting from the proteome is more indicative of the function that is actually being carried out in that system. Another nice thing about doing the proteome is that at first I was, I always say this, but as a microbiologist I thought the human proteins were contaminants, but they actually turned out to be very useful because you can get a study of the microbiome interaction with the host by looking at the human proteins. So we get the human proteins for free, at least the proteins that are attached to the bacterial cells because we enrich the bacteria. And when we look at the human proteins, the largest groups of proteins are usually digested enzymes and those involved in cell adhesion. However, we can see these very interesting proteins, including antimicrobial peptides. And this is just an example of one protein that was identified early on. It's a DMBT1, which is thought to play a role in cellular immune response. These little blue bars just show the peptides that were lined up along this protein or the gene for the protein. So that's proteomics. What about metabolites? Well metabolites are really the ultimate proof of processes and pathways that have occurred and it's really the final signature of metabolic processes. The thing that's different about looking at the metabolites compared to the other kinds of omics is that it's not so easy to key it to a particular organism. You don't have that way to track back to a gene, instead you're dependent on massive data correlations. So metabolomics is, of course, very important. When we consume food, the food is digested. If you can have rather insoluble carbohydrates or more soluble polysaccharides, oligosaccharides. And depending on the organisms that encounter those in the intestine, you're going to have different kinds of metabolites that are produced. And you can have primary degraders and also hydrogen utilizers that are consuming the hydrogens that's produced. And eventually the metabolites, some of them are used by the community but some are actually taken into the systemic system and can have impacts on the body. So that's a background about the omics technologies. Now I'll give you a couple of examples of projects that we've carried out where we used multi-omics approaches. The first is for IBD cohorts. We have a twin cohort and also a longitudinal study where we looked at microbiomes, metagenomes, metaproteomes, and metabolomes. And the second is the dietary study, which is looking at microbiomes, metagenomes, metaproteomes, and metabolomes. This was an earlier study, so it was using the 454 sequencing platform, whereas we migrated over to Illumina for the second. So first I'll show you the example for inflammatory bowel disease. And this is a disease that has many different consequences for the body and it has a very complex etiology. But one thing that is of interest for this meaning is that there's often been reported a dysbiosis or an altered microbiome in individuals that have inflammatory bowel disease compared to healthy persons. So this is just an example of publications that have reported dysbiosis. There are many more papers than this, so these are just a few examples. I know you can't read it, but that's fine. It just lists a lot of different bacterial species that are either reported to be more prevalent, higher in individuals that have inflammatory bowel disease, or lower in individuals with inflammatory bowel disease. And so I summarized some of the key points here from the publications. And one is that dysbiosis in IBD is characterized by an overall decreased diversity of bacteria in the gut compared to healthy people. And typically a greater relative abundance of proteobacteria, such as enterobacteria ACA. And another important point is there's often a loss of beneficial microbes, such as butyrate producers and other producers of short chain fatty acids. So the study that we did was to study twins. The reason for doing that, and I think we heard a beautiful example from Ruth Leigh this morning, is that you have these genetically matched individuals. And therefore, you can discount a lot of the confusing impact of the genetics and early childhood exposures when you're looking at dysbiosis. So it's a Swedish twin cohort, 46 twin pairs. They included healthy twins, those that had ulcerative colitis, those that were discordant for the disease. So the healthy one is the smiley face, and the sick one is not smiling. And then those that were concordant, so both were sick. And the same for Crohn's disease, we had discordant twin pairs and concordant pairs. So these were all the tools that were used on the same samples. So it was the same fecal sample, and we used everything on the same samples. And it included also biopsies taken from five locations. So we got a lot of information from these individuals, including all of the different parts of the pipeline. For the microbiome, at that time, we first started with a fingerprinting method called TRFLP. We used QPCR, but then we moved to pyritex sequencing, the metagenome pyritex sequencing, and then did the shotgun proteomics and metabolomics. So this is a TRFLP profile survey of 90 different children. The reason I like to show this is it just was one of our first indications that every single one of these children had an individual fingerprint. That's the only reason I'm showing this here. But when we looked at the identical twins, this was amazing to me, that they were so similar, and these were adults that had lived apart for decades. And their TRFLP profiles of their fecal samples were very, very similar. And this also supports what Ruth Leigh was talking about earlier today. By contrast, if we looked at these discordant twin pairs, their fecal microbiomes were very different. So this, again, is an indication that there is a dysbiosis. There's something different in these individuals. So if we look at this pipeline again, and this is just looking data from one pair of healthy identical twins and the correlations between the two individuals in the twin pair, we can see that at the microbiome level, we have a very high correlation between the OTUs present, 0.9 R squared. If we look at the proteome, we start to get more of a separation, some more individuality, R squared of 0.396, and at the metabolome, even more individualized, R squared of 0.301. So this means that as you go through this pipeline, you start to get more and more individual characteristics. It gets to be more discriminating. So when we looked at the 16S data, we could see very distinct clusters. So this is all of the patients that had inflammation in the ilium, which I'll call ilial CD, and sometimes I abbreviate ICD. They clustered separately from those that had inflammation in the colon, so colonic CD, which I sometimes abbreviate CCD, and from those that were healthy, which are in green, and the blue that had ulcerative colitis. Now this grouping was much more significant than this twin pair similarities. So even the healthy twin pairs similarities, okay, the healthy twins did cluster together, but disease was the major clustering factor over zygosity. So once we saw this data, we focused more on the disease comparisons. Now the reasons that we looked at the biopsies was to study the mucosa associated microbiota. And here we found that these are just different locations, ilium, distal, colon, but what we basically saw was that we again saw, when we included the biopsies and the fecal samples, we had this distinction between ilial Crohn's disease and those that were healthy and had colonic Crohn's disease. And also the individual biopsies and fecal samples clustered together. So when we look at the composition at a phylum level from these sort of individuals that were either healthy or had colonic Crohn's disease, just averages, ilium biopsies, and fecal samples, we can see that there are differences in the biopsies compared to the fecal samples. If you look at the blue, which is lacknose braceae, you can see that it's much greater in the biopsies of the healthy, which is H here, compared to the fecal samples. So there are differences, but still when you look at one person, their biopsies cluster with their fecal samples. And so we were interested in seeing, well, which particular organisms were higher or lower in abundance? Now, I already told you that some of these butyrate producers are known to be more abundant and healthy compared to those with some of these IBD phenotypes. And definitely with ilial Crohn's disease, that this organism is basically absent in the biopsies, in either the ilium or the colon or in the fecal samples, compared to healthy and those with colonic Crohn's disease. So here again, we see that separation between these different Crohn's disease phenotypes, whereas other organisms were more abundant. And this is an example of E. coli, that these are different biopsy locations. The five different locations was much more prevalent in Crohn's disease compared to healthy. And we found one Ruminococcus natus that was higher in the biopsies than those with ilial Crohn's disease compared to the other locations and to healthy. So a gap, that was looking at single time point studies. A gap is really to look at this in a longitudinal scale. So previous studies have focused only on these single time points, really provides limited insight, especially for something like IBD, where you can have flare up remission and different things going on over time, drug therapy. IBD has active and quiescent disease states. Therefore, it's really important to have a temporal study to properly assess IBD. So more recently, we did a longitudinal study with 139 subjects. And up to 10 time points were collected for these individuals every three months. And during that time, we have from our clinical collaborators, information about remission, drug therapy, etc. So what we do find when we look at all of this data, we still get this major clustering based on disease. Might be a little bit hard to see. But this is ilial Crohn's disease here in purple. Ulcerative colitis is the light blue. Colonic Crohn's disease in darker blue and healthy in green. So even when all of these time points are taken into account, we still get this clustering. But this is a super interesting thing here, so I can get it to work. So Rob Knight's group did this for me. This is looking then at the trajectory of these individuals over time. And so if you follow the orange and the yellow, so the healthy and ulcerative colitis, they are starting to form a cluster here on one side. Whereas the different IBD phenotypes are varying dramatically. They're jumping back and forth in this space over time. And so each of these segments represents a three month sampling period. And here you can see another healthy person is still continuing in this plane. So when it finishes rotating, here you can see the healthy and ulcerative colitis are almost as flat as a pancake. On that plane, this is where they rotate. But the IBDs are, they exhibit a different space. So I think this is really important to understand what's going on there. If you look at individual temporal dynamics, you can see this is a healthy person. There is some variability. For example, there's a bloom in this bacteroides. I think, can't really see the color. But there's some difference over time, but not nearly what you see. When you look at the IBD phenotypes. So here's an example where there's a real enrichment of enterobacter. And then the bacteroid Daceae come in, and then Lackness Braceae. So it's a lot more dynamic. And this is, it's individual though. You have a different pattern for each person. So what we're currently doing is the metagenomes and metaproteomes for five of these patients at five time points. And so I don't have that data yet, but that is ongoing. So for, I have to go faster. For our HNP demonstration project, we examined a subset of these pairs that had matched metagenomes and metaproteomes. And these are just showing the proteome similarities in the twin pairs. So we see a lot of similarity with the healthy twin pairs, and with the colonic, but much less with the discordant twin pairs. And the metaproteomes, they cluster according to disease, phenotype, and here you can see that here as well. This is ileal Crohn's disease, healthy and colonic Crohn's disease. And when you look at individual pathways, so this is the lowest phylogenetic level where we can identify the proteins and what they're assigned to. We can see that all of these pathways are less abundant in ileal Crohn's disease at the protein level. But there are some proteins, especially for outer membrane proteins that are more abundant in Crohn's disease. And that's just saying what I just said. And if we look at the human proteins that we find in healthy, more proteins that function in mucosal integrity. And also in the ileal Crohn's disease, the higher abundance of proteins involved in inflammatory response. This human alpha defense and pancreatic enzymes. So we think that that's demonstrating a defective epithelial, or a leaky gut symptom. Looking at the metabolites. So again, just to emphasize, these are from the same samples. So the pellet was sent for proteome analysis and the fecal water was sent to Germany for mass spec analysis of the metabolites. Again, the same pattern. We see the clustering here, red is the colonic Crohn's disease, blue is the ileal Crohn's disease, and green is healthy. And so we get this very distinct clustering. And this is just showing some of the differentiating metabolites. We had so many differentiating metabolites. Over almost 8,000 metabolites significantly differed between these. And we had over 18,000 metabolites, and most of them are unidentified. One example is bioacid biosynthesis that was higher in Crohn's disease. And we think that this may also be due to inhibition of bioacid absorption by inflammation. So I just want to mention this study. I won't have time to really go through it. But this is an ongoing dietary study funded by General Mills and NIH NIDDK. And we're looking at different high carbohydrate, low carbohydrate diets in a crossover study. And this is just showing the study and the different kinds of analysis. One thing we find with a resistant starch diet is that we get, we do have more, let's see, but the high resistant starch diet, we do get the lower insulin resistance. But these are different patients. Now, we're interested in differences in the microbiome. And so with these different arms of the diet, when you do the crossover, there is definitely a significant difference between the high carb and low carb, and also with the high resistant starch and low resistant starch in both branches. And we find our favorite fecalibacterium is enriched in a high resistant starch diet. And these are metabolites that were detected. And we do find the metabolites separate according to high resistant starch diet. Okay, I'm gonna have to finish here. So I need to mention where we should go from here. So I think the current grand challenge is how to analyze all of this multi-omics big data. I'm so thankful there's a call coming out for big data analysis, because this is really an enormous amount of data. And we generate it and we wanna correlate it. And what we want to avoid is this, interpreting the hairball. Because often I get the data back and this is what it looks like, as an example, an anonymous hairball. So I think that what we need is more multidisciplinary collaborations with microbial ecologists, clinicians, bioinformaticians, biostatisticians to be able to really dig down into this data. We have a huge resource of data, but we need to be able to analyze it. And I'd like to conclude with acknowledgments. Thank you very much. We have time for one question. If not, then we can move on to our next speaker. We thank you, Dr. Janet Janssen for an excellent presentation and we're moving on to our next speaker. Dr. Dan Rudolf Liedman from NYU and Skarbal Institute. And he will talk about approaches for host, immune, and microbiome studies.