 Hi, I'm Caitlin. I'm a postdoc at the University of Ottawa, and today I'm going to be presenting PEPFUNC, Peptide Centric Functional Enrichment for Metaproteomic Gut Microbiome Data, and how I use R to create tools for biological data analysis. First, I'm going to give a brief introduction to the gut microbiome, and then talk about some challenges with metaproteomics, which is the tech that we use to study the gut microbiome. Then I'm going to be talking about PEPFUNC's methodology using some real data as an example, and then how we've integrated PEPFUNC as a shiny app with MetaLab, our platform for metaproteomic data analysis. So the gut microbiome is a collection of bacteria, fungi, yeast, archaea, and viruses that live in our digestive tract. These organisms perform essential functions, such as the fermentation of the fiber that we eat into beneficial short-chain fatty acids, and our intestinal cells actually use these short-chain fatty acids for fuel. So humans have a very close relationship with our microbiomes, and so it's no surprise that the function and dysfunction of the gut microbiome can influence our health and well-being. Some examples of diseases and conditions that have an association with the gut microbiome are immune system associated diseases such as inflammatory bowel disease, asthma, multiple sclerosis, some metabolic disorders like diabetes and obesity, cardiovascular disease, and even our mental health can be influenced by the gut microbiome. The composition of our microbiomes can be influenced by our genetics or our DNA, but they're mostly influenced by our environment and our lifestyles. So some examples of those influences are our geographic locations or where we live in the world, how we were delivered, our diets, our exercise levels, our stress levels, how old we are, and even the medications that we take every day. And it's this influence, our medications, our drugs, that is a large focus of the study in the lab that I'm part of. So we study the gut microbiome through fecal samples. We specifically extract microbial proteins out of these fecal samples and study microbiomes using metaproteomics, which is the study of proteins from a community or the study of proteins from multiple species altogether all at once. We study proteins because they're the functional players of the microbiome. So this means that metaproteomics allows us to look at what the gut microbes are actually doing. We analyze proteins on an instrument called a mass spectrometer. And this mass spectrometer returns hundreds of thousands of spectra that we will batch back to a database. So using this database, we can computationally infer which protein was identified, from which microbial species, and how much of the protein was in the sample. This means that we can analyze both the composition of what the microbiome is and its function at the same time. However, there are some challenges to metaproteomics. And they stem from the fact that proteins are too large to be measured by a mass spectrometer. And we first have to cut them into smaller peptides. We use enzymes to cut proteins into smaller peptides at predictable sites. So in this case, amino acids K and R who are lysine and arginine. This is good because this means we can create a database in silicone in order to match spectra back to peptides. But it's also bad because the same peptide can be found in multiple proteins due to evolution or just the presence of multiple species. This means that it's impossible to match some peptides back to the original protein. So a typical workflow does this. It matches each peptide back to a parent protein and then uses proteins for the functional enrichment analysis. But in this slide, you can see that I've matched peptides back to their parent proteins in the protein database. But there are two peptides that we just can't match back to the parent proteins. And so it's really difficult to do functional enrichment on this type of data. So what if we just skipped this step and looked at the functional enrichment of the identified peptides themselves? So to do this, we created PEPFUNC, a peptide centric functional enrichment methodology. PEPFUNC highlights molecular functions that are over or under represented in microbiome samples. And we do this using an adapted gene set variation analysis that was modified for use with peptide intensity data. What's great about PEPFUNC is it can also handle peptides with multiple functional annotations by weighting the intensity or weighting how much of each peptide we can identify. And this is very common in microbiome samples. PEPFUNC is also available as a Shiny app for simplified analysis or for biologists who are not as comfortable with computational tools. We did modify GSVA for peptide intensity data. However, we kept the same five major steps that was outlined in the GSVA paper. And those steps are first. We have to subset peptides into peptide sets. We chose to subset them by function using something called keg pathways. We then have to estimate peptide intensity statistics for each peptide. And we do this using a nonparametric kernel estimation of the cumulative density function. We then rank each peptide by that expression statistic in every sample from the most highest to the lowest intensity for how much of each peptide we found. We then calculate a KS-like rank statistic for each peptide set in each sample. And then we calculate a GSVA enrichment score for each peptide set in each sample. So I'm going to illustrate these steps in an example from real data. The data site I'm going to show today is from a single microbiome that was treated with metformin, which is a drug that is commonly used to treat diabetes. So we treated the single microbiome with metformin for 24 hours using our published assay called rapidane. After this treatment, we extract proteins and then cut those proteins into peptides and then analyze them on a tandem mass spectrometer. After we detect the peptides with our mass spec, we then create peptide sets according to their keg terms, which describes their functions. On the left, you can see an example of what those functions can be. And in this case, we're looking at different types of metabolism. These peptides are assigned to functions according to our own in-house curated database. And we weight peptides with multiple functions according to our confidence and their functional assignment. This is what is used to calculate the potential enrichment of these functions in each sample. After we've created these peptide sets, we calculate a KS-like statistic for each peptide set in each sample. So we use a random block along peptide rankings where we go from highest to lowest intensity to determine if a peptide set is more highly ranked or more lowly ranked than we expect. So if a peptide is in a set, we rise according to a previously calculated peptide expression statistic that's specific to its sample. We do the same thing for all the peptides that are not in our set of interest. So in this slide, the orange line are all of the peptides that are in a set of interest and the purple line that is more straight are all the peptides that are not in the set of interest. And we compare those two distributions. An example, this slide shows what a positive GSVA score would look like, where we have a peptide set that is ranked more highly than we would expect. So we compare the random walk of a peptide set, which again is the orange line to all of the peptides that are not in the set, which is the purple line. We add the most positive deviation to the most negative deviation. So in this case, we add 0.25 to negative 0.01 for a final GSVA score of 0.24. An example of a negative GSVA score is something that looks like this, where you can see that the orange line doesn't rise until very late, much later on in the peptide rankings. We do the same thing where we add the most positive deviation to the most negative deviation. So in this case, we add 0.06 to negative 0.58 for a final GSVA score of negative 0.52. We do this for every single peptide set and every single sample. And we're able to visualize it on a heat map, which looks like this. This is an example of a very small subset of the results, because there are so many different peptide sets. In this case, we're looking at the significantly changed functions from our control microbiome to our metform and treated microbiome, where now we can go in and understand what's functionally happening, what's functionally changing to our microbes after they've been treated with this drug. Petfunk is actually part of the MetaLab suite. And MetaLab is an entire platform dedicated to metaproteomic data analysis that was created in the Figay's lab. Petfunk is specifically included along in the iMetalab platform, which is the online version of MetaLab, along with other Shiny apps for metaproteomic data analysis. So our Petfunk Shiny app looks like this, you can upload your peptide intensity file, and you're able to choose what type of data normalization or transformation you'd like, and then look further into your functional enrichment of your microbiome samples. However, if you don't have any peptide intensity data, and you still want to play along with the app, you can use our sample data that's up on there as well. We can use Petfunk to study the functional changes or the functional contributions of a microbiome, like in the previous example, after drug treatment, maybe to better understand the side effects of different drugs. We can also look at how microbiomes change due to diet changes, or what it looks like when somebody has a certain disease, or maybe their activity levels change. However, Petfunk is not limited to studying just gut microbiomes. It has the ability to use your own custom functional database, so you can study whatever type of microbial community you have. For example, you can study soil microbes that are important to agriculture, water microbes that are important to water treatment research, or even if you're really interested in what's inside your sourdough starter in order to make the most ultimate sourdough bread. And with that, I would like to thank you all for listening to this presentation. I've included the link to the GitHub repo with Petfunk's code if you're interested in it. And at that actual link, there's also links to the Shiny app and our publication. I'd like to say thank you to my supervisors, Dr. Danielle Figuez and Dr. Machiavelli-Adam, and all my other co-authors on this paper, Dr. Xiebin Neng, Dr. Xu Zeng, Dr. Luyan Li, and Crystal Walker. I'd also like to say thanks to Patrick Smith for some code that he helped me with for this presentation itself. And finally, thank you to all my funding agencies, specifically the technicalized program that is funded by NSERC and the Government of Canada, and the University of Ottawa.