 Welcome to the Galaxy Training Network's smorgasbord tutorial on metaproteomics. My name is Pratik Jaktap from the Galaxy P team at the University of Minnesota. And I'll be introducing you to the Galaxy workflows for metaproteomics research. So what is metaproteomics? It is the characterization of proteins and functions expressed by microbiome in response to its environment. The GTN tutorial for metaproteomics can be found here in this link. Microbiome research has been in forefront in addressing questions about the effect of microbiomes on human health and disease. For example, scientific literature in gut microbiome research has multiple effects that correlate the taxonomic composition of gut microbiomes to the physiological conditions. Environmental researchers have also studied the effect of microbiomes on soil, aquatic systems and even oceans in environmental research. Microbiome researchers have greatly benefited by studying the DNA of microbes from clinical or from environmental samples. Metagenomic studies, which involve analyzing sequences to detect taxonomy of the microorganisms is a mature field with advanced bioinformatic approaches to correlate the microbial composition to the phenotype. In recent years, researchers have started studying the RNA expressed by microbiomes to understand the gene expression in response to conditions. Metatranscriptomics research not only helps us understand the microbial taxonomic composition but helps in characterization of functions expressed by the microbiome. Metaproteomics, which is the topic of this tutorial, has been used to study the microbiomes using mass spectrometry methods wherein microbial proteins are characterized. The identified proteins are assigned to biological functions thus helping us understand how the microbiome reacts to its immediate environment. Websites that are identified and are unique to taxonomic units such as genus, species or even strains are used to detect the taxonomic composition of the microbiome. Metaproteomics researchers, including us, believe that metaproteomics research has a huge potential to unravel the mechanistic details of microbial interactions with the host or the environment. Metaproteomics and its definition has changed over the years as the field has evolved as newer sample preparation, data acquisition and bioinformatics analysis methods have emerged. In general, there is an appreciation of the fact that metaproteomics offers an insight into functions that are expressed by microbiome. To illustrate this point, imagine that you have a microbiome that is exposed to dissimilar conditions such as nutrition. The proteins and functions that are expressed in response to this nutrition are different as compared to the similar taxonomy that has been shown here. On the other hand, you can start with a dissimilar taxonomic composition of a microbiome and then treat it with a similar condition or nutrition and the microbiome express a similar set of proteins or functions. To illustrate this point further, here is an example wherein we analyze the data set from Magnus Aundsen's laboratory at the Norwegian University of Life Sciences. A sample from thermophilic biogas plant was inoculated into a large lab scale reactor for growth on cellulose. The bottles were inoculated at 65 degrees Celsius and collected at nine different time points and processed in triplicates. Mass spectrometry data was acquired on all time points and we performed metaproteomics analysis on four of these time points, 8 hours, 23 hours, 33 hours and 38 hours and we call them T1, T4, T6 and T7. Based on the analysis of the proteins and functions and their abundance values, the principal component analysis separates time point T1 from other time points which are clustered together. On the other hand, if one used taxonomy abundance values based on the metaproteomics data, it does not separate these time points as well as seen when we use functional data points. This highlights the importance of understanding the functional state of the microbiome in order to understand how the microbiome responds to the environment. With this, let us move on to the hands-on session and for this we will go to the Galaxy Training Network website which has this metaproteomics tutorial and then we will use these inputs as well as workflows on the Galaxy Europe website. In this session, we will understand how to generate a new history, how to access a data library and transfer the inputs from that data library. In order to locate the metaproteomics tutorial on Galaxy Training Network, you can go to the Galaxy Training Network's website and then scroll down to the topic proteomics to locate the metaproteomics tutorial. So in the proteomics tutorial website, you will find metaproteomics tutorial and if you click on this link here, it takes you to the website of the metaproteomics tutorial. The metaproteomics tutorial has various sections. It starts with an overview which has questions as well as objectives and also some input data set workflows and it also shows you on which Galaxy instances this particular tutorial is available. So as you can see, it's available in Africa, in Europe, in Norway as well as on the Galaxy Australia web server. We're going to carry out our analysis on the use Galaxy EU network today. If you keep scrolling down, it gives you an introduction as well as tells you on how to upload the data set and takes you through the analysis and its steps for this particular tutorial. At the end of the tutorial, you will find a feedback form which we recommend that you fill in so that we can make changes to this tutorial as needed. Next, we will go to the Galaxy EU website and we'll start with generating a new history so that input data sets can be transferred to this new history. The way to do it is use this plus sign and when you click on it, it generates a new history which is unnamed so you can name it as metaproteomics GTN for example. And now in order to transfer the input data sets into this history, we'll go to shared data and within the shared data, we'll go to data libraries. Once you click on data libraries, you'll find a name called GTN material. Click on that. Once you do that, you will see different categories of tutorials including one on proteomics. So if you scroll down a little bit, you'll see a proteomics link here. Click on that and within proteomics, you'll see tutorial metaproteomics third in line here and if you click on that, you'll see a DOI link or a Zenodo link that has all the input data sets that we need for this tutorial. If you click on that, it gives you these six input files out of which we're going to use the top five input files. We do not need the sixth one. We'll use this five input files and export to the history as data sets. So you can now export it to this current history that we have generated or if you have forgotten to do that, you can generate it in a new history as well. So let's go with this metaproteomics GTN history that we'll import these files into. So once I say import, it has already imported these files into this history. Once I go back to this history, as you can see, these files have been placed in this history and the first thing that we'll do here is we will remove the website link to this and just keep the names of the files. So starting with the first one, I have removed the website link and I save. So the way I do it is I go to this pencil mark, click on it. It shows me the name and then I only select for the file name and delete anything else. We do that the same procedure for the FASTA file as well and lastly for the Tabular file as well. Right now, so we are ready with our input data sets and one more thing that we will be doing and this is something that is generally performed in most of the proteomics analysis is since you might have fractionated your data set, in which case we would like to search this data set as a single output from these three raw files, we will generate a data set collection and for that you click on this check box, checkmark box here and then select those that you would like to generate a data set collection from and so for all selected, build a data set list and I am going to name them Bering Strait MGFs and I will explain you soon why we call this, why it is called Bering Strait and then once this is done, it generates this data set collection. So what data set collection basically does now is it has these MGF files which are now searched against this database that we have and it will generate a single output from this. So just to summarize, there are these three MGF files which are mass spectrometry files, there is this protein FASTA file or Metapeptide FASTA file, there is a tabular file of gene ontology terms that would be used for assigning gene ontology later in the workflow and the last one, number nine is a collection of these three files which will be processed so that it can generate a single output. With this, let's move on to try to find out what these input files are. Now that we have downloaded these data sets into our Galaxy histories, let's try and understand what these input data sets are about. So these mass spectrometry data sets were acquired from the lab of Dr. Brooke Nunn at the University of Washington in Seattle. Ocean water samples were collected from the Bering Strait and Chakchi Sea and the oceanic marine bacteria were retained using a filter. The mass spectrometry data was acquired on Q-exactive HF instrument and an associated shotgun metagenomics data was also acquired. The attendees are strongly recommended to read this manuscript to understand more about this data set as well as its analysis. In order to understand more about metaproteomics data acquisition, let's go back and understand how a mass spectrometry data is acquired. On the left are shown proteins that are isolated from the sample and are later digested using an enzyme such as trypsin. This digestion leads to generation of peptides and since these are complex peptides, one needs to separate them using multi-dimensional fractionation methods such as liquid chromatography so that they can be ionized in the mass spectrometer. These ionized peptides also called as MS1 signals or precursor signals are eventually fragmented in a mass spectrometer. The peptides are isolated and then fragmented and then eventually mass analysis is performed so that it generates a mass spectrum which has intensity on the y-axis and the mass to charge ratio on the x-axis. In single organism proteomics, the mass spectra that are acquired from your mass spectrometry experiment are matched against protein-fasted files to identify peptides. Shown here is one such mass spectrum which was assigned a peptide and this peptide was then assigned to cytochrome C which is a protein. In single organism proteomics, it's relatively easy to assign a peptide to a protein or a protein group. The situation in metaproteomics is slightly different. For example, in metaproteomics these challenges are slightly more pronounced. For example, in single organism proteomics wherein you're studying an organism such as human sample or yeast sample or any bacterial sample, your database size is generally very small around 10,000 sequences to 100,000 sequences and the complexity is not very high. You have your proteins that come from this particular organism along with contaminants. While if you're studying a metaproteomic sample, you're starting with a multiple set of microorganisms and hence when you're searching a database, especially if you're looking at a public repository database, your database is almost a million or sometimes larger than that and there are multiple proteins which are homologous to proteins that are present and shared within different organisms and hence there are quite a few challenges with metaproteomics analysis. For example, there have been search algorithms that have been developed to address large and complex database searches. The protein grouping needs to be done at multi-organism level. Identification statistics are affected by large databases. The taxonomy is based on unique peptide identifications and then the functional analysis is based on proteins that can be identified. So for all of these, you actually require quite a few software tools and multiple processing steps and hence Galaxy as a platform has really helped us in this wherein you can bring in this multiple software tools, build a workflow and then run these workflows on multiple samples. To get an idea about how a metaproteomics workflow functions, let's start with the mass spectrometry data as we discussed earlier. This mass spectrometry data generates multiple spectra as we looked at before and then from FASTQ files and these FASTQ files come from your mass metagenomics data, you can generate a protein FASTFILE. You basically take your mass spectrometry data and match it against this protein FASTFILE to identify peptides. So this first method of generating database is an important method because you want to ensure that you have the right composition of proteins in a sample and if your database is large or if it is small, you might end up using various database search strategies to identify these peptides. However, once these peptides are identified, you can go and try to identify the taxonomic composition of these peptides or of this sample and your peptides are of different forms. You could have unassigned peptides, you could have shared peptides or you could have unique peptides, peptides that are unique to a particular species, genus or even a strain and these unique peptides are the ones that you use to identify a particular taxonomic unit. The other part that peptides can be used for is that you can use them to identify proteins. One thing to remember here as we mentioned earlier is that these proteins are sometimes shared within multiple organisms but that does not prevent us from identifying the function of these proteins. So, all of these four components, database generation, database search, taxonomy analysis and functional analysis constitute to a metaproteamics workflow. Now that we have these basics of a metaproteamics workflow, let's move on back to the hands-on session and download and then start a workflow and we'll run this workflow against the inputs that we had downloaded earlier. We will observe the outputs that come out of this workflow and while we are waiting for this workflow to generate these outputs, we can go back and look at some of the basics of the software tools that are used in this workflow. So, let's go again back to our Galaxy EU instance. In order to download the workflow, there are two ways one can download the workflow. You can either go to shared data and use click on workflows and then in this list you'll see metaproteamics GDN workflow. So, you can use that and say import and then you can start using this workflow. So, that's one method of getting this workflow. The other method of getting the same workflow is if you go to metaproteamics tutorial website, there is this link for workflows under supporting materials. So, click on that link, click on workflow.ga file and then it downloads this .ga file, workflow.ga file in your download and now you can go to your website and click on import and then browse and under downloads you'll find your workflow file and you can import this file. So, I just showed you two different ways of downloading a workflow file but for this tutorial we can use either of them. So, let's maybe use this second one, the one that we downloaded from the GDN website and the way to invoke workflow is you click on this or you can click on this arrow button here, run workflow and once you do that it shows you the various inputs that you can use here based on the history that you have generated and also various steps or tools that have been used here. You can also use the edit workflow option to look at the layout of this workflow. So, just to give you an idea about what this workflow has, we can look at the input files. So, the input files here is this data set collection of MGF files that we had generated as a data set collection. The protein faster file generated from SixKill and then these two would be searched using various search algorithms within search GUI and as you can see here it has used X-Tandem and MSGF plus and then processed with peptide shaker and then it uses tools such as query tabular tool, unipept analysis and then eventually another query tabular tool to generate outputs that give us an information about the functional as well as taxonomy analysis. So, I can also run the workflow from here. If I click on this run workflow option and once you do that it takes us to that same page that we saw earlier. So, the first input is the protein faster file and that's number 4 and the workflow chooses this file based on the data input type. So, since there's only one protein faster file, that's what is chosen as this workflow. In certain histories you might have multiple data types or multiple files with the same data type and that might sometimes lead to you to choose the right input file but in this case that's not a problem because there's only one protein faster file. The data set collection for the MGF files which are the converted mass spectrometry files have been selected here and the third input was the gene ontology terms. Feel free to scroll through and look at what are the parameters that have been used for these searches. So, as you can see there are these two search engines that have been used. Protein digestion options it was a trypsin was used to digest these proteins and hence we selected trypsin. For precursor options precursor ion tolerance was at 10 ppm or 10 ppm and then the fragment tolerance was at 0.5 dalton and then there are these various parameters that were chosen so that peptides could be matched against the spectrum. For protein modification there were no fixed modifications or variable modifications that were chosen here so these are the few options that you have at least for the search query part and so on and so forth. You can kind of keep scrolling down and try to find out what are the parameters for each tool and one thing to be noted is that depending upon your data set once you start analyzing these workflows or your samples on your own data sets it is important to note that you might want to check on these parameters so that you get optimal outputs from these workflows. So once we have all of these parameters checked or at least for this data set we are quite certain that these parameters have been properly checked so I'm going to basically start running this workflow since I'm sure that these input files are in right shape as well as the parameters have been properly set I go ahead and run these workflows. So the first thing that happens after you run a workflow as you must have seen in other tutorials is that these tutorials these workflows start loading in the history items so there are going to be as many as 13 steps in this history and you can see these start showing up so there is search query which is the first step wherein you're matching your spectra against your protein fastafile the peptide shaker inputs or outputs that are generated and then there is a query tabular tool about which we'll talk a little bit later as well as unipep tool that is that is invoked in this workflow and then eventually at the end we will see there are these four outputs that are generated there is this taxonomy information in terms of genera and number of psm's and peptides go terms either as biological processes or molecular functions or cellular localization so while this workflow is is running let's go and learn a little bit more about search query and peptide shaker and then come back once this workflow all all these history items have turned green so as you can see initially turns yellow and then later it would turn green after your search is complete and the same would apply for the rest of these history items as well while we are waiting for our history to get completed let's try to understand some of the tools that are present in this workflow the first tool which is the search query tool which matches your msm spectra against your protein sequences uses multiple search algorithms for example in our workflow we have used x-standom msgf plus and omsa as three tools that can be used so search our data set this helps in increasing the number of identifications you can also specify the enzyme that was used to digest your proteins based on your mass spectrometry measurements and the quality of the data can also specify the mass error both at the precursor ms or ms ms level so that you can identify spectral matches lastly search query also allows you to specify post translational modifications such as fixed modifications or transient modifications so that you can identify peptides the generated from search query are processed by another tool peptide shaker peptide shaker filters these search query results using false discovery rate analysis so that you can yield the most confident peptide spectral matches remember that peptide shaker generates outputs such as protein report peptide report and msm identimal files apart from the peptide spectrum report that we have been processing these reports help you to perform a subsequent analysis of your data set especially if you're interested to know more about the protein level or the peptide level or even visualization of some of the spectra using these file outputs now that we have covered these tools search query and peptide shaker let's go back to the history and see if our searches are over now that your history has run and as you can see here the search query ran through and one of the ways to look at some of the parameters that were run or even rerun this would be to click on this the circular arrow icon and find out what were the parameters that were used here and we had already looked at these earlier trypsin was used precursor options were used 10 ppm and 0.5 dalton and so basically this this helps you to ensure that you have run this this search query in the appropriate using the appropriate parameters the next step is peptide shaker and as you can see here peptide shaker has this mz addent ml file that is used that is generated we are not going to use mz addent ml in this particular tutorial but the mz addent ml can be useful for other applications such as the multi-omics visualization platform which helps you to analyze your data and even visualize some of the spectra or perform genomic coordinate visualizations as you will learn about in the proteogenomics tutorial what we are going to use in this particular tutorial is the psm report and a psm report is one of the multiple reports that are generated by peptide shaker so if you if you look at what are the outputs peptide shaker generates you can see there are quite a few outputs that peptide shaker generates we also have the protein report and the peptide report generated though in this particular it was hidden so if you go to hidden and unhide these you will see there is a protein report here number 14 and number 13 so even though you do not see this in history they are still present here and you can always unhide them in case you want to look at what's there in this particular report so coming back to the psm report which is what we are going to use for this for this this particular tutorial if i click on this i icon i'll see that what are the proteins it comes from and the reason why these proteins look like peptide sequences is because these proteins or this protein faster file was generated through six gill which basically takes in fast q file and converts it into peptide faster outputs and those peptide faster outputs or the sequences generated were used as headers and that's why you see these as as these sequences generally you would have a protein names or even protein accession numbers in this place and then in this psm report if you keep scrolling on to the right you will see the peptide sequence and other information associated with this peptide sequence so feel free to scroll through this understand what it means if you have any questions regarding these columns or what these mean feel free to reach out to us but for sake of this tutorial i'll keep going proceeding further so that we can we can cover much ground so one other important thing i wanted to mention here is there is this confidence parameter and in this particular case we have decided to only use those peptides which have at least a 95% confidence so the next step that we have used is this tool called as query results and so in order to kind of understand what it does is you basically click on this click on this circular arrow icon and the query tabular tool is a tool which basically takes in tabular files so what this particular tool does is that it leverages the SQLite database and uses regular expressions to parse out useful information from tabular files so one of the advantages of this query tabular tool is that if you have multiple tabular outputs getting generated or that have been generated through your workflow or in your history and if you want to somehow make some correlations or want to parse out information from each of these you can use some regular expressions but before that it generates this SQLite database from which you can parse out this information so in order to generate a workflow with query tabular tool you definitely do require some knowledge of structured query language or SQL although use of query tabular significantly has shown to improve the development and application of multi-step workflows in when we have used it in Galaxy so we have started using this query tabular tool for multiple studies using metaproteomics as well as proteogenomics so what this particular tool is doing has done is it's taken this psm report and then by using this this regex regular expressions it has generated a list of peptides or distinct peptides that are present in this sample and then as you can see here it has selected only those that are that have a confidence of at least 95 and has generated this list of distinct peptides now this distinct peptides are eventually processed by this tools tool called as Unipept and we'll we'll learn a little bit more about Unipept and then come back to understanding what what what what outputs we get out of it Unipept is an open source web application developed at the Gantt University that is designed for metaproteomics data analysis. Unipept is powered by an index containing all Unipro entries and a custom lowest common ancestor algorithm shown here in an example for a peptide shown in the first column can be assigned to multiple proteins since this is a metaproteomic sample and then can be assigned to multiple taxa and then Unipept performs taxonomy analysis to assign this to the lowest common ancestor which is the streptophyta taxonomic unit Unipept performs analysis for all the peptides that are submitted and then by using this method one can identify specific organisms that are present in a sample for this Unipept uses multiple suite of tools such as peptopro, prototaxa, taxonomy and so on and so forth. Unipept also generates quite a few visualization outputs as has been shown here and it can also generate functional outputs such as gene ontology terms, interpro terms as well as EC terms for protein identification. Attendees are strongly recommended to go and read publications for Unipept at this link here. So if you look at Unipept outputs now the first one that it's generated is Unipept to prot output and if you look at this output which is number 16 you'll see the peptide the Uniprot IDs so as you as I mentioned earlier Uniprot or Unipept searches against the Uniprot database and finds out which organism it comes from what are the EC numbers associated with this peptide what is the protein EC number associated with this peptide what are the GO terms associated with it and so that's that's the information you get from this particular from this particular table. There are also other outputs that it generates such as RefSeq IDs as well as you know other outputs that are generated but for this particular tutorial we will be using this the GO term output or gene ontology term outputs so that we can look at this. The other output that Unipept generates is this taxonomy output and if you if you so we are looking at data set number 17. If you click on this visualize data icon with bars here it gives you an option of using visualization plugins so the visualization plugin that we're going to use here is Unipept taxonomy viewer and if I click on this it opens this tree view option which basically you can interactively visualize so as to look at what are the different taxonomy units that have been identified here so if you click on this large blob here which is of proteobacteria you can you can basically keep clicking on these and if you if you scroll around here you it basically tell you for example at this particular taxonomic position there are 29 unique sequences so if you click on this one you can see that you know these start getting associated with different genera here and if you click on this other one which is Candidatus Belligibacter you will see if you click on that you'll see that there are at least four sequences that are specific to this particular sequence and you know you can also click on that so that you can see if there are any species level identifications and as you can see there are as many as 25 species associated or 25 sequences associated with this species so you can keep scrolling around and and look at more information that you can get out of out of this particular tree view and this is quite interesting because it kind of helps you to interactively visualize and look at some of the other species or some of the interesting species that you would like to like to see if they were observed in this particular data set. The other output that UNIPEP generates is this UNIPEP PEP2LCA output so remember we talked about the lowest common ancestor so using that algorithm if you click on this or if you use the I icon to look at what's the output from this it'll show you that there is this peptide sequence and then it gives you which super kingdom which kingdom which sub kingdom and so on do these particular peptides belong to and you can see that in some cases some of these peptides are present in all bacteria while some peptides can be identified at order level in this case rhodobacterials or if you keep scrolling now you'll also find some of these that are found in genera for example there is this one polaribacter and you know so this particular peptide the first peptide here is specifically identified or uniquely identified to this genus polaribacter. In some cases you might also find that there are some some of these peptides found in pelagibacter ubiqu which means it's at the species level and so it basically gives you an idea about the depth at which you can identify these peptides at right depending upon how unique these are so so this is really useful information and then we basically use all of this information from the PSM report as well as from your UNIPEP outputs both at functional and taxonomy level and again use this SQLite database the query tabular tool that I mentioned earlier to generate these outputs so one of the outputs that it generates which is which is useful for a biologist to look at is the various number of genera that are identified so as you can see here the candidate candidate pelagibacter was identified with at least 398 PSMs and 29 peptides so remember we looked at this pelagibacter and that were 29 peptides that were associated at the general general level so that's what's been shown here there is nitrosopumulus which is identified with 64 PSMs and so you know so this basically gives you an idea about the abundance of this particular taxa that are present in this ocean water sample the outputs that are again generated through this SQL tabular and these are the final three outputs that we see here and the one of the first two that we'll have a look at is this GO terms which are biological processes so biological processes gives you gives you an idea about an overview about the biological processes which which which are present in these proteins that are identified so remember we identified peptides and from these peptides we identified proteins and these proteins were then associated were assigned to these various functional terms GO functional terms and you can see translation seems to be the highest most prevalent GO term when we're looking at biological processes while if we look at molecular functions we'll see that so molecular functions at least gives you some a slightly more granular level analysis of your data wherein it shows you you know it not only tells you about that it's from translation but it actually tells you that this is a structural constituent of the ribosome and here as well you can see that's not surprisingly it's one of the most abundant you know sequences or most abundant GO term that's present there's ATP binding and so on and so forth so this basically gives you an idea about how this particular microbiome is reacting to the environment or at least in this case what are the most abundant proteins or most abundant gene ontology terms that are getting expressed and the last one is the cellular localization which gives you an idea about from you know the localization of these proteins so as you can see most of these seem to come from the cytosol of the of these bacteria as well as you can see there are quite a few ribosomal proteins which is also not surprising given that most of the molecular functions and biological processes seem to have that as the highest you know the most abundant GO term now having said that this is just looking at one sample and we have made some tools available and I'll be talking a little bit about that a little later in this in this tutorial there are tools available which have helped you to compare different conditions based on these quantities and hence this is very this is an important part to note that when you are comparing two conditions you basically need these workflow workflows to be done on two of these conditions or you know multiple replicates of these two of these conditions so that you can make some biological interpretation or even look at some differentially expressed taxonomy units or functional terms so with that we are kind of at the end of this tutorial we are we as we can as you can see just to summarize we started with this input data sets we invoke this workflow that we downloaded either from the GTN site or from the data library that's present here and we ran this and once we when once we ran this we we the event eventual outputs that we got where what are the genera that are present in these samples and what are the different biological processes molecular functions GO terms that are expressed UNIPEP also gives you information about EC terms as well as interpro terms so if you want to do that you can always alter this workflow so that you can you can use that but it entirely depends upon what are the biological answers that you're seeking from your data set so with this I would like you to I'll strongly encourage you to to provide us a feedback because if you do so then that will help us to improve this content even add any new tools or any new parameters that you think would be important or as I said since these workflows are flexible we can also try to see if any other outputs can be generated so that it can be useful for the metaproblemics community so to summarize the SQLite relational database helps us to summarize the taxonomic information that's present in your data set for example here it uses the PSM report along with the UNIPEP LCA table and passes out information so that we get an idea about how many PSMs as with how many distinct peptides contribute to these particular genera similarly the query tabular tool also helps us to also address the question about what are these organisms doing in this microbiome data set and for this it uses the PSM report and sequence information as well as the information from go term analysis to give us outputs that can be used to determine which go terms are present in this data set so I'm hoping that this particular tutorial will help you analyze your mass spectrometry data especially if it is from a metaproblemics data set to detect peptide sequences derived from shotgun metagenomic data or any protein database that you have available to match it against I also hope that this helps you to answer the question about what is the taxonomic composition of your metaproblemics data and for this it will use the UNIPEP algorithm especially the lowest common ancestor algorithm to detect unique peptides that are present in your sample and lastly this workflow and this tutorial will help you to analyze your data to detect the functions that are present in your metaproblemics data and for this you could use gene ontology terms as has been shown in the tutorial or you can also use interpro outputs as well as easy term outputs to determine that all the material that was presented here is available on this website that's shown here and I will strongly recommend you to please provide us a feedback so that we can improve on this tutorial as we go forward in the next few slides I will be discussing a little bit about other tutorials in the field of microbiome analysis that we have at the GTN so we went through this workflow the metaproblemics workflow that we discussed earlier in this tutorial apart from this we have all the ability to perform quantitative analysis on these data sets so for example if you have a mass spectrometer data you can either use spectral counts or intensity data to get some quantitative information that can be used to determine functional expression as well as taxonomic expression in order for this to be possible we developed a tool called meta quantum which has been published in this journal in 2019 the meta quantum tool uses peptides that are identified in the workflow that we went through earlier but it also combines it with the precursor intensity or ms1 intensity along with functional annotation as well as taxonomic annotation and all of these inputs are processed by meta quantum which is a suite of statistical analysis tools to generate data exploration outputs volcano plots to determine differential abundance and heat map cluster analysis as has been shown here on the right side for more information about meta quantum you can visit these three links shown here on the galaxy training network the first tutorial will take you through how to generate inputs for meta quantum analysis and this includes some of the tools that we have already covered such as search going peptide shaker and unipept while there will be newer tools such as flash lfq to generate inputs for meta quantum analysis the second and third tutorials will cover functional analysis as well as taxonomic analysis to generate outputs for biological interpretation using meta quantum apart from meta proteomics we also have a tutorial on meta transcriptomics and this has been developed based on a manuscript that was published in 2018 on assign suite of tools we have tested this suite of tools and optimized it for meta transcriptomics analysis and you can find this tutorial on this link that has been shown here lastly we'd like to thank the galaxy p team our collaborators and our grant funding agencies we also like to thank you for attending this tutorial please do not forget to fill in the feedback at the end of the gtn meta proteomics tutorial website so that we can continue to improve the tools and workflows users in this study are available on the galaxy eu instance and the galaxy training network please contact us if you need any information thank you for your attention