 Hello, my name is Emily and I am part of the Galaxy P team at the University of Minnesota. Today, I will be explaining how to run the Encyclopedia workflow, including an explanation on the workflow's inputs and its different steps. As Prateek mentioned, Encyclopedia is a very powerful tool that allows for the analysis of DIA data, and I will be happy to walk you through running the Encyclopedia workflow on Galaxy EU. Specifically in this presentation, I will give a brief explanation of the data sets and workflow before we run the Encyclopedia workflow. Then we will go into more detail on the different tools and parameters included in the workflow, as well as variations on this standard Encyclopedia workflow that you can use to tailor the workflow to your specific data. Lastly, we will go over the results of our workflow test, and I will give you a summary of the outputs generated and their results. When running the standard Encyclopedia workflow, four inputs are required. One, an experimental DIA data set collection of a wide window variety. Two, gas phase fractionation or GPF data set collection of a narrow window variety. Three, a spectral or prosit library in a D-Lib format. And four, a background protein fastafile. The initial experimental DIA data set collection and the GPF data set collection will both be in a dot raw format. These data sets are both made up of the same DIA data. However, while the experimental DIA data set collection is made up of wide windows, the GPF data set collection uses narrower windows for the pool data with multiple acquisitions per precursor scan. Then the GPF data set collection is used in the generation of the chromatogram library that will be searched against in the analysis of the experimental data set collection. Hence providing the context of the specific DIA sample being analyzed. The last two data sets required are a D-Lib spectral or prosit library and a background fastafile. These two inputs are both required in addition to the GPF data set collection in generating the chromatogram library. Once the chromatogram library is generated, it will take the place of the D-Lib input library and along with the background fastafile, it will be used in the quantitative analysis of the experimental DIA data. Here is an image of the encyclopedia workflow. I will briefly explain the three different steps of the workflow now before we run it as part of the tutorial. The first step is the conversion of files using MS Convert. As the experimental data set and the GPF data set are both in a .raw format in this tutorial, they must be converted to a .mzml file type, which is the appropriate input type for the search to live and encyclopedia quantified tools. The next step is the generation of the chromatogram library using the search to live tool. As mentioned, the chromatogram library is generated using the background fastafile, the D-Lib library, and the .mzml GPF data set collection previously converted by MS Convert. The generated chromatogram library in an E-Lib format will take the place of the original D-Lib library input and I will discuss the importance of this later on in the tutorial. The last step of the workflow is the encyclopedia step and the analysis of the experimental DIA data. This step uses the chromatogram library, the background fastafile, and the .mzml experimental data set as inputs. Now that you are familiar with the three steps of the workflow and the general encyclopedia inputs, we are going to run the encyclopedia workflow together on Galaxy EU. Go to the Galaxy EU site at usegalaxy.eu. If you do not already have an account it is easy to make one by clicking log in or create an account. Once signed in we're ready to begin. The first step in running the workflow is to create a new history. To do this click on the plus sign here in the upper right corner of the screen. Once the new history is made give it a name that is informative on the workflow being run and the data being analyzed. In this case I am going to name the history encyclopedia GTN IPRG analysis June 24th as I always like to add the date to my history title. Next we will be importing the data. For this tutorial we are using IPRG DIA data. Accessing the data is simple. Go to the linked Galaxy training network tutorial for encyclopedia that is associated with this video. Scroll down to the import data section. Here you will find many Zinodo links which is how we will be importing the data. Zinodo allows for online storage of files and direct import to your Galaxy history for analysis without the need to download very large files onto your computer which is very slow and a big hassle. To do this press the copy icon. This will copy all of the links each of which is associated with a data file that we need to run the encyclopedia workflow. Then go back to the galaxy history that we created before. Click the upload data symbol on the left panel of the screen and at the bottom choose paste slash fetch data. This box will pop up and at this point you can paste the links that you previously copied from the encyclopedia GTN. Then click start and the files will begin to upload to your history one by one automatically. Again this method is much faster than the downloading the files onto your computer and then uploading them. At this point I fast forward through the recording because uploading the files takes a few minutes. Once it turns green and you've uploaded your data we will be renaming the datasets so that the name only contains the sample name without the percent 20 character and we are going to remove one of the dot-raw prefixes. To do this click on the pencil icon edit attributes. As mentioned we will edit the name so it only contains the sample information. So in this case remove the GPF DIA dataset collection as well as the percent 20 dot-raw. Now the file name only contains the sample information and one dot-raw. We will repeat this process for all six GPF files. Again I fast forwarded through the editing because it's the same for each six of the GPF files. Now we will rename the experimental files similarly. Click on the pencil icon of one of the experimental files and remove experimental DIA dataset collection as well as percent 20 dot-raw so that the file name only contains the sample information. We will again repeat this process for all four of the experimental files. Again I fast forward through this process because it's the same for each of the files. The FASTA file and the D-lib file do not need to be renamed so once we've renamed all of the GPF and the experimental files we can move on. As the data files are renamed we can now create the dataset collections that will be required to run the workflow. We will create two dataset collections one containing the experimental dot-raw files and the other containing the GPF dot-raw files. To accomplish this click the checked box in the right corner that says operations on multiple datasets. Then check the GPF files. You will check six boxes. Once selected click the for all selected tab and then select the build collection from rules option. When the dataset collection window pops up add an informative name to the collection. In this case I'm going to name it IPRG GPF dataset collection. Then click create. We will follow the same steps to generate a dataset collection for the experimental files. Clear your GPF checkmarks by clicking the none tab at the top and then select the experimental files. Click the for all selected and build collection from rules again. This time I will name the collection IPRG experimental dataset collection and then click create and again click the checked box in the right corner that says operations on multiple datasets to remove the boxes as we're done creating dataset collections. At this point you will have two dataset collections you've created, an experimental dataset collection with four items and a GPF dataset collection with six items. Now that we have uploaded the data and created the dataset collections we will upload the workflow. For this we will go to the shared data and then workflows at the top of the screen. Let's use this search bar at the top of the screen and type in encyclopedia and the workflow that we are going to use is labeled June 22nd GTN encyclopedia workflow raw inputs. We can use this drop down arrow and click the import to import the workflow to our list of workflows. Then let's go to the workflows tab at the top. Here we see imported June 22nd GTN encyclopedia workflow raw inputs. Once the workflow is pulled up and we are ready to run it we can click run workflow on the that icon on the right hand side of the encyclopedia workflow. Each of our files and dataset collections from our history has a specific place to go in the workflow. Specifically the GPF dataset collection that we made will go in item number one under GPF files. The Dlib file will go in item number two the spectral or prosit files. The experimental dataset collection will go in item number three under experimental DIA data. And lastly the background FASTA file will go under item number four background protein FASTA file. Once all of these inputs are in their correct slots we can click run workflow. Once you have run the workflow this page will pop up with information on the invocation of the workflow as well as the progress of what steps have yet to be completed. The nice thing about running workflows on Galaxy EU is that they do not require constant attention meaning that you can hit run and complete other tasks and do not need to be on your computer to watch the workflow run. And in this case it's very helpful as the encyclopedia workflow can take five to six hours to fully run and so you don't really want to be tied to your computer for that entire time. Therefore while we leave the workflow to run I am going to go through the steps of the workflow in a little bit more detail. As previously mentioned the first step of the encyclopedia workflow is the conversion of files from dot raw file type to the dot mzml file type which is an important step as search to live an encyclopedia both require dot mzml input file types. While file type conversion is the simplest step it is also the step that is most tailored to the data input. More specifically the parameters used by msconvert to convert the files will vary between different input data samples and need to be set by the user accordingly. In this case the specific parameters that were changed from the default parameters are shown on the screen. I will now take you through the a quick peek at the parameters on galaxy e. So to examine the parameters of msconvert we can go into the editing mode of the workflow. To do this go to this drop-down menu and click edit. Then once it loads there we go once it loads go to the msconvert step. As you can see the output type is mzml. The parameters that were changed from the default setting are apply peak picking which was toggled to yes the demultiplex overlapping or msx spectra which was also toggled to yes under that specific parameter optimization was changed from none to overlap only. Next the sim as spectra option was changed from yes to no and lastly the intensity encoding precision option was changed from 32 to 64. These same parameters were also changed for the conversion of the experimental DIA data as well as the GPF data. Again it is important to realize that if you use the encyclopedia workflow with your own data make sure to alter the msconvert parameters as needed to accommodate this. The second step of the workflow is the generation of the chromatogram library. A challenge to analyzing DIA data is that DDA generated libraries or predicted spectra libraries are not always a reasonable representation of DIA data. Differences in retention time methods in data collection and convolution cause significant differences between DIA and DDA data. As Pratik mentioned using a chromatogram library can bypass issues that arise with a DDA data generated library or predicted spectra library. The inputs to search to lib are the GPF data set collection in mzml format after msconvert, the background fastafile, and the dlib library either DDA generated or predicted spectral library. As the GPF data set collection is generated using the same DIA data as the experimental data set collection it provides context to the data to analyze as well as takes into account the differences between DIA and DDA data. An important distinction between the experimental DIA data and the GPF DIA data is that the GPF data uses multiple acquisitions for each precursor scan that cause the windows to be much narrower than the experimental DIA data set collection. The narrow windows used in the GPF sample preparation mean that the GPF data set collection offers a very rich and in-depth understanding of the experimental DIA sample contents which makes it very useful in library generation. Therefore when used in combination with a DDA generated library or predicted spectral library and the background fastafile DIA GPF data not only provides the context of the experimental DIA data but allows for the generation of a richer and more finely tuned library for analysis. Additionally as mentioned the chromatogram library generated will take the form of a .elib file while the spectral or proset library will be in a .delib format. Elib files contain more information including retention time, mass to charge ratio information and intensity compared to the delib file type making it a more thorough library to use in data analysis. Again let's take a look at the search to lib parameters the same way that we did with MS convert. So as you can see in search to lib going into the parameter settings all of the settings are either toggled to know or we use the default parameter options. Additionally you can see in this search to lib box that a input data set log text file is also generated. However as we are mainly focusing on the elib chromatogram library this is the emphasized file that is output. The actual data analysis of the experimental DIA files comes with the last tool of the workflow Encyclopedia Quantify. The inputs for this tool are the background FASTA file, the experimental data set collection to be analyzed, and the chromatogram library that takes the place of the delib library that was used in the previous step. If we examine the parameters of the Encyclopedia Quantify tool in this workflow we see that the original default parameter settings were used again. Additionally we see that Encyclopedia Quantify generates many different output files including a log text file, a quantify input data sets an elib file, a concatenated quantify input data sets in a tabular format, and then two tabular files one quantifying the peptides and one quantifying the proteins. These two files are the primary files that we use to analyze the results of our workflow in this tutorial. Here is the agenda slide again and now we will discuss how the Encyclopedia workflow can be tailored to your specific DIA data by altering the parameters of the workflow as well as variations of the workflow that are available. If you converted your experimental and GPF files to the mzml format outside of Galaxy EU you can alter the standard workflow to directly use your mzml inputs which i will demonstrate now. Simply you take out this msconvert step. So to edit the workflow to accept mzml inputs go to the workflows tab at the top. I am already here. Go and make a copy of the Encyclopedia workflow using the raw inputs that was taken from the share data workflows tab and make a copy. Now open this copy in the edit mode and for both msconvert steps click remove which is this little box and the x. Now connect the GPF DIA input to the spectrum files in mzml format. Do the same with the experimental DIA inputs to the spectrum files in mzml format for the Encyclopedia Quantify tool. Now we are going to rename these inputs just to indicate that they are going to be in the mzml format. We will do the same for the experimental data. Once we've done that we can just click save workflow and then if we go back to the workflows tab we will rename this workflow to indicate that it uses mzml inputs so i'm going to add mzml input compatible and simply put that is how to edit the Encyclopedia standard workflow from accepting the raw inputs to accepting mzml inputs in case you converted your files outside of Galaxy EU. If you are missing the predicted spectral library you can still run the Encyclopedia workflow. However you will be running a variation on the Encyclopedia workflow called the Walnut Encyclopedia workflow. If you don't have a spectral library then the number of quantified peptides and proteins will likely not be as high as if a library was used. More information on this can be found at this link on the screen for a poster where the Galaxy P team tested the Walnut versus the standard Encyclopedia workflows. The Walnut Encyclopedia workflow is easy to generate using the standard Encyclopedia workflow. Simply we are going to remove the dlib input file. To do this again go to the workflows tab in Galaxy EU and create a copy of your workflow like we did before. Open the workflow in edit mode and select the spectral or proset library file and simply click remove. Then click save workflow and back to workflows and now we're just going to rename this copied workflow to indicate that it is the Walnut variation of the workflow. This specifies that this Walnut Encyclopedia workflow uses raw inputs. If you followed the protocol that I described to produce a workflow that uses MZML inputs you could create a Walnut Encyclopedia workflow that uses MZML inputs as well. So this workflow can be tailored very well to your specific data and what works best for your data. So now that I have briefly explained the different parts of the Encyclopedia workflow in a little bit more depth as well as variations on the workflow that you can make let's take a look at the outputs from the test that we ran together on the IPRG inputs. The Encyclopedia workflow can take many hours as I mentioned so while your workflow might not be completed at this point in the video I have a history here that I ran on June 22nd a few days ago in which the Encyclopedia workflow has successfully run on the same IPRG inputs. So if we scroll down here items 15 and 22 in the list are the dataset collections of the experimental and GPF data once converted to the MZML format. Items 27 and 28 are outputs of search to lib the log text file and the chromatogram library in a elib format. Lastly items 32 and 33 are outputs of Encyclopedia as well as output 29. Outputs 32 and 33 are the quantitation outputs for the peptides and proteins that very clearly state the number of peptides and proteins that were found in the sample. For example it looks like there were 24,457 peptides that were quantified and 4,460 proteins that were quantified. The tabular format of these files means that they are compatible with Excel and accessible to examine in other platforms. Additionally statistical analysis tools such as MS stats, MAP, DIA and DEFACTO are recommended for further analysis of these files. Again while these are the files that are shown as outputs of Encyclopedia 29, 32 and 33 there are also hidden files that you can find by clicking this hidden button at the top to include 30 and 31. The other output files of Encyclopedia the quantify in an elib and the concatenated results in a dot text format. Well that concludes our run of the Encyclopedia standard workflow. Thank you so much for following along. We would also like to thank the Galaxy Training Network and the Galaxy EU team with all of their essential help in creating this tutorial. If you have any questions following running the tutorial or have any trouble with the workflow please feel free to reach out to the Galaxy Training Network page that is linked. Additionally please review our training protocol as it helps us improve where we can. Again thank you so much for listening from the Galaxy P team.