 Hi, my name is Alex Ostrovsky and today I'm going to be introducing you to Galaxy using our Galaxy 101 for everyone training. Before we begin, although most of the Galaxy interface is visible to you initially before you have done anything, in order to gain access to the full suite of Galaxy capabilities, we will first have you make an account. To do so, go to the login or register tab on the masthead. Here, select register here to create a new account. This new account that will be generated is only available on the specific Galaxy server on which you are creating this account. Here, I am using usegalaxy.org. You would need to create a new account if you wish to use an alternative server such as usegalaxy.eu or usegalaxy.org.au. Now, before we begin our analysis, let's take a look at the layout of the Galaxy main page. On the left side of the screen, you'll find our toolbar. Here, you can see any tool that is available on this Galaxy server. These include data import tools such as pulling from SRA or more personalized methods of data input, which will be discussed later. You can also access manipulation tools for data cleanup or at the end of an analysis. You can also access more analysis-specific tools such as those used for RNA-seq or for genome assembly or what you need for whatever analysis you will be performing. The center panel is the main workspace for Galaxy. It's where tool forms will become available. It's where you will visualize your results as well as various other aspects of interacting with Galaxy. On the right is your history. This is where any data sets that are imported or generated by your analysis will become available for you to view, edit, or use in future steps. Now that you've been introduced to the Galaxy interface, let's run an analysis to get you used to using Galaxy. We're going to rename our history first to make it identifiable for future reference. I'm going to click the pencil next to the name of your history, and in this case we're going to name it Galaxy 101 for everyone. When you're ready, you can click save. To begin our analysis, we'll first need some data. So go to the upload data button on the top left of your toolbar. This will bring up a modal, which will be your central method of importing data into Galaxy. Here you can choose local files, choose remote data from specified online repositories, or paste and fetch data either text-based or from an online URL. We're going to be using the data provided in the training itself, so we're going to choose paste and fetch data. We're going to name that data set, which will allow us to manually set the data's title in our history once it is imported by editing this name box here, and then we are going to paste the URL in the box provided. When importing data into Galaxy, most of the time, the data type will be automatically sniffed out by our Galaxy data type sniffers. However, if you want to be sure, you can always manually set the data type here. We're going to be setting it as a CSV file, and when you're ready, you can click Starts to perform the import. When a data set is queued in the history, it will appear gray. When it is running, it will appear yellow, and when it is ready to be used further, it will appear green. Once complete, we can also tag the data set to make it more identifiable within the history. To do this, click the tag button after expanding the data set by clicking it in the history, and to type your tag. If you add a hashtag at the beginning, it becomes a propagating tag, and all children data sets of this original data set will carry that tag automatically. We'll add the propagating tag Iris, and then we will click the pencil to edit the metadata of this data set. You can see here that you can change the name, you can change the associated genome of the data set here, but we're going to be looking at the data type of this file. If you click on the data type drop down at the bottom, that will let you manually assign the data type of this data set without changing anything about the data set. However, if you'd like to convert this data sets to another format, you can use the Converts tool at the top. Here, we're going to be converting this from a CSV file to a Tabular file by using this. Select your chosen conversion, and then click Create Data Set, and it will be generated in the history. You can see here that the new child data set carries the same tag, and once the conversion tool is run and we have our new data set, you can click the pencil and rename that data set. And so that it's identifiable, we're going to be changing the name here to Iris Tabular, and then clicking Save to make that change propagate. Now that we've converted our data to the proper format, we're going to run our first tool on it to begin preprocessing. We're going to look up Remove Beginning, which is a tool that will let us cut out the first lines of a file. We're going to remove the first line of the Tabular file, because that is a header line. You can view the file by clicking the I on the data set in the history. Once that's done, we're going to click the pencil again to rename the data set, and this time we're going to be renaming it to Iris Clean. And then we click Save to let that propagate. Now we can take a look at our new data set by clicking the eyeball on the clean data set. You can also click the data set itself to expand it within the history to get a preview in that little window there. So this is what our data looks like after this step. Next we're going to get a little bit more information about what our data set contains by isolating the fifth column. We're going to look up the cut tool and cut columns from a table, cut out the fifth column. This file is delimited by tabs and double check that we're using Iris Clean, and then run tool when we're ready. This tool simply will create a new Tabular data set that just contains the fifth column from the original data set. Now we can go look up the unique tool, use the new data set that was generated, and that will isolate only unique entries within the fifth column. Now we can click Submit when we're done, and we're going to rename these data sets so that we know what they are within the history. So we're going to rename this one to be Iris Species Column. And once this one's done, we're going to rename the unique one to say Iris Species, because that is all this new file contains, is unique instances of the species names. You'll see that there are three values within that column. Now we can also click the I on the data set within the history to take a look at the data set itself in more detail than just the preview that is shown when explaining the data set. This is the whole file. This is not the only way to generate this type of data from our original file, so we're going to show a different method of doing so by using the group tool. We're going to use our Iris Clean data set and we're going to group all of the species by the fifth column. The results will be the same as when we had run the previous tool. We're going to then rename this data set Iris Species Group because that is how we generated this, even though it will be the same as the Iris Species file. If you want to rerun this tool, if you did something wrong or if you want to rerun the same step on a new data set, you can click the little rerun wheel on the expanded data set in the history. Now we can get a little bit more information about the data set that we have by using the same grouping tool to figure out how many samples per species are in the data set. We're going to put in the same original Iris Clean data set, grouped by column five for the species, but this time we're also going to insert an operation. We're going to get the count of all unique items within column five and then run against column one, and this should give us, once we are ready to submit, all three of our species, but with the counts of associated instances. This, once we expand it, shows that all three species had 50 instances. We're going to rename this to be Iris Samples Per Species Group, and then we're going to click Save. Although the group tool is great for getting information about the data set, Galaxy has another tool called Datamash, which allows you to run calculations against your data set. So here we're going to get a lot more in depth with the information we're going to be looking at. We're going to be selecting the Iris Tabular data set. We're going to be grouping by the field, which is in this case the column five. We're going to sort by input. The input does have a header line because this is the file before we took it out. We're going to print the header lines. We have more information at the end. We don't care about printing all fields from the input file, and we're going to ignore case with grouping. Now, for each column other than five, because five is what we are basing this against, we are going to be calculating the mean and the sample standard deviation. That means we're going to have to put in eight operations here, and I'm going to skip ahead so that you don't have to watch me redo the same thing eight times. And now we're back having input everything, and we're going to run the tool. And once that's done, we can quickly rename the data set. We're going to call this one Iris Summary and Statistics. And then we can click the I to take a look at what data mesh is calculated for us. And here we can see that for each column in the original data set, we have been able to group by species and then find the mean and standard deviation for each, which allows us to see the mean and standard deviation for sepal length, width, pedal length, and pedal width. Galaxy is also fantastic for creating visualizations of your data. For example, we have access to the scatter plot tool from GG plot right in your toolbar. In this case, we're going to be using the Iris Clean data set, which if you remember, we do not, we removed the header line from. We're going to plot column one against column two, that would be sepal length and sepal width, and we're going to label our axes as such and create a title for our plot, which is sepal length as a function of sepal width. We can customize our plots a little bit more. So within advanced options, we're going to be changing data point options to user defined point options. And then we're going to be changing the relative size of points to 2.0. We're going to be changing the shape of our data points into circles. And then we're going to be plotting multiple groups of data on one plot. We're going to be differentiating these by column five. And we're going to be setting our color scheme to set to predefined color palette. And we're going to add the additional output of a PDF file in addition to the normal PNG. Now, as many of you may have noticed, when I was running this tool, I accidentally set the column differentiating different groups to 15 instead of 5, which will cause this job to fail. What's really nice about Galaxy is it's very easy to fix these kinds of mistakes. I can expand a data set once it has failed. And then click the rerun button to very quickly run the same job a second time with all of the inputs the same way I had them before. Find the error that I had made before, which is in advanced options. And then change the column differentiating the different groups to the proper number of 5 and to run it as standard. And we can click the trash can on the failed data sets in the history to know what's wrong and how to fix it. And we can rerun those to make them disappear from the history. These failed data sets can be retrieved from the trash can icon at the top of your history. And you can see that these new data sets have populated in our history and they will run properly into completion this time. We can view our new graphs once the data sets finish running by clicking the I icon as previously. This is what our graph looks like. You can download these files as you wish. One of the nice functions of Galaxy is that now that we have run a full analysis we can very easily extract that analysis for a full rerunable workflow. So we're going to go to the little drop down arrow on the history and click extract workflow from history. This new screen shows all of the steps that were used to generate data sets within the current history. You can select steps to include or exclude by clicking the checkboxes next to their bubbles and you can name your new workflow. In this case we're going to be naming it exploring Iris data sets with statistics and scatter plots. So it can be more easily identified later. Additionally during this workflow generation process you can more easily check and uncheck data sets by clicking uncheck all or check all as opposed to having to go and find specific ones and when you're ready you can click create workflow. You'll then get a message that says the workflow is created but the screen will have changed. You can access the workflow you just created by clicking the edit or run workflow buttons on this pop up or you can click the workflow tab to see it in all of your workflows. Here you can run your workflow directly by clicking the play button or you can edit it by going to the drop down button and then selecting edit. Now in the workflow editor you can fine tune your workflow by specifying which outputs should be hidden within the history or shown within the history when run. You can specify and change tool parameters by clicking their associated bubbles and then changing those parameters in the right column. To show a data set in the history when a workflow is run you're going to make sure that the check box is filled in for the outputs that you want to see. So in this case we're going to be seeing the unique output. We're going to be looking at group outputs and we're going to be looking at the scatter plot outputs. So we're going to check those boxes to make sure that those will appear in a history for a user. And once we've changed everything we need to change we can click save and then use that workflow when I need data set. To do that let's create a new history and then into this new history in the same way we did last time we're going to upload a new data set. This one we're going to call diamonds and the URL for that is also available in the tutorial and then we will click start in order to get it to upload your galaxy instance, close the modal, let the data set complete importing and then add the propagating tag diamonds. And although I forgot to here you can click the pencil at the top of your history to name that again so you might want to name it diamonds or something more identifiable. And when you're ready you can go to the workflows tab at your masthead and access the workflow we just created. Then you can click the play button to access the workflow form. You can either run as default with new data sets there or you can click to expand and access all of the individual tools to manually change parameters or view them. We're going to open up the scatter plot tool, find the plot title and click the pencil to be able to edit the title. We're going to rename the plots to diamond price as a function of carrot with cut as a factor. We're going to change the labels for that. For the x-axis we're going to call that weight of the diamonds in carrot and then change the y-axis price in US dollars. And when we're ready we can scroll to the top and click run workflow. And once you click that your workflow will be run against your new input data set and you'll see the workflow invocation page here which will show you the status of your workflow and then as they load in, data sets will appear in your new history. You can also share or publish your histories once you have finished it to make it more easy to see a completed analysis rather than just a workflow. You can go to the carrot in your history and click share history and then you can make your history accessible which means that if somebody has the URL they might be able to view it. You can specify who has the ability to see the history. You can also make the history directly viewable in the shared data tab from the masthead. Shared histories will be available on the server on which you shared them. And with that we have reached the end of this training. Congratulations! To summarize the key points Galaxy provides an easy to use graphical user interface for complex command line tools will keep a full record of your analysis in a history generates workflows which will enable you to repeat your analysis on different data can connect to external sources for data import and visualization purposes and provides ways to share your results and methods with others. Thank you very much for using Galaxy and have a nice day!