 Hi everyone, I'm Anne. I'm working at the University of Oslo and together we'll go through the Galaxy 101 for everyone tutorial from the Galaxy training. So where can we find this Galaxy training is in the training galaxyproject.org and introduction to Galaxy analysis and Galaxy 101 for everyone. So all the training material look the same, you will always have at the top this panel with the question, the scientific question we want to answer. And here this is what are the differences between the iris species and I will explain you later on what what it is about. And the objectives of the lesson so we'll familiarize ourselves with the basics of Galaxy we learn how to import data from external sources. How to tag datasets, how to run tools, how to use histories, Galaxy histories, how to create a workflow and share work. So this tutorial takes about one hour and a half and we'll go through step by step. Here you can see all the different Galaxy instances you can use for running this tutorial. If you left click, you can see most of them can run this basic Galaxy 101 for everyone tutorial. I will be using the use galaxy.org the main instance. So if I go here, one of the first step will be to register and to log in. But let's see, look at the interface first. Here you have the left panel, which is a tool panel with all the different tools Galaxy tools will be using a few of them. This is the main panel where everything will go on mostly when you run tools and when you inspect datasets, and by default you end up in the landing page with news and information about the Galaxy project and the Galaxy instance. And on the right panel you have the history panel, which is here empty. We need to log in. So we need to register. If you haven't registered yet, I will click on this login and register. If you don't have an account yet, you need to register. Put your email address password and the public name. Once you will once create you click on the create button you will get email confirmation it can take a few hours. And then you will be able to log in. So I have already an account so I will look in this galaxy instance. And there is no differences between what we had before and what we have now, except at the top here if I click on user, I can see I'm logged in. And this is what we want. And actually now I can close the panel for this tutorial because what we will be using is this very nice feature here if I click. I actually get the training, Galaxy training website with all the different training. And I can click on introduction to Galaxy analysis. And again find this Galaxy 101 for everyone. So nice, because we have these two panel here. I have this Galaxy 101 for everyone. And if I click on this external panel, I go back to my Galaxy instance. So let's have a look again to the Galaxy 101 tutorial. You have some introduction and background so we'll be using Iris flower data set. You can get some more information looking reading the paper Fisher paper from 1936 and some more information on the wiki page here if I look at it. Tabular data set and each row will contain one flower. And we have a flower with different characteristics. They are all Iris flowers, but depending on the size of the lens. Petal lens or simple lens or with it will be a different species of Iris and this is what we will try to analyze in this tutorial. We can read for more if you want to. So we have login and register and we have looked through this different panel with the tools main and history. And now we will create a history and this is really good practice when we start an analysis in Galaxy. It's really a good practice to start with a clean history and have only the data set and all the analysis results of the tools. You have used for this analysis because we'll see later when we'll create a workflow from the history it will be much easier. It will only contain your the steps of your analysis. So we'll do this part where we'll create a new history and we'll name the history Galaxy 101 for everyone. I'm going to click here to get back to my Galaxy instance and you see this plus button here. So it's loading and to change the name I click on the name life clip and I change I passed Galaxy Galaxy 101 for everyone. So to be effective you need to click on enter. If you don't, it will not rename your history. So we have renamed the history. Let's go back here. Next step is to upload the Iris data set. So you see this little button here copy. I will click on it and it will copy this line into your. You can go here in the upload data and past and fetch data and you can pass it and you can start and close. It takes a bit of time sometimes you need to refresh. So when it's gray, it means is your task is cute. And now you see it has changed from gray to orange. And it means it's running and you can see because it's spinning here. And once this is done, it will turn to green if the task is successful and red if it fails so now it's green. So you can see we have the data set and the next step will be to rename the data set. So this is what we explain here in the next step. So we have uploaded data set. And now we are renaming the data set to Iris. So we also check the data type, make sure this is CSV for comma separated value. If not, we'll change the data type to make sure it is well handled as a CSV file, and we'll add a tag, which is Iris tag and to make sure the tag is propagated to all the different data sets using this data set as an input for instance, an input to a tool. You want the output of the tools to have this the same tag. So we use this hash in front of it. I will show you just right now. So now we need to rename the data set, mostly because you see this very long name with an address, a web address is not very convenient. And in addition, some Galaxy tools do not work when we have this complex name. So I change the name here and I click on save. Then it will change it. I will check the format. So the format is CSV and it is correct. You can change here in the data types. And you could eventually change to the right type and click on change data type if this is not correct. So here this is CSV. So we are all set. Last step is to add a tag. So to add a tag, I can edit data set tags. And then I can add a tag here. And remember I said we need to use this hash tag and iris and I enter. It will take a bit of time to be effective. And now I have it. It's refreshing. So we have the iris tag here, right name and I all set. So now I can start my analysis. So we'll do some pre-processing on the data set. We'll convert the data set from a CSV file to a tabular. So why do we do that? Because many tools in Galaxy use, they use tabular format instead of CSV. So this is much more convenient to always convert your CSV file to a tab, separated value. So for this, we'll go to the convert panel when we click on the edit button and we'll convert from CSV to tabular. And again, we will rename the data set. Always good practice to rename the data set to have meaningful name. This is much easier to go through your history. And we'll answer to the question, how many header lines does our file have? How many headers do we have? Let's go back here. So what we need to do first, convert from CSV to tabular, rename the data set and then answer the question. I click here, edit attributes. So I will convert tab and here convert from CSV to tabular. I click on convert. And you see, and now I get a new data set. So every time we do a process or an analysis on a data set, we run a tool, we get a new data set in the history. Now, so I will wait and I will rename the data set to have a much more meaningful name. And I will rename data set to Iris tabular and I click on save. Now I have a new data set, Iris tabular, and you see the tag Iris has been propagated from this data set to this one, because this data set has used this data set for the analysis. It's just a conversion here. Can we answer to the question, how many header lines do we have in this data set? I will click on the I here, which is view data, depending on the size of the data set. It can take more or less time. Yeah. And now I can see all the different values with a sepal and sepal with better lines better with and the species here on the last column. So we have only one header, which is a colon name of each of each column. Let's go back to the next step. Now we'll do some real analysis. It's time to run a first tool. And what we will do is we will remove the header because many of the different tools we want to use would not be able to run with this header. And we know we have one header line. So to find the tool you can use this search bar here and type the name of a tool and to search it. When you are running a tutorial, this is a very nice feature you can click here and you will get access directly to the tool. So we'll run this tool, which is remove beginning with the following parameter. So we'll remove the first line because we want to remove the header. We will select the iris tabular. Now we are running with iris tabular data set, and we'll click on execute. So, and the next step will be to rename from the result of the tool to iris clean. So let's do that we'll click here. And remove beginning. Yes, so we'll remove the first line. We'll take the iris tabular. Yes. And we can click on execute. If you want to get a notification, you can select here. Yes, and then you will get a notification when the tool, your tool has run and finished, which is very useful if tool is, is quite long. So here, I click on it, it will run. It's running already. And remember, I will rename the data set here. I can check it. I will I can see there is no header anymore, and I rename it to iris clean. And I save. Yeah. And again, you see this tag iris has been propagated because it used originally a data set having this tag as an input to this remove beginning. So let's go back here. I rename the data set and can we answer to this question. So which tags are present on the resulting data set and how many sample lines does or data set contain. So I have already answered to this first question, which is the number of tags you have. And we have always this iris tag because we have put this hash sign in front of it when we first added the tag here so it's always propagated. Let's see how many lines do we have here it is written we have 150 lines so this is very handy. So you, you can also see some of the line here. So we have already answered to the questions here. Let's go to the analysis part. And in the analysis part we want to answer to the question what does the data set contain. So how many different species are in the data set for this. Remove. So we will select the column and cut by colon and we'll look at the colon five so why colon five so I look here 12345. So this is where we have the name of the species and remember we want at the end to have a workflow so we want to automate all the different steps. So we need to look at this colon. And we see that sometimes this, the name is changing for seed from setosa to versicolor and Virginia, which has a different names of the species so we want to select this colon. And we want to have a unique values to avoid the repetition many times of the different species. So how do we do that. We will first use a tool which is cut. So to cut the column from the table and take the column five. We'll say that we have a tabular data set is still tabular even if now we have no header. And we will rename the resulting data set to Iris species column. I copy here for the name. And then the second tools we will be using because we will have only the fifth column so the older different names will make a will use a new tool which is unique to get only the different values unique values on this column. Let's do that first I get cut. And here, I will get C five. It's a stabulator and I start from the risk. And remember, we need to rename the data set to Iris species. colon. I can make it faster by copying again. It's running. Can inspect it. And I can see. I have only the last colon. Let's rename it, but with good practice to Iris species colon. And I say, again, I see the tag has been propagated. And the next step is to have unique values in this data set, because we have a repetition. Many times of each different species. So if I go back here I will use this tool, which is unique. And I will get as an input this Iris species colon, which is a result of the previous tool. And then I will execute this tool and rename the output to Iris species. So then I will try to answer to the question how many different species are in the data set. And what are the different Iris species. Let's click on this unique tool. Yes. Species. I click on execute. So it's great. It's cute. And I can rename the data set with good practice. You remember to Iris species because this is what this data set contain. Yes. Again, I have this Iris tag, which is very nice. I click on the eye. I can see. So we can now answer how many different species are in the data set. We have three species, which are the three lines. And the species are set other particular and Virginia. So we have answered to the question. Here. And you can always check with the solution you see here. You can expand the solution. So what what we could also do is use different tools. And this is very classical in galaxy. There is so many different tools that most of the time. One question can be answered with different tools. So here, for instance, instead of using this cut and unique tool, we could have uses grouping data set tool. Which is a group data set by column and aggregate. So here, for instance, if we want to use this group data, you can use this group data. And if you group the data, where you will group the data. By colon five. And you will not take the Irish species. You will take the Irish clean data set. And you will get the colon five. And you click on the execute. So it's a different way to have the same answer to our question. But it's running now. And I will rename data set. Once this is finished. It's finished now. Can edit attributes. And I will call it Iris species group. So if I put the eye, you see I get exactly the same kind of output as before with only one tool. So I can answer to this same question, but I use the different tool, which is a group tool. Very useful tool. So the next question we will ask is how many samples by species are in the data set. So now we know we have three different species in our data set, but we want to know how many we have pair category. So again, we will have to look at this colon five. But instead of selecting unique values, we will have to count how many we have on each category. So the group tool, if you remember the tool we just use, it's quite useful and we'll use the same tool, but we look at it a bit more closely and see if we can get some more option to gather and count the number of species. So let's do that. We'll go back to the training material here. Go here, how many samples by species are in the data set. And we'll use this group tool. So the same tool. So we can rerun actually the same tool as before. And we'll use a new operation, which is insert operation and we'll use type count and on colon one. So we mostly want to count the number of us we have for each category. Otherwise the rest will be the same colon five for selecting and we count this. Let's go back here. So for rerunning, we can take the last tool and click here. We run this job again. Very handy. You have exactly the same parameters, some input data set as before, but we'll insert an operation here, which is not the mean, but which is a count and we execute. And we'll see if we can answer to the question, how many samples per species are in the data set. So now it's gray. We'll run. It's running. And as usual, we will rename the data set. Once this is finished, we name data set. And we will rename the data set for instance to Iris samples per species group. Again, save and click on save to save it. I can look now to my data set. And what do I see? I see that for each category, I have a number and this number is actually the number of sample per species. So we can answer to the question we had and we can say we have 50 sample per species, which is what we have. If we look at the tutorial and you can always check the solution. So we indeed have 50 sample per category. So the next question we would like to ask is to see if we can differentiate the different Iris species. So we know we have three species. We know we have 50 sample per species. And now we would like to see if the information we have in the data set is sufficient to distinguish three different types of Iris flower. So if we look at the Iris flower, let's look at the figure here. We know we have three different kind of flowers, Iris flower, versicolor, Virginica and Cetosa. And we have some information in the data set. We have the petal lands, petal reeds and the sepal lands and sepal reeds. And with this, can we distinguish between these different categories? So just to make sure we know what we are talking about. The petal is this one. So this small interior one. And we have the lands and reeds and the same for the sepal. We have the lands, the lands and the reeds here. So here this is not so obvious from the picture. And it's very hard for us to distinguish the three different species, but let's see if using this information, we can get some info on the different species and distinguish. What we will use is a very, very useful tool, which is a tool called data mesh. And it gives us some descriptive statistics and a summary about the data set. It's very, very useful. And we will use it with the different operation. We'll perform different operation on this data set. So we'll always group with column five, because this is a different species. And we want to do an analysis per species. We'll use as an input the iris tabular. It will have a header. Yeah. We'll print the header line. We'll sort the input. Yes. So to have all the different categories together. And then we'll make some operation. We'll make the mean standard deviation for each of the different columns. C1, C2, C3 and C4, which if we look back into the iris tabular data set, this is here. This one, if you remember, C1, C2, C3 and C4 are the different characteristics, the length of the sepal and the width, and the petal length and petal width. Let's do that. So we could do this data mesh and we select, make sure you select the right data set and we want to select this iris tabular. So we want to group by field. So the field is five. So here we don't put the C5. We put only a number in this field. Does the input file has a header? Yes, it has a header. And we want to print the header at the end. So it will be easier to distinguish the different values. We want to sort the input. And we don't want to print all the fields. Ignore the data. In your case when grouping is this is not relevant for us because these are only numbers. So we can add different operations. Let's say for the first operation instead of the count, we'll do the average, the mean value. And we'll do it on the column one. Yes. And we can insert a second operation again on column one, instead of the mean value, we can have the sample standard deviation. I don't see it. So you can start writing. Yeah, sample standard deviation again on column one. And we'll repeat this operation, but on the different columns. C2, C3 and C4. So I will add the same. But here make sure you select C2. This is mean value. I will insert again. And this is the sample standard deviation. And make sure you use the right column C2, column two. Again, I insert a new operation on the column three. Yeah, sorry. And this is the mean value. And the sample standard deviation on column three. And finally, the two last operation mean value on column four. And the sample standard deviation on column four. And then I execute. And again, I will rename the data set. So it is running. Okay. So again, don't forget to rename your data set. Always very good practice to rename. And we'll rename the data set. To Iris Samari. So Mary. And statistics. And I click on save. Now we can click on the view. Data button. And we have all the different statistics. So the question is, can we differentiate the different Iris flower species? So you can. You can pause the video if you want and take a closer look to have. And try to answer to the question. So here, if we look at the set us up vertical or and Virginia. From the results. We have the main balance and this is the standard deviation. This is the main sepal with and the standard deviation. And the same for the petal. So if I look at this colon. This is very close from each other. Same for the standard deviation. Even though the C2 that has a smaller standard deviation. For the sepal. Here we have a few differences, but this is not very significant. Standard deviation are very similar. But here this is quite interesting. Actually, if we look at the first. Species, the set us up. The mean petal lens is much smaller compared to a vertical or and Virginia. That's the colon for standard deviation. Nothing very significant. And the petal. This is also quite smaller compared to the two other species. So, I mean, what. We can see. Other iris. This petal lens. Here. It is quite significant. The difference between this. Species with. Oops. Which is set other and the vertical or and Virginia. So what we can say is we can distinguish quite easily. See to the species from the two other. But at least from looking at the values here. It's much harder to say anything about vertical or Virginia. It's not so easy to distinguish the two species. So that next, what we will do is we will visualize this different feature into a two dimensional scatter plot. So maybe we'll see a bit more what is going on. And we'll see if we can distinguish this. Three different data sets. So let's go back here. And this is what we have done. So can we differentiate the different flowers and you can read the solution, but this is more or less what I have told you that we can differentiate the iris data from the two others, but not differentiate the iris vertical or from the iris Virginia. So let's make a scatterplot and we'll make a scatterplot, which we see here. And we'll take the iris clean data set as an input. So if you remember, iris clean doesn't have any header. And we'll plot the colon one. So remember the colon one. If we go back here. Let's take the clean. This one, the iris tabular because we have the name of the column. So we have the lens and the width for the sepal and the lens and width for the petal. So what we will plot is as an axis will plot the sepal lens and vertical will plot on the y axis will plot the sepal width. So let's do that. We'll always give some title here for the plot. And we'll also give some label sepal lens and sepal width. And we'll also in the advance option, we can choose the size of the points to make it a bit larger and easier to visualize. So we'll put all the data set all the different groups. So we'll group them by the same different species. And which is the colon five, and we'll put all the different them in the same on the same plot. And the output will be PDF. Okay, so let's do that. Click here input. Make sure you take the right and we take the iris clean. It doesn't you need to have no header. Instead of colon eight, we'll take the colon one. And the colon two as why we'll put a title, which is a sepal lens as a function of sepal with because we are using the two first column. We'll give a title for the Y. And it's nice to make a proper plot sepal lens. Sepal width. And we'll click on the advance option. So you see this option. They are collapsed by default. So you have an eye and it's crossed to say it's not visible. But if you click left click, it will expand and you see all the different options you can use. And here we'll have point only. Yes. And we will use data point option. So instead of the default user defined point option. And we put the size here and we'll change from one to two to make it a bit larger. We don't change the transparency. Another thing we'll do is to plot some multiple groups. Yes. And we'll put them on one plot. To have all the different groups and we need to specify the column to differentiate the different groups. And we know this is colon five. This is where we have the category. Colour schemes to differentiate your group. We'll choose a predefined set to predefined color. You can test a different scheme if you want to. Let's choose the default. Change the default for the output. Instead of BNG, we'll choose PDF. And I click on execute. Once this is running, we'll have two output obviously. I'll probably add one. Check this is PDF. Yes. So I have my plot here. And if I look carefully. So what does this scatter plot tell us about iris spaces. So here we again we see this is a very different. This set of art is always very different from the two other species and the iris versicolor and virginica. As we said, and as we have observed before, it's very hard to distinguish the two species. We can rerun the same tool again, but instead of using the sepal lens and with, we can do the same plot, but with a petal lens and petal with. So using the, the other two other columns. Let's do that. I will rerun. So remember to rerun and click on this. And we don't want to have one and two. We have one to have three and four. And instead of sepal, this is petal lens and here I need to change. This is a petal lens as a function of the petal risk. And this is on the X axis. We have petal lens on the Y axis. Petal with option. Advanced option will not change the advanced option. I think we can do the same. And we can rerun. So this is quite easy to rerun. Yeah, already run. Let's view. Oops. It didn't work for some reason. Yeah. Click again. It took a bit of time to load. Now I have it. And again, it was very easy to distinguish the set other species from the two other one. And still not so easy to distinguish. We always have this, some points where we are not sure if this is a versicolor or Viginica. So still a bit better actually to differentiate between the two species, but for some sample, the petal lens versus which is still insufficient for all this data here. Well, this one is always super easy to distinguish. So we have done quite extensive analysis of this data set. We would probably need more tools or different tools to distinguish the three species, but we have already a fairly good idea of the different species of this Iris flower data set. So we have finished our analysis. And now the next step as we will see here is to convert the analysis we have in the history into a workflow. So why? Because we will apply the same analysis, but on a different data set, a completely different data set, which is one of the most powerful feature of Galaxy to create a workflow out of history of the different steps you have done. And we use it again. So fully reproducible with the same data set and with a different data set. So we'll first save extract the workflow, save it, make sure we have only the different steps we want to have in the workflow, give a name and then apply it on a different data set. So let's do that first. I will extract. So I will make sure first I check. Oops, it's a bit slow. I will check first. We have only the steps I want to keep in the workflow. So I extract workflow. I can check again here and you can tick or un-tick one step. The first step which is upload of a file is treated as an input data set, which is what we want because we will apply the same analysis, but on a different data set. Make sure you give a meaningful name to your workflow. So for instance, here we will call it explore ring, high risk data set with statistics and scatter plots, which is what we have done. And then finally we create workflow. So all the workflows, all your workflow will be available when you click here on workflow. And the latest one is always the one you have at the top. So here, for instance, exploring Iris data set with statistics and scatter plot. You can run it directly, but we first will edit. You see, if you click on the small triangle, you have different menu here, a menu with different option and we'll first click on edit. We go to the workflow editor. It shows all the different steps of your workflow. If you click on one task, the detail of the task of the tool is available on the right hand side here. And in particular, what we can do is, for instance, remove all the intermediate files created and only keep the results of the analysis. So for instance, here this is a result we want to keep, which is, it will return the different categories. This is also different results. There is no other task, but this one is not. So we can remove the intermediate file by unticking the task, the box here. And you can do the same for all the different intermediate state. You do not want to keep the file. So it will reduce the file you can see in your history. So it's cleaner and it's easier to get to view the result from your workflow. One of the things you can do is when you click on a task, if you see the out file here is the name of the output. It's not very meaningful, but you can give a better name for your history by clicking here on the configure output. So it's hidden by default. You can click and it will expand. And you have this renamed data set. So here, for instance, we can call it categories or unique categories. We can do the same for every single output. So here this is the same categories, but from the group. But we can also call it categories, categories from group. So if we want to distinguish, this is what we will see in the history. But this one is the number of samples so we can hear. This is sample pair categories. And for the scatterplot, you can give some names of each of the plots. Figure one, for instance, and this is the PNG file. And make sure you have two. So we have two outputs. Now can we name the data set figure one PDF. And we'll do the same for the second plot. So figure took took took here. We name data set figure two dot PNG. And the same for figure two. PDF. Okay, so now we are more or less ready. So let's see what kind of data set we can use. So here we will use an existing data set, which we have, which is a diamond data set. So instead of analyzing flowers, we can use the same data set. So here you see at the top save workflow. Very important. Don't want to lose all your changes. It looks perfect. So now we can click on the analyze data again. And we are ready to apply the same workflow, but on a different data set. So instead of analyzing flowers, we'll analyze diamonds, but we'll do very similar analysis here. So we'll upload a new data set. So we'll create a new history. So if you remember, I said at the very beginning, every time you do a new analysis, it's a good practice to create a new history. So we'll create a new history and we'll call it diamond analysis and we'll upload the data set. And then we'll rename the data set as we always do to remove all the different prefix here from the website, from the nodu. And we'll add a tag and we'll use this hash sign to make sure it is propagated when we will on the workflow. We have to copy here. And then we finally will run it. And I will explain before I run it this data set itself. So let's do that. Let's create a new history. Create a new history. Oops, sorry. Change the name, diamond. Analyze this. I'm going to enter upload and paste on fetch. I will paste the address and I will start. And I wait. Yeah, now it's green. Remember what we have to do. I will first rename. I will call it diamonds. I click on save. Check the data type. This is CSV. Perfect. And I can have a look at the data set itself. Get an idea of content. Very similar to the previous data set. We have one, two, three, four, five colon. And this is the header. The name of the colon. The first colon is the car. So this is proportional to the weight of the diamond. The second one is the price. I guess this is in US dollar. And then we have the different categories for color and clarity. And this is here the cut. So different cuts. This is ideal premium good or very good. I think there are a few categories. And this is what we want to know how many category categories we have. And if there are any links between this, for instance, the weight and the price. And for instance, the clarity of the cut. Let's go back here to the training. So the diamond data set is very well known for the GG plot to package. It's developed by Hadley and week. We can. So it's very much use data set. And this is a quite simple data set here. Simpler than the original data set. It only shows the five columns because we want to reuse the same workflow. For this new data set. And so we have only the four characteristics. We have Kate kept the Kara cut color and clarity, which is what we call the foresee. So the car refers to the weight of the diamond. And this when it's measured on the scale and the cut refers to the quality of the cut. And take a different grades. And this is what we will see later. And we have different colors. The color, so it's like the tint of the diamond from color less. So white to yellow. And there is a later scale ranging from D to Z and D is the best and Z is the worst. So if we look at here, you will understand more. So this is this color from D to Z here. And of course, in galaxy instead of letters, we have replaced by number starting from one to the numbers here. And we did the same for the clarity. So the clarity is the describe the amount and location of naturally occurring inclusions found in in all diamonds. This has a scale of 11 grades ranging from flawless. So this is the best ideal situation to I3, which is like level three. This is the worst quality. And this is what we can see here. This is from this flawless to here. This is level three. And again, we have put numbers rather than categories here in in the data sets. So if we look at the data set here, the color and clarity will, they both are numbers integer. Okay, let's go back here and then know what we want to do is to run the workflow on this diamond data set. And we will run it exactly in the same way. But for the plots, we'll always plot with the two first columns. So the car and the price and the color will not be taken into account. Because we know this is not the most interesting but the car so the weight of the diamond and the price. And it's interesting to see if this is related for instance to the clarity and to the cut. So let's do that. The workflow is, he's here. If you remember, I will take the last one, which is this exploring Irish data set with statistics and scatterplots and this time I want to run it. So we click on run. And make sure the input data set is the right one. Yes, this is this one. I will expand the workflow because I want to customize. And in particular I want to customize the plots, because we said we always want to plot the same. We always want to put the diamond price as a function of the car of the way. So this is the two first column. Yeah, but the first plot with with the cut as a factor and the second plot with clarity as a factor. So let's find the plots first one. We have a scatterplot here. So we want to colon one and colon two. Yes, the plot of the title. I will customize because this is definitely not simple lens diamond price as a function of Kara. So as a function of the weight. With clarity and always with cut as a factor. So let's first change the label. So the label X will be the weight of the diamond, which is a car. We said this is proportional and the Y level will be the price time and price. So the price. And we can say this is in US dollars. This is in useful information. And because we want to take the factor. This is the cut list. Let's check. Yes. So we are using the latest colon, which is 12345. So this is the cut. So this is correct for this one. So now we can close this one and we'll take the second plot. And we'll change it because we still want to keep the same two columns. And we'll do again. But this time this is the diamond price as a function. But this time we'll take the clarity as a factor. So we again have the same. This is the weight. Diamond. This is a carrot. This is proportional and the Y level is the price in US dollars. Advanced option. You click here to expand. And we want to make sure we take the right column. So this time we don't take the last one, but we take the clarity. So we change it from five to four. Yes. And that's it. So we have everything we can expand here. And we can run the workflow. So you can see all the different steps. They will start running. Yeah, so it starts to launch to launch some jobs. Let's look at the questions we want to answer. So we are here. We are running the workflow on different data. We have done this. And what we want to answer is how many cut category are there in the diamond data set, how many samples are there in each cut category. And so if you remember, the output will be categories and the sample per category. And we'll see if we can notice anything about the relationship between the price and the carrot and based on the plot showing price versus carrot with clarity as a factor. Do you think clarity account for some of the variants in price and why. Okay, so let's wait for the analysis to be done. So this is nice. Everything is running. So we have unique categories we can already to answer the first question. How many categories do we have one, two, three, four, five. So we have five categories. And these categories are fair, good, ideal, premium and very good. So how many sample do we have per cut or per category. So we have this category from groups. And is it this one. Yeah, I forgot to rename, obviously. So we have for the fair category 1610. And for the very good category, we have 12,082 samples. So we have very uneven number of sample per category, quite a lot of ideal actually. So now, let's go back based on the plot. So let's have a look. What do you notice? Do you notice any relationship between the price and the character. So if we look at, for instance, this data mesh. So if we look at the price. This is also statistics, statistics here. We cannot really differentiate much, I think. Let's look at the figure. So if we look at the figure one, for instance. So this is the weight and the price. So there is obviously a link between the price and the weight. If the price more or less increases quite, I mean, it's not fully linear, but there is a trend to have higher price if this is a larger diamond. If the weight is higher. Now let's look at the other plots figure two. Which is as a factor, the karate to the different clarity. So if you remember, clarity is here. This is a flawless so it's very good diamond and this one is a three, which is lowest category. So do we see anything here. So what can we see. So far, we see this vertical stripe already. It's quite clear on this on these plots. If we take one weight, for instance, let's take this value here for one car. We can see there is a link between the price and and the clarity. So for if you have a flawless diamond, which is like this, this one here. It's much higher price than S3 or this I S3 clarity, which many of them for the same weight will be here so much cheaper. So there is obviously a link between the two. So the clarity we can see that the clarity explains quite a lot. The price. This is quite linked. So okay now we have to analyze this new data set. So one of the most important feature and this is really the last step here is to share the work to share the process both the history and both the workflow. And this is one of the most important feature of Galaxy, which you do at the end. So we can share the history. And for this. Let's do this. If I want to show my history, I will go here and the history action and you can see the share and publish. This is default. This is not. This is private history. So I'm the only one to be able to see the history and the data. I can make the history accessible. So I can make the history publicly available in the published history, or I can choose to share to share with a user so only with a single galaxy user, which can be useful. So if you have a colleague and you want to share the results or here, not do it. And you want to share the results of your analysis or your history. But this is like preliminary preliminary results and this is not something you can really publish. So if you make your history public. I can share here share public and I make history accessible. I can give the URL here and anyone can access this. And I can also take here, make history public available in the published history. And I click on it, you will see all the different history. The history is everyone has shared. It takes a bit of time because it's quite big. So the default setting for the history and for the data set can be different from one instance to another. So for some galaxy instance, you will have to specifically make each data set public. Be aware and check when you share your history. Okay, here this is all the different latest published history. So I can see mine here, which is quite useful and you can also if there is one user sharing the history with you, you will see it in the here and the history shared with me. So here I have no history shared to no one has shared a history with me. And I think that's it for this tutorial to summarize what we have done so we have learned the basic of galaxy, the basics of galaxy. And now we know how to keep all the different records of all the steps of the analysis. We can reuse the same steps, but with a different data set, which is very powerful. We can import data from external sources, you have seen we have important data from Zenodo, but we can also import data set from different external resources. We can easily share the results so and you can choose how you share your results if you want to share with everyone if you want to publish your history, or if you only want to share with one colleague or your group for instance. And that's it. Don't forget to give some feedback on the training. And I would like to thank you for your attention. And if you have any question, please ask in the Slack channel. Thank you.