 Hi all. In this video, I'll be touching on a few subjects about creating high quality publication level visualizations using R through Galaxy. And this is a continuation of the of a previous set of videos and following up on the RNA sec for electronics lesson of Galaxy. All this material is available on the galaxy train network. And I will be following those instructions and I will guide you through what we are trying to do here. And as before we use galaxy to launch our studio and it's always available while it's running in the active environments where you can see here. And, and the point that we ended in the last video was about how to use deep layer entire to wrangle around and work with data. I'll be closing the scripts and I will create a new file, a new script that we will use to do visualizations and using a new library called ggplot. I'm going to save this again. I'm going to save it as our visualization. And, and we're going to have to, I will be working on this script here. You can see it's available right down there. Our environment is complete empty. If it's not in your end, feel free to click on the brass window and remove everything here and be aware of that when you're doing that. And you're actually removing every formation. So, as I said, we will be continuing from the information done in the transforms that happens based on a set data analysis. And, and we will be using as input, the final table generated with a different space genes as would include some statistics and so forth. So I'm going to use that it CSV format function that we saw earlier, and the URL file available there. And if I run this, we should be able to see the new file right right here. So, we have loaded our file, and what we are going to be trying to do are a few visualizations specifically going to do a volcano plot. And we'll see how we can make a rather nice looking volcano plot based on this information. We'll try to split and create various various plots based for by crowds for example, and finally we're going to do a bar plot of the differential express genes. So we're going to be using a GG plot. And I'm going to, first of all, load the plot to into our environment. As you can see I've run it control, enter or command enter your and and live have loaded. So the overall idea the overall structure is that it creates a plot by putting layers one on top of the other. So the base function is called the plot. So we expect us input and the data. That's going to be used to actually plot and a setting which is basically a mapping, I can type it out as a mapping actually and consider this an aesthetic that contains basically what are the axis X is this one, why is something else, and so forth. So setting this text by using the plus you put additional laser so we have a first you met your function, all of them start like geome something so you have your function one, plus another layer. And so forth. So this is, this is a, this will not execute this is not an actual command. It's basically the structure of how geome plot works. You might think of this very similar to the piping that we saw earlier in the player where you, you, you pass the input of something to another. Although it seems similar. This is the concept that you have your status your, your, your base level, what are the data and the access of your work moment to players on top of one another, and making a more complex plot at the end. So I'll put this is common just to keep them in mind. And I'll start directly by, by actually loading some data to the plots open to the plots. I'll be very explicit as you can see our studies already given information how this works. And I can save data, and I'm going to add the annotated differential genes data frame that we saw earlier. And I can actually run this directly. And as you can see, it happened something happened here in the plots, we actually see a plot. And that's basically a great great background. What it means is that it tried to create a plot, but basically does not have enough information yet. And the only thing that it knows is that we are going to be using this particular data to do some plot, given that we are going to do a volcano plot I will going to use mapping, and, and I'm going to use the aesthetic to set the x and the So the x is going to be the value that comes in from the log log two dot FC. And what is the why is going to come from the P value. You can check the actual names here so you can be absolutely certain so it's log two dot FC dot, and the P value. Sorry, it's here B dot value. And that's just a reminder, or is case sensitive so if I change the types, it's going to be an error. So if I run this again. You see now that in addition to the great background which is basically I know that they're going to some data point here. Now we also have the access and we see that we have the log to FC on the on the bottom line and the P value on the on the bottom line. So these are, and actually, as you can see, and our studio and are is clever enough to see what are the ranges of device flow to FC and for P value, and it's scale the different axis based on that so you see that this goes up to 7.5 something, and then you have a lot to FC from my wish for to something else. Now we have the x axis as the local change and the y axis as the P value. And if we want to plot and using the same structure. Now I'm ready to add a some functions actually does plotting and one of the most basic ones is GM point, as you can see by tapping GM and the underscore all the different functions that can be applied on top of that, at least here. Going to be using point. So I can run now the whole thing. And we're going to be seeing here and the plotting of this particular file. As you can see. Now, and the actually run is actually and we see our first zidzy plot to plots and congratulations. So this is a it's called a lot because it sort of reminds of a volcano there are things are pushing up and basically it's a type of a scatterplot and that shows the statistical significance, which is the P value versus the magnitude of the fall chains. The most up regular up regulated genes are towards the right, right here. The most dynamic regulated genes are towards the left here. And the most statistically significant ones are up top. So by looking this plot, we can have a quick identification of how many genes or which scenes have a large full change. And that also statistically significant and basically hopefully they have their biological significance. So for those who might be already aware of how volcano blocks looks like this might not be it. And the reason is that we have a lot of data points close to zero, which creates this like a vertical So what we can do, and I'm going to copy the exact same command so that we can slightly tweak it is we can ask to change the values of what these actually plotting and instead of having the P value, we're going to have the negative long 10 of the P value. In other words, we are going to create the logarithmic scale on the vertical axis. So as you can see, this is much closer to things that you might have already seen in literature publications and so forth. A good question at this point might be okay everything looks good. Why do we have a gap here. We understand this plan this plan but we have some luck here so as a reminder, at the end of the RNA sec pipeline. One of the things that we did is to remove genes with a significant adjusted P value over 0.05 so in other words we kept on this statistically significant genes P value less than 005 0.05 So this is full change higher than two. So in other words, we want the log to full change to be either less than minus one or more than one. And so this is how we ended up with 130 genes at this point. So for this reason, this is a black if we use the entire annotated genes output of the RAC analysis we will have a much more complete picture. But that again will create and will introduce some noise. So, now we've seen how we can create a plot, it might be relevant to keep something else in mind as well. We can save an entire plot as a variable. And we can use these to actually build layers on top of the other so I will actually assign a plot to a viral. So let's say that we want to the differential genes plot. This is our variable, and I'm going to take this part of the first one so the one that sets data and sets them up. I can run this, and you can see that there's a new variable here. I can actually use this function to clear all the plots so that we can see what are the new plots and we'll be creating that right now so we don't have a plot if I let me read on this. No plot is actually being created. And now I can use this variable. And I can use the plus sign to add a new layer and our layer would be the endpoint if we want to create the exact same plot. So by running this, and we see that it actually takes the base plot that was already stored into the variable and adds the new layer in this particular form so we have the exact same parts for a useful tip here is the location where we put the plus sign. You might be so it's we use the same sort of visual structure as we use a deep layer and we start one layer down the other so we can see them as actual layers. And but it's easy to do the same thing like that it will run easily and produce exact same plot. So some people may be tempted to do something like that. So, okay, this looks similar. It should work like me to move all the plots. I will be so instead of putting the plus sign up here. As I said, as I had initially I put it on my low so if I run this, you see that you have your base plot so it understood the variable itself, but no actual product is done you have an error saying that you cannot use plus with a single argument. Our studio and our is clever enough to say did you actually put plus in a new line. So basically, you need to put the plus here so that are is aware that a new layer is coming in so it expects something more to come in to come in. So if I run those two commands now it's run, it runs perfectly. So now that we have a first plot and let's try to actually make it more visually appealing. And we'll take it step by step I'm going to copy this entire thing again and I'm going to be updating constantly. So that is absolutely clear. Let me remove this one so we knew if we can see the evolution of the process. So let me run this this is our, our base plot. Now we see a lot of black points. And but it might be interesting to see whether there is a lot of overlapping here. And a good point for that is to add transparency, which is indicated as alpha. And by setting alpha for example equal to 0.5 so half transparency, we can see and I will run this. You can see that now the plot points have been a bit more transparent. So you can see some places where there are a lot of overlapping points, but others are a bit more more sparse. It's a bit more easy to understand but it's still a black and grey plot. We can add some color into the geom point directly by saying for example we want the color to be, let's say blue. And if I run this, we can see that all the points have now been changed to to blue color. It's again nice. And but how about we create the colors based on a particular piece of formation. So how about we change the color and we define color based on on the points, but they are located on the standard they're located. So this is an interesting point to keep in mind. So far, here I've been changing formation direct to the geom point, because I'm attaching direct values to them. And I'm letting me copy this one. However, now I'm asking to use a variable that is coming from my data as a way to introduce additional information. So this means that I wanted to use the next level of mapping. So this means that I need to define the color, not at the point level, but a level where we actually doing the mapping. So I'm going to go into the aesthetic, and I'm going to add here, another, another combination saying that the color is actually the strength. And I don't need this one anymore. So this command here, and we will be able to see that are has selected two colors red and green systems transpiring still the same, but now it gives us a much more detailed information of where its point is coming from so minus strand is the red points, plus strand are the green points. So this already looks a bit better. And but how about the actual change also the labels of the axis so like log two of C's. Yes, it's useful to know but it's basically the name of the column here. And as is. So to do that, I'm going to add another layer, and I'm going to use the labs, which stands for labels. I'm going to say that the X label is going to be log two, which is default change, not to hold change. And the food load chains. Let's practice here, there it is. And the Y label is the minus log 10, which is P value, and this is an implosion. So, now I will have much more informative axis so I can easily assess what I'm actually looking. So this is log two, and it's in full change. This is log 10 minus log 10 of the P value. So it is enough for someone to understand what is happening what is the information here. And so, so this is a much more elegant plot. Let's take this one one step further. So this is all the information put in one plot. This actually has a special technique that allows us to split one plot into multiple ones, based on a particular aspect, and this aspect usually tends to be a factor. And in order to do that, and we will use our particular data set and we will split our volcano plot this volcano plot into five panels, its panel being a distinct chromosome. So, in order to do that, I will copy this again to maintain a continuity. And I'm going to add yet another layer. This layer being called facets reads. And now I'm going to put dots, tilde and chromosome. I'm going to explain that in a second. I think I've tapped this correctly. So let me run this. In a second time around, you should be able to see that every chromosome has its own color. So instead of having a single, a single plot where all the, all the promises are some together. We now see how the different points are mapped are presented across different across the five different promises. And it's important to note. So the facet grid expects as input. The structure that you want to present this. So you want basically your rows, how many rows you want, and how many columns you want, and you can have a much more complicated functions here. In our instance, the dot stands for one. So I want one row and cross as many columns as defined by the chromosome. So in this case, I have high chromosomes. And I can easily change this the other way around. So I can do chromosome tilde dot. So, as you might expect, I will have five rows because I put the number of rows defined by the number of chromosomes have and one row, one column. So let me run this. And as you can see, now I have this particular information back and forth. So this is how you can create multiple plots from basically the same one but using a factor to actually split them into multiple individually designed individually structured and subplots. Let's keep it like that. And we can try to go even one step further and make it even more interesting. And let's say that there is a limitation and to, to how we want to print this out, and we can use themes. And here I'm going to use theme black and white. So if I run this, you'll see that it will change the background. And it will increase the contrast so it's much more readable were printed out by using these arrows I can go back and forth so you can see the difference that now the brain that was on the back has been a minute a bit. And the, the letters are a bit more sharp, and you can much more easily identify what is what is going there. And also because this is a small enough plot, maybe the grid lines are a bit too much. And so I can also get another layer, and I'm going to ask from the theme to remove the, the, the, the, the grid of the palace. So if I just set the panel to grid to be blank. So if I run this whole thing highlights, click run, you see that actually can remove everything in the back. And so, as you can see, there are multiple different ways of interacting with that and everything is done as layers. So at any point, it's easy enough to disregard one particular layer, and focus on anything else for example I might not. If you don't want to have the facet grid anymore. I'm going to copy coming out and hopefully this will work. Just as well. And you can see that this actually produces, it maintains the theme, no background because we've removed it here but now everything is together in the same place. So there are different themes that you can work with. The most common one are thin minimal thin light, thin void is one useful thing if you want to start from a completely blank state and create a new right really handcrafted theme and based on that. You can see the plot to website and you can see a complete listing of things. And interestingly enough, there are a few things that sort of include an Excel 2003 theme so you can create plots that look like Excel but actually created using the plot. So as you can see, it's easy. It's convenient enough, and I will suggest that you go through the material itself on the galaxy network website to try a few the exercise that are listed there. All right, so finally, now that we've done a volcano plot and we played a bit about the different layers and how they work. Let's try to do another type of plot this time a bar plot that will show the number of different genes across for example the chromosomes. I'm going to use the exact same structure so I'm going to use the plot again. As data, I will use the annotated differences as genes. And as you can see now I don't include the data equals because I use the same order I don't need to explicitly state that this is the data are is clever enough to understand that if I put them in the exact same order that it expected. And now as the static. I'm going to set X to be the future. And I'm going to use the field color as the chromosome. So what I'm going to produce now, because it's a bar plot, you actually need to define what are the features as the yx so I'm going to have as many bar plots as as the features and the field that it's feature will have it's but it's it's part of the hell have will depend on on the chromosome. And so let's create that. And the actual box plot is defined by GM bar. And let's go ahead and do some facetting directly. And I'm going to do a facet grid again using the chromosome as the variable to to create different subplots. And as you run this. And as you can see it creates a very, very interesting box code as you can see, and the y axis, the x axis is all the three features. If you recall from our earlier part of the lesson, and the other day different genes has only three features. And these are the three features that actually has, and this is what we got the vast majority of them are recording but we have some students mostly in this chromosome, and some long going RNAs in these two chromosomes. And so there is a bit of a dance here which I hope that you can see the chromosome is labeled on the individual facets so you see chromosome 12 and so forth. And I also have a legend here that read is chromosome 12 and so forth which is a bit redundant. So what we can do is we can ask to, to remove this information and because this is coming in from the GM bar. I can go and set this as a parameter which is one of the common ones, and say so legend, which by default equals true. So if I rerun this whole thing, and you can see that now it's only our box plot bar plots, and the different colors are indicated already you can identify what it's called corresponds to here, and you have all the information on the back here. So what I'm going to do again, again, the theme, doing black and white. And I'm going to rerun this, and I will have my final plots, ready to be added to my, to my article. You also have the option of exporting directly as an image or send a PDF if you want to do more. You can go ahead and create a PNG file directly into our. So the plot and our has a normal functionality locked of flexibility, and to do a lot of the things that will otherwise take a lot of time. And, and it also has the X abundance that everything that you do is easily repeatable and producible if you capture the decode as it is. And data manipulation and visualization are important parts of any RNA-seq data analysis, and this is why we continued the RNA-seq data with the RNA-seq pipeline with this introduction to our visualization. And Galaxy, as you can see, is easy enough to connect them together. So both our studio this interactive functionality with the rest of the gods. So I will urge you to go through them and, and they are very deeply detailed, so you can easily follow them. And I know that working with a program language at first might be a bit intimidating, especially for the first time. But pushing through is very rewarding. And it definitely outweigh any frustrations that you might have at any point. Our might not be the easiest to learn program language ever created. But it is even with a little notice of our and some little experience, you can do a lot of incredible things and you can get well yourself on the way to becoming an even more complex user just by doing this exercise that we've done so far. So thank you. I hope you find it useful. And feel free to review all those, all this material on the Galaxy train network as well.