 Okay, so this is my empty history. I'm going to call it Galaxy 101 And at this point, I'm going to upload data I'm gonna pass paste fetch data I'm going to paste these two URLs from the tutorial here and click start This will create new two new data sets in the history. Okay, you can expand data sets to look To figure out what's inside we had a peek what's inside. I'm going to rename them So they have more descriptive names. I'm gonna call this one zones and This one I will call snips Okay, so the first thing to do is to intersect exons with snips to figure out which exons actually contain snips in order to do this I will go to bed and Find the tool which is called Bed tools intersect intervals This tool intersects one data set with another so data set a data set B in our case data set a is exons and Data set B is snips. So let's select exons here. You can either select Your data sets from history using this drop-down or you can drag them directly like this So we're intersecting exons with snips we don't care whether they're on the same strand or not and The way we're gonna intersect them is we're going to select this option So what this means is that for every exon we're going to list all snips that overlap that lay within that exon so Our file a exons file B is snips and here we choose right the original query in B for each overlap and Let's leave all the other options as they are and click execute So you can see that this would create a new data set so every tool you're gonna run is going to create additional data sets in history and Let's look at this data set so we can click on that I icon and what you see here is that you have the exon information and Next to it is the snip information and here It's kind of very easy to see how many snips per exon we have so for example This is the same exon because it has the same IDs and it has two different snips so really we can Count how many snips we have per exon just by counting Lines with unique exon IDs So let's look at this file a little bit more. So here you have chromosome. This is start position. This is end position This is idea of the exon This is some score field. We're not using here. This is strand And for snip we have a similar file just without this trend information We have chromosome start and and ID of that particular snip. So to do this in a more scientifically accurate way, let's group this data set by exon ID and Within each group, for example, this is going to this For example, these two exons will form one group because well actually it's gonna be group of Five you can see here. It's no sorry. It's going to be a group of four because they have the same ID So this exon has four snips because here you have these Snips with different IDs so a proper way of doing this is to group this by exon ID and count how many Unique snip IDs we have within that group Well, we have a very good tool for that purpose It's called data mesh that meshes data. So let's use this The data set we want to run this operation is this the output of our intersect We want to group by field for because remember this is where our exon ID is Now the operation that we're going to do is we're going to do We are going to count unique values and we're going to count unique values in this column 10 because column 10 contains snip IDs and let's run and see what happens and This is the data set that we got So in this data set we have exon ID and this is the count of unique snips within this exon Let's find which exons Let's see what's the top five exons with the highest number of snips and for this we're essentially going to sort this data set It's gonna go here Click sort Select this data set the output of data mesh and we are going to sort on column 2 because it contains counts It's a numerical sort in descending order meaning that the maximum value will be on top And so now this data set is sorted Take a look and The winner here is the exon with 27 snips in it Let's restrict this data set to just top five exons and for this I'm going to go to text manipulation and select Select first two which selects first lines from data set And I'll tell the tool to select first five lines from this data set and run it Now let's Look at this We have these five winners five exons with the highest number of Snips in them and wouldn't it be nice to visualize this data in a genome browser The only problem with that is that while doing all these operations We actually lost all the start and end coordinates because remember they were there in the original files start and end so we need to somehow to get them back and Let's do this. So I will go to this section find the tool code down compare two data sets And I will be comparing exons here Against this last data set. It's already selected Let me Scroll here. So in the first data set in exons We interested in column four because it contains the name But in the last data set it's column one that contains the name So we are essentially intersecting exons. We're comparing exons using column four with this select five data sets on column one and We obviously want to keep rows that match And you will see that now we have all the information about these exons. We have their coordinates So now we can visualize them in genome browser To do this we expand the data set and normally you would see display at UCSC Link here But in this case galaxy doesn't know which genome to display it at which which version of the genome We want to render these data. So we need to tell it this that this data actually is derived from human Human hg34 you can see there are a lot of patches, but what we need is actually hg38 proper just like that And if I click save You will see that now the database is set to hg38 and now you can see you have this display in UCSC main button and We can click it Usually genome browser goes to some random location, but in order to visualize actually our Exon of interest. It's just select its coordinates from here and Enter them there in this particular format what she says he likes And this is the sex on and you can see that in fact we have lots of snips the green ones on synonymous They don't change underlying codon and the red ones are non synonymous. They do change the underlying codon So there's an asset change caused by this snip. Let's go back to galaxy Let's collapse all data sets so we Have neat view of the history. So this history is essentially This is this is outline of our analysis and let's suppose you want to perform the same analysis on a different set of data So instead of redoing this step by step, we can go ahead and extract workflow from this history and This interface will ask you which steps which steps from the history want to include in this case We want to include all let's call this date This let's call this workflow find axons with the highest number of features Click create workflow So now you can edit or or run the workflow, but you can also access it from this workflow tab So I'll click on that tab you see this is the workflow We just created and if we click on this drop down, let's choose the edit option so we can see how this workflow looks like So we have all these Nodes so these are individual tools And This is the output of the workflow Dataset with top five axons and the and these are the inputs So one input is called axons the other is called snips, but maybe we don't want to overlap with snips again Let's make it a little bit more generic. So I'm going to call this Features, maybe we want to Again do this now since on some other types of genomic features you can see it's changed now here and For the last data set I want to I want to tell Galaxy to rename it So I can clearly see in my history of what's going on I'm going to rename it to top five axons for example and At this point, I want to save you should not forget to save One other thing that I can do here is if I uncheck these check boxes. I Will not see these intermediate data sets in my history. They will be hit and that would make history easier to to look at So let's run this workflow. Let's go back to Galaxy. Let's well save it. Let's go back to Galaxy and Let's create a new history. So this is a new empty history now. I can go to this multi history view and Copy the axons data sets. Let's reuse this data set. I'm just gonna drag it like here Go back to the analysis view. I will name this history because we are going to use repeats as the other set of genomic features and Let's upload the repeat information now. I'm going to click upload and Paste the URL of the repeats data set that you also have in your tutorial And I'm gonna give it more descriptive name So now I can simply run my workflow in these two data sets. Go to workflow Click the run button Use axons as the axon's data set and repeats as these features. That's now you can see it says as bad because when we uploaded this galaxy actually Identified as an interval, but galaxy knows how to convert interval to bed. So it's not going to be a problem and Now all I need to do is click run You will see this workflow invocation display and Eventually you will see the final data set so the coordinates of five axons containing highest number of repeats and Here you have five axons with the highest number of repeats