 Before we can begin analysis, let's upload data. So we will simply copy these URLs like this, select, put it in the clipboard. And here I'm gonna click upload button, click paste fetch data button and simply paste these data sets into this box. Now I'm gonna set the data type to fastqsangr.gzipped because this is what these data are in this format. And I will click start. You can see that data sets are now being added to the history and in a few seconds they will become yellow and ultimately they will become green which means they're ready for the analysis. Okay, now we're ready to go. So there are eight data sets. It's actually four samples and this is paired in data. So each sample is represented by two data sets. Let's organize them as a collection. So I will click on this check box button, we'll select all the data sets and for all selected, build a list of data set pairs. This will bring up this wizard. So our data sets already have underscore one and underscore two and this is why Galaxy Automatically knows how to pair them. But let's pretend that's not the case. Let me just get rid of this, get rid of that. So I'm just gonna unpair them all. So normally you would see something like this. If you look at these data sets carefully they have underscore one and underscore two. Well, that means underscore one is the forward read and underscore two is the reverse read. So let's give Galaxy a clue. All data sets that are forward have underscore one and you see once I enter this, this gets filtered to only those data sets that have underscore one. And here I'll give Galaxy a clue that all my reverse data sets have underscore two and you can see it's also automatically filtered to that. And they're sorted in the same order. So we can pair them one by one like this. Or you can just click auto pair and it will pair them all automatically. So this is our collection structure. It has four samples. Each sample consists of two data sets forward and reverse. And let's name this like that. So M117 collection. We can leave these check boxes the way they are. And what this really means is that the original data sets will be hidden in the history. We won't see them anymore. They won't pollute our view. And this will remove that dot gzip off of data set names. Let's create collection. Okay, well, now I have one item. If I click on it, I can, again, I can expand it. I can click on it. You have four data sets. If I click further, each data set has forward and reverse reads. Now let's do something with this. Well, what I want to do is I want to map these data sets against human mitochondrial genome. Well, let's applaud this genome. I don't really have it separately. So I will go back to our tutorial, scroll a little bit down and select this URL. So I'm going to go back to upload data. And this contains information about my previous upload. So I will reset that page by clicking on the reset button, click paste fetch again, paste this URL. This time this data set is faster dot gzip. So I'll tell again, actually, what it is. And I will click start. And this will create a single data set into my history. That's the reference genome against which I would like to map by prints. Okay, I'm going to use BWA as my mapper. So I'll search for this. Here it is, map with BWMM. So first I need to tell BWA what I'm mapping against. And in this case, I want to map against this thing. So instead of using built-in index, I will use genome from history, because this is my history, and it automatically chooses this chromosome M.FA.gzip. Now, next I need to tell BWA what I'm mapping. So where my reads are. And there are several options here. The option we need is paired collection, because this is how we organized our data. And you can see that immediately once I choose paired collection, it will become visible here. So that's the collection I want to choose. I will keep all other options in their default setting and click execute. You will see that this creates another collection. So BWA takes my collection of reads or runs jobs on them. In this case, I have four samples. So I'm actually running four distinct BWA jobs. You can see four entries here. But because we start with a collection, Galaxy knows that these jobs belong, need to be output together as another collection. And this will be that collection. Once the BWA mem jobs are finished, it will become green. And it is green now. You can see it's done. We can click on it to see what it is. Now it's a flat collection. So here we started with a paired collection, meaning it had two levels, right? So if you click, you see the first level, level of samples, and you have the next level, level of reads. But what BWA mem did, it converted the fast queue reads into BAM dataset. It mapped these reads. Map reads are representing different format and a BAM format. And there is no longer distinction between forward and reverse. All this information contained within BAM dataset. So now it's a simple flat list, just a collection containing four datasets. You can see that I'm using terms of list collection interchangeably here. So if an input is a collection, you feed it to a tool. Tool knows that it needs to actually process datasets individually, and then it outputs a collection as well. So let's go further. Let's call variants on that collection. And I will call, I will use a tool called low freak and the call variants option of it. It's a package with several tools. I will use the one that calls variants. You can see that no BAM dataset visible, because in fact, we don't have individual BAM datasets, but we have a collection. So if you click on the collection button, then the interface will change and you will see, you will now be able to select that BAM collection because that's the output. It's in a BAM format and a call variants tool takes BAM datasets only as input. So here we are mapping against again genome that is in our history, that chromosome that mitochondrial genome. Let's do this in these in indels. And let's keep everything else default and click execute. So this will create yet another collection and this collection now will be in VCF format. It's variant call format. That's what variant call layers are producing. So again, you start with four BAM datasets. Low freak starts four individual jobs and then we'll produce a single collection with four VCF datasets in it. Okay, it's finished. And now we have a collection with four VCF datasets in it. We can actually look at them. So we do have some variants here. So, and the next step is let's convert these VCF datasets and something which is a little bit easier to look at or into tab delimited datasets. And we will do this with SNP-SIFT tool and in particular with SNP-SIFT extract fields right here. And the fields I'm going to extract are listed here in the tutorial. That's the list of fields I want to extract. So I'm just gonna paste it in here and set one effect per line and I will keep everything else the same. Oh, well, see, I didn't specify what to run it on. And again, it does not see any VCFs in my history because there aren't any individual VCF datasets. I only have a collection. So I need to click on the collection tab and select that covariance output. And now I can go ahead and run it. It's done. Let's expand it. So you will have four tabular datasets one. So each dataset has a header line specifying what the fields are and then the content. And suppose this is all you want to do. So you want to generate final report out of this but you kind of still have these four files. So now I'm going to use one tool from a collection operation section. And in particular, the tool I'm going to use is called collapse collection. It will effectively merge the datasets. But the problem here, of course, is that when I merge datasets, I mean, if we look at the contents of these datasets, so if I merge them together, how do I know which of these variants comes from which sample? Well, collapse collection provides an option for dealing with this. So first let's choose collection to collapse. That's going to be this collection, the output of SnipSift. And let's keep one header line so we know what columns are. And now I'm going to tell this tool to prepend the name of the dataset. And specifically I'm going to tell it same line and each line in the dataset. We'll see in a second what that means. Let's run it. So this tool takes collection as an input but it produces a single dataset as the output. And here is that dataset. You can see that it's variants from all four samples merged together, but it prepended the sample name to each line of the output. So I know that, for example, these datasets correspond to blood from mother. These datasets, it's blood from child. These datasets, it's cheek samples from child. And there is a variety of tools for operating on collections and they can be found in this collection operation section.