 we have collection with four samples here let's look at this data it's some kind of a variant file so this is position 10 in the genome of unknown animal and then that position we have A to C change and sample 2 is obviously similar format and so on if you look carefully so this sample 3 contains site at position 20 sample 2 doesn't and sample 3 again does contain that change position 20 let's suppose that position 20 is very important it's some kind of a functionally important site in our organism so what I would like to do is to retain only those elements in my collection that do contain sign 20 so that would be sample 1 and that would be sample 3 let's do that so I'm going to do it using a combination of two so I will sort of do conditional filtering here actually while I'm doing this let's name that history and that would be it's called filtering collection so let's go to collection operation and select collection collapse collection tool this is the tool we used in previous screencast we're going to be collapsing this collection we are not going we're just yes we're going to keep one header line and we are going to pretend the file name same line and each line in data set so that's the collection keep one header line repent file name let's run once it's finished let's see what we got if we click on if we look at this data set you can see that this is just aggregation of all data from all collection elements now let's filter this file to figure out which samples contain that site 20 for this I'm going to use filter and sort group of tools I'm going to use filter and here I'm going to write the following expression so C2 corresponds to column 2 these numbers here show how many columns what their numbers are in the given data set and I would like only those rows from that data set that contain number 20 at the second column and this is why I'm using two equal signs because programming languages one equal sign is assignment operator and two equal sign that's actually test for equality so give me all the rows where C2 equals 20 and this file has one header header line here so that's the line I would like to skip so I would like to skip one header line let's run all right so what we have is the header line and only those rows which contain number 20 in the second column so these are really the identifiers of the collection elements that I would like to filter from my initial collection so I need to do two things first of all I need to get rid of that header and I need to only retain that first column so I'm going to do this in two steps first I'm going to text manipulation and I'm going to use remove beginning of a file tool so I just want to remove the first line okay and now I need to extract that column only so while I'm back in text operations I'm going to use cut tool and here I'm just going to cut column one c1 again stands for column one okay so that's the labels of the elements that we need to pull out and to finally do this let's go back to collection operations and use the new tool and this screencast is really about this tool which is called filter list from contents of a file so the input collection here let's collapse all the data set so we can so we can see better so our original collection has three data sets here we only want to extract two you can see that these lines are identical to the names of the two data sets that we want to pull out so that's the input collection now we want to remove data sets from the collection if they are absent from this file we just generated and that's data set number eight and let's execute okay so that tool will produce two collections one collection with the data sets that we wanted to filter out sample one and sample three and the other with the ones that were not included sample two the data sets that were excluded from the collection that we wanted to exclude from the collection because they don't contain that site 20