 Hi there everyone, my name is Vinod Koning and in this video we will go through the tutorial for detection of antibiotic resistance gene with the use of nanopore sequencing data. In this tutorial we will use the nanopore.usegalaxy.eu to perform all the steps because in this specific website of usegalaxy all the tools are available for the nanopore sequencing analysis but you are free to use your own instance and also all the tools are available on usegalaxy.eu. So first I will open the tutorial which you can find here. This is the starting page that you started on and for this specific tutorial you have to go to metagenomics and here you find the antibiotic resistance detection and as you can see we will use nanopore data and we will see some plasmids. So in this video I will go through this tutorial. There are some questions that we are going to try to answer. How do I assemble a genome with nanopore data? How do I get more information about the structure of genomes? How do I get more information about the antimicrobial resistant genes? And to get this answer we will perform some specific steps. We will perform a quality control on the reads. We will assemble a genome with minimap mini smn rakan. We will determine the structure of the genome and we will scan for antimicrobial resistance genes with star AMR. That are the main objectives and there will be some small extra steps that you will see throughout this tutorial. So here you can find the introduction but feel free to read it through it by yourself. I won't go through it for now. I just want to point out that we are using for this tutorial data from Li et al and you can find the publication by clicking the link. In this tutorial we will go through the workflow and we will start with the nanopore sequencing data. Here you can see all the steps that I have just described in the objectives and here you can see the last part that we will do. We will do some antimicrobial resistant gene detection. We predict whether it's a plasmid. We will do some visualization and we will show some quality control reports that you can create. So the first step is to obtain the data that we will use and for that we will import it into the galaxy. So first you start by creating a new history which I will do soon and then we will import the sample data. So for that we will copy the links here and we will import them into the new history. So we create a new history. I will call it AMR. In this case we will do antimicrobial resistance and then we will upload the files by clicking this specific button and here you can paste and fetch the links that we have just copied from the tutorial. So here you have a box and you fill the links in and you can press start and then the files will start to upload. What we will do in this tutorial is that because we will have to perform some of the steps every time on each of the files we will create a data collection and you can do that by clicking this button operations on multiple data sets by that you see these boxes showing up where you can select the data sets that you want to put into the collection and you can select them one by one or you can just select them all at once and then what you would like to do is for all that you have selected we want to build a data set list. Here you can see all the files that we have just selected you can still discard them and we will give it a name in this case. I will call it Plasmid and we create the list. So when and when these files are uploaded we will go to the next step. While these files are being uploaded I will show the starting page of the nanopore use galaxy where you can find some of the tools and you can find some tutorials you can find some workflows. So this nanopore use galaxy.eu is a specific galaxy for nanopore sequence analysis and here you can see that some of the tools that we have imported into this nanopore.useGalaxy.eu you can find some polishing quality control and pre-recessing tools like Portion of Portion Field Long, you can find genome assembly like Minimap Minism and Fly, some visualization tool like Nanoplot and Bandit and then there are some taxisonomy and metagenomics tools like Blaspho, Star, AMR and Kraken too. Now today we are going through the tutorial antimicrobial resistance but here you can find the training website where you can find more of the trainings on the galaxy. Here you can find some pre-made workflows that you can use to run your analysis if you are interested in for example a basic workflow with the nanopolis tutorials you can find it here and you can find the history of an already ran workflow. So in a meanwhile the files have been uploaded and we will go through them to the next step and the next step is quality control. So what you would like to do is when you get sequence data you want to check whether the quality is actually sufficient to go to the data analysis part. For this there is a tutorial on quality control on the galaxy website as well. In this case in our case we just uploaded FASTA files and that means that we don't know anything about the quality of each nucleotide in the files and therefore there is not much to do anymore in quality control but I still would like to show you the option of nanoplot to show what it does and that you can see at least the distribution of the length of your sequences and to see whether that is what you expect. So for that we will use nanoplot and when you go through the tutorial by clicking this button we can actually click on the tools and we will be directed to them. So when we click here we go to the nanoplot tool page and here we will have to select multiple options. So first of all we will select the batch. There is also the option to combine the differences that when we use batch all the reports or in all the files are going to create it for each of the input files separately where if you use the combined option you will get one report and one histogram for example for all the FASTA files combined. So in our case we would like to have the batch so we can see for each of the files separately. So if there is a FASTA file that we think is containing too many short reads we can see it separately. So with which type of files are we working on we are working on first A in our case and then we have to select which files are we going to use and in our case we just created a data collection. So here you see the data collection button and we will select Plasmids the one or the data set that you have just created. When that's done we can execute. You can see that the execution has started there are six jobs generated for each of the output and that corresponds to this six FASTA files that are part of our collection. And this will take a bit so I will come back to you when the files are processed. So welcome back the output has been generated and I will go through the HTML report for now but feel free to click through the other outputs. So within the newly created collection the HTML reports of the six FASTA files we can view one of them. In this case I will look at the RB01 FASTA HTML report and here you can see some of the basic statistics. So the mean read length the median read length the number of reads etc etc and this is also viewed within histograms throughout the HTML report. This is very useful if you have your own experiment and you want to see whether your sequencer actually created long reads or it didn't. What kind of reads do I have in my files. So now that this is done and maybe in your case you will go to more quality control because you have your FASTA files and we are ready for the next step and the next step will be to do alignment and for that we will use mini map too and the parameters are also explained here but I will go through them with you for now. So we go to the two mini map which you can find like that and what we are going to do now is we are going to map the reads from our FASTA files to itself. So if we have RB01 we will try to map the reads that are in that file to each other so we can build upon the reads that we are having. So kind of extending the reads by mapping it to each other and for that we will choose some of the parameters here. So we will use select a reference genome you can use a reference genome if you just want to do mapping with as I just explained we want to use our own genome so here we select use a genome from history and build index. Here we want to use our collection that we have created in the beginning which is plasmids indeed. Then we really have to select what is single or paired entries so in the nano port case it will be single entries but for Illumina you might want to choose for paired entries and then we have to select a fast queue dataset again we will select the plasmids so now you see that we have selected the plasmids twice because we want to map it to each other and then a profile for preset options in this case the some of the more advanced options are predefined and in our case we will use the Oxford Nanopore all versus all overlap mapping because that is exactly what what we would like to do we want to overlap our Oxford Nanopore reads and here you can see the options that are used while doing that so you click that and because we want to use Miniasm afterwards we will need an output called Puff and therefore we have to go to the set advanced output options and we want to select the output form Puff and then we are ready to execute so something nice from Galaxy in my opinion is that now that we have mapped with Miniasm there are some with Minimap there are some recommendations for tools to use afterwards so I think Recon is for example a very good suggestion we won't use it now directly but later on you will see that we will actually use it after the map with Minimap step so meanwhile the mapping is performing and I will come back to you when that is finished and the mapping step is done so now we have the output from Minimap and we will have a look at it here you can I will again show the RB01 but feel free to look at the others here you can see the output which is the so-called Puff file and it is a tab separated file where we have some information on the query string the query sequence length the start and the end position the relative strength and the read that it has mapped to and then more information about the other mapped read and this data files we will use to go to the next step which is the assembly step so again we will go back to the tutorial we have done we have gone through the uploading we have looked at the quality we did the pairways alignment using Minimap and now it's time to do the assembly so now what we would like to do is we try to assemble the overlapped reads that we have just created with Minimap so what we will do we will use Minimap and that is open like that and then what we want to do is we want to select our sequence reads which is not the separate file but our data collection again Plasmids these are the original reads that we have used and then we have to select our Puff files which you can do by selecting this and then we want to have the output of Minimap in this case map with Minimap across collection 13 for the rest we will leave the options as it is and we will start executing so again mini sms running and I will come back to you when that's done so now that the mini sms done we can look at the output again and we have some gfa code output which is an assembly graph but we aren't finished yet because what we would like to do is we want to still improve our overall assembly our overall context so what we want to do is we want to rerun the Minimap stamp once again and but because Minimap cannot use the output from Minism we will transform this gfa files back to FASTA and this we can do with the following two as you can see here and we have to choose our input files which is the Minism output so again we click data set collection and indeed we use the Minism output the assembly graphs and we execute when that is done we are going to perform again Minimap and I will I will just continue with the next step because that is also something nice that you can do in Galaxy is that while your previous step is still running you can already start your next step so we will open Minimap to again this time I will use the search option of Galaxy and this is then the tool that we want to use again and I will go through the options in this case we want to again use a genome from the history and build index but and we still want to use a data set collection but instead of using the original plasmids this time we want to use the FASTA files that have just been created with gfa to FASTA then the single or paired and it's the single end and this time we don't want to map it to our to itself but this time we want to map it to the previous to the original plasmids set and for this for the next step so for improving we use a slightly different preset options in this case we will use the back bio oxford nano for read to reference mapping and again because we want to use path as output we will set the output format and that is it then it's ready to run when this step is finished the again the mapping what we have done is we have mapped and assembled all the reads but what we would like to do if there are multiple nucleotides possible on one position we would like to clean them up we want to make one consensus and for that we will use reckon which is uh is this uh so to say cleaner up uh the consensus module where you can create this one consensus in the end so we will use reckon here you can see reckon a consensus model for rather novel assembly of long uncorrected reads so that is exactly what I just described and what we want to do then for the sequences we want to use the original plasmids for the overlap that we have just created is going to be the path from the newest minimap run and the target sequences are the ones that are created with gfa2fasta so the the file that came out of miniism basically but that we have transformed format from gfa2fasta and then we are ready to run execute so by doing by using these three tools mini map to mini sm and reckon all we have tried to do is to go from separate sequences to one assembly and a cleaned up assembly something that we trust and what we then would like to do is we want to visualize what have we created and for that we will use bandits so to go over it once again is we have obtained the prepare and prepare the data we have imported we use nano plot to look at some of the uh properties of our sequences we did the alignment using mini map we did the assembly using mini sm we did the remapping again with mini map and then we use the consensus model reckon to clean up our consensus and now we want to visualize the assemblies using bandits and for that we will use the bandits image option and here we want to use as input the assembly graphs created with miniism and therefore we don't select the separate faster but we still want to select a collection but we here we can't use faster files so we need to you have the assembly graphs files that are exported by miniism and for now we will leave the parameters like this but you can change the image and the height the the image height and the width of the image you can add some labels you can add to change the font size or the file type that we have and then we press execute and when this is done I will show you the output of one of the bandits images that we have so now the bandits bandits is ready and we will look at one of the outputs and this is what it is well what can we see this might be maybe a plasmid this can be some chromosome but actually we are not that sure and that might be because we have gone through the whole process only once but maybe you want to do actually even more steps of going from mini map to miniism to reckon and and so on and so on and actually this is wrapped up in a tool called unicycler and so the next step that we will take is we will run unicycler on our original input files and therefore we will use the tool unicycler which we can find like that unicycler create assemblies with unicycler so here again we are in the option section of unicycler and what we would like to do is for the paired or a single end data we select none because we don't have that and for the long reach that is where we are going to select our original FASTA files so here we select the plasmids data collection plasmids and the rest of the options we will leave as it is and so again what unicycler does it is repeating the steps that we have just taken one by one and so this is what we will use unicycler for and unicycler has a couple of other advantages it is specifically made for circular DNA it will do multiple rounds of the improvements improvement of the sequence accuracy so the steps of minimap mini as a man reckon and it will even rotate the sequences over and over again to get your most polished output and so we have selected all the options and we can press execute and when that is done we will create again the image of the output with the bandage image and we will compare them so what we want to do is we want to run again a bandage image and this time we would like to input the output of unicycler the final assembly graph output so here we will select the data set collection and we don't want the final assembly but we want the final assembly graph the rest of the options we will leave as it is and we will press execute so now you can really see that the bandage image tool has been queued and it will start running as soon as unicycler is finished and that this will take well so I will come back to you when that is done so now that that is finished we would like to look at the images again and see whether we see a difference so I will open the one again for RB01 and view the data so now I think you can see that there is definitely a use difference with the previous one here you can see two clear plasmas and maybe one grommazone and some left over but what we actually would like to do is compare them and for that there is an option in Galaxy to open multiple data sets at once and therefore we will use this the scratchbook we will enable it and then we want to view our data and we can resize it and we go back and we want to do the same for the previously made output and we can actually search for our bandage output and search and here you can see the previous created output and we open it and then we want to view the data also from RB01 and so now you can see the two different outputs next to each other you see so on the left side you see the output from our main steps that we have taken manual and you can see this maybe plasmid and some grommazones but when you use unicycle for all the steps it is optimized to detect plasmids and grommazones and so this has a way more clear output and more likely to be correct so here you have two plasmids one grommazone and some left over and here you have some blob that might be a plasmid but it isn't very polished and some long strings which might be actually a plasmid so for this you can use the scratchbook to view multiple images or output in the same time I will turn off the scratchbook so when we want to look at the output we can do it one by one and so now that we have decided that we have a good output something that looks good actually we want to find out is it really what we think it is so for that we will use plasmid which predicts whether a sequence is a plasmid sequence or a grommazone DNA and it does that by a model that is trained on full genome and plasmid sequences and they claim that they can differentiate between plasmids and grommazones with the accuracy of reaching 96% so that is pretty good and so we will open the tool plasmid like that and we want to select our final assembly created with unicycler and then it's ready to execute and what we will look at when this is finished is whether certain contexts are classified either as plasmids or as grommazones and so this is actually another confirmation whether what you find found is actually a plasmid or not and plasflow will take a while to run so I will come back to you when it's finished and you can get a coffee or something to drink maybe if you want to take a break now so welcome back I will show you the output now of plasflow and specifically I will show the probability table that it has created and again I would like to look at RB01 and when you view the data you get this table corresponding to the four parts of the image that we have just seen so you can remember there was one grommazone like structure and there were two circles which could indicate a plasmid and there was some left over and actually when we use plasflow it says that all four sequences are likely to be a plasmid so it is possible that we couldn't reconstruct the correct plasmids in all cases but at least in two of the cases yes and you can see that it classifies it as a proteobacteria to be most likely and this is all the probabilities that it belongs to for example a plasmid belonging to a firmis suitis or some other and there is also a possibility that it is a grommazone belonging to this so that is how you interpret the output of plasflow now now it's finally time to look at whether there are anti-microbial resistant genes on these plasmids that we have found and for that we will use star AMR and star AMR is a tool to detect anti-microbial resistance gene and it uses that and it uses restfying their point and plasmid finder to do so so we will search for star AMR there it is and we want to again upload our output from unicycle but you can also choose to only upload the plasmids or the grommazones that you have identified with plasflow and then we execute and so star AMR is trying to detect the anti-microbial resistant genes by using restfying the point finder and plasmid finder and it creates multiple outputs which are described in the tutorial but I think for now what is most interesting is going to be the restfying their output the tsv which state the different genes that are present so here we will open the output and here we can see per input contact the gene that has been identified when you have found one of these genes for example dfra12 you can go to the cart database which is a database with all information about anti-microbial resistance genes so here you see cart and you see it is a comprehensive antibiotic resistance database and you can search for the gene that we have just found and it will give you all the information about it and feel free to go through the information that you find here per gene on your own in your own time it will describe some resistance mechanism as well and so this is going from your input sequences to some findings and that is the tutorial I would like to end with a small summary so let's go back to the tutorial and go to the conclusions so as I have shown we have gone through the process of mapping assembly remapping and making a consensus module but I have also shown you that there are is a specialized tool for doing these steps automatically by using unicycline and that improve the outcome quite a lot nevertheless I think it is important to understand which steps are taken in a tool like unicycline and that maybe you would like to change some of the tools for example instead of using mini as an nowadays fly is another used tool for the assembly step and so you could change your workflow based on what you think is most fitting to your data and when we have come to a consensus sequence we have scanned for resistance in using star AMR and we have predicted whether it is a plasma is if it is really a plasmid sequence using plus flow and then on the galaxy side I have shown you how you can use collections how you can run multiple files all at once and we have looked at the scratch book where you can view multiple outputs at the same time next to each other I think that are the key points used on the galaxy side and with that I would like to conclude this tutorial and I just wanted to point you out that actually on the nano galaxy part we have also published the paper describing some of the workflows that we are using a use case and some of the tools that we are using here and you can find it by going to the nano galaxy publication so this is the article I'm referring to nano galaxy nano for long read sequencing data analysis in galaxy and it will explain a bit more about the workflows that we have put in and some of the tools that we have added etc that's it thank you for listening and I hope you enjoyed it