 start. Hello everyone. I'm Francesca Brunad. I'm today's host. And with me is Yuki. She is giving an introduction to NF Core Nanoseek. Welcome. Yeah, thanks for the introduction. I'm Yuki and and I have been maintaining Nanoseek for the past two years since I picked it up from her Shield Hotel. I'm more awarded and Chelsea Sawyer and also training. And then so I've seen it like through it's like DSL One Days converted it to the initial DSL Two first syntaxed and like updated it to that to the newest DSL Two syntax with the help with with a lot of help from Chris Hackert. And so Chris is from Secures Lab and um and he was previously in the University of Tobengen and um yeah so if you have any questions like on Nanoseek you can always reach out to us and um yeah so without further ado I'll just tell you a little bit about Nanoseek. So NF Core Nanoseek is a bioinformatics analysis pipeline for a nanopore DNA and RNA sequencing data that can be used to perform base calling, demultiplexing, QC alignment and also downstream analysis. So um so in this in this like bite-sized talk like I will briefly introduce you to like what is nanopore sequencing and why we need a specific pipeline for nanopore sequencing data and um and so um and so because like through the slack channel um conversations I pretty much have found like um people having um trouble like understanding like how to run Nanoseek itself so I'll just like go on to the basics of like how to run like um different parts of the Nanoseek and also like talk to you about um some latest um latest um addition to the pipeline itself which we are very excited to introduce you to. And um so I don't know how familiar the audience is with um nanopore sequencing so um nanopore sequencing is a sequencing technology that's provided by um Oxford nanopore technologies and um so where here you see that like um there's this like string of nucleic acid like going through a pore and this is called the nanopore and as it grows through a pore current signals are emitted from the nanopore itself and these current signals can be um can be translated into their um specific um nucleotide basis and um so so nanopore sequencing like most people know what like most people know what as like one of the third generation sequencing technologies and it outputs long read and um so the longest read is around like um 2.3 megabase and um it's a nanopore sequencing was used in the um teal mirror to teal mirror consortium which um completed um the human genome finally and um and it is also used in identifying RNA isoforms because it can um it can sequence full length RNAs and so um nanopore sequencing is different because like it has these kind of like current signals that's it that is outputted by the pore and so and so these kind of current signals is not available in other sequencing technologies and so with these um current signals um life one can use like machine learning algorithms to um to extract like um biological information such as um DNA modification RNA modification polyatel length and also RNA secondary structure and um without without like having to do like extra blood lab assays so um so here's nanoseed so it is relatively confluted but um so here um so chris hacker created this um figure which is like so nice and um and so um so it is color coded like based on the kind of sample that you have so we have three different subway lines here so for the blue line there's this like DNA DNA sample and for the green line you get the direct RNA and also like cd and cd and a that's like aligned to the genome and for the um orange line is the direct RNA that's like aligned to the transcriptome and um and so and the subsequent slides i will just make it bite size and and you can um and i will just talk about like what kind of like stuff we can do with the we can do with these um these three lines so um so for the first part of the pipeline um it involves base calling um demultiplexing and also um qc and also alignment so um base calling starts from um a fast five file uh a fast five directory containing multiple a bunch of like fast five files and um and so like for the pipeline input you have to input it as like as um with the input path flag with the fast five directory and um to correctly base call your sample you have to um specify the flow cell and um and so demultiplexing can start from either either the fast five directory or the um fast q undemultiplex fast q where um if you have like a fast five um undemultiplex fast five you can um demultiplex it with guppy and you can output like a demultiplex fast five um files like with um o1t fast five api and um and if you have like um an undemultiplex um fast q file you can demultiplex it with q cat and um and so um you also need to like specify the barcode kit in order for it to be um demultiplex like correctly and um and so after the demultiplexing of the fast five like you like we we implemented um pico qc and nano quad for um for checking for quality checking the fast back files and for the um demultiplex fast q we have these um fast qc and also nano quad for checking quality checking the um the fast q files and so um so alive bed can um can either take in like um the fast q files from upstream processes or um or from um user input where um the fast q file is already demultiplexed so um for the blue line um which is the dna and so and so we so our chris um added these um these um dna variant calling tools um for for dna small variant calling and also for um for structural variant calling and so um so you can choose between midaka and deep variant for small variant caller and um you can choose um sniffles and qcb for for um structural variant caller and um and so um the default is midaka for small variant caller and sniffles for um structural variant caller and if you have more question if you have any questions on dna structural variant calling you can reach out to chris he knows a lot more than i do on this and um and he is also on the call so he'll take questions like if if you are if you are interested in these and um and so for for rna so um if you have like cdna sample or um dark rna sample you and if you align it to the genome you can do um transcript discovery and also quantification and um and so these process these taken a sort of band from the sand tools and um and so the default is bamboo so bamboo dust both um transcript discovery and also quantification and um we also have another option which uses string type two for transcript discovery and featured parents for quantification and after after um transcript discovery and quantification um if you have more than one group of samples you can also do a differential expression analysis on the g level with dexc2 and um and on the transcript level with dxc so on the green line we also and this is also a new functionality where um where um we we implemented we um included um java l for detection of um rna fusons and um and so it takes in a fascia vial from either from either the sample sheet where um given that like it is demultiplex and um we can um take it like from upstream processes like if you start from uh if you start from fascia vials or um or undemultiplex fascia vials and so for for the last part of the new the newest addition of the pipeline is the um rna modification detection which um this one is a little um different in a sense where where um per sample you should have like uh you should have a larger directory per sample and within the directory itself you should have like a fast five directories uh fast five sub directory and a fast q sub directory and within the fast five sub directories you need to include like all the fast five vials in it and um in the fast q sub directory please only include one based called FastQ file in it. And so it goes through the alignment to the transcriptome converted to BAM then prior to prior to our name of mutation detection, nato-polishes run for segmentation. And if you only have a single sample, you can detect M6A with M6A net. And if you have multiple groups of samples, if you want to see like the differential modification that's across the samples, you can run explore. And so that's the differential modification analysis. So just to summarize, so here is the nano-c pipeline. So so there are a lot of tools being included here, but you don't have to install anything other than next flow and Docker or Singularity or Kanda, depending on whether you are using an AWS cloud or you're using NHPC. And now like with the latest release of nano-c, it supports DNA variant calling, transcript discovery and quantification or any fusion detection and also any modification detection. And yeah, thanks for listening and I'm happy to take any questions. Thank you very much. Am I visible? I have to remove the spotlight. Anyway, I have now allowed everyone to unmute themselves if they want to ask questions. Otherwise, you can also put questions in the chat. There's actually a comment that we have in the chat from Olaitan. Sorry if I butchered the name. He says that T2T completed A human genome and not D human genome. Perfect. And also he thinks that Sniffles 2 exists, which is an enhanced caller for structural variants. Have you thought about Sniffles 2? I know this was not even your main part, but hey, yeah, I can probably jump in. Oh, yeah. So yeah, I've seen that come out very recently, maybe like the last four months. So Sniffles was initially added about 12 months ago. It's kind of the first caller that we were interested in. In my opinion, it's also superseded by QTSV, which when we did testing, it was actually the best caller. But in say now, I haven't actually tested Sniffles 2 with a full dataset. So I can't be sure if it's better or worse, but something we can definitely look into for a quick add in the future. Yeah. Thanks, friends. Hello. Hello. Hi. Can you hear me? Hi. Thanks. Great talk. I'm just my interest lies mostly native or native transcriptomics. I wondered if there is a way to add poly A tail measurements. Yeah, that's like on our radar. So we actually talked about like adding the poly A tail length detection to it, because like, yeah, our like one of our lab members actually added like the poly A tail functionality from Nanapolish, but I'm looking into tailfinder right now. So yeah, do you have any specific on poly A tail length prediction tools that you want to add in? Well, Nanapolish tool works really great with the native RNA. I think tailfinder can do the CDNA. However, it uses CPU and it's kind of slower if you have a big dataset. While Nanapolish is much faster to do it and nothing there is a shiny app where you can visualize it. And one more other thing is while we're at the three prime end, is there any tools available to add to this pathways for alternative polyadenylation characterization? Do you have any tools that you have like specifically in mind on like poly A alternative polyadenylation? For a long period. For a long period of the moment, I am just looking at something called LAPA. It's a GitHub. I think the group is still working on a paper, but it's in GitHub. It's a long-read authentic pollination in the government, LAPA. LAPA? Yeah. Yeah, because I am aware that like there are there are short read polyadenylation tools such as KWAPA, LAPRAD, or there are quite a few tools out there that do that and yeah. So I'm not exactly sure like how translatable like those tools are to Nanaport reads. And so yeah, so it will be great if you can suggest several long-read tools that we can look into too. Yeah, because I think there is a long roundabout way of getting it through FLARE and then yeah FLARE and then there is I think TAPAS has got something, but I think TAPAS is more TAPAS oriented. TAPAS, as of like, because I'm aware of like TAPAS. Yeah, okay, cool, cool. Okay, because I'm aware of the fact that like there's another TAPAS which is like TAPAS, like all tabs. So yeah. No, I saw that one. The thing is this one is I think they've got their own kind of Java user interface, but underneath it, there is a lot of kind of FLARE and scanty three that's where it does some of the transfer variants of poly detection and finding of APA sites, but it's something obviously it'll complete the pathway quite nicely to take it from everything that you have currently to APA and also polyadenylation. Thank you very much. Yeah, I think I can just as a follow-up to that, that the group that did the TAPAS, you know, they did the very robust, would I call it software now because I saw the demo by the PI of that group. I mean, it does a lot of things. If you're doing anything isophone related, you just need to, they've done so much work. So you can, since you're doing transcript discovery, you can just figure out a way to maybe just find some of the features that they have implemented in things like TAPAS or there's another one or CAS, you know, where you can integrate into your pipeline, since you're already doing transcriptomics and structural variant stuff, you know, with longer, so that would really help your pipeline to become more robust. Okay, great. Thank you. Thank you for these suggestions. There's also another question in the chat. Edo is asking, do you plan to include genome assembly tools to the pipeline? Yeah, yeah, we actually talked about this like during the lab meeting last week, and we hope to include raving into like the genome assembly because I, I suppose it is also nanopore based. So, yeah, so we are looking to include that. And do you have any specific tools that you have in mind that you would suggest us to include it into? Edo, you could also unmute yourself if you wanted to, otherwise you can write in the chat. Yeah, thanks. Can you hear me? Yes, yeah, thanks. Well, I don't have anything in particular in mind, but you know, just the common sort of pipeline tools, you know, whether it is in RACON or or Nanopoly, sure, you know, just just the normal, and then, you know, it depends obviously which, which organism we're looking at. But, you know, I would say just sort of the common pipelines that would either take Nanopore reads by itself or whether taking hybrid both Illuminae and Nanopore to do the assemblies. But yeah, we're just looking at, you know, some of the common pipelines, but just put them inside this, this workflow will make it very, you know, very easy to use unless there's something that's already a pipeline that exists that is designated for genome assembly. Okay, great. Awesome. Yeah, thanks for these suggestions. Okay, there is another question in the chat. What is the best practice process for Nanopore sequencing metagenomics? That's a complicated question, actually. What's your answer? Yeah, that's a good question. I don't know about a genomics. Yeah, I think I think I can jump in. So somebody did a benchmark of metagenomics tools and really came up with a conclusion that there's no best. So I think that's the simple answer to that question, because some of the well-established ones didn't even perform well when he did his benchmark. And he presented these at a conference, I think about a month ago that I was, I was proud of it's a long read conference and conference where they were just doing different kinds of presenting tools and stuff like that. And you know, when he was, when he came to the metagenomics part, when he did his presentation, there was no, for his conclusion, there was no best metagenomics tool for a long read. Yeah. Interesting. I have a question for the NFCORE core team. Is there a pipeline for metagenomics? I suppose Mack is for metagenomics, right? You see someone from the core team here. I was just checking that. I think there is something that's pretty close. I think in terms of Nanoseek, doing metagenomics might be slightly outside the scope. I think one thing we've found developing this pipeline is that it's already, it's a bit of a beast already. You know, you could easily split this pipeline into three different pipelines. One is the DNA, one for, you know, sort of standard RNA, so you can run for kind of these isoform detection. Yes, we could look at trying to include it, but I think it personally, I think it might be a step too far. And I am suspicious that there is another pipeline that does some form of metagenomics, but I can't remember the name of it. So please don't quote me on that. Hi guys. Just one more comment or kind of a question about including short reads. I think that's what the TAPA is kind of pathway kind of includes, includes a lot of short reads for transcript invariance. So it could be beneficial for both the assembly, but also for more complete transcript kind of invariance, if there is an option to include a little bit of short reads. And obviously it increases some depth if you want to do the DSEC kind of differential expression analysis because I don't know if the depth is high enough with some of the sequencing. So that's another comment. One more comment is about the NOVO modified base detection with Tombo. Can that be added? I know Tombo is not really well. Yeah. We are exploring that right now. Okay. Thank you very much. Great work. So there's a comment in the chat that talks about, I mean, the scale or the scope of this work you are currently doing. I mean, he was just saying that, do you think that this workflow is becoming too big? And I was actually going to say the same thing. So what I was going to say is that you can actually focus for now. Maybe you don't have to take my suggestions on maybe the RNA based analysis and just make that as robust as you can or focus on the DNA based analysis. And then when you think that one is decent enough in terms of scale, you can then move on to the other one. I think somebody is just making a similar comment. Yeah. I think that's a good suggestion. Yeah. I think Chris and I are a good team that he does the DNA part of it. And people from the lab I work at, we do the RNA part of it. And yeah. I think we also have some concerns about when is the pipeline out of scope? And so definitely that's on our radar to think about. Chris, do you have anything to add? Yeah. Look, I completely agree. I think initially last year when we started adding all of these new features and it made sense because the front end of the pipeline was more or less the same and it made sense to recycle it. But now with a lot of the pipelines kind of building these really awesome sub-work flows, which can be shared and integrated multiple pipelines, it would be nice to kind of lean into that side of the community and kind of share what we're doing and sort of be shared with as well. One of the things that Nanoseek is that the sample sheet is a little atypical and that you can specify genomes for different samples or a different genome for different samples and different alignments and things like this. So Nanoseek is already kind of coloring a little bit outside the lines of your typical NF core pipeline. And it is something we've spoken about and I think we'll speak about it again very soon about kind of trying to bring it back to that kind of NF core way of doing things. And as a part of that I could see that we may consider splitting the pipelines but that's something we'll have to talk about soon I think. Yeah for sure. Yeah thank you. Okay if there are no more questions at this moment I thank you again Yuki. Yuki, yeah. I'm so sorry. And I also would like to take the chance to thank the Chan Zuckerberg Initiative for funding these talks. And if there are any more questions to anyone here you can always come to the Slack channel for Bite Size Talks or specifically for Nanoseek and ask your questions there and you might get an even more detailed answer. So thank you very much everyone. Thank you.