 So hello everyone, Maxime here. I'd like to welcome Annick Renneré from clinical genomics. Carly is going to sit. She's going to talk about the RNA-efficient pipeline, which is a pipeline I, because I like a lot because I used to work on this one. I'm still like helping out a bit, but there's definitely, she's doing like way better than I was doing at the time. Yes, thank you Annick for that already. But before we start, I'd just like to quickly change the cutback initiating for helping us out and organizing these bite-sized talks. And two other four listeners. So you'll be able to unmute yourself at the end of the talk for questions. So Annick, you're on. Hi everyone. So yeah, I'm Annick. I'm the main developer currently of RNA-efficient. And I will lead you through, hopefully short technical introduction, a little bit, what is our goal with the pipeline, what we can get out and how to use it in a few words. So I will start with a little bit. Interesting, yeah. Our angle with the pipeline. So we are coming from the clinical genomics unit in Stockholm is having a lot of links with clinical diagnostic. And we are providing analysis tools that help diagnosticians when they are reporting back to patients. So we are really part of routine clinical care. And clinical genomics is sitting in CYLA FLAMP, which is a conglomerate of four different universities that work together. So we are part of an organization that is part of the new organization, et cetera. So hence the multiple applications. So we need fusions in, because they have been detected increasingly in many common cancer types. And they are a very valuable tool for diagnostic purposes. The first versions were developed during Martin Approx master thesis at CYLA FLAMP. And he has been building on the work of others and exactly Maxime has been contributing to it a lot. But we have a lot of different contributors along the years. Unfortunately, it got outdated as many scientific software do because we got a lot of other things on our desk, I'm guessing, and the software got updated. The database got, bases got much better. And all of a sudden, when we wanted to use it end of last year, the pipeline was effectively broken. We couldn't download the references that we needed to run. So there was need for some rework. So that's when I came into play. And so there is the now version 2.00 2.00 has come out already. And it was a complete rewrite and an upgrade to DSL2 syntax. It includes flexibility so that you can make the pipeline do more or less what you want. Otherwise, just open an issue and we'll see what we can do. We see a lot of options and adding visualization and quality control tools. So the main goals is to detect fusion in RNA sequencing, but there are many different ways, different tools to detect fusions. So the idea is to combine the power of the tools available and to compare them. To compare them between themselves and also with databases of fusions that are already present, which can help you in case you're looking for a common fusion type, but can also, if you're looking for another fusion, you might want to go further than just databases. So this is just an indication. It is also completed with visualization tools and quality control. So the pipeline overview looks a little bit like this. So you can imagine it like a network of different subway lines. You can take any of the subway lines, all of them, or just maybe Ereba, Squared and Bisley, and you maybe don't care about fusion catcher and star fusion, something like that. You also have a parallel line that is considering of the quality control and the core analysis tool, which will lie here, where the fusion reports, which is a tool developed by Martin Prox, that basically takes all of the fusion detected by the five different subway lines, put them together and check, okay, is this fusion identified by this tool and is present in this database? And then once we have looked at this, we take every fusion that has been identified by two tools or more and we look again in more detail into it with fusion inspector, collector statistics, et cetera. So here is how the output of fusion report looks like. You can see that it has an interesting dashboard where you have all the tools, known versus unknown fusion databases, and by how many tools for fusion detected. So here we can already see a little bit that Bisley, in our case, was very sensitive. So it detected many fusion. Now I'm gonna try to do an interactive demo. Let's see how it works because this is also the table here is very nice to look at it a little bit more in detail. Do you see the browser? Yes, no problem. All right, good. So you can see that here I can highlight how many fusions were identified with Bisley and if I hide the fusions identified by Bisley, I can have a look which tool identified how many fusions which is quite interesting. And if I remove fusions that were detected by one tool, probably Bisley, then I have a bit more for detailed panel here. And this table though is very interesting because you can sort how you want it, like change the orders, et cetera. But basically, so this is a sample that is artificial. You will probably never see this, hopefully in a natural sample, but this is a sample consisting of 20 fusions. So you have those in the sample. And as you can see, they are found out by all the tools, corresponding to the five tool heads. And then you have a scoring function that depends on the number of tools that have found the fusion and also on the different databases that have found it. So that's a quite valuable tool if you want to compare between different tools. Now, coming back to the different results, we can have a look at Fusion Inspector. So here is just one side, the HTML output of Fusion Inspector, but there is a lot more and I really encourage you to run it and look for yourself if there is something that can be of interest to you, if you're interested in a special part of the fusion. There are BAM files, there are a lot of tables of statistics. So this is just an overview, but again, it's an interactive table, so you can again look at fusions and you have also some visualization possible in the browser, among others. So you can see some statistics here and a bit more about the gene and their position. The last visualization tool that I wanted to show you now is Ariba visualization tool. So it's only done for fusions that have been identified with Ariba. And so you basically get a PDF file out which one slide per fusion and this is one fusion. And you can see that you have a very detailed view of where's the breakpoint happen. So you, oops, sorry. So you can really have an idea of the sequence, et cetera. You can also have a quick look at the retained protein domains which might have already an importance pathologically and a few supporting read counts like statistics. Now, a little bit about how to use the pipeline and what you would have to do is first build the references. And this requires patience because at the time we are building star fusion reference from scratch and that takes about 24 hours on HPC. So, yeah, just don't be surprised that it takes a long time. Yeah, it is what it is for the moment. I'm hoping that to make it shorter at some point if I can host the references directly is a build references but this is something I'm working on. So you would have to start by creating a cosmic account and passing your username and password to the software. Then you would have to specify build references and the tools that you want. So here I put all because I find that it makes sense to build for all tools. But if you only want to use Ariba then you can just do dash dash Ariba. If you want to use Ariba and fusion catcher you would use dash dash Ariba, dash dash fusion catcher and you would only download source references. Then you need to provide genome space which is a pass to your references and out here which will be the output directory of the run. So in this case it will not contain very much because all of the data so to speak the references will be generated in genome space but you will have still the execution trace logs the versions, et cetera in the out here. So now to run the pipeline. So you just do not specify build references and it will run the actual analysis. Again, you have the possibility to do all of the analysis or you can just do any combination of the four tools that you want. So if you want fusion catcher and squid dash dash fusion catcher dash dash squid. If you want everything except PISLI just specify each tool and not PISLI. You also need an input this time. It will not complain if you do not have an input in the first, when you build the references but if you try to run the pipeline it will complain if you don't have a sample sheet. So you need to create a sample sheet with your sample. The first three columns are standard and if core. So as a sample name, FASQ1, FASQ2 and on top you have the strandedness that is depending on your library preparation kit. You need to link the genome space that passed to your references and the out there is this time very important because it will contain all of your analysis. Yes, so I included a few things to help you gain more flexibility in your usage of the pipeline. You might just use it very standard. You don't have to even look into these options but if you are looking into doing something more specific or gaining some time at runtime then it might be useful for you. You have the possibility to skip the visualization if you're just interested in the different results for the tools but not the visualization. So you would skip, with skip underscore this you would skip a river visualization and fusion inspector. Skip QC, you would skip the entire QC line. You could manually feed references pass for each tool if you have them in different directories. And you can also just run fusion inspector with the option fusion inspector only and then you will have to provide fusion inspector fusion and the pass to a file that you manually construct and that has a fusion that you want to look into was a sample and then only fusion inspector would appear. So as you can see, you can have a few possibilities to enter at different points of the pipeline. It has been suggested to me to maybe also add an alignment shortcut so you would feed manually alignments to the pipeline. This is a great idea. At the moment, as you can see, we are aligning basically for each line and this is because we have each time parameters for the alignments that are optimized for the different fusion detection tools. So it should perform slightly better but if you want to save time, you might want to bypass the step. So this is something that I think I will work with in the very near future. And about the future, I will talk a little bit about what's going on. So there will be a next release hopefully very soon with trimming, adapter and quality trimming with the possibility to run string tie as an extra line. And that will be helpful because there is a type of fusion that is not detected by any tools that is implemented at the moment and that is when you skip, for example, next zone, it was in the same gene and that should be resolved by using string tie. But again, if you're not interested in this type of fusion, you could skip it completely or run justice. And now I have no pressure, but I'm really looking forward to see if we find a solution for the AWS mega test so that we can host a demo on results on the website and then you can have a look yourself to the results that the pipeline can give you. On the in next few summits in Barcelona, I will present a bit more details hopefully on our implementation production, what sort of issues we are facing a little bit about data. And I'm hoping also to release a how to video with more details and hands-on demonstration about each command line option. Now, if you have any questions, happy to hear them now or feel free to reach us out on Slack or on GitHub, open an issue. It's been great to hear about the different experiences. Yeah, thank you for your attention. Thanks a lot, I think that was super good and super clear. I really like it. So now people should be able to unmute themselves if they have any questions. I think we have Anne who is like so say something. No, just show me how we do. I think I had a question. Thank you for the nice talk. So the visualization and everything is amazing because it really sort of puts these things into perspective. And for a while this pipeline stood out here because we don't really have that sort of thing for most other NF4 pipelines other than maybe the multi-QC report. So it's quite nice sort of having something that's customized that you can use to organize and query the results and stuff. I guess we should probably start thinking about how we do that for other pipelines as well at some point. In terms of references though, is it just human samples that the pipeline works with or is there to do others work? Is how easy? Because I tend to keep up to date with what's going on in the Slack channel but things get out there very quickly on NF4 as you know. What I've always been confused about is about the references that are out of the box compatible with the pipeline and how easy it is to create references to use with the pipeline because that's been quite a big issue recently in terms of creating these references and using them. Yes, so I won't say it's easy but it's possible. You could basically feed any reference that you bid yourself to the pipeline. The recipes are there in the pipeline when I built them. So if you feed a non-human mouse or something like that FastQ and GTF, then you would be able to build the references for non-human but again not guaranteeing that it's easy. So what's the problem? Where's the complication? You might not find the exact same types of files or you might be missing databases. So of course the whole database is human-based so you won't be able to compare but mostly I think it should be possible. It's more like searching for the right files, testing a bit. Okay, cool. Is there any... Do we have these databases on NF Core anywhere that are just easily pullable or is that part of what you are going to do next? That's something I would love to have. And that would reduce also our tests a lot because at the moment, as I said, it's 24 hours to build the references. So if I could host them somewhere, if you have some space, shout out. Yeah, I'm sure we could make it available somewhere. I mean, I guess the thing and slightly going off topic now but the AWS IDNOMs bucket we have on S3 has typically been used just for that and so we haven't really added any other custom files to it but it could be something we could maybe just push there. If it's tested and it works and we knows, then we can just have an S3 part and upload it. So if you get a list of assets together, then maybe post in IDNOMs and we can try and make it happen. Definitely, yeah. Cool, thanks for the talk and see you at the summit. Yes. Does anyone has any other questions? Then I guess we're good. Thank you very much again, Anik. Now let me stop recording.