 Hello everyone and welcome to the first bite size talk of 2023. And I'm very, very happy to have Sophia Stammoli present today. A new pipeline called NF Core Tax Profiler, which is soon to be released, I've heard. So off to you Sophia. Yeah. Hello everyone. So I'm going to talk about NF Core Tax Profiler, which is using the github's description, a reformatics best practice analysis pipeline for taxonomic classification and profiling of shotgun method genomic data. And in the talk today I will briefly introduce what is shotgun method genomics, how the development of tax profiler starts it. I will give an overview of NF Core Tax Profiler pipeline and how you can use and run the pipeline, as well as our upcoming development plans. So to start with, what is shotgun method genomics sequencing. I borrowed the description from Kings paper from 2017 that describes shotgun method genomic sequencing as the untargeted sequencing of full microbial genomes present in a sample. And it allows for the determination of the taxonomic diversity in a sample, and we may be looking at bacteria, viruses, fungi, archaea, or combination of those that are present in a sample. And the development started in February 2022 by James fellows Yates and Moritz Beber, and we at Caroline's Institute joined during the online hackathon in March. So and with that I would like to mention that this is really a community based development. And there are a few NF Core pipelines like eager and mug that support some sort of taxonomic classification. But they only support one classifier and its classifier is tailored for a specific purposes with and each one has its own custom output format. So there was really needed to have a pipeline that would support taxonomic classification and profiling of the method genomic reads against both multiple tools and it's possible with multiple databases. And there are at the moment, there are a few examples of how you can use NF Core tax profiler and some of those different context is a pathogen detection clinical methods andomics. There are comparative microbial diversity analysis and as well as detection of food DNA from ancient microbial samples but of course those are not only limited to those. And this is the overview of how the pipeline looks like and I will go into more details in the next slides. To start with, it supports both short reads and longer reads. And the first step is the sequencing quality control. Right now, fast QC is used as a default, but during hackathon in October in Barcelona, Falco has been added as a drop in replacement with supposedly improvement, especially for long reads. The user can really choose between either fast QC or Falco. Next, we have the preprocessing steps. All of those are optional and up to the needs of the user. And we have dedicated tools for its sequencing technology. The first step is the adapter removal where fast P and adapter removal is supported for short reads and ports up for long reads. And then we tax profiler allows for removal of low complexity reads with BB duck and print sec plus plus for short reads and fit long for long reads. The user can also choose to remove the host reads using bow tie to a liner for short reads and minimum up to for a long reads. And as the last step of those preprocessing steps tax profiler allows of concatenation of multiple fast Q runs libraries of a sample. The last step of tax profiler is of course taxonomic classification. Right now we support nine classifiers slash profilers with cracking to being paired with Bracken cracking unique Metaflan three malt diamond centrifuge and kites and MOT use. The profiler can be executed with multiple databases. It's with their own settings, and each profiler has its own output. And because of that that each profile classifier has its own output format tax profiler supports standardized and aggregated tax on count tables. So the type of tax pasta that is a Python packets and with more it's better is live is leading the development and it stands for taxonomic profile aggregation and standard standardization. And I'm also I added the link to the GitHub repository. And this is and in this slide I'm going to talk about how tax pasta. So here you can see an example of how the output of cracking to classifier looks like it has a six columns, the percentage of reads covered a number of reads covered the number of reads assigned. And this column here is the taxonomy describes the taxonomic level. This one describes the NCBI is taxonomy ID and this is the scientific name of its action. And this is how the output from Kaiser classifier looks like it has five columns. It also has header and it is very different from cracking to and this is the case for all the different classifiers. So with the tax pasta. We are really able to have a standardized output format for its classifier, and the output format looks like this, it has two columns, the first one is describes the taxonomy ID, and this column describes the read counts. About how to run the pipeline, one would need a two input sample sheets, one describing the fast queue files and one describing the databases. Regarding the sample sheet describing the fast queue files. This is the format how the format looks like. The first is the one should describe a unique sample name. The user can add a run accession, and the user should also describe the name of the sequencing platform, as well as the path to the fast queue files. Regarding the sample sheet describing the databases. This is how it looks like it is four columns. This one is the first column describe one should give the name of the classification tool. Here is a unique name based on the database. In this column, the user can specify the parameters that they would like to use, and the fourth column describes the path to its database. And this and about two one and two two in the argument here. Those can be replaced by its classifier profiler that is desired by the user. And the last argument, like this one, the perform step. This can be replaced by pre processing or post processing steps. About our future plans, we'd like to support more taxonomic classifiers, particularly for long reads. We would like to add an assignment validation step by aligning master reads to identify the genomes, and we would like to add the workflow for database construction. But before we go on with the implementation of those. Yeah, plans, please stay tuned for the first release in January. And with that, I would like to thank James Feliciates in Germany and Moritz Beber in Denmark, as well as my colleagues here in Sweden, Tanya Normark, Malvas, Jamie, Lauren, Missy Lacks, and of course all the collaboration collaborators that contributed with different classifiers and issues in tax profiler. So if you have any questions, please reach out to our Slack channel with the hashtag tax profiler. And yeah, that's it and I'm happy to answer any questions. Thank you very much Sophia. I don't know any questions in the audience. So you can either write your questions in the chat or you can unmute yourself. I allowed that now for anyone. If they're now questions at the moment, I actually have a question. I was wondering why there are so many of these profilers, because I mean if there was one that actually would work properly, then you would only need that one. Like, yeah, the methodomics field is very broad and with those classifiers, they're based on different algorithms and they cover different needs. And the final output that you have now is that an average of what the different ones detect or we have a different output for its classifier and we have with the help of tax pasta, we are able to have a standardized output for each of those classifiers. Okay, but you will get a separate output for each classifier. Yes, yeah. Then we have here questions in the chat. One is from Juan, do you have to download the databases manually? Yes, I would not support it right now. It's in our future plans maybe to add a workflow for database construction, but the user has to do it by themselves right now. And then a comment from James, I guess it is for the profiler question I had. He says it's also a fun problem for computer scientists. Thank you. Okay, are there any more questions? It doesn't seem to be like if there are questions later on, you can always reach out as you mentioned in the Slack channel for a tax profiler, or also in the Bite Size channel. Otherwise, I would like to thank Sophia again for this great talk and of course also the John Zuckerberg initiative for funding these talks. Thank you very much everyone and I hope to see you next week.