 Hello, everyone. My name is Francesca Bonatt. I'm today's host for the Bite Size talk. And with me is Robert Petit. He is from the Wyoming Public Health Laboratory and is going to talk about a, about Bactopia, which is not part of NFCOR, but is using NFCOR components and NFCOR pipelines. So, off to you. Awesome. I'm so sorry. I did mute you by mistake. You can unmute yourself, I'm sure. There we go. Oh, I'm so sorry. No, don't worry about it. It's me rubbing off on you. Thank you for having me. I'm super excited to be here today. I'll just kind of be introducing you to Bactopia and how I am. I'll give you a few use cases of how I'm using NFCOR, some NFCOR components in Bactopia. But first, a little motivation on how Bactopia came up. In the last 10 years, we saw a quite nice growth in the bacterial, the public available bacterial genomes from ENA, SRA, DDBJ. We went from about 7500 in 2010 to about 1.5 million, while that pells in comparison to COVID, which I think is 12 million plus now for bacteria that is quite a lot. So over these last 10 years, we've also seen the rise of containers and packaged managers such as Docker, Singularity, and Bioconda, and then workflow managers and the curators behind those such as Nexvill and NFCOR. And so in 2010, maybe we couldn't use all the data, but it really starts in 2022. It really makes me wonder, can we make use of all these public available genomes? Again, in 2010, I remember passing tar balls with binaries and then email and sequence and instrument groups say, can I get a binary to your assembler and stuff like that. There's no real sort of it was just kind of make install and hope it installs, but now in 2022, I think we're kind of there where, you know, we have the tools and we have the talent to start gluing all this stuff together and start using these data and our own analysis and one, you know, why would we want to use this data. And for example, a good example is if you have a small outbreak at your local hospital or a foodborne illness that comes out of some carnival or something. And you want to compare your genomes to what's already been sequenced. That's a nice use case of making use of those 1.5 million genomes that are available. And so, to address this, Tim Reed, who's at Emory University and was I master's mentor and PhD mentor, Hem and I developed Bactopia, which is a next flow DSL2 pipeline for the complete analysis of bacterial genomes. So you can, because it's written in next flow, you can go from a single genome to 10s of thousands of genomes with a simple parameter change and we were able to as just kind of use the necessary proof of concept because we wanted to and we were able to use Bactopia to process 67,000 genomes and just five days using AWS and a lot of that was, you know, kind of being able to prototype one, a laptop or something and then switch to our AWS profile and boom we're off and going. And that's kind of kudos to the next flow for most of that. So, Bactopia, we try to include as many NFCOR practices as we can to ensure things like reproducibility and audit logs and all that. And again, because it's next flow, it's extremely portable, and you can go laptop, HPC at your university or something between all the cloud platforms within just a few parameter changes. So that's about Bactopia. It supports Illumina and Nanopore reads. These can be from your, your local machine or from publicly available databases such as SRA or ENA. It includes more than 145 automatic tools. There are 45 Bactopia tools which are completely separate workflows which I'll get into shortly. It's been extensively tested with more than 100 tests, tested in more than 10,000 output files. It's easily installed through Bioconda, Docker or Singularity. And it's been, I think it's, I've gone through great efforts to make sure it's well documented. Some design principles behind Bactopia. One, Bactopia requires all tools that are included in it to be available from Bioconda. One of the main reasons is because it's 2022, people shouldn't be, they shouldn't have to figure out how to install a tool now. They should be able to either use a container or some sort of a conda. It should be an easy simple process. And because Bioconda has the downstream containerization, so every recipe gets a Docker container to biocontainers and a Singularity image through Galaxy Project. So we kind of have all those tools necessary to kind of start using these immediately. I also require all modules in Bactopia tools also be available from NFCore modules. And if it's not there, we add it. And then, yeah, Bactopia should be easy to install and adaptable to the user's needs. And converting to DSL2 has made this much, much easier. There are three sides to Bactopia. And you kind of think of these as kind of like checkpoints between the three. There's Bactopia helpers, Bactopia and Bactopia tools. The Bactopia helpers kind of help you get started using Bactopia. These are kind of your pre-analysis steps. Or some commands to kind of post-analysis get information. But one is the Bactopia citations, which will print out citations for all the tools used by Bactopia. The Bactopia data sets command allows you to go and download publicly available data sets that can supplement your analysis. These include things like RefSeq and GenBank sketches, as well as PubLMST schemas and many more. The Bactopia download command will pre-build conda environments for you, pool docker containers, or download singularity images as a pre-step so that way you're not doing that while you're starting a process in next flow. The Bactopia prepare will create a file file name similar to the sample sheet that you see in many of the NFCore pipelines. This allows you to really process as many genomes as you want. And the Bactopia search, one of my favorite, it takes a query, queries ENA's API then returns a list of experiment accessions that you can then feed the Bactopia to download and start processing. The Bactopia, the main Bactopia pipeline includes all the standard steps in a bacterial genome analysis, gather samples, you see the reads, the simple genomes. You can sketch your genomes and then query it against RefSeq, GenBank, call snips, all the standard things that you would expect in a bacterial genomics pipeline. It allows Illumina Nanopore reads, SRA accessions, NCBI accessions for local assemblies if for some reason that's all you have. There are also some kind of jump off where basically the genome will start, the sample will stop being processed if there's things such as poor quality. Something that's going to likely cause downstream failure, Bactopia will do its best to catch those so that way it doesn't stop the whole pipeline. And then once everything's processed, you get it in this nice standard directory structure. And it's this directory structure that Bactopia tools take advantage of. And so Bactopia tools are essentially more workflows for more science. By looking at that standard directory structure, you can run a Bactopia tool, which can include a single tool like Cleverate or TV Profiler, and then it'll go and find the files that it needs and run everything for you. And then you can connect multiple modules together for something like a pan genome type analysis where you're running Pyrate and creating the core genome phylogeny. And the Bactopia tools because of the directory structure will find all those files that you need. And there are currently more than 45 different Bactopia tools. And because it again DSL to I've been able to kind of framework this and make it a kind of a streamlined process. So yeah, and just a few steps, you can go from raw data to investigate results. Genomes install Bactopia through condo docker singularity. If you want to include public data, you can use Bactopia search. If you want to include public available datasets which I always recommend just to supplement your analysis the Bactopia datasets command. And then you can create file file names to process thousands of genomes if you want to using back to prepare. And then you use the back to the command to process all your, your samples, want to independently, and then further analyze these with back to the tools and by the end of it. You just have a bunch of output files that you get to sift through and figure out, can we answer our question that we hopefully asked before sequencing these genomes. So most of this has been made easier and more achievable in Bactopia by adopting and of course components. And if you're kind of on the outside wondering, you know, what should should I make it in a core pipeline or should I just keep doing what I'm doing and, you know, start adopting some of their practices, or should I just go do my own thing I think you're going to advance here is like, over these next few slides you can kind of get an idea of how I am making use of numerous in of core components without actually being in a core pipeline. And honestly, I don't think it's so much about the inner core practices and components and more so about the people behind NFC or you jump on the slack group and got a question and there's, there's many many people that are willing to help out and just, they probably seen it, especially many of the error messages that you kind of come about arithmetic so I think it's a it's at minimum you should hop on the onto the slack group and you know just start participating and get an idea of all the things happening with next flow. But here are a few ways I'm using back to be a, as NFC or component or making use of NFC components in back to be a first the NFC library which is that live folder in the NFC pipelines. So back to be a has 45 different workflows that you can execute from a single entry point so there's a primer that says, you know I want to run the pan genome back to be a tool or I wouldn't run the back to be a main workflow. And those all come in through the same main file. And to achieve this I adopted the NFC or library because one handles all the argument parts and it has super nice outputs. It does audit and you can set it up send emails and all that. Also by using that you kind of set yourself up to be compatible with the next flow tower, which is quite nice but I wanted to be able to programatically import config and parent Jason files so and on the back to be a side I have a dynamic import that looks at a workflow config and determines based on that which files it needs to import. And then. Yeah, so that way I can basically run 45 plus different workflows from the same. Which is quite nice, because previously it would have been 45 different main workflows that that I was maintaining. When I converted DSO to I was suggested, it was suggested to me that hey you should consider making use of it of core modules and I had previously participated in some of the hackathons and was quite fond of it of core modules. And so yeah it was super easy to say okay if I'm going to include as a back to be a tool it should also be on and of core modules and on the back to be a side I do some slight modifications, and these slides will be available later but there's links, kind of compare the two there's many links and any slides. And so, some of those modifications are mostly just adapting to use pre built condo environments and just kind of the way I important export and files. I also adopted a similar pie test framework for back to be a that is implemented in core modules so this allows me to test every step and back to be a back to be a tools. And this is saved my butt quite a bit when it comes to submit a new release typically. It's the condo side that kind of something has changed with the, the packet solver and so. I use self hosted GitHub action runner, and those modules like gtdb which use large databases, those are actually being tested with a real database on my self hosted GitHub action runner so there's that side effect that gets we're kind of validating indirectly the NF core module. Finally, I use the metadata YAML template for documentation. When I first saw that better yet YAML was like oh that would be kind of nice to just kind of build documentation from. And so I add stuff like citation some work down tables and output trees, and then the YAMLs are been used to build the documentation use ginger to templates. And so this is really saved me a lot of time by allowing me to write documentation while building back to. So what's next for back to be a. I am always waiting to see what's next for NF core and kind of in the background saying hey, should I do this or not. I'm going into multi qc modules, because back to be a need some sort of report generation. There always be more NF core modules that I'm submitting because there's always more back to get tools I want to submit or implement. So my eyes on the, that issue on next to about the future of the config files because the way I use config files that that could have some, some downstream effects on back to good. I'm interested in making custom workflow for surveillance here back to be in the more I use rich click I just want to rich click everything so expect to enhance CLI here too. Yeah, don't hesitate to reach out if you think I can help you get started on your non NF core pipeline and using NF core modules. So thank you and I will take any questions folks have. Thank you very much. So I haven't able now for everyone to unmute themselves if they have questions. Otherwise we can start with the one question that is already in the chat, which is from light on. I'm just asking it seems like back to be as not in the APT repository. Could you work in on including it. I, I don't know much about including tools in the app repository. It is available from bio content so you can kind of install back to be a. There would be many components of back to be a that aren't in the app repository. So I don't know how that would work as if it would be something like you would have to add all the dependencies to the app repository and I think the, the time required for that and the learning. I don't have the bandwidth for at the moment. And I think definitely consider using the big bio content stall and then from there you can use conda Docker singularity. Okay. Thank you. Are there any more questions from the audience. If not. Yeah, thanks. I do have a follow up question. It's not really based on the app repository thing it's the. The workflow that in terms of the different steps that back to back to peer does I didn't get what Robert meant by the final step that talked about something about the analysis I was wondering what are the specific things like what's the actual, what are you measuring at the end of the day, you've got to be specifically in terms of, you know, the omics analysis now. So it's going to include pretty much all your standard back to genomics so you're going to QC the reads. How well did you sequence your sample. What's the average reading all that fun stuff that it's going to kind of characterize your, your sample. What mst schema doesn't have certain antibiotic resistance genes that you may be interested in. Does it have snips and indels against a reference genome that you selected. Does it compare to public available genomes does it, does it look like is it is it what you expected the sequence so if you thought you sequence that staff or is in it. You know it kind of came up as looking more like enter caucus like that's something that those are the type of analysis results and on the back to be a documentation. There's kind of an overview of the workflow at each step what's happening. And then output overview is on all the output files that you get for both back to be a and all the back to be a tools and those output give you a description what's in the each of the files. Okay, thanks. So my final question is, I saw you integrated these with the Lumina reads and I believe none of our reads so what happened to pack bio. Yeah, I just don't, I haven't been exposed to pack bio data much. And in my, yeah, so far in my analysis and my studies. Yeah, we just, I think if, if I start using pack bio, then pack bio data will come in otherwise, I think I would need support from the community to kind of add that type of. Just because I don't have the opportunity to use it on a daily basis like I do a Lumina now for so yeah we need someone else to kind of help out there. Cool, thanks, good job. Thank you. Thank you. Yeah. Thank you very much. Are there any other questions. I don't see anything pop up so I would like to thank you again, Robert, everyone else. There's always the chance to ask more questions if they come later at the bite size channel on Slack, I guess you can also contact Robert directly. And this video will also be uploaded to YouTube. And I would like to thank, apart from Robert, of course, the John Zuckerberg initiative, who is funding these talks. And thank you everyone for joining in.