 Hello, everyone, and welcome to this week's Bite Size talk. I'm very happy to have Phil here, who is talking today about converting Python scripts into packages for PyPy, Bioconda, and Biocontainers. So it's your stage, Phil. Thank you. Hi, everybody. Thank you for joining me today. We're going to have a little bit of fun together, hopefully. Today's talk was inspired by a conversation that's come up a few times within NFCore, which is when people have got scripts within a pipeline, so typically within a bin directory, or could be within the exact shell block of a process. And instead of bundling that script with the pipeline, we instead prefer to package that script or set of scripts as a standalone software package instead. There are a few different reasons why we like to do this. Firstly, it makes the package and the analysis scripts available to anyone to use, even if they're not using NextFlow and not using this pipeline. So that's kind of the greater good of a community, more usability, more visibility. It can sometimes help with licensing because we're no longer bundling and modifying code under potentially a different license within the NFCore repo. So the NFCore repo can be MIT and can just call this external tool. And it also helps with software packaging, as Van mentioned. So for free, then we get a Docker image, Singularity image, a condo package with all of the different requirements that you might need. So you don't need to spend a long time thinking about all the different, setting up custom Docker images and all this kind of stuff. You just package your own scripts as its own standalone tool and you get all of that stuff for free. So much better and all the maintenance can kind of sit alongside the pipeline rather than integrated into the pipeline. So it's a nice thing to do. And for me, the main reason is that first one, which is that it makes the tool more usable for anyone not necessarily tied to running within NextFlow, which I think is great because it's nice to use tools on a small scale and then scale up to using a full-size pipeline when you need it. So I've told people in the past that this is easy. Which it is, if you've done it lots of times before, but I thought it's probably time to put my money where my mouth is and actually show the process and hopefully convince you too that it isn't so bad. Now, a few things to note before I kick off. Firstly, I'm gonna live code this. I haven't run through it earlier. So I've got a finished example on my side, which you can't see, which I will copy and paste from occasionally. And hopefully refer to if everything really goes wrong. But in the words of SpaceX, excitement is guaranteed because something will blow up at some point. So join me on that. Secondly, there are many, many ways to do this. My way is not necessarily the same as what I'm gonna show. And there are better ways to do things and probably recommendations that you should listen to from other people are much better than mine. My aim today is to try and show you the easiest way to go from Python scripts to something on Viaconda. And I wanna try and make that big and friendly and kind of as bite-size as possible. So let's start by sharing my screen up here and we will kick off. Spotlight my screen for everybody. So hopefully you can still see my face. And yeah, so to start off with a famous XKCD comic about Python environments which are famously sort of complicated packaging environment. So we're going into something which is known for being difficult and varied, but that's fine. We're gonna keep it as simple as possible and you don't need to worry about all this stuff. So I've got a little toy Python script here. It doesn't do very much. It just makes a plot and I wanted some kind of input. So it takes a text file here to do that now called title.text with some text in it. It reads that file in, sets it as a variable and sets the plot title to whatever it found and then it saves it. So this is our starting point. I can try and run this now. So if I do Python analysis.py, there we go. We've got our plot and my nice plot. So it works, first step. So this is where I'm assuming I'm starting off is you have a Python script, which works. So we have a few objectives to do to take the script into a standalone Python package. Firstly, we want to, as far as possible, make things kind of optional and variable. So instead of having a fixed file name with a string like this, we want a better way to pass this information in to the tool. So we wanna build the command line tool. We wanna make it available ideally anywhere on the commands, the command line on the path. So make it into a proper command line tool rather than a script, which you have to call using Python. So we can call it my analysis tool or whatever and run that wherever. And once we've done all that stuff, we wanna package it up using Python packaging so that we have everything we need to push this package onto the Python package index. And we're gonna focus on that. Once we've got this as a tool on PyPyne where anyone can install it, then the steps from PyPy to Condor is fairly easy. And once it's on Condor, you get by containers for free, which is the Docker image and the Singularity image. So really our destination for today is just Python packaging, just the PyPyne. There's another talk fairly old by now, but it's still totally valid by Alex Peltzer on NFCore Bite Size, which takes you from that BioCondor packaging steps. So you can kind of follow on this talk with that one. Right, hopefully that makes sense. So first steps first, let's try and make this into a command line tool. Now there are a bunch of different ways to do this. Probably the classic Python library to do command line parsing is called arg pass, which many of you may be familiar with. Personally, I've tended to use another package called click. And more recently, I'm tending to use a package called typer, which is actually based on click. So if I just use the right browser, this is URL, typo.entangler. Gosh, it's quite big. On a bigger screen, it looks, I just make my window bigger just for a second. So not reading anything here, but just seeing what the website really looks like. It's got a really good website. It explains a lot about how to use it. And you can kind of click through the tutorial here and it tells you about everything, that what's happening, why it works and the way it does and how to build something. So we can start off with this, the simplest example. And we're gonna say import typer here. So go up to the top, import, oops. Like wrap our code in a function name. Ah, why? I can't copy from the VS code browser, apparently indent all of this code. And then I'm gonna copy in that last bit, which was there my window and down at the bottom. So what's happening here? I'm importing a Python library called typer, which is what we're using for the command line tool. I've put everything into a function, which is just called def main. And then at the bottom, I've said, if name equals main, so this is telling Python, if this script is run directly, use typer to run this function. If I save that, now I can do Python analysis and nothing will happen. It should just work exactly the same. But I can do Python analysis help. And you can see we're starting to get a command line tool coming here. Right, next up, let's get rid of this file. We don't really care about it being in a file, that was just a convenience. So I'm gonna say, let's instead pass the title as a command line option. And with type, we just do that by adding a function argument to this function. And I can get rid of this bit completely. And to prove it, I'll delete that file as well. So let's try again. Do Python analysis minus minus help. And sure enough, now we have some help text saying, hey, there's a expecting a title, which is text and we have no default. And if I try and run it without any arguments, it will give me a nice error message. And now if I say, hello, there, it's passed that into there. And right, our plot now has a different typing. Right, so that is our first step complete. We have a rudimentary command line interface. And we've got rid of that file and we now got command line options, which makes it a much more usable, flexible tool. And that was not a lot of code, I think you'll agree with me. So you can do with typing, you can do many more things. You can obviously add lots more arguments here. You can say it should be an integer or a Boolean and they'll craft the command line for you. You can use options instead of arguments, so minus, minus, whatever, you can set defaults. You can write help text, loads of stuff like that. So as your tool becomes more advanced, maybe you dig into the type of documentation a little bit and learn about how to do that. But that's beyond the scope of today's talk. Okay. Okay. Next up, let's think about how to make this into an installable package and something we can run on the command line anywhere. Those two things kind of go together. Now, if someone else comes and wants to run this package, they're gonna need to be able to import these same Python packages. So I'm gonna start off by making a new file called requirements.txt. And I'm gonna take these package names there and gest troops, pop them in there. We'll come back and use that in a minute. And in the short term, if someone wanted to, they could now do pip install minus our requirements.txt and that would install all the requirements for this tool. What else am I gonna do? I'm gonna start moving stuff into some subdirectories. By convention, I'm gonna put it into a directory called source, but it doesn't really matter. You can call it whatever you want and I'm gonna call it myTool. And I'm gonna move that Python file up into that directory there. Also gonna create a new file called underscore underscore or dunder init underscore underscore dot pi. This is a weird-looking file name and it's a special case. By doing this in Python, it tells the Python packaging system that this folders directory behaves as a Python module, which is what we want to install later. So I can add a doc string at the top saying my amazing tool. And I'm actually gonna not put anything in here for now apart from a single variable, which I'd put here by convention, but really you can do whatever you want. And I'm gonna call it again, use dunder, so double underscore version, double underscore, and we'll say minus, you know, semantic version of 0.0.0.1 dev. And again, we'll come back and use this variable a bit later, but for now it doesn't do anything. What else? Let's type an example slightly more complicated. We're gonna now actually create a type of app like this. We're gonna get rid of this bit at the bottom because we don't actually need that anymore. If we're not gonna be running out as a script, we're not gonna be calling out Python file directly. Get rid of that. And we're gonna now use a Python decorator called app command here to tell the type of it. This is a command now to be used within the command line interface. This is a normal secondary set. First very simple example, it's so simple, but you almost never use that with Typer. This is basically what you always do. And then you can have multiple functions here, decorated with command, and you can have multiple sub commands within your CLI using that way and groups of sub commands and all kinds of things. So with NF Core, where you have groups of commands you do NF Core module updates, for example, and those are separate kind of sub commands. So that's how you do it here. But for now, this would work in exactly the same way as the example I showed you a second ago. Okay, I'm gonna add, because this is gonna be a Python package, it's really important to tell everybody about how to use it. So I'm gonna write, create a new license file, a fan of MIT. So I'm gonna make it the MIT license and just paste in the text there that I've grabbed off the web. And I'm gonna make a read me file. Because this is gonna turn up on GitHub. So we want people to know about what the tool is and how to use it when they see the repo. Okay, hopefully you're with me. That's all the kind of simple stuff. Now we get onto a slightly more complicated bit about how to take this and make it installable. And this is one of the bits where it gets very variable about how you can do it. Typically within Python, you can use a range of different kind of installable Python packages to do your Python packaging. It's quite meta. There's a very old one called diskutils, which you shouldn't use. There's one called setup tools, which is most common. And that's what I'm gonna use today. Other people like packaging set up such as one popular one called poetry. There are quite a lot of them. So if you have preference, great, go for it. And maybe in a discussion afterwards, people can suggest their favorites. But for now, I'm gonna stick with setup tools and I'm gonna say setup.py, which again, this gets a bit confusing, but you don't necessarily need and setup.cfg. I should dump in here. You don't need to remember how to do this. I don't remember how to do this. I don't think anyone really remembers how to do this. If I do some browsing, type in setup tools.pyk.io, you can see there's quite good docs on this website for setup tools, and they tell you how to do everything. They talk through, it's quite easy to read actually, and they also talk through all the different options of how to build this stuff. You can do it with what's called a pyproject.tuml files, which is probably what I'll start doing soon when it becomes slightly more standard. There's a setup.cfg file, which is what I'm gonna do now, and there's also some documentation about the old school way of doing it, which is setup.py. For now, the setup.py file is just for backwards compatibility, and I'm gonna do exactly what it tells me to do here. I'm gonna say report setup tools, setup save, and I just forget about this file and never look at it again. And everything else goes into this setup.cfg file, and you can kind of work through the examples here. For now, I'm gonna cheat for the sake of time and copy in what is what I did earlier, and just walk you through what these keys are quickly. Again, I always copy this from the last project I did, but you can copy it from the web very easily. Name is important. Version is important because when you're updating a Python package, it needs to know which version number it is, and this then is using the special variable I set up here. Now, if you look where it is, it's in the Python module I made called myTool, and it's the variable number is underscore underscore version. And so here I'm saying use an attribute. I could hard code it in this file if I wanted to, but I'm using it as an attribute, and I'm using this variable, which is under myTool underscore underscore version. You could call that whenever you want, or you could just hard code it in this file. Author, description, keywords, license, license files, long descriptions, say it's marked down. That's just what shows on the PIPI website. Classifiers, which are just sort of categories. I basically always copy these without thinking. You can probably think a bit more about it if you want to. And then some slightly more interesting stuff down here. The minimum required version of Python, which might be important for you, where you put your source code. In this case, I say, look for any packages you can find, any Python modules you can find, and look in the directory called source. So if you call that something different, you put that here, and then that's looking for basically .init files like that. And then saying, we require a bunch of other Python packages here. And here I'm saying, look at this file called requirements.text. And again, if you didn't want to have that file, for whatever reason, you can also just list them in this file here as well. And then finally, console scripts. This is the bit which actually makes it into a command line tool. And here we say, I want to call my tool, my awesome tool. And I find what, when someone types that into the command line, what I want Python to do is to find the module called my tool, which we've created here with the init file. I've actually got this script called analysis here. Again, this file name could be whatever you want. And then look for a function called app. And then here, or sorry, a variable called app. Here our variable is called that. I could also put a function name and stuff here as well if I want to fit. The type here I'm going to say .app. Okay, so now Python will know what to do when I install my tool. And moment of truth, let's try and install it and see what breaks. So pip, Python package index uses pip. And I'm going to say pip install. Now I could just do full stop the current environment. So my current working directory and that will work. But I'm actually going to add minus E flag here, but editable. What that does is instead of copying all the files over to my Python installation directory, it soft links them. And that's really useful when flipping locally because I can make edits to this file, hit save and I don't have to reinstall it all every single time. So I just am always in the habit of using minus E basically pretty much all the time. And then let's see what happens. Yeah, break. Setup not found. That's cause I got the import wrong. Right, sorry. From setup tools, import a setup and then set up. So yeah, and I could have done setup like that. That should work as well. Let's try again. Great. You can see it's running through all those requirements. It's installing all the backend stuff, which is like map.lib and, and typo and stuff and it installed. So now what did I call it? My awesome tool. If I do my awesome tool, minus, minus help. Hey, it works. I got my online tool. And now I can run this wherever I am on my system. I don't have to be in this working directory anymore. Doesn't matter. If I make, do an example, make the testing, CD testing. And then if I do my awesome tool, this is a test. There we go. Now we've got that file created in there because that was my working directory. And sure enough, got a nice title. Brilliant. Okay. So we have a command line tool and it installs locally. It works. And it's got a nice command line interface. We're nearly there. The final thing then is to take this code and put it, publish it. Put it onto the Python package index. Now, again, if you start digging around on Google, you will find instructions on how to do this. And it will say, run a whole load of command line functions, run those, do this, and that will publish it. And there's like a sandbox environment where you can test first. And you have to sign up to PyPy obviously and register and create a project and everything. But what my recommendation is to keep things simple. And it is the only way I do it now is to do all of this to get help actions and automate the publication of your package. And that's all I'm gonna show you today because I can walk you through that quite easily. And it's the same logic. So if you've not used GitHub Actions before, the way it works is you create a directory called github.github, so it's a hidden directory. And a subdirectory called workflows. And then in here, I'm gonna create a new file which can be called anything. Deploy PyPy.yaml. And then I'm gonna cheat and copy because otherwise it's gonna take me a while to type all this in. But I'm gonna walk you through it. So this is yaml file that tells GitHub Actions what to run and when to run it. We have a name up here, which can be anything. And firstly, we have a trigger. And this tells GitHub, run this GitHub Action whenever this repository has a release and the event type is published. So basically whenever you create a new release on GitHub and you click publish, this workflow will run. And it'll run on a default branch. Then we have the actual, the meat of it. What is it actually doing? It's running on Ubuntu. It's checking out the source code first and setting up Python. Now I install the dependencies manually here. I'm not totally sure if this is actually required or not, but it was in the last GitHub Actions that I did. So I thought I'd do it again. First command is just upgrading PyPy itself and setting up setup tools and stuff. And then we do the PyPy install.command again just to install whatever's in the current working directory. So now on GitHub Actions, your tool is installed. And then we run this Python command with setup.py which is just calling setup tools and saying sdisk, so setup tools distribution and create a bdisk for you. Or you don't need to know what that means or why it's there. But that's just the files that the Python package index needs. So now it's built the distribution locally. And then finally we publish it to, you can see where I copied it from. We publish it to the Python package index. This is a check just to make sure if anyone has forked your repository, don't bother trying to do this because obviously it won't work. So I usually just put this in, check if your GitHub repository is called whatever and then use this Python package index action which is a GitHub action that someone else has written. And with, and here I'm using a password and this is a GitHub action secret. And this is an API token that you can get from the Python package index website when you're logged in. And that gives you, gives the GitHub actions or the credentials that needs to be able to publish the Python package for you. And that's it. If everything works well, you stick all this on GitHub, you make it all lovely, you hit release and then you'll be able to watch that workflow running and it will say workflow published. Remember to change this version when you run it more than once because if you try and publish the same package twice with the same version number on Python package index, it will fail. And as long as you bump that, then everything sort of should work and you should end up with a package on, on Python. And when you have that package, you'll be able to do name, that's, I think that's what Python package index uses. So you'll be able to do pip install right on from anywhere, anyone will be able to do that and it will just work. And that's it. And then at that point, you can pat yourself on your back, think how amazing a job you've just done is how anyone can now use your analysis tools, prepare yourself for an onslaught of bug reports to GitHub and take the next step and scaffold that pipy recipe into Bioconda and do all the last stuff. But like I say, that's in a different talk. So I'm not gonna swamp everyone by talking about that too much today. Right, hopefully that made sense to everybody. Shout out if you have any questions and I'd love to hear what workflows other people have and whether I made a mistake and you think I should do it in a different way and if your way is better. Thank you so much. It's nice to see how some of the magic actually happens in the background. So do we have any questions from the audience? I got one. Have you tried cookie cutter and to automate all of this? Yeah, when I was prepping this with like five minutes to go, I was desperately trying to find a link for a really nice project which I've seen and I've spoken to the authors and I cannot remember the name of it. But there's a few of them floating around but there's one definitely for bioinformatics where you can use a cookie cutter project and it creates scaffolds and entire Python package index project for you with all of this stuff in place and it's probably much better and quicker. But I purposely chose not to show that kind of thing today because I was thinking of going from someone who already has a script which is working through and trying to kind of explain what all the different stuff is doing. But yeah, if you're starting from scratch, I would absolutely do that. And if anyone has any good links for projects or can remember the projects I'm talking about, please post them here or in Slack. So yeah. I just dropped the link in the chat if someone doesn't know what we're talking about. So that links for cookie cutter itself, right? Which is just like a generic templating tool. Yes. There are cookie cutter projects which people have created like template repositories specifically for Python, if that makes sense. We do have another question in the chat. Someone is asking, why not pyprojects.tomo? Yeah. So this is something else I was debating on the start. So this is a bit of history here. When I started creating my first Python projects, he always used that setup.py file and he still can. And it's a bit like how nextflow config files are just a groovy script where you can do whatever you like, setup.py is the same. It's just a Python script where you can do whatever you like, which is wonderful and horrifying. So slowly over the last, like Python community moves slowly. So for the last many years, there's been a move away from that way of doing things into more standardized file types. And there are two, which are being used. There's a setup.config file, which is basically exactly the same thing, but in a structured file format. And the other one is pyproject.tomo, which is the newer and better way of doing things. Pyproject.tomo is nice because it's also a standard for many other Python tools with configs. So if you want to use black to lint your code, which you should, because black is amazing, you'll put your settings in pyproject.tomo. If you use, I don't know, my py for type hinting, or any of these kind of flake eights tools or whatever, any of these linting tools and stuff, they all stick their settings in pyproject.tomo, which is great because you have one config file for everything to do with your Python project, which is much nicer. And you can also do all of your setup tools, Python stuff in there. There are a couple of things which I found, I think I'm missing, correct me if I'm wrong. I don't think you can point it to a requirements.txt file for all requirements. And it's quite useful having that file sometimes. Maybe it doesn't matter. And it's also, I think a setup tools website says it's like in beta and it might change. So I thought I'd play it safe today and go for setup.cfg, which is new-ish, but sort of fairly safe. And, but yeah, pyproject.tomo is, if you can make it work for you, it's probably a nicer way to do it. We have some more comments. So there was a link posted to Moritz's cook-we-cutter package, which has not been tried out, at least not by the person who posted it. And it says, ironically, Flake 8 can't actually work with settings from pyproject.tomo, or at least couldn't a couple of months ago. Yeah, so cook-we-cutter, you might look familiar to anyone who's who's the NFCore template. We used to use cook-we-cutter for NFCore back in the early days and still use the underlying framework, which is called Ginger. So that's where this double squiggly brackets comes from. It's a templating system. And you can see, and here you've got all these different settings, therefore with license options and the name and stuff. And then these will go into all these double bracket things. So the idea is you do cook-we-cutter run or cook-we-cutter, I can't remember what the command is now, build, then you give it this GitHub URL and it will ask you a few questions, which will just replace these defaults here. And then it will basically generate this package here, but with all the template placeholders filled in. Great, do we have any more questions? It doesn't seem so. So thank you very much for this great talk. On the Slack channel. Before we wrap this up entirely, I also have something to mention. So next week's Bite Size talk is going to be one hour late. I will also post this again in the Bite Size channel. And very interestingly, there will be a talk from people that were part of the mentorship program. So the deadline for the mentorship program just got extended. So it's actually for anyone who is still questioning if they should join or not, this is your chance to actually listening to people who have been part of it and they give some impressions. So with this, I would like to thank Phil again. I would like to thank everyone who listened again. And of course, as usual, I would like to thank the John Zuckerberg Initiative for funding our talks. And have a great week, everyone.