 Thank you. This is my first talk to your Python. So I probably got a few things wrong, like the title or the abstract. So did you guys like packaging? Did you just get a feeling? I like it, but do you like fixing it? So actually, I want to give a different talk. I want to give a talk about debugging because it's more dramatic. You have a bug, you know, and you spend three days, you know, and it's only your fault. But when you have a problem with the packaging, you always blame the guy who invented the setup tools or these two tools. There's always someone else. You know, it's not that dramatic, but it turns out it's more interesting. So the reason I'm giving this talk is because I did quite a bunch of packages. So I did some of my own. I adopted quite a few of abandoned packages, so I maintained them. For example, this slide system, I forked it. I also worked on a big-ish project, so it has lots of dependencies. Also worked on a re-write of virtual AMP, but it's not merged yet. But you can try it if you have problems with virtual AMP from Windows. Probably fixes those. But first, basically, this talk is going to be something around set up by, so if you get angry about that, then it's going to be a problem. I'm going to go through the content. At the beginning, I'm going to have some introductory stuff. I'm going to go fast through that. And then if you have questions, just wave, yell, or something, so ask it right away. Don't wait until the end. Okay. So the first thing about set up by, it's basically a configuration for a very complicated archiver. So basically it's an archiving system that also does compilation and weird things like that. And it has lots of options. So there's quite a lot of options from these two tools. Then there's set up tools, and set up tools adds a bunch more options on top of that. Set up tools is quite an improvement, so you should use it because nowadays everyone uses it. I mean, even PIC depends on it. So don't do raw distitutes because you're going to have a bad time. So I'm going to talk about the part of making the package, not installing it, making the package, that's the packaging part. Now there's a bit of always, most always there's a bit of a confusion with the terminology here. There are packages and there's distributions. Packages, originally packages were the, I mean, the name packages means the that directory that you import and has a init file in it. So that's the package. The distribution is the archive. Now in the packaging guide, there's a packaging guide. They kind of switch around the name. They kind of switch around the name. So it's, now they call it distribution package. So people that go through the guide and search for package, they find distribution package. So then they go down. Okay, it's distribution package. They are different kinds. I'm not really, I don't really like the terminology because I mean, doesn't really sound well in my head. I mean, packaging, you package a package and the result is a package. So we got two main categories of distributions. There's the source distribution, the S-dist, and there's the binary or build distributions which is the B-dist which normally you don't use. And there's the B-dist wheel, the egg. Nowadays you should only use the wheel because it's kind of, that's the way the tools are moving forward. And of course they have different rules. They look kind of similar, but they have like some subtleties there. So it's, you always, you get confused by that and it's kind of weird to figure out what to use because you have so many options. And about the file gathering, it's about the options. So for the source distribution, there are some files that are collected by default. There's the readme, the set up by, and the test directly with only those files. Strangely enough. And there's the, all the files that you put in the packages, in the py modules, the extensions modules, sources, but we are the headers for some reason. Everything that you have in package, data, everything you have in data files, in the scripts, and the manifest. Now the manifest is the more interesting one. We're going to talk about that a bit later. And there's the B-dist. Now the B-dist doesn't use the, the main difference is that the B-dist doesn't use the manifest by default unless you use set up tools. And you use this option, that set up tools added, it's include package data. So in order to simplify, you kind of use this option. Okay, about the packages. So suppose we have this structure here. So the packages for, if we have a directory with this, with that structure in it, we're going to have four packages, which is fine. And when you write your set up, set up by script, then you shouldn't hard code those packages because if someone, one day, decides that the, the, say the utils module needs to, you know, it has too much stuff in it, and then needs to be made a package, then he changes it. So, I mean, he needs to be added in the packages list. And it's quite frequent that you forget to do that. And I kind of forget, you know. For me, it happens all the time, I don't know. So that's why, and set up tools kind of, they saw this problem and they, they did this collecting function that find us all the packages from the current directory or directory that you give. Now, there are some alternatives to this. There's, if you heard of, there's PBR, which kind of does this automatically so you don't even have to, you only specify the main package and then it figures it out on its own. But PBR does quite too many things. There's even a project that wraps PBR, it's called packet. Now, there's a newer project called Flit. And this one completely replaces set up pie. It's, but it only makes wheels. So if you have binary stuff, like C extensions, then it won't work for you. So, and this, there's Bento, which is an older thing. Okay. Now, back to the manifest. The manifest, the manifest is quite easy to get wrong. It has lots of commands. It has the include, exclude, inclusive, include, because if you exclude, globe, okay, gets old. And the problem I had, most of the time, is with two fine grained manifest. You specify everything down to the last extension, right? So you kind of hard code in your manifest, everything that you have in the file system, which is kind of wrong, especially if you look at the file system as a database, right? So you wouldn't hard code the IDs you have in the database in your code, right? So why would you hard code the file names or extensions for that matter? So from what I found, most projects can work just fine with the graph and the graph just takes the whole directory and collects everything in it. And then there's the global exclude, which you can strip out temporary files or, you know, compiled files or anything that, like byte code files, anything that you don't want. For example, a bad manifest which is two fine grained. It specifies every extension. So for example, if one day you decide to add a web font, right, and it works in development because you already have the files there. But when you deploy, you don't longer have the file there. So you don't know it. It's too late. It's broken. People are going to get angry. So the best is just use the graph. The graph works on the root. You can also specify a path on the graph, but it needs to be a directory. Then you just exclude the byte code and all the stuff you don't want. So, I mean, some of this advice might not, I mean, you might not be able to use graphs in all the situations. But I mean, from what I've saw, I mean, it's better to have some extra files than miss the important files because if you have extra files, you just have a bigger archive. It's embarrassing, but it works. If it doesn't work, then I mean, it's kind of worse in my opinion. And the problem with the stray files, I mean, you can just run git status to see what untracked files you have. I mean, it's not that bad. And there are tools that can check your manifest. There's the check manifest. You can get it from pipy. It's very nice and you should use it. There's also a set up tools extension called set up tools SCM, which kind of generates the manifest for you from all the files that you have added in the source tree. So if it's committed, then it's going to get collected with this extension. Now there's the package data. It's another option, which kind of overlaps in features with the manifest and other things. It's for specifying the data files inside packages. Now this has the kind of the same problem. It's too grainy. I mean, it's too specific. And then when you get the specific, you make mistakes. And that's why, I mean, you should include package data in this situation because this is the problem with the bidist. If you don't use that include package data option, then you have to specify the package data, which is kind of nasty. You have to collect the files manually. You need to write code. It's better have less code and more configuration. I mean, technically it's not more configuration because the configuration is less than the code. It's not technically more configuration. So don't use it. Avoid it. Simple as that. Don't use it. Just use the manifest and include package data. And as you saw, you can also avoid writing the manifest if you can use that set up tools SCM. Okay. Another option. This is another overlap in features. Still kind of data files, but these data files go outside the packages. And they are so inconsistent. It's mind boggling. The tools don't really respect this option because maybe it's too obscure. I don't know. For example, the set up tools and they put it inside the egg. So if you use egg installs and it happens, if you run set up by install, then they go nowhere practically. Pape install and these tools kind of do the right thing. But the problem is that, I mean, if you use absolute pets, that's the problem with the absolute pets. So let's get down to that. Basically, you would use this in applications. So you would use data files for configuration if you want to stick in its script somewhere, in ETC or some creation file or a man page, things like that. There are most always bear options and I just use a specialized package like devs or RPMs. And there are plenty of tools to help you with that, especially if you have dependencies and you need to manage them. There's DH, butch lamp. There's the newer pie to dev. It's a quite new project. For windows you can use installers. There's this pie ansys. It's kind of a, it's a thing that generates the configuration for ansys. It's the now soft installation system thing. And you can even make your own custom thing. You don't need to use dev or RPM. Just make a shell script or whatever. Just set it up yourself if you have very specific needs. There's the makes self tool which generates a self-extracting shell script. So it makes a shell script, sticks the archive at the end and then unpacks automatically. So you don't have to deal with copying, unpacking. No, I mean, okay. I mean, Python packages are not for applications. Not really. An application has configuration, right? Where you put the configuration? You need services. How do you configure those services? It has a very specific needs. It needs pre-installed actions, post-installed actions. There are talks about that, but I mean, currently it's not supported. You cannot run pre-installed or post-installed actions. If you need those and for a serious application, surely you need, then you use a specialized package. I mean, it's best not to try to shoehorn. Use something that's kind of designed for that. Don't, because it works, I mean, okay. About the dependencies. This is kind of a tie-in to the previous one. If you had dependencies and you need to bundle those, and you don't want to use DH Virtual Land or buy to dev or do your manual bundling of the dependencies, you can use Pax. Pax, it does a self-extracting shell script. It does a self-extracting binary, so it makes a binary. You can just run, and it automatically creates a Virtual Land somewhere, and it installs the wheels that you specified, or you depend on, which is nice, and you could use that. Okay, another thing, another problem, is importing code inside a pipe, because it's such a critical place. If you try to import your package inside there, just to get something, like the version, you might end up in a chicken and egg problem, because you're trying to import something that you're not prepared yet to import, because you don't have the dependencies installed. So the import fails, because you don't have the dependencies, right? It's too early to have them, because you need to run this script to figure out the dependencies. So then it doesn't install at all. So don't do this. There are many ways, very many ways in which you can handle this, for example, the version. There's in the packaging guide, quite a number of them, and there's also setup tools SCM, which pulls up the version from the tags for you, so it's even less work if you can use the setup tools extension. Okay, the main module. It's not really related to setup pie, but it's kind of packages frequently have binaries, right? So they have something that you run on the command line. So in that situation, from Python 2.7, you can write a main pie file, so then you can run my package, and it will just execute the main pie file. Now the thing is that it will execute it as a script, so the name will be main, the name of the module will be main, and it won't import it as a module. It will just execute it as you would execute the script. So it becomes a problem if you try to import stuff from the main module in other parts of your package, because then it gets executed again, right? The module is not there, so it's going to execute it again. So if you define a class inside the main, and get two classes, so your instance won't work anymore, things, weird things like that. So don't put too much stuff in the main, in the main pie file, just make a separate module for that, like console command line interface or something, main, just main. And then you import that and just run the main function. Now to use this in a setup pie script, the setup pie has an option called script. Now this is a very unique thing because it's a script, and it won't work on windows unless you give the proper extension like the bat or CMD or X, which it's kind of hard to do. You have to handle each, and that, of course, won't work with wheels. So setup tools has a nice feature called entry points, and it has a built-in support for one special entry point called console scripts. You should use this over scripts in all cases. For windows it creates the nice XF wrappers, and for unixes it also fixes the shebang, so if you use a virtual lens, it's going to get the proper shebang. So you should most always use the console scripts, not the scripts. This is another feature, extras require. So they design an extra system, so you have a big package that does lots of things or can use backends to render stuff, and there are some optional backends, so for that you can use the extras require, and when you install it, you can specify the extras if you want the special thing. And that will just make the package depend on that, say report lab if you want to activate the PDF feature. There is also, this is a very flexible feature in the sense that people kind of abuse it. They use it for development and test the penises. They just stick the test the penises in an extra and then install them, and it works. It works, but I mean there are tools that can manage that very well. There's TOX, which can also manage virtual lens, so it's quite an improvement. TOX basically does three things. That's the workflow it's designed for, so you can also abuse TOX, but this is the workflow it's designed for, so installing dependencies, installing a package, and running some test commands. That's it over multiple Python versions. It's quite nice to, especially nowadays when we want to port everything to Python 3, right? I mean you kind of have to test both Python 2 and 3, because there are other ways to manage the virtual lens. I mean if TOX, you don't like it, you need more flexibility. There's Vex and Pew, just to run the virtual lens. This is like an alternative to the activation script, because the activation script kind of changes your current shell environment, and sometimes it breaks, and then you don't know what happens. It's kind of bad to have something mess your environment variables, but it's not very explicit, so there are tools that just spawn sub-shells or sub-processes, like Vex, it's with the sub-process, and Pew, it's with the sub-shell. So they are quite worthy alternatives to using activation scripts. There's also another thing called PyAMP, which is kind of popular on Macs, because it can manage the interpreters. It's not for virtual lens, it manages the whole interpreter. Okay, environment markers. There's a common pattern in your setup PyScript tool. Sometimes you don't want to have a dependency for all the Python versions, say for, you only want ArcPars for Python 2.6, which doesn't have it, but not for the others. So you want to handle this specifically. There are plenty of people that just do an if, and then just generate a different requirement list, but this doesn't work with wheels. That's the big problem here, because the wheels are just, they are something very static. They don't have a setup PyScript. They're just metadata and the files. That's it. No setup PyScript. It's too late for them. So they kind of shoehorn this environment marker feature in the extras. So you have this kind of markup language. You can have very basic expressions there, and you have a few variables available. You can consult the PEP for more details. Okay, another thing, coverage for C extensions. This one is kind of what was tricky for me to figure it out. The tricky thing, okay, so there's the C flags which you can use to activate the coverage for the C extensions. But to actually get the coverage collected, you need to build the extensions in place. So just inside your source tree. So for that, there's this in place flag, which you should use to get it working. It was tricky for me to figure it out. So it works. It's quite nice. Okay, about uploading, there's a new tool. Okay, it's not new. It's quite old. For uploading your distributions, it's an answer to setup Py upload. And it does the secure upload. So it also uploads the metadata. So if you use Trine, you're going to see the dependencies that your package has on your PyP page. If you use setup Py upload, you're not going to see them. So an interesting change regarding to uploading, it doesn't allow you to re-upload stuff. You can only delete. So you can make a post release maybe to re-upload in case you mess something up. I don't know. Also, the package names are kind of normalized. So you have package name, sorry, distribution name. Yeah, it's very hard. The underscore and the dash mean the same thing. So you can specify either. It's the same thing. Other scores, dashes, same thing. It uses HTTPS basically. It doesn't have any fancy, just HTTPS. So the setup Py upload, it's over plain text. So you're kind of vulnerable to man in the middle. Okay, about the versioning. Also, this kind of changed recently since setup tools 8. Now we have automatic normalization for versions. For a few examples here, the dash means a post release. The dash dev, it's a dev release and the alpha just gets normalized. So if you have variances like release, candidate, they get something very consistent. There's also semantic versioning, which is something that people quite like. It's not about the normalization, but there are some differences here with regard to the scheme of the versions, which probably you don't care about those two, but there's a difference so you cannot use those. You cannot use pre-releases like that because the dash means a post release and there's the build info, which is not supported at all. So you probably get an error for that. Okay, getting close to the end. The last thing, so I made a template with a cookie cutter. There's a presentation about cookie cutter tomorrow or something, which kind of has all these things baked in and some more things about testing, like default configuration for testing, measuring coverage, stuff like that. Any questions? Go ahead. Yeah, so the question is that he has a package that kind of doesn't want to have a module on Python 3 because it already has that module, so it's inside, that module is inside your package, as I understood it. Okay, he wants to exclude it on Python 2, but that won't, I mean, you can do it with the setup aspect, because you can have code there, but it won't work with the wheel, because you can get, unless you make, it won't work with the universal wheel, but you can make a wheel for Python 2 and a wheel for Python 3 and that will work, but it's extra work to release. It will work. You want to do it. Go ahead. Speed. The advantage of wheels are the main one. I can see it's speed. It doesn't have a build process on install time. With the setup, there's a build process, files are being copied around two times. After that, they're copied in your installation path. With wheels, they're just unpacked there, done. Fast. And it's also fast for the compiled stuff, of course. Well, I mean, eggs and wheels, they're kind of the same thing, but eggs are kind of phased out. They have some issues with the metadata. People don't like them for some reason. Go ahead. Okay, PBR, as I saw it, it's very tailored to a certain release workflow that OpenStack has. For me, I wouldn't really like everything PBR does. It's a nice idea, but they kind of tailored it. It's very customized for OpenStack. It's not for every project. Yes, except the release process they have. I'm not saying it's bad. I mean, you just look at what it does, and it does many things, so you should check all it does before using it. That's all I'm saying. There was some... Was everybody Google-surround when doing this kind of stuff, and finds all sorts of stuff? Okay, I mean, yeah. The problem is that, I mean, when you make an official documentation, you cannot put what you think is most important. You need to put it all, so there's lots of documentation. So you have to read it all, and it's very hard to read it all. I think just specific hints like don't use this. It's old crap, and use this is much old. I mean, it's up for discussion. They have this packaging guide. They take pull requests. Go ahead. The package or packaging. Oh, okay. Stop. Okay, we can talk about it. In the back. There's the packaging guide, packaging.python.org. That's the place. Sorry? No, it's not up to date. If it's up to date, just make a pull request. They are maintaining it. It's maintained. It's the thing you should go to. Go ahead. The local version. Just to add, there's another tool called version here. It seems to do a similar thing. It overlaps. It overlaps with set up tools at CEM, but set up tools at CEM, and it's kind of, now it's owned by the Python packaging authority, so it's kind of, it seems to be the thing to go to nowadays. There's some overlap. I'm not sure what's going on there, but set up tools at CEM seems more maintained. Go ahead. This is, okay, pinning the dependencies in the set up by script. It's a question of concrete and abstract dependencies. So, which would you put where? I mean, the abstract dependencies which don't have the version, you would put at least in set up by script. The concrete ones, most definitely, you need to have them tested, right? So, you need to have a set of known, well, working dependencies. So, you would put that in your testing requirements, docs, any something. Go ahead. There's docs, which you can use. Docs is a good alternative. I'm not sure of any other alternative to extras require. I mean, it works. It works, but I mean, it seems like you're putting stuff there. I mean, I'm not saying it's really bad. There are better tools for it, and that's all. Okay?