 Hello. Welcome back down to the ground floor. Hope you're enjoying your day so far. Need to keep to time. So quick introduction of Bernard, another senior engineer here at Bloomberg, this time from our data technologies department, helping all of our global data analysts get data onto the system as quickly and efficiently as possible. He's a big time contributor to TOX and various other packaging tools. So he's going to take an opportunity here today to talk to us about how best to package our libraries. Thank you. Thank you very much. So yeah, you heard earlier last low about how you install some packages. This talk will going to be more about how do you get to that phase where you can actually install stuff and hopefully also understand what actually happens under the hood when you actually say to people, whatever other tool you're seeing, OK, install this package. So this talk has a lot of, let's say, not the nicest things. So I brought a few nice puppies to cheer you up a little bit. So yeah, just who am I? Yeah, as I said, I work at Bloomberg. I'm a senior software engineer. I'm also the maintainer of the virtual amp tool, which makes me a member of the Python Packaging Authority and the maintainer of the TOX tool. So hopefully I should have a good picture of what I'm actually talking about next. So yeah, this is going to be, again, as similar to Lasso, about the point of view of the Python Packaging Authority, meaning that we'll not talk about these lovely technologies, such as Kanda or your favorite operating system package manager, which is the package or yum or whatever aspect that you're actually using. But in order for me to tell my story, I actually need some example project. So I brought this lovely little project. This is how, let's say, a basic Python library kind of like works nowadays. It basically has some kind of logic, whatever you want to implement it. After the logic, you write some test, because you don't know it actually does what you want before you write some test. And then you have some packaging information. This just tells you, OK, this is kind of how I want you to package my stuff. And then some additional files, which are basically just maintainers there for your CI information or your Git repository and all that kind of stuff. So this is the library. This in the question is the only thing it does, it tells you lovely quotes from the point of view of the lovely pox. Like in this question, it tells you that an elected pox knows how to make the best of whatever he has to work with. So let's try to make the best of whatever we have to work with in the Python packaging. So the question is how do we make, if I write a project, how do I make it available for someone else running on a totally different machine on the other side of the earth? Now the way that we do is basically, when I say make it available, I want them to type in this import box, like in this case, the library. And I want them to be able to do the same thing I'm able to do on my machine. And in this case, it's basically just calling that quote generator, OK? So whenever someone actually type in this import box, if I actually look at the representation of this box module, you see that it actually is a representation of a file on the file system somewhere. Now this file system is something that's called about the site packages. Python, it sure doesn't really know if a package is available or not. The way it actually finds out, it actually tries to do it. It's kind of like a mantra of the Python, try first, ask forgiveness later. It just tries to import it. And if it manages to import it, it will give it back to you that package. Now the way it does this, it actually has a few package loaders registered. And this package loaders has some configurations on it. And one of these configuration is the syspad. And the syspad basically just tells you a location where the file system loader will actually try to look for existing packages or not. So going from this, the way we actually can, what it means for us to make it available to another developer or another machine, is basically we have to put it to that other machine site packages somehow. And this is basically the goal of the Python packaging. How do we take something from my developer source tree, package it up, ship it up to some cloud from where some other people can actually pull down the package and install it into their site packages folder. So you can see a few tools, which you probably use it if you're doing this. Like the setup tools is how we usually package stuff. Then we use the trying to actually upload to the PyPI, which is like a cheese shop inside Python. And then we use PIP mostly both to download and install into the site packages and discover these kind of additional packages that the user might want to use. Now, so if you actually look into the site packages folder, you will see that inside the site packages for us, we have two things. One of the things that we have is the actual package files. This is like my business logic with some additional compiled file generated there. But more importantly, we also have some additional metadata. So when we talk about we want to install a package, we want these two things. We want to pull in the business logic, and we want to pull in also the additional metadata into that folder in the site packages folder. So the question is, how do we generate this? Now, the way we have two options here. One of the options is we make available our library as a source distribution, or we make it available as a real distribution. And there are some differences between these. So what is a source distribution? So a source distribution basically is whatever you have in your working directory, minus the project management file, and maintain your files. These are like the CI files, your actual Git repository, that kind of thing. And still, as all the business logic, the packaging and the test. Now, basically, if you take in this project, you just remove this special information, but kept everything else, which allows the user to both still actually do our packaging. And the reason why we should always also package test is that whenever the user does that packaging, he needs something to validate that the packaging succeeded. That's why you should always include your test and your source distribution. It's a validation for the people on the other site to actually then try to do the more installation packaging. So a wheel is basically, you no longer care about the test. The only thing you actually care is about, basically, your binary files, your business logic. A wheel, basically, is one-on-one, taking just this information, like the actual business logic, and in the files, it's actually whatever actually gets put in its present there inside the system side package. Now, if you look further ahead, how do we actually ship or generate a source distribution? The way we actually do this is basically like this. We first start with developer source tree. We collect some files from the file system. We package it up inside, for example, like liquid in a file. We package it up, pass it to the PyPI, and then the other side, how the installation actually now has to work, is that the source distribution gets pulled down. You extract it to a folder. You generate this metadata. The metadata is also like, for example, we saw in the wheel case, you had this additional information about the package, plus pull basically the business logic and put it, copy it over to the user machine side package folder. So this is what we want to end up in the end of the thing, the metadata and the other file. Now, how do you actually ship a wheel compared to this? In case of the wheel, the only difference is that now all these building operations generating this metadata and selecting the business logic happens on the user machine. The user machine, meaning that here, on the developer's soul tree, I directly generate exactly that information that's what I want. So at this point, whenever I get to the user machine, I literally have to wear it. Basically, I just have to extract whatever I get in that wheel and just copy it over to the side package of the user. Now, you can already see that this means that the operation that we perform in this case on the user machine is very simple. It's a lot less that can go wrong. And that's why this is kind of like the preferred way, how you should do it. But the R case is when this is not going to be possible and we'll see later. So how do we generate the wheel? As I said, just generate the metadata, generate the PIC files. And this is the reason, this fourth bullet point is the reason why it might not be always possible. If you actually have C extensions, in this case, you actually have to know the target platform to be able to generate the binary files. And in this case, you might not have access to all the platforms that you want to ship to your users. So if you are having a C extension, you always most likely will try to ship a few wheels, which is the most important main platforms, and leave the solution from there as anyone else who doesn't have your main platforms to actually fall back and generate their own installation packages. Now, this is where the Python Packaging Authority and the Kanda differs. Kanda says, PIP says that we are not responsible for actually providing you the C++ tool chains, at least up to date. Maybe this will change in the future. What Kanda actually takes up in itself to actually also, if this build operation is like a compiler, actually provide that compiler, or actually provide even more dependencies, libraries, header files, all that stuff pulling. Basically, the approach at the moment of the Python Packaging Authority is that it's up to the user to provide it, OK? So now there's an initial constraint. For example, you actually have to generate wheels in this format, which basically specifies that from the file name, an installer should already know if that is compatible with my platform or not. This is just a technicality detail for using the life for the packages. So let's see what you actually need to make this actually. This operation happened correctly on the user machine. You see that in order to select and copy files, you require, if you use setup tools for your packaging, you require the correct version of the setup tools. If you're to generate the metadata, you need the wheels package, which is on the PyPi. This is what actually knows what format to generate this metadata files that you have to put in the side packages. It also knows what kind of Python you're targeting, because, for example, it might need to know how to generate any kind of PyCi files. And also, you have the other information, like you need to know the architecture you actually have CC++ files that you actually compile against it. But more importantly, you require that, in any case, in order for this operation to succeed, you need to provide correct setup tools and wheels version. Now, the way historically work, what actually, and the reason why this is needed, is, for example, that if you're using a feature of setup tools that was added in a new version, if you try to run it with an older version, basically it's undefined what's going to happen, because you're using new features which are not available there. So it might work. It might not work. If it doesn't work, it actually works. And it doesn't work, because then you're not sure it actually succeeded or just happened to succeed, and you have latent boxing to your software. So it's important that whenever you actually perform this installation, you generate your disbuild exactly with the right versions of both your packaging ecosystem, meaning, in this case, setup tools or wheels that we're using in this example project. OK, so short history packages have the actual Python packaging ecosystem developed. So it all started basically back in 2000 when it was enough projects started to get shared around that people were in the core system that maybe we should have some kind of way of defining how we actually package stuff. And when you're just starting out and you don't know how you actually and what you actually need, what's the best way to win, you just basically, it seems a great idea that for now we're not standardized what we actually can do. We'll just give you basically a script file where you can specify what you want to do. And this is how this tool started out. They introduced the setup by, here they provided a core framework of what packaging is, but you could write anything in the setup.file. This means that you could easily iterate, extend that base iteration, and adapt it to what exactly you needed. Now, setup tools, the distilled was kind of like baked into the C Python, which means that it had a longer turnaround, like if you needed a new feature, it took a while to get it out. But in 2004, setup tools scaled around and add a lot of sugar syntax on top of these two tools and made it just a lot nicer to use. It could quickly be adopted. And it became so much the fact of standard that PIP kind of like started assuming that it was already there. And that's why if you go to the second workflow and you have some packaging issues, you're gonna see, so the first step is to upgrade your setup tools if something doesn't work. But in 2008, we had PIP, which is separated in install, then we had in 2014 wheels. This is when the first approach was to solve this problem of not having the correct setup tools versus, maybe let's not require the setup tools on the user machine. We just build exactly what was needed there. But even as I said, this doesn't necessarily solve the problem because maybe you have C extension, that kind of stuff is required still to rebuild. In that case, the user still kind of like assumed that you know what are your build dependencies or the depend versions of your builder tools. Okay, now one thing is that setup tools and we'll still use a setup.py, which basically allows you to run arbitrary code, which is great for flexibility. It's horrible to actually use ability because whenever people who are less experienced come into the ecosystem and they start writing random Python coding into the setup file, it's basically the great recipe to actually mess up things and people start complaining it's not good. So Fleet in 2015 was initially introduced. It's kind of like the first attempt to have a declarative or dynamic build system. And it uses the pyproject.com to actually define what are your build dependencies. And this kind of like seemed to work. So this is where we actually, because it was easier to understand and it was also harder to get wrong, we try to see how we can move more into this direction, how we can allow the ecosystem to actually have builder tools which are nicer interface than what setup tools usually has. Okay, so how does the build work? Well, the build in the previous scene where I say the actual DTPK details is that you basically call the setup by SDIS and then you are uploaded by using the setup by upload which with the bracket deprecated concerned out there is no safe way to actually make this work on HTTPS. So instead we created a separate tool called Twine which is kind of like guaranteed to do it over a safe network protocol. And whenever you basically do a PIP install, what actually happens in the background that PIP goes to the PyPI, discovers that package, pulls it down and after it does this track it basically calls the setup by install command which basically then uses setup tools to actually generate and copy over the files into that target setup site packages folder. Okay, so now the thing is in this case, what happens if the user doesn't have the correct version of setup tools over wheels? What actually happens is we basically end up with these cryptic errors that the installer starts complaining that I don't have this no model name C Python build. You can see this is kind of like error message targeted at the programmer who actually knows what is an import and how what is using. This definitely is not a message for the front end user to figure out what is wrong. So, how we can solve this one? So the way we can actually solve this problem and the way we decided, okay, maybe let's have a declarative way to define our build environment dependencies. Instead of just assuming that the user machine knows what kind of setup tools or builds we actually need or other kind of build dependencies, we should allow them to ship them, tell them exactly, this is what this package needs to be able to build. Okay, so what happens in this case? What would happen in this case virtually? We can create a virtual environment. If the user machine knows our dependencies, we can actually exactly insert that build dependency into that environment and then we basically call this generation of the wheel by using this isolated build environment. And this allows us that we always can guarantee that on the user site machine we are exactly wrong with the same versions as we are on, as we are required in the production. Now, this is what PEP518 is about. It's about allowing you to specify exactly what kind of packages you want for the builds to perform correctly. And it's used as the pyproject.tomk and tom mainly because we did not have any other better alternatives and yes, yam was not considered a better alternative, neither in is. So, moving on from this, basically at this point you see that now we have actually have two jobs we have to do. One of the jobs, like in this side, you can see is that now we, instead of just actually running the build operation and copying, we have two. One is we actually first have to generate this isolated build environment and then we have to generate the actual wheel and then install it. Now, the question is who will actually do this kind of isolated build generation and the decision has been made that this should not be the actual build account and the build back end in this case is like set up to filter proprietary. Instead, PIP will be the one responsible who actually do this operation and PIP is actually considered the build front end. Meaning it will not do any build operations but it will provision the environment for the back end to do its job and it will guarantee that the dependency declaration specified for that back end are met when it's called. Okay? So, yay, we have one. We no longer should have this awkward error is that the user will say that hey, the installation didn't actually sit on my machine. But there's another problem here. If you see in our operation, we're still calling basically set up dot pi, okay? So this meant like in case of fleet, the way fleet actually managed to work is that they had to basically derive everything back to the set up pi and the set up tools ecosystem because that was the only good thing that PIP actually understands to call. And we have now another standard which basically is pet five on 17 and this defines, okay, let's throw away the set up pi legacy which is kind of like there from all the old ages which allows the user to run arbitrary code but instead we should be able to just generate the wheels by having a proper API and this is the API we came up. It basically you can actually in the same pi project.com file, you can actually specify this is my back end and this is its builder API endpoint which we can actually use. So you can see in this case now we actually have some programmatic API and the other advantages of this is that now we can actually have really nice way of calling it from applications which hopefully means that the front end has a better communication protocol to start with the back end rather than using basically textual. Now we actually have more Pythonic way of speaking with things. So this various backend started adopting this kind of new way of interfaces and set up to provide it from 40.8. Fleet provides it basically at very early ages and Poetry is another builder that provides it for a long time already now. And we have also front end supports like PIP 19 already comes with PEP 5.7. So if you have PIP 19 on your system, you literally can use any of these back ends you no longer need to use set up tools and your builds are guaranteed that never will be having a bad build dependencies unless you specify this wrongly. So and talks is another tool or like a front end which actually does this. There are a few caveats though. So one thing is our PEP 5.7.8 or PEP 5.1.8. So the two PEPs did not really address editable installs. This was kind of like when we initially the PEP was designed was like, this draws a lot of non agreement. Let's just drop the subject and let's not focus on this. This is something that hopefully we'll manage to address in the next year. Meaning that you can actually use editable installs too in the future. It requires a fairly new PIP which may be a problem where not the PIP problem for you. It requires, happily it only requires a new PIP only on developer machine if you actually distribute as a wheel but if you actually have a distributed distribution it requires a new PIP only on the user side machine. Okay, so there is a PIP build command plus because the question is, okay so how do I now build my wheel sensor distributions to upload to PIP? There is a PIP build command hopped slash planned if anyone is willing to implement it. But in the meantime you actually can use the PEP 5.1.7 library directly. This is basically something that you can just PIP install and this actually has an interface to generate binaries or generate sort distributions with this single command and this is what you should prefer using going ahead. Okay, now the benefit that we have reproducible decorative builds no more need for setup pie. Well, at least at once we have the double installs ironed out and it should be a lot less simpler ones. Now there's one caveat for Tux Tux at the moment if you're using it in your CI it doesn't actually enable it by default. This is mostly for back for compatibility reasons so you should actually have to specify by using this lovely flag to actually you want this isolated build environment and more importantly, going ahead we have these two PEPs 5.1.8 and 5.1.7 which ensures that you no longer have all setup we no longer need to use the set all legacy set of that pie and we also ensure that we have declarable represent environment and this should make us a happier hopefully every one of us and in future if you want to build wheels or source distributions please check out this lovely back end. You don't have to use setup tools setup tools is very powerful but it's not the only option you have flit and poetry it may be more simple and much harder to get it wrong and consider setup tools especially in this set of the pie format more like something that only needed when you have something complicated something advanced but not for simple use cases and use the lovely PEP 5.1.70 library to actually build your source distributions and wheels this should mean that you have a lot less errors even if you change machines it will automatically pull in all the correct dependencies for the build to succeed correctly. So if my talk was anywhere you got lost I have this blog article which I wrote which basically is playing this in much more long worded and much more in detail and yeah, I'll take some questions now. Thank you Bernard we'll make sure that gets onto the attendees chat on Slack if anyone needs an invite to Slack let me know have we got any questions for Bernard before we break for lunch? Oh right in the corners you just want to see me run the lunch going to be much better. Yeah thank you for the talk I had one question is there any layer of security or do we all pip install packages at our own risk? Layer of security and I've seen... As in between us uploading packages to PIPI and then pulling them back down. There is a security later on the PIPI itself like PIPI does some inspections you don't should expect like actual checking what gets imported what gets run now the Python 3.8 does have a security API which means now you can actually inspect whatever is running on your application so kind of like you can do some evaluations on it so that's something that you should look on if you actually want to check what running that library would do to you but otherwise there's only or it's like PIPI does have like if they detect anything is done in something horrible they'll remove the package but that's more like if you have to report it and flag it as being a bad package. Okay, I think we've got time for one more question over here. Yeah, thanks for your talk. Just in terms of do you have any insights to share any tips kind of running this system let's say in production or in like a practical point of view would you have an internal PIP server or Git sub module what are kind of the best ways to get it from to distribute it let's say let's rather than distribute it to the world it may be a controlled manner for organizations or corporates that kind of thing. I think we're running it from like pulling directly the Git models I would not say it's horrible it's kind of like works the problem is you don't have a way to like easily change it and you like pulling it the entire version control that you don't need it. I generally definitely recommend having some kind of PIPI server run it locally on your company's machine something that already take care of a lot more features for example it automatically allows you to cache whatever you're downloading that kind of stuff because if you're pulling from Git every time it's gonna be even the pulling down it's gonna take a lot of operations while if using the PIPI server and it's protocol it allows for various improvements over its communication with PIP. Cool, thank you very much Bernard. Thank you very much.