 So we are coming to the next session. Our next speaker is Joel Christiansen. Hi Joel. So where are you streaming from? I am streaming from the Optiverer office in Amsterdam. Okay. Nice. You will talk about dependencies, so packaging in Python. Yeah. I'm talking about dependencies and build tools and kind of an open source ideology. Okay. Great. So let's start. Yeah. Hello everyone. Yeah. Huge thanks to the organizers of EuroPython for making this happen. I've had a great time the last few days. I've already seen a bunch of awesome talks and I'm really looking forward to more. So thank you for joining me today. My name is Joel Christiansen. I'm an application engineer at Optiverer. Optiverer is a trading firm and we use Python just about everywhere, from monitoring to data analysis to operational tooling to risk controls. We use Python all over the place across 800 plus servers all across the world. And as a consequence of that, we care a lot about Python packaging. Before we jump in, you should join us in the sponsor Optiverer channel in Matrix. I'll be hanging out there after the talk and I'd love to answer any questions you might have or just a chat about Python, build tooling, type systems or anything else. So today we'll be talking about a wide variety of topics around build tooling. The idea that spawned the conversation for me was this idea of transitive dependencies and how to deal with them. Transitive dependencies are dependencies of dependencies. So for example, if you have a direct requirement on requests, then you will have a transitive dependency on URL lib3. This is just a small aspect of the larger build tooling discussion I want to have, but it's a good focal point and it's a good jumping off point. We'll talk more about these in a bit, but just for a second I'd like to talk to you about a really nice tool which is called SHIV. This is a bit off topic. I won't be talking about SHIV after this point, but I did want to give a shout out for something we use quite extensively and absolutely love. SHIV is an awesome tool developed by LinkedIn for creating single file executable binaries for Python applications. The idea is you give it something that can be pip installed, so that can be a wheel, that can be a source directory, or a source distribution, a local directory, anything like that plus a entry point that will be called. It then produces a Python zip app that calls some bootstrapping code and extracts not only your application, but all of the dependencies that were bundled with it. Then it executes the entry point you gave it in that virtual and it just extracted. Awesome tool makes deploying Python applications a joy. SHIV is actually a really good example of a lot of the principles that I'll be talking about in a second. It exists on a bit of a different level of abstraction than what I'm going to be focusing on, but it's a really great tool and I highly recommend it. There's a huge range of build tooling in Python at all different levels and I think that's really, really cool. So that out of the way, let's talk for a second about what we care about in a build tool. There are a lot of different aspects to consider when it comes to building and this list is not comprehensive and reasonable people can disagree on the ordering. This is part of my list of things that are important to me. So what do we care about first? Reproducibility. If you build an application and you make no changes and a year later you build the exact same package with the same commands, I would expect the output of those commands to be identical to the first time or failing that for the build to fail. This is to me the most important aspect because it's the one that can potentially impact production. If you get this one wrong, then there's no point in worrying about anything else. Compatibility. We want people to actually use the tools we make. This means that those tools have to be able to integrate with the outside world. In a few minutes I'll talk about a number of PEPs that relate to build processes and this is the reason why. How good a tool is doesn't really matter if nobody wants to use it. Flexibility. Both in Optiver and in the wider Python ecosystem we have applications that are built in a lot of different ways. Some repositories output SHIV applications, some are libraries, some have C++ or C or Rust or other compiled bindings, some have data files, et cetera. So long as a repo is standards compliant and has the requisite configurations, a given tool should be able to build it. Ease of use. This is kind of in the same world as compatibility which is the idea that if people don't want to use our tooling it doesn't really matter how technically good it is. This depends on a lot of factors, kind of fuzzy factors and specific ones and it depends also a lot on the domain the tool is in and who the target audience is and it's difficult to narrow this one down. And then the last thing that I wanna consider today is modularity and the idea of this is basically does this tool fit into other workflows as a part of a cohesive whole. So for example, if I switch my IDE from VIM to PyCharm or VS Code, will this application still work? So given these goals, how do we achieve them? For reproducibility, we need a very specific concept and this is the concept of a lock file. Most languages have some implementation of this idea typically tied to the common build tool of that language. In Python we are blessed for choice in this area with more than enough tools that implement the idea of a lock file. Compatibility is a broader but just as explicit area. The how here can be easily summed up as be standards compliant. Compatibility is after all one of the major goals of standardization. What standards are applicable to a given process or application is a more difficult question and that depends heavily on the specific area you're working on. Again, I'll cover a few relevant peps in just a moment that are part of what make a standard build process in Python. For flexibility we need to be capable of converting most current projects without any significant changes. Luckily this is mostly pretty easy at a high level though more complex builds get increasingly hard to convert. Wheels at least standardize this by solving the distribution problem. So all you need to do is output a wheel and then everything else is handled. Ease of use, keep it simple. We evaluated a few different external tools to try and achieve the results that we needed. And in my opinion this is the most difficult bit to get right. There aren't really any peps to help here. There aren't a ton of standards to tell you when simplicity has been achieved. There's not even an objective way to decide if something is more or less the same as something else in terms of complexity. And ease of use doesn't necessarily have a one to one relationship with complexity anyway, although they are correlated. So it's important to get other people who are not writing your tool to try it and experiment and beta test it and make sure that they can use it too. And then the last point is modularity. Again, keep your tools small and single purpose. This is essentially kind of the bash philosophy of do one thing and do it well. And that lets you chain and combine a series of single purpose tools to achieve very complex and variable goals. We're not necessarily working with pipes and standard out standard in here, but the same ideas are generally applicable. The last thing I wanna talk about before I start talking about practical things is the possible outcomes to a build process. This is a little bit out of the way, but hang with me. There are generally speaking three possible outcomes to a build. Actually four, this is a table of two variables with two values each. But the last of these, the case where the build fails but the output works, we don't really care about because it's annoying but harmless from a production standpoint. The first and second outcomes are the ones where the build results and the output health match. So that means either the build fails and the output also fails or the build passes and the output works. Either of these are totally fine from the perspective of the build itself. The third outcome is the one that is really a problem and that outcome is that the build itself passes but the output product is defective in some way. And there are a lot of ways this can be divided up but they all point to the build itself being incorrect in some way. Just a few examples of how this can look. Maybe the build is misconfigured and fails to run the test for the project. So the build passes even though the tests do not. Maybe the requirements are missing something. So the output doesn't have or doesn't specify all the dependencies it actually needs. Maybe you have a manifest.in that is accidentally excluding some mandatory data files that are present when testing but not when distributed as a wheel. The point is that a build passing is good. A build failing is good. A build passing while outputting a bad distributable is a problem. And the reason I bring this up is to kind of underscore the value of simplicity and ease of use. The problem tends to come from people misunderstanding how a build works. Maybe they have the wrong intuition for how talks test ends relate to the envelope so their pie build job is not getting run. Maybe they're missing some flag on some tool. Ideally the most correct way to call something is also the easiest. And I put a lot of emphasis on the idea that for a given project I should be able to just run pie test with no arguments, no parameters and just have it automatically run with all the right flags and against the right path to run all my tests and pass. And that can be done through a config file. And then that configuration gets put in the pieproject.toml or the pie test.ini or equivalent and committed as part of the project. Basically the easier a tool is to use correctly the less likely it is to be used incorrectly. So with all that said let's take a look at some of the new building blocks available to us to make a composable tool. These are PEPs 517, 518 and 621. So kind of the key insight here is that when you are building a Python package or an application sometimes the build itself has external dependencies. The prototypical example of this is set up tools itself. The way this is currently handled is that when you create a new virtual environment set up tools and wheel are automatically installed into that new environment for you. And this works for things that only depend on set up tools and wheel but leaves everything else out in the cold. If you wanna build something with a lock file you can do that because setup.py is just a Python script but it's not very nice to have to do that by hand in every project. So we have three parts of the solution here. The last part which is PEP 621 is the quickest to explain. PEP 621 defines a standard set of metadata keys in the pyproject.toml and that's it. That's the project section that you see in the picture here. The other two allow us to move away from set up tools if you want set up tools is also compatible with these. PEP 517 allows us to define some import path along with some hook that PIP can call to build a source distribution or a wheel. This allows us to swap out set up tools for for example, poetry or flit. That's the build back end line in the picture. PEP 518 then allows us to define some set of third party libraries that PIP will install into an isolated environment and use to build the package. And this neatly solves the chicken and egg problem of how do you define the dependencies required to build a package without also needing to import those dependencies at the same time. And that's the requires line in the picture. So these three together give us a powerful combination. We now have the ability to have build time dependencies on any library, have the build call any code that's compliant with the PEP. And we have a way to specify metadata in a way that is in theory acceptable to any build tool. Now in practice PEP 621 was finalized in November and very little supports it yet, but that will almost certainly change over time. And with these with these in mind, I wanna give some examples of kind of the values that I was talking about and how they interact with the PEPs I just described. I wanted to give some varied examples. So if I mentioned a library for one thing, I don't mean to imply that that library is bad at the rest or anything. If I don't mention your favorite library, it similarly does not mean that I'm implying that that's bad. I'll also not be looking into any of these tools in anywhere near the level of depth they deserve. Yeah, so disclaimer is given, let's take a look at FLIT. And the one other thing I wanna say is, unless I explicitly say otherwise, these are all compatible with PEPs 517 and 518, and none of them are compatible with 621 yet. So FLIT is a pretty small build tool, especially compared to, for example, poetry or PIPMF. It specializes in publishing artifacts to PIPI and not a ton else. And in my estimation, that's a good thing. It keeps the CLI small and the complexity low. FLIT has a really good command line interface, in my opinion. All of the commands are simple and there are only four commands in total. If you just look at the slide, you can already get a pretty good idea of everything FLIT does and how to make it do those things. Keeping such a tool simple is a difficult task in and of itself. And I'm sure quite some work has gone into making and keeping this tool as simple as it can be. Kind of the other side of having a small interface is that it does have less functionality. As I've said, my personal taste in build tools leads me in that direction. So I don't see that as a downside, but it is there. But for example, FLIT does not handle lock files. I don't, again, I can't emphasize this enough. I don't see this as a bad thing. Different tools excel in different situations and that is as true for build tools as it is, for example, in woodworking. You don't use the screwdriver to cut a piece of wood in half. The next one I want to look at is poetry. Poetry is kind of the other side of that coin. Poetry is a tool that aims to take care of the whole stack of dependency and packaging management, including development and publishing. And it does this quite well. Personally, I like tools that I can mix and match and put together and take apart at will. I like to manage my own virtual limbs and I like to have my own REPL. And you can do that with poetry too, but it's a little bit clunky. But one aspect of poetry is absolutely to my taste and that's the lock file. Like I said earlier, the key to reproducibility in an application is generally tied to a lock file. Poetry translates a set of top-level dependencies in the pyproject.toml into a lock file that specifies not only the exact version, but also the expected pypy repository and the requisite Python version and all that kind of metadata you expect to specify the exact version of a library. I wanted to put a picture of the lock file in the picture here. Unfortunately, there's only so much vertical space available and the lock files tend to grow extremely quickly. So we get the pyproject.toml here as well. PipTools is a bit of an odd case actually in that it exists kind of next to the other tools that I'm talking about. PipTools does not create build artifacts like wheels or source distributions. What it does is just the locking bits. So if you give it a requirements.in file, which is kind of the high-level dependencies, it will compile for you a requirements.text file which has the very specific locked dependencies. And the upside of this is that a repo that uses PipTools does not need to change any other aspect of itself. It can use whatever metadata format, whatever other build tools whatever process the author or maintainers are comfortable with and it just slots into that locking role naturally. The downside is that it is a bit of a leak. It is less of a cohesive whole than say poetry. But if that's a downside at all or an upside will vary from person to person. Like I've said a few times, I do prefer smaller tools but I think there's a bit of a sweet spot between PipTools and poetry that can kind of fit my purposes. This is the only tool I'll mention that does not support PEP 517. Though I think that's planned for the future. The last, the second to last tool I want to mention is SetupTools. SetupTools gets special mention as kind of the basis of Python packaging. Obviously the true holder of that title is DisuTills but SetupTools takes DisuTills and makes it nice to use for just normal people. So yeah, yay SetupTools. Setup.py as I mentioned before is just a Python script. This has some surface level implications which I mentioned earlier which is that you can basically do whatever you want with the Setup.py and have that be a part of the build. This does have more extensive implications. A lot of other build tools use SetupTools as well. It does have some downsides which is that you can do anything as part of the build and that's potentially a very large security issue because if you're distributing source distributions then the install procedure involves running a arbitrary Python script as whoever's installing the package. And this is kind of, this is addressed with wheels which can also be built with SetupTools. So I'm by no means saying to abandon SetupTools or anything but just stop distributing source distributions, move to wheels. And all of these features, each tool has lessons that we can take and learn from but no tool quite made us happy. So we decided to create our own. We looked at this plethora of options and took inspiration from what we consider to be the best aspects of each of them. And we also took advantage of some of the newer standards and features and we came up with Vulkan. I've tried to broadly characterize the goals of the other tools I've looked at so far and I'll do the same here. The intention of Vulkan is to make doing the right thing the easiest thing. You should never need some obscure combination of flags or commands to get the output you want and it should be difficult to do the wrong thing. That's the goal. We've just finished open sourcing this so you can use it too if you like. It's designed to be small and simple. It only has a handful of commands with a max one or two options each. It allows everything you can want to configure to be configured in the pyproject.toml. It allows you to build wheels with dependencies either from the lock file or not as you desire and it's dead easy to use. Vulkan can do basically three things. It can generate a lock file, it can build wheels and it can do editable installs for pyproject.toml projects. Because it's PEP 517 compatible, Vulkan works with PIP install on a directory Python build or any other build front-end tool. Because it's PEP 621 compatible, it's quite future resistant. I won't say future proof to avoid tempting fate, but I doubt PEP 621 will be overthrown in the near future. A Vulkan project should just slot into whatever other stack of development and deployment tooling you prefer. And because it outputs wheels and artifact created by Vulkan, whether that's Vulkan build, Poetry-M build, PIP wheel or any other command is fully compatible with the rest of the modern Python world in a completely transparent way. And if it doesn't do everything you want it to, it does support also a very basic plugin mechanism for build time plugins if you want to, for example, generate some build info. In theory, you could even use that to do things like compile C++ extensions if you want, though I've not tried that. I do hope you'll go and take a look after this talk and maybe install it and play around with it a bit. It comes with conversion script to make that easy too. If you take anything away from this talk, I do wanna emphasize that making this kind of tool is something you can do. I want people to try out Vulkan and I hope you all take a look, but if it's not for you, if it doesn't do what you want, you can make one yourself that does. That's the freedom that having these standard building blocks can offer. I can imagine a future where there's a build tool that specializes in building projects with compiled C++ extensions, for example. So for my taste in build tools and for somebody that shares that preference, Vulkan will be a good tool. For others, I hope that this can still provide you some inspiration to make your own tool that makes you happy. So yeah, go forth, make new tools. I say this in a slightly self-serving position in that I want to experiment with all the cool stuff y'all will create. Find a niche, fill it and make your tooling available for everyone else to experiment with. And remember to be standards compliant so that everyone else can use it too. So that being said, I'm open for questions or comments. I'll hang around a bit in the sponsor Optiva Room and Matrix as well, if anyone wants to chat. Thanks, everyone. Nope, nope. We do have many, many questions. Let me choose some, why the first question are actually three questions. It starts with your firm probably needs to support a wide range of applications that require different pricing dependencies. But yes, no question, I guess. Yes. And then the first part of the question is, do you let your users choose versions of key packages, pandas, tasks, or do you mandate them, they use firm approved versions only? It depends on where in the stack we're deploying this. So for example, for data analysis tools, we pretty much let people use whatever they prefer. So pandas, definitely, I've not heard of desk, but we might use it. If a application is in a bit of a more key position, if it's a bit further inside our infrastructure, then we do keep a close eye on what people are using and what versions they use. So it depends. Okay, next part of the question, how often do you upgrade package versions with the new multiple environments and why? I'm not sure, I understand the question. We tend to upgrade stuff when we want new features or if there's a security release, we upgrade things as we come to them. Okay, and how do you manage multiple pricing environments? So like I said, we use SHIV to kind of isolate these different Python applications and their dependencies. If we're talking about Python versions, then we try not to. We try to keep the number of Python versions that we have going at once, as small as possible. So one or two if we're migrating. But we try to stay consistent there. Okay, and the next question is, given that log files are luxury, not everyone has, for many reasons, does anyone have any recipe for making sure the minimum versions are correct? And then it continues with run unit tests with the minimum versions rather than latest compatible and then it even continues to be tried to use pip constraint for that, pretty much a special log file, but it's tedious hard to update and easy to do incorrectly. Yeah, so I mentioned a tool which was pip tools, which was part of my talk and that is exactly a tool for kind of automating the use of pip constraint files. Basically what pip tool just does is it takes your requirements.in, which is the constraint file and then gives you back a requirements.text, which has the specific pinned versions and then that can just integrate with whatever other process you're currently using. Okay, next question, when you say editable installs, does it mean pip 660? And I added this one, this pip 660 is one for the pip project thermal-based builds, wheel-based. Yeah, so that's the idea. So previously with setup.piprojects, you can do pip install-e. And that will install the project as an editable install, which means that if you make code changes locally, then those will be reflected automatically. We lost that when we moved to piproject.toml for some pretty good reasons, but obviously this is a really popular feature that everyone really likes. So there's this pip 660 to bring it back. I don't know if that's approved or implemented yet, but in the meantime, Vulkan kind of hacks around this and generates a setup.pip and then calls pip install-e on that and then removes it again. So it's a bit messy, but it brings the feature back, which is nice. Okay, we'll get rid of that once the pip 660 is fully done. Okay, we have time for one more question. The description of pipa seems to not use pip 621, but it's own namespace, is that just not done yet? So we released, we've initially released this tool internally before pip 621 is done. So we had a specific tool.vulkan namespace for it. The migration over to fully pip 621 is done. It's possible that we missed some of the documentation in terms of updating it to reflect that. I'll take a look at that. Since we have a shorter session and we lost a little bit of time at the beginning, I can ask more questions. One other one that popped up was when multiple people in a firm working on a common library, in the same time, how do you manage the version between different branches, Office and Dev, in different environments, Prod and UAT? So what we generally try to do is we try to keep Prod and UAT environments in sync as close as we possibly can. So if there is a major divergence there, then that's gonna take some effort to kind of bring those back into alignment. But we usually try to avoid that problem as much as possible by just keeping everything as in sync with what's possible on a consistent basis. Okay, so thank you very much again for your talk and see you around at the conference virtually. Yeah, thanks for having me. I'm really enjoying the conference so far. Thank you so much. Thanks.