 Good morning everyone. Thanks a lot for coming to this session. I am Pradhyan and This is my first technical talk as such at a conference. So I am nervous So Python packaging, where are we? Where we are and where we are headed? So who am I? I am Pradhyan I'm a member of the Python packaging authority. I'm a maintainer of pip, virtual men Packaging and a bunch more things. I'm a Python software foundation fellow and Perhaps most importantly, I'm a college student Which means I had a lot of time to do these things With my introduction out of the way, let's get into it Where are we? That's the first part of the stock and We're gonna try to answer two questions. What's the state of Python packaging today? And what does the tooling do? What tooling do we have today and what does each tool do? These are good questions and Let's take a moment to build ourselves a vocabulary and a framework for answering these questions since They're fairly loaded questions. I am not gonna cover how we got here That's something that Dustin Ingram Excellently covered in a stock which is linked here These slides are available. You'll see links to where these slides are at the bottom of few slides So that's not what I'm gonna do. What I'm gonna do is cover where we are today So the question I'm gonna cover right now is what's packaging? And I'm gonna start with a very simple The real world use case Say you're moving Households or you're gonna courier something You're familiar with how you package stuff, right? If I have a sheet of paper that I want to take somewhere else I might put it in some sort of a package possibly a file Possibly roll it and put it in my bag and then take that package and move that package And then on the other side, I will just take the thing out of the package That's straightforward and this really scales well So if I have a lot of packages, I'll just dump them into a container or a ship and take them a long distance away But we're not moving bananas here. We're here at a software conference so we're gonna talk about software distribution and Well, it looks pretty much the same The main difference here is on the other side. We don't want what we put into the system We want working software We start with our source code we want to manipulate it at some point in this process such that we have working software at the other end and turns out this is basically an unsolved problem and I'm not the only one saying that there's more people saying that and it's really difficult because There are a lot of things in that transformation that affect what the result looks like Everything from what OS you're on all the way to what little nitty-gritty file things you have locally on that machine turns out you can make this problem a little more easier to solve by Constraining yourself, which is what basically everybody does. That's why you have programming language specific Package managers or OS level package managers They constrain the software distribution problem to be smaller and easier to deal with But yeah, the general case is too tricky So you might have noticed Between these slides where I was talking about packaging I changed from package to distribution The reason for that is in Python package is actually a very loaded term there are multiple meanings and You import numpy and numpy is a package, but then you also download a package from pi PI which is Different from what you have when you import numpy. So naming things is hard. So I'm just gonna call this distribution To keep our life easy now in this overly simplified model I am gonna create two kinds of people publishers People who have the source code and want to give it to other people and users who want to use this and get working software What do these personalities do? Well, the publisher has the source code to start with They can build a distribution from it Note the nomenclature words I'm using because these are gonna be the vocabulary we use together to keep a track of things so Cool, I built a distribution This built distribution Well, you need a process for building it you have a build process or a build mechanism and You have an environment you build this in by environment. I mean anything. That's not the source code itself So it could be your laptops or it could be some CIS system you have or it could be some data center SSS to VM that you're running that has things in it by environment I mean everything that's external to the source code itself that Determines what the distribution looks like And it is that means that the distribution is determined by what's available where you build things Now we have the distribution the next step is Starting to move it right getting it into a place where the user can get it Well, I'm gonna call that upload because in most cases you are gonna upload the file literally to some intermediate server and Yeah, that's where the user is gonna get it from And this also has some mechanism or protocol you're gonna use It could be SSH it could be H2DPS or it could be something else entirely But usually you want to have a single file and some information about that file possibly contained within the file itself So we have the software somewhere Now we're moving it over to the user So the user needs to get the distribution that we built and we're gonna download it. I mean upload download Similar to how downloading has a mechanism. So does downloading But there's an extra step here. You have to choose what you're gonna download because in most cases Because software is difficult we make well bug fix releases or new features and You have multiple releases for a single thing. I can get any version of say requests As long as it's available and published So that's what the second step here is choosing what you want to do Now you have downloaded whatever distribution you had decided to Now, how do I make this into a working bit of software? Well you install it and This has two things as well the install mechanism, which is how do I go from the distribution to working software? And the environment, where am I doing this? Am I doing this on the final machine? You should be and Overall, this is what it looks like This is the overall flow for mostly packaging and distributing software The publisher builds the source code into a distribution which gets uploaded and downloaded by the user on the other end and Then the user installs from it All the words on this slide should make sense by now. We're gonna use them a lot There's another way to look at these things If you look at the upper half of the diagram All you're doing is taking the distribution from one place and taking it to the other side these two steps are closely related and Often what upload mechanism you have determines what you can do on the download mechanism This actually is fairly straightforward. We know how to move files across the network or between computers It has quirks in making sure you get the exact data you wanted To make sure what the publisher publish is what the user is downloading, but in general we have solutions to deal with this On the other hand these two steps are tricky Because they have environments These environments these details external to the code itself are where a lot of the complexity in software distribution comes It's dealing with the fact that the environments you are working with may differed on a way you care and When there is this difference things break So By constraining our general problem of how do we distribute software to say how do I distribute Python software? Or how do I distribute Windows software or how do I distribute Ubuntu software? You're constraining yourself in a manner that lets you design a distribution format That makes it easier to work with Dealing with environments for Python is not something I'm going to go into That's a talk a blog post and an overview for Python packaging Already all things that have been done before I will touch upon some of these things, but these are the links you want to go to if Well, you want to get more detail about these things Now with that overview done, let's talk about Pi PA tooling Python packaging authority tooling these are tools that I work on and a bunch of a lot more Interesting people who are gone So Pi PA tooling is mostly built on top of standards I say mostly because there are a few things that don't have a standard yet, but we want to make them standardized Why do we do it this way? Because we want these tools to not just be interoperable within themselves, but with the broader community as a whole there's gonna be Linux redistributors and Basically redistributors of any kind that want to take these distributions or tooling outputs as Inputs to their systems and do stuff with them This enables us to essentially Have one mechanism to do our stuff and then let other people Take that as input and transform that into a format That's their distribution format for their constrained way of solving software distribution In an ideal world, this means each tool that Pi PA builds is Replaceable by somebody who is not a member of the Pi PA, which is something ideally we'd love to see and We are seeing that as well Now the Pi PA makes two kinds of standards one is package index interfaces These are how you interact with the Python package index or essentially any package index for Python Which may not be by PI dot org One example of this is DevPy Which you should probably use internally if you care about having a constrained Pi PI that not everybody can Well, not everything is accessible within now There are standards for the Download stuff side of things, but the upload side of things isn't pretty well standardized so that's one area for us to improve and Package distribution metadata this bit the tricky bit is where we have a few standards and I'll get into those at a later time within the stock but Talking about standardization We went too far once This was prior to I was involved, but this is something you'll see a lot of PEP 426 or metadata 2.0 This was an effort at standardization Which was trying to solve all of Python packaging's problems? It never really became a reality not because it was not feasible or Well, it was not feasible But the reason for it not being feasible was because all of Python packaging today I mean most of it is volunteer driven There's very little or no funded developer time on these tools Until very recently, but even that is not completely all-encompassing So since PEP 426 we have moved to a much softer approach which is To do incremental improvements to improve one bit at a time so that eventually we can get to a point where well Everything is standardized and everything is interoperable, but we're not there yet, but we're working towards it Now moving on to coming back to our diagrams What does the publisher side of things look like today? Well, you have source code you build it right now you have a build back in It's usually setup tools but recently We have standards which let you essentially replace setup tools with something else Which is why I say build back in here because with PEP 5 and 7 and 5 and 8 Don't bother about what they're named. I'm just gonna call these modern source distributions These have a file called by project or total which defines how you're gonna build things It's very useful because now setup tools and all of its legacy Can I won't say that be done away with but not every user has to deal with them because now you can take an alternative for building things On the other end uploading uploading is fairly straightforward You'd use twine for it today if you're using setup dot by upload don't Twine is the tool for uploading packages to python pipi pi pi dot org It's a relatively small tool. It like takes a file uploads it to the right URL and Occasionally you'd want to do twine check to make sure what you're uploading Actually looks right and works well with pipi On the other side for the user Pip is the main thing From pipi I'll get into that more details here, but In summary In general the funnels all the download parts and all the installation parts. So All but one thing that it's not good at is environment management, which is where virtual and was developed Python and pip as the two components are not really good at isolating from the global system and blah blah blah So folks realize it's a good idea to have an isolation and what's in them was that Basically every modern python has VN in the standard library, which is a Re-designed but essentially the same concept for virtual environments. It's lighter. It's usually better But This picture is too straightforward That's not what things look like. This is what things look like We'd actually have to depart a long way from those pretty boxes To go into like the internals of pip, but we're not gonna do that And this complexity is not bad it's inherent given the fact that environments are difficult to deal with and a lot of this complexity comes from the fact that source distributions, which is what is very common within the Python community and wheels Well source distributions have to be converted into wheels because we don't know anything about source distributions But we do know things about wheels And this complexity is essentially for dealing with that, but a lot of this complexity Isn't really necessary for working with these tools as long as you know this You have a good overview of what the tooling should look like But this assumes you're using pipa pipa is tooling, right? There is non pipa tooling as well Pippin is not a non pipa tooling But initially it was and what pipa does is it essentially Takes you away from using pip and virtual length to just using a single tool This is super useful for web applications where you don't really have too many complicated C dependencies and By essentially having a lock file you have a reproducible environment non pipa tooling Environments are difficult well Oops, I missed a slide Well, let's talk about build backends. So I was talking about how we have a standard for build backends. Well, there's flit FLIT, it simplifies the build experience it's a non pipa tool built by a pretty awesome person and It simplifies a lot of things by making assumptions and by making choices for the user so that the user does not have to deal with them So in act functionally all you need to do is write your version in your init file and have some metadata and the pyproject.toml file and You can publish this you can It's very straightforward and it's wonderful that this exists There is on the user side. There's pecs pecs is a library and a tool for building python executables pecs is short because it gives it that extension and Essentially what it does is it takes the python application the end library that you want to take and crafts a fancy Environment for it such that when you want to actually deploy the application all you have to do is copy one File as such or a single directory. So it's just a single call on the shell On the other end, there's pipx as well This tool is for end users as well And what it does is it creates an isolated environment for every single tool that you want to use? Then there's poetry Poetry tries to be a single tool that does everything. I have opinions on this I will keep them away, but this is a very smart approach when well, you are able to take a cohesive view at things and well clearly people love this environment and Really, this is where we want to get to as well Pipeers tooling is not at a point where it says well cohesive as such but There are inherent Issues with taking a cohesive view at things one of them being US monolith that is very difficult to work with But there's another approach here conda Conda well, it's excellent in my opinion for the same reasons that poetry is excellent But conda takes it a step further what conda does is in strain of constraining itself to python it constrains itself to a conda environment which can contain a lot more than python and By defining their environment in a way that's pretty smart they essentially have a cross-platform and language agnostic management system package management system and This resolves a lot of the environment management issues you have with python which is C dependencies for numpy and cyber and Since a lot of the scientific python community was well dependent on C libraries and Understandably so conda was developed by a few of them to essentially resolve that problem All of this stuff is really hard though and This is a quote from Paul about when we were trying to standardize one of these standards and It is it's So many of these details were not understood completely or Clearly documented in a manner that works well for us But enough about what we're up to right now Where are we going? Where is python packaging headed? Well, I don't have a time machine. There's no concrete roadmap. This is stuff done by a bunch of volunteers But there are things that we want to do soon age and that we're motivated to do and These are things that are clearly preferred over the others These are ongoing improvements We want to make sure by PI in case of a compromise Cannot be taken away from the users and that nobody can ship malicious code If that's not what the publisher published there are newer many Linux standards many Linux is a very unique approach to solving the environment problem Where we're basically saying these are the expectations we have from a system and the system gives us those and If those match well, hey, you can install this wheel Yanking packages from PI PI This is something that well we want to do for things like security vulnerabilities If there's a wonderful package of say requests version of requests We'd want the request maintainers to be able to yank that package so that it whip and other packaging tooling does not install it by default and Better my licensing metadata Python packaging as a whole does not have good metadata on what licenses look like and What the license of a certain packages and this essentially is an effort to fix that? There are more things we want to do we want structured lock files in pip We want pip to be a little better with handling environments and that's one We want to give the publishers a more Easy experience in terms of publishing those packages to pi PI because right now you push it and that's it But we'd want it to essentially be a lot simpler. You can push it check that everything works and then publish it We want security notifications for vulnerable packages if you're using a vulnerable package on Pi PI you should know about it GitHub did a lot of work on this front and it's gotten better But it's still something we can improve a lot on and there's the pip dependency resolver That's a thing that's been broken for a long time And it's something I have personally worked on and it's a lot trickier than I expected it to be But yes, this is also something we're going to do in the future. Hopefully You want a better user experience for our end users On the user side as well as the publisher side This just amounts to better defaults and pip feature flies on pi PI for making things easier to deploy for us improbability testing This is making sure all of the tooling that we have that's popular Works together well and that a new release of safe pip does not break Talks or set up tools and all of the other packaging tooling that we use and we want to have a professional UX review of these tools because we're just general software developers mostly and Having someone who does this day in day out would really be able to spot things a lot better than we would One very important thing that we really want to do to make all of this happen is reduce technical debt a lot of these things All of lot of these tools rather Well, they've been built over time by volunteers with limited time That's a very good recipe for technical debt for anyone who's not sure what technical debt means It's basically when you try to do something and you do it quickly rather than properly and Then if you do things quickly too many times the next one becomes a lot slower That's a very weird way to put it but we're going with this So we want to rewrite virtual length Not change any end user functionality, but rewrite it because at this point It's accumulated so much technical debt that it's difficult to work with as a maintainer We want to transition away from distitutes, which is what the old standard Sort of for Python packaging was to set up tools so that everyone gets the new shiny things We want to clean up pips build logic that convoluted graph. I showed you well, that's an ideal world scenario That's not where we are today And we want to clean up pips working schemes so that well It doesn't install to your system when it's not supposed to it doesn't break your Ubuntu OS And stuff like that which would be really nice for having for the end users and We want some more standards because we're still not there in terms of being able to appraise any tool at any time We want editable installations to also be standardized. We want to have more powerful extras and We want to have less ambiguous side licensing How do we get there though? well That's tricky right All of these projects that I listed are fundable targeted projects that are listed on the Python wiki and Well, you can volunteer to help us actually fix these we could use help and well, it would help us if you tell us how you deal with these interesting issues because More user and information of how users work with these tools helps us solve these problems better. Thank you you