 Okay, welcome back. Second talk. This is Stelman Kuzar. He's since about 2009 into packaging. Think you can Google Summer of Code project, get introduced, working in the Gentoo tooling, since about eight years in the Python community, since about two years contributing to Nix, and he's talking today about the state of Python packaging and how that reflects the Nix. Hey, everyone. I hope you have stretched and got coffee near wake. I'm going to talk a little bit about how terrible packaging is in Python, and how we're slowly fixing that in Nix and in Python community. I assume you know a little bit of Nix. I assume, you know, almost nothing about Python. And if you have any questions during the talk, please ask. It's easier than at the end. So just raise your hand. I'll repeat the question. So just a bit of outline. We go through history, how setup.py works, what are caveats in packaging, how built Python package works, then what the caveats in Nix and, you know, what's next from, you know, from my perspective. So a bit about history. There was these tutorials, and still is. And this is how the packaging looks like in these tutorials. You have this setup function that does everything you can think of, and you tell it some metadata and what are the modules. And then you can install that in Python, and it will store it in the special path. More about that later. You can build six tangents. You can create a tarball. So it does everything, of course. So these tutorials were very basic. It was mainly meant for building and installing stuff. But it doesn't know about, you know, downloads, dependencies, and it doesn't know about dependencies at all. There is no way to reproduce an installation. You're just imperative commands, and, you know, you hope everything works. And it was included into Python in 2000, so a long time ago. And then setup tools came four years later, and it's using the same interface. In fact, it's monkey patches, these tutorials, in order to provide that support. And, you know, you have this easy install command that we'll fetch from the Python Packaging Index. That's the central repository for packages. You have this binary format, which is a zip file, and you compile all the six tangents, and it zips them up. And, you know, you can also use all these tutorials commands. But it's monkey patches, these tutorials, so it's even, you know, the situation is even worse. It doesn't support uninstall. It's the worst code I've read. At least in 2009, I haven't touched it since, because I don't want to. And in 2009, there was no release for years. So that was a pretty depressing area in packaging. And still, we do everything from setup.py, and it handles, you know, the state of the world. So, you know, just to show you how bad this is, and it was easy install, actually, for each dependencies, it would fetch the description. And then for every link that was in the description, it would go to that page and find a possible turbo release. So not only going to the Packaging Index, it would try to crawl the Internet for more releases, which is like what? You know, the install times were pretty terrible. And, you know, you later see that this was fixed only in 2014. And there was, you know, a lot of politics around whether, you know, this kind of backwards compatibility should be kept or not. So this is why everything is going slow, because, you know, once you give something to the people, they don't want it. They don't want it, that you take that from them. So then in 2007, I, Baking, which is not working from Mozilla, he wrote a virtual end, which is kind of like Nick Shell for Python. It simulings Python into, you know, a folder test in this case, and then you can install the Packet-on-Package in an isolated way. And he wrote PIP, which is kind of a fork of easy install, which was a really nice move, because you want the installer software to be separated from the build in languages. And, you know, it has a nice interface. It has, like, phases. It can freeze versions of dependencies and so on. And then Tarek Zyada in 2008, which also works for Mozilla, basically said, OK, we have to fix this. And, you know, again, after lots of politics, he forked, set up tools, and wrote distribute, tried to document stuff and improve that code base. But he, the same year, figured out the only ways to rewrite. And finally, I think this is, at least as far as I know, the first actual implementation of static metadata in Python using setup.cfg file. So, you would have a static file that, you know, can be parsed and stored and so on, instead of evaluating setup.i every time. But in 2012, he pretty much stepped down. There is a long story around that, and it's not really that important, but the end result is that Distitutes was pretty much left alone and Distitutes at least got merged back into setup tools. So, people are very confused. We had, like, setup tools, Distitutes, Distitutes2, and then that merges back and forth. So, a lot of effort, and, you know, some results, but still not that good. And then that was the area where people actually started to talk, and they said, okay, now we have to start with specification first and discuss things before we go and implement that. So, in 2012, that's when it happened, and then after 2012, a lot of FEPs came, and FEPs are Python enhancement proposals. They're pretty much how to improve Python. So, there's, like, a lot of stuff that, and, you know, we're really thankful to all the people that are working on that. And now it's a community effort, but you see there the meter data 2.0, that's the static meter data format is still at draft, and all the FEPs that follows are pretty much, like, specific parts of that meter data. So, it's going kind of slow, but that's how we want it to be. We've tried to rush it and it failed, so that's not the only way. So, yeah, set up high. This is, you know, a typical a call of that function. You, you know, there's not much, much here to, I'll talk about specific parts later, but, you know, you have, like, entry points at the bottom where you specified what executables to generate, you have different kind of dependencies on meter data, and that's pretty much it. So then Python, actually, how it finds the packages is using the Python path. And this is an example of the PBR package. You see it all depends on setup tools, and on Python, of course. And then if you actually run the Python and import syspath, that's, like, what Python interprets from that Python path, and you'll see it adds a bunch of Python-related paths. And at the top, you see it's an empty string, so that's the current directory. And it's, so when you import a package, it goes from the top, from the bottom and tries to find a module or a folder, a file or a folder and run that. So it's, it's, that's very simple. But, you know, so there are a lot of things that are bad. I'll just talk of some of those. We could have a few hour talk about this topic, but, you know, so we have three different, basically, graphs. So dependencies, setup requires build time, run time, and optional. And you see there's, like, even the, for starters, even the names, sometimes they have s at the end and sometimes not. So, yeah, I always miss that. And, and, you know, another thing is the extras include the testing, but there is no, like, convention of the testing set should be called and some people call it test, some people call it testing and so on. These are all the things that make it really hard to parse this. And we have three separate directed cycle graphs, so I'll talk about why that's hard in Exhibit later. So we have, in Python supports circular dependencies, of course. And that's pretty much because PIP, the installer that everyone uses runs in phases, so it will first download everything and then it will run on all the packages, the build command and then on all the packages install command and then it doesn't run tests at all. So our phases are for the whole package set. And that's why it can do these kind of things, but, you know, in Nix, this is problematic because we install packages as one unit and build them as one unit. And there are not easy ways to go around circular dependencies. One is to bootstrap something like we do for the standard environment and something we have to do for PIP and similar. And the other way is to do this kind of Nix stuff, like B depends on A, but in this case it doesn't depend back on B and so on. So if there is one thing I want you to remember from this talk, it's this in Python tests are the single source of truth. If you don't run tests, you don't know what you packaged. You packaged a bunch of things, you never really run and, you know, because everything is a runtime, even our packaging the tests are the only way to really test something happen. So, you know, when you package Python stuff always the most of the effort goes into making sure that these tests actually execute and if a single test fails we don't want to disable the whole test suite, we want to disable that test and leave others be there. So, but a lot of times the source doesn't come with tests. A lot of times people use mocking in Python so if you upgrade a dependency there is no guarantee that it will work even though we have tests. And, you know, people try to do everything in tests and, you know, it's a lot of effort as I said goes into fixing those things. You know, people have even do things like, you know, run this command and it should succeed in 200 milliseconds but when you run these things on Hydra, you know, because of the IO that's not a guarantee so a lot of time we have to disable those tests. Also, like in Python versioning we have currently supported our 2.73334 3.5 and I think 3.3 Yeah, that's it. So, you know, a lot of versions to support and, you know, Nix has to support all of them and the way that in package MinterData how we declare this what's supported is with classifiers but the problem is no one uses that or it's you know, it's outdated so nobody is using that information to automate, you know, the MinterData to say, okay, I'm gonna support our build based on these classifiers so there is no this feedback loop, at least I haven't seen it so the only way to support that is really to look up the documentation and really specify what the Python versions are supported so it's not too much work but we have to do it for every package. So the next thing is the manifest file which is kind of important in packaging and it basically specifies what files to include besides the Python files so if you have like icons or anything basically that your program uses and usually you would say okay, you would just create a package and say graph to my package and everything under that will be included into the table but people don't want to maintain that file because a lot of time they will forget to add something they will deploy and there will be an icon missing so they made this package that a lot of people use set up those git so it will use git to list all the files in the git to know what are the things that belong to this project but that means we depend on .git folder in build and install time and we know that in Nix we have problems because it's not that deterministic so that's sometimes a bit of a pain but something to be careful about and just so you get an idea what can be done with set up py people do a lot of stuff a lot of times things like this so they will check the current interpreter for what version it is and include arc parse and this is because arc parse was editing python 2.7 so they will declare an dependency on arc parse for python 2.6 and this is problematic because if you run set up py with different versions of python you will get a different meter data that makes things even more complicated or they will do things like this try to import it and then depend on it and PEP 4.96 which is from this year actually adds these environment markers so for every dependency you can add a bit of under what environment this should be true so we're progressing it's not that bad and just to give you an idea how bad this disk can be pillow is a python imaging library and it's like almost 1000 lines and it tries to run brew and it tries to find the prefix from brew and if it's there it will add some paths where it should find stuff and so on and this is for different links distributions they have hard coded stuff where it finds anything can be in set up py and really I don't see much contributions to this and it's something we should improve and something we should talk about but that's the current state all right so enough about that so now we remove into how Nick supports python and what can be done to improve that and I'm going to go through all the build python package search because really quickly because it's very small just so that maybe it will scare you away but maybe that will give you a bit of an insight what's going on and hopefully we'll get more contributors so I skipped the first lambda function of dependencies because I'll explain all the arguments throughout the source so first we throw this error if the disabled flag is specified and that's if a specific version of Python is not supported and it's a little bit abusing Nick's but I think it's nice because it gives you the name of the package and under what version is not supported so it's user friendly all right and then we define the Python make derivation we remove the disabled flag because we don't need it anymore we inherit the do check and do check is always true in Python so that's the most important thing we have the name prefix so it will prepend Python and the version before every package we add a bunch of we add a bunch of helpers we have this line that automatically at zip unzip sorry if the source is and then we add the propagated build inputs with setup tools always included um all right and then the configure phase this is the we exported the ministic build this make sure that the Python compiled file includes time stamps um you can well it's there so you can disable it but I guess nobody will ever do that um and this is and then you know the here comes the fun part we basically have to import setup tools before these two tools because setup tools patches these two tools so it has to be imported first um so yeah that's how we do it there is there are better ways to do it by by um first importing setup tools and then evaluating setup I but yeah that's how we currently do it and it kind of works um and people does the same thing by the way if you look at the source um okay then the check phase this is just setup I test um the field phase runs setup I build with a bunch of flags um and all the hooks um all right and then the install function that's kind of the meat of it um and you know it it creates the the size packages that's where the Python packages are installed uh it exports the Python path and then call setup pie install with a bunch of flags um and then it will move a bit of files and and remove them because if you have easy install dot path in every package it will conflict when you install two packages into the same environment so we move it and rename it um and and there is one flag that you see there old and unmanageable and I'll talk about a bit later and that's something we we have to get rid of um and yeah all right then is the fix up phase this is where we rep all the Python um so all the executables in bin are wrapped with propagated build inputs basically um because they expect those Python packages to be available um and then we use this um dot pth files and I'll talk about those a bit later that's what the next um that's what this snippet does and then we have this shell hook um which if there is a setup high file it will run setup high develop so it will actually like enable you to it enables you to develop Python software and you will have that you know installed and importable um it's it's not that nice but it works really well um oh so I forgot to explain about the pth files so um this is this is a kind of um back in Python because uh if you just say if you point Python path to just one package like you know Django it will it will use these pth files to discover all the other packages so that it will recursively import all the other packages so you don't have to always specify the whole dependency tree um and and so that makes Python path kind of out of discoverable um I haven't I didn't write this but I don't know if it's a good idea but that's what we currently have um so cave it's an we don't support namespaces because of that old and unmanageable flag um and you know we will see stuff like you know this the that the init files collide and and you know what's what's really going on is that setup setup tools has this um if in this any file you can say declare namespace so that in that way it knows that the logilab package can come from the logilab folder can come from different packages which is a terrible idea in my opinion but it's a feature that people use it's not encouraged anymore but still there are packages out there um and so the way to currently fix that is just ignore collisions and these files will just you know merge and it should work more or less um but but we are going to support that with wheels and I'm going to talk about a little bit later um okay that's another thing that you know not only python has a problem with but um if you have two different versions in the in the graph um the first one in the python file will be used and we won't conflict and again that's because of the old and unmanageable flag uh and again this will be improved with the wheel support so that we don't install time you know this will be detected um and then the imperative package management if you do things like I can install numpy and python import you know you will get that it's not available and that's because we in profiles we don't populate the python path um you you have to specify it yourself um and and you know this is up for discussion um if this is a good idea or not um I think I think if we if we pollute the python path in profiles it's a bit problematic because it's you know python if you if you'll do anything it will also find those packages so if you're not using change route or something to build stuff it's gonna you're gonna have you're gonna leak stuff and python won't build or detect that it needs something and so on um but you know I I I have an opinion about this but I'm easy you know somebody wants to convince me I'm for discussion um alright so a bit a bit so now a little bit about what problems rise up when you packet stuff um and you know one of the things is that python 3 doesn't have a python binary so other packages assume there is a python binary so currently we just replace the python with python interpreter which is the full path to the binary and that's sometimes needed um then we have stuff like um that the tests will depend actually on the installed files and and because check phase comes before the install phase in mix this is a bit problematic but you can say okay check skip the check phase to install check phase which comes after and and you can kind of explore the path and run the tests and that kind of happens a lot of times um then we have another thing that sometimes interpreters already ship with the library so for example enum 3.4 comes in python 3.4 so you want to to depend to only include it before that and it's problematic because you will get very weird errors if you don't do that because people don't assume you would install it um on 3.4 for example and you will see it's just not supported there uh so those are the things that are really hard because you get a very weird error message if you don't do it um and something similar is for pipi which is an alternative implementation of python uh in python itself um it's it ships with this cffi library so you if you if you add that dependency uh it will pick it pipi will pick it up and because it's not it won't use the internal one it will you will get conflicts because the released version might be different from the one that ships with pipi and you'll get again very weird stuff yeah I mean yeah you could in yeah so the question is can we do this more automatic and yes um of course in in the build python package we could you know say if the pipi if we have the pipi interpreter we could skip these things um yeah yeah yeah well this is well well it's kind of questionable whether to use this well in built inputs it will work this technique but not if you for example use it in string substitutions yes so yeah well I always don't know whether to use this technique like for this the Darwin stuff yes I conf and such but that's one of the problems if you use null then if you have yeah if you use that package in a string you will get a confusing error message um and and that's why I prefer not to do that but we could filter it out or something it's definitely something that can be improved um in in the python packaging um so um so there is another thing that python when it reads files from the file system it will do that based on locale so a lot of times you will get like a unicode error in tests and and then you have to do something like this um to depend on locales and export that and and you know yeah um and and a lot of times people will like hard code um you know exact version and then most of the time the solution is to un-hard code it but not always um it kind of depends on what they're trying to do of course uh all right and spied on ships with with some built-in modules that are we we we package them separately because for example tkinter depends on x so you don't want the whole x dependence uh dependency tree so there's a packet separately so then you just use python modules that curses to and and tricky part is that python we built python 3 as a whole and python 2 is modularized so um that's something to improve there also um and and you have to be careful which versions use and and in python 3 actually the all the modules are defined as null so all right so you know lots of stuff to improve um I haven't had too much time but at least slowly we're getting somewhere also in the next uh so this is something I'm going to work on sprints and if somebody is interested um to help out I think the together prototype it shouldn't take more than a few a few days probably a day or two um is going to be a bit more work to to fix all the packages that break um so so what are wheels wheels is um basically the next generation egg and it's a well defined standard in python documented how how the when we build the package and we put it into a zip file how you know what everything is and where it's the metadata and where the stuff that should be installed and scripts and so on um so in other words the this is what will effectively change um but why we want to do this is because this the path on packaging is this is where the most improvements happen so we want to use the the latest upstream um where where we we get all the the candies um and this will fix as I said like a lot of bugs um and then you know at some day we probably want to generate packages but because there is no static metadata that's really hard so this is one part uh where where we could research how much that metadata we can extract in in like a central repository or something so it's easier to then from that static metadata to generate all the packages um but you know as as you saw before it's it's really hard to do that and it's gonna be need a lot of work and mostly we'll have to work with upstream so that's kind of a lot of effort to do on contrary to to Haskell or something where they only have the static files and they can just use those uh and you know we I think the Haskell API in terms of Nix is really nice now um um it's not perfect but it's way better than Python so um I think we should we should move also there with Python and and I think in general all the languages that we have we we should try to have kind of similar API so not everyone is doing their own thing so and I think Haskell is currently the most you know advanced and and simple at the same time so that's something to to do um so yeah that's that's a quick overview uh at the bottom there are two links one is the first one is the the quite about Python packaging in general it's it's very well written and it has the latest you know um recommended tools and and you know um ways to package and the second one is a bit of a reference manual I wrote for Python and it and it's not perfect and and you know I'm feel free to to tell me what's missing and I'll do my best um and and the first line is just from the Python uh that's actually one of the guidelines so I don't know what it means but uh um yeah thanks questions okay uh so one thing that's not completely clear to me uh so you mentioned uh a new firmware and you also mentioned that static uh descriptions would be good so the new firmware does that still use setup.py or does that have a static description so well kind of both um it still uses setup pie but it will generate the static meter data and put it into the wheel um so only once you've built it you will have that so that's a bit too late for us to use um that's that's meant more for for people doing um stuff on you know build packages and trying to figure out um stuff later on um but the static meter data is something that's still in draft mode and it's currently not used anywhere uh open stack people have their own pbr package which does that and and you know the effort is very fragmented but we cannot use that so we'll we'll you know again we have to do it our way somehow to to find so it it wouldn't really be possible to take a wheel file and generate a nyx expression out of that yeah that's that's what actually um florian uh frist of the last year uh and that's yeah that's one way to do it but the problem is as I said before if you run setup pie with different python versions you get different dependency graphs so it's it's not that simple you know uh that's just one one problem that that's there so we still have to run this script to generate the meter data and it's dynamic and it's based on environment and so on um so yeah you might not get everything you need from that so next question from rock um there is one problem also which you forgot to mention is and especially if you package any command line tool uh because of this recursive thing and the python path is actually uh is actually created at runtime so if you have a command line utility it would actually run a lot slower because it will need to figure it out the whole python path and it's quite performance boost uh hit when you do this yeah the only way because it's basically at the build time you know the python path what are my old dependencies why just not use it you know like just set it because it's you know about it so it's like this recursive thing no it's that's not how python works that should be removed yeah i agree the pth hack is you know nick specific and it's a hack and and yeah it has drawbacks and i i if you would ask me i would remove it but you know that's something we should talk about also with more questions no it's all clear that's good so you said that um quite often these python packages uh some metadata like uh had a compatibility with specific python versions is incorrect or incomplete right yeah uh so does python do any continuous builds i mean uh what's their c-pen called is that pypy or pypy yeah no there is no they actually don't run that because it's a script so that they specifically don't run set up high so the the index doesn't even know about dependencies um um because they don't want to run you know arbitrary code on there so it's it's pretty much just like description there and that's pretty much it and and you know it's problematic because if you like the way you generate the whole dependence in tree is python is like if you depend on jango you say okay then you fetch jango you run the set up high then you know the first you know uh list of dependencies and then you have to download all these packages and run set up high to get more and and so it's you know you discover that dependency tree based on how you run this set up py files so it's it's that's why it's you know that's why that's why we want to go to the static meta data but it's a lot of effort and a lot of people don't want that because you know it will break stuff of course and so on so it's yeah so so i guess we should sort of infiltrate pypy to sort of provide kind of hydra as a continuous build surface for pypy and yeah no one went once python developers depend on that they will automatically fix their bugs and we don't have to do it i was actually talking to two with nick from redhead and they're kind of interested maybe in sponsoring that work but we have been too busy so far but maybe yeah that's something we could do and and you know fix a lot of like python packaging stuff but the problem is that you know there's the question when how where where this static meta data you know it's it's good that this goes upstream and and you know where would you know if we could solve a lot of problems in our ends but that wouldn't go upstream and that's kind of problematic because upstream in python it's not as bad as like in in java script where releases happen a lot of times still pretty fast and you you don't want to maintain that in nicks you know ecosystem so it has to be a community effort so there is a lot of work to be done in this area actually to to come to sanity yeah one one more thing yeah so yeah thanks for the talk it still sounds like the python situation in general is still a little bit of a mess despite all the improvements from the last couple of years and I was wondering seeing that in the Haskell community this has driven a lot of people towards nicks coming from Haskell because of the packaging situation in their ecosystem do you see any movement of python nisters towards nicks for similar reasons yeah so in python you know unfortunately well for us there is like a conda package manager built in python which is pretty much a clone of nicks written in python and and I've talked to to to the original developer and when I was at the python conference you know like why split this effort and they basically said that we're we want to support windows with Saigwin which is a bad idea and even we don't say that's officially the author of python so what conda does differently is they have separate specification for windows and unix so you know they they maintain both at the same time so windows has like native support and of course the same could be done in nicks if nicks you know was supported to windows natively but unfortunately you know they they're doing it their way and they even got a big investment right now so you know it's it's good I think at the end we have a better ecosystem but it's gonna take more time to to show off basically and to convince people okay any further questions we would still have five minutes otherwise I think we could start the break and to see each other again at I think it must be 1115 yes I guess the right number so thanks a lot see you soon