 Okay, good morning. Thank you for coming, progressive to engineer Domain Korser. Hi everyone. Welcome to Europe. I'm really excited to be here for yet another year. Just before I start my talk, I'd like to say a little bit about myself, so you'll better understand the context of it. I've been interested in the software distribution since basically I was a student and I was using Gen2 back at times developing for Google Summer of Code a project to package Python automatically for the Gen2 platform and so on. And in the last three years I've been working on NixOS. It's a Linux distribution, probably heard of it. And I'm tackling the problem of how to distribute all those packages to people and make it easy to use and it turns out it's not. So I'll talk about how Haskell does it and how that compares to Python and what we can learn and what things we already know, but we just can't get there because it's complicated because of our legacy. And currently I'm working for a company called Snap and we're doing open source networking software and I'm infrastructure engineer so I'm setting up the whole pipeline for testing and benchmarking them. So my py, right, we got types in Python so clearly we are improving Python even though it's more than 25 years old and Haskell is definitely inspiration here. So clearly there are things to improve and to learn upon. So let's start how Haskell does packaging. And their tool is called Cabal and you would have a file like this. It's a special kind of syntax. And at the top you'll see just some metadata about the package and at the bottom you'll see you can say, okay, my thing, my software is a library but there is also an executable and it has these dependencies, it lies in the source directory and so on. So one thing that you will figure out is compared to Python, this is just a file that you can parse and in Python we have this script you have to run for actually to do something. I'll dive into that a bit later why that's a big difference and how that affects everyone pretty much. So if you think about the API, in this case in Haskell you would parse this and get the metadata back. In Python the API is setup function which does everything, like literally everything. So the format is more approachable and we'll see that a bit later. So one thing if you were careful enough you notice this build type line in that file and if it's specified as simple that means you parse that file and you have all the information you need to install that package in Haskell. But also you can say build type make or build type custom and in case of make it will run the make files and it will skip the Haskell building process and in case of custom it will run a Haskell program with specific hooks where you can specify code. So you have the power to go from very simple to overriding. Unfortunately the custom method is not used because it's very poorly documented but that's also a good thing because then people fall back to simple. So in Python we have PEP 518 which is I think it's not accepted yet but it talks about basically how to hijack setup tools build process and you can define your build process and this is in progress and you'll be able to go and not even touch the setup tools machinery and do whatever you want. You'll have the freedom to for example write a make file back end for Python packaging and of course this will be integrated into the PEP and so on and all the tools which is really nice because finally we'll be able to go forward from the legacy that we've been stuck. So just a little bit about advanced features in the cabal. For example here in Haskell you can say okay I want to have this flag that you can toggle and for example if we have a flag debug we can describe it, provide a default and then throughout the file we can write conditionals like if this flag is enabled then do this, then this option is configured and so on. So it's like a very simple language with just if sentences and nothing more. And this gives you the flexibility of saying for example if you have a library do we want HTTPS support or not. But there are DAO sites also in Haskell for example at runtime once the package is compiled there is no way to know which flags were used. So you just don't know that. And also for example you can say if HTTPS flag is enabled then add these dependencies but it also works the other way around if for some reason those dependencies are in the environment that flag will be enabled by default. So there is some magic and they also have problems and one thing you learn in packaging is that features are really problematic once you start introducing them you have to support them and these kind of things are really really painful on the long run. And in Python we have the PEP 508 which is environment markers. So for example we have a dependency you can say this dependency is only on Python 3 and Windows for example and so on. This is already supported in PEP but not many people are using this because they don't know about it. And the idea is that you don't write in Python if imperative code is saying if we are on Windows you just say okay this dependency and the marker is Windows and you are done. And this gives everyone else the possibility to also get this information to parse this marker and to do something with that information. And I will talk about later what we are doing with that. So Hackage is the Haskell Python packaging index. You publish your packages there and people can download them. But just as an example of a feature where it's really painful to support on the long run in Hackage you can edit the cabal files in place through the website. So that means if you release version 0.1 for example somebody can go and edit that cabal file and remove a dependency. And then it's not really 0.1 anymore. It's a whole new thing. It's slightly modified but it's still not the same thing. So in that case the Hackage will add this revision to align to cabal file. And when you start to think about okay now I have this local process where I release software and then I can also edit it online. But then what happens if I bump this revision and push it to the Hackage and so on. So there is a lot of stateful things going on suddenly. And while this might be a good idea and maybe some like one percent of people wanted for everyone else using the Hackage to download packages and to figure out the state this is really, really problematic. Especially if you want to have reproducible builds once you edit that file the hash changes of your tabal. So all the people that say okay download this file and this is the hash they will suddenly get a mismatch. And we really don't want to enforce a culture where you just say okay it's a new hash whatever because then there is really no point. So these kind of features are also present in Haskell and they're also present in Python which they give us headaches every day. And the API then for in the Hackage is that you can say revision and 2.cabal and you can get these revisions. But you basically have two versions. First you have a version and then a revision and it just becomes an item handling those. So Haskell is one year older than Python and they've also had this path of improving the packaging ecosystem. And since about until two years ago they had this problem where in your cabal file you had to specify the dependencies and we all know that not all software packages work well together. And in case of Haskell because there are types you would get a new package the types would change and suddenly your usage of that package wouldn't work so Haskell wouldn't compile. And this is the biggest problem they had is called cabal hell. So then you would start when a package would get a new version things wouldn't compile you would start putting in these constraints and so on. So every developer would do this for himself or herself and it's just a big waste of time trying to figure out which packages really compile. So I'll talk about how Haskell solved this but just an interesting thought how Elm which is another functional language solved it. They basically said in your dependencies you have to say always specify the limits of the major version. They say I depend on package HTTP it has to be between version five and six. And then if you uploaded that package an API changed it wouldn't allow you to upload it unless you bump the major version. So it's basically forcing the semantic version of the package. So the package manager forces you not to change the types the signature unless you bump the major version. And that's really nice we cannot do that in Python unfortunately because there's no way to really check if an API changed. Well of course that we could parse the APIs and so on but that's the gray area there. Hopefully something we will be able to do one day. So just how Haskell solved that? So they solved it actually it was released in 2005 so just one year ago called Stack Edge. So Stack Edge is a stable source of Haskell packages we guarantee packages build consistently in past tests before generating nightly and long term support. So what does that mean? So they built a site where as a maintainer can log in you specify some of your information and you say okay I'm a maintainer of these packages on Hackage. And then they go and they pick a dependency tree of your package and build it and see if all the tests and everything passes. And then they say okay we use these versions and these versions compiled. And then they provide an API for that so you can get those versions. So if you think about it in Python we have requirements.gxt but everyone has their own set of versions. In Haskell they pretty much crowdsourced that so they have a website where all those versions are tested and compiled and people use that as a community effort. Not as something you commit to your repository and you hope for the best. And so if you want for example to have backwards compatibility you depend on Stack Edge LTS 6 and then all the minor versions 6.7, 6.8 going to you that the API didn't change but they still ship security updates and so on. And when you're ready usually the new version means a new GAC which is their compiler, the main compiler then you're ready to go and fix those errors, compiler errors and you go to the next version. So I think that's very interesting because they're doing all the work together in one place instead of everyone in their own garden. And I'm not sure really if we could do something like this in Python because it's way more complicated than just compiling the package and saying it works. But I still think it would be worth the effort of at least having the major software that we use in Python to have these versions community managed instead of well having this work done. And by each individual or company. So yeah our solution is requirements.txt. So together with Stack Edge they also released a tool called Stack which is like a wrapper around cabal. So it can do more things than just cabal. And you specify a configuration file like this and you say okay I'm going to use these flags. That you'll pass to cabal when I'll be compiling software. I'm going to use these packages so you say okay the package is in a current directory and there's the cabal file and that's the one we'll use to build this project. And you can have multiple of those. So if you think about how Python does that you have to say pip install minus e dot or something like that. So that's imperative you have to actually like run that and when you have a new if you develop on two packages you have to run for both of them. And in this case it's declarative you open that file. You open that file and you know what packages are being added. There is no imperative steps instead of then saying just stack build and that will execute the whole thing. So it's way more declarative. And at the bottom you see the resolver. This is where you get this big set of pinned version and you say LTS 6.7 and there you go. You have most of the package packages pinned down and you're sure that those work. And there's also a field called extra dependencies and those are the dependencies that are not in the LTS. So not everything is pinned down. It's a community effort so of course if people don't do it then it's not there. So for all the packages you have that you don't that are not part of the LTS you can specify then there. And stack will complain if you don't do that. So it has a bunch of simple commands like stack setup is something like virtual environment for us. It will download a compiler and it will set it up for you based on the resolver that you're using and so on. And stack init will generate the files. It's like a mini-templating for starting Haskell packages and so on. So that's what stack does and the community was really, really happy when this happened. A lot of problems went away. Right. So now that we have this package with all of packages and a stack edge as a set of files, a set of versions, then my job and what I'm doing is how do we distribute all the software to the users so that they can really get these seamlessly and it works for whatever the platform. And we're doing this with Nix. It's a functional language. It's based on a PhD thesis by Elko Bostra. And it's a very short and nice thesis I recommend it to anyone who cares about packaging and how the functional language concept can change the thinking dramatically. It improves a lot of things that we have problem with today. So for Haskell, this is kind of the stack that we have. Nix packages is then a collection of Nix expressions that specify how some software should be built, similar to APT or something else in the distributions, except that we're not tied to a Linux distribution and we support Darwin and Linux. So why you would need this layer on top of the upstream package or PyPI is because we take care of system dependencies. We have a build system that will compile these packages and provide binaries for you. And we have a really powerful API which you'll see later so that you can actually go there and change those packages and, you know, tweak them in a way that you want, apply some patches, boom versions or whatever you want to do. So we're not the upstream that you just have to say, okay, either you use what we have or it's nothing but you have the power of changing that. And most importantly in Nix packages we have all the Haskell packages there. We don't compile all of them. We don't, because that's a lot of, you know, power and disk space that you need. So we take only one G8C version and for that compiler, which is the latest stable one, we compile all the packages or most of them. But theoretically we could distribute all the binaries and so on. So the user can then say, okay, I have this project, I have these packages, I want binaries and it will, you know, the Nix package manager will download that and there you go, you didn't compile anything except your package. And that's really nice, especially because you can share it between Darwin and Linux. Okay, so how does that work in Haskell? How do we get that done and why is it so hard for Python to accomplish this? So this is the simple infrastructure that we have. So let me explain what's really going on here. So in the left upper corner you see Hackats, that's the API that has all the packages. And then there is a script that goes and downloads all of them, calculates the shares and everything and commits that in a repository. So you have a Git repository that's called all cabal hashes and you have all cabal files there. So you can go through all of them and parse them and generate dependency trees and so on, whatever you want to do. And then those hashes, those cabal files are taken and they're built into Stack Edge Nightly and that gives you a view of what builds currently and not and that's a continuous process, of course. And then based on the Stack Edge Nightly, when things kind of look okay, they make this LTS Haskell which you've seen before and that's kind of like, okay, this kind of compiles all together now, let's take those versions. So this is like the Stack Edge and the upstream that Haskell provides. So then we have Hackats.Nix that parses the all cabal hashes repository and the Stack Edge repository and generates Haskell packages.Nix and it generates configuration LTS.Nix. So in Haskell packages.Nix, there is every version of every package specified how you should build it. And this is all generated from the cabal files. It's one-to-one mapping, some features in cabal we don't support, some features we do. There is room for improvement but in general it works. And the configuration LTS basically just says, okay, based on the LTS version and the long list of version dependencies, pick those versions to be the default ones when you use these Haskell packages. So there it's just basically pinning in Nix, okay, take these versions. Because in Hackats packages it will always use the latest version which is, as I've said before, not always means that things will work. All right. So then there are two more files, configuration.-common and configuration.gac.exe. So those are the files that have to be manually, that have to be manually crafted and maintained. And in there, if the cabal file for example doesn't have specified system dependencies, in there we will override and say, okay, for packets HTTP, you know, also take this system dependency and so on. So basically everything that's not in upstream cabal file we will override there. And in configuration.gac we will do that based on the GAC version. So some GAC versions might need different flags or disabled tests because they don't work and so on. So those are the two files that we maintain and everything else is upstream provided by the Haskell community. And then you have this cabal.nix in the middle and this is what the user gets. So when you have your project, you have your cabal file, you say cabal.nix, you run it, and it will generate a Nix expression automatically out of it, specifying all the dependencies. And in there you can say I want a specific LTS version or I want the latest packages or whatever. So this is all, as a user, you just run cabal.nix file and you get basically the whole set of dependencies that you know that they're going to work. And the cabal.nix file has this function called package overrides where you can basically override anything from the upstream. You can say take this package, but different version, take this package, but apply this patch or whatever you want. And then you install this software and then you have binary distributed Haskell pipeline. All right. I hope that was not too fast and it's clear enough. All right. So this is probably the hardest slide, but I would really like to say a few words about the infrastructure in Nix and how these files all work together and it all fits on one slide. It's just not that easy to explain. So basically what we want to do is some kind of inheritance. We have different files and we want those files to override each other. We want this powerful overriding mechanism. So at the top you see a function called fix and that's a fix point. That's how you do recursion in functional language. And it's basically calling itself. It's a recursive function that just calls itself. And how it works is it takes the output and it fits it into the input. And because the language is lazy, it will do that only until you reference something. So as for example, in the middle I define something you would call a dictionary. It's called an attribute ZNX, but it's pretty much the same. And you can say, okay, I have an attribute foo that is the value foo and bar with bar. But the foo bar is actually self.foo and self.bar. But that self is really just the input of this function. It's a lambda function. It gets self as a parameter. But that self is actually the output of itself. So it will actually then reference self.foo and self will be the same actual thing and it will reference the foo and get it back. So it's just recursion and a function, nothing really fancy. And when you call fix point on this function, on this dictionary and you ask the foo bar, you will get the value foo bar back. And you'll just basically call it twice. And this is a way how we do dependency and how you can reference different things. Okay, so now that we have that, we want to have a little bit more of flexibility and we define a function called extend. I won't go into how it works, how it's defined. But if you look at the override, that's the function, that's the API you get. And this override function accepts two things. Self and super and self is the input and super is the output of this dictionary. So you have the power to get the previous configuration file and either references inputs or outputs. So you have both things. So in this case, I say, okay, take the foo, take the output super.foo and reverse that. So if I call then fix extend d and the override, so that means extend the d dictionary and override it with this function, you will see that foo bar value is different because we have reversed the foo. And that gives us the power to override the dictionary at the top, either by inputs or outputs. And if you call it twice, oh, it's not seen here, but you will get foo bar back. So that gives you, that gives all the power to override these files. So, okay, how do we use that? This is then all that you need to combine all these files. You say, first I have a fixed point which takes care of the recursion and then I take all the Haskell packages, the common configuration file you've seen before, the compiler specific config, the packet set config, and then at the bottom all of the overrides where you can hook into. And you can change everything from the upstream how it's built. So in Python currently nicks, we manually edit files. Why? Because of this problem, we have a set of high scripts and you have to run all of those scripts to actually get and figure out what's going on. So someone would need to take that and for everything in Python Packaging Index generates JSON file or something with all this media information that we could then use to generate an automata of this. And we would need to maintain the requirements file global for the whole Python Packaging Index. So these are the two big, big projects that one would need to tackle in order to have the same infrastructure. And then we would be able to build all the, the whole Python Packaging Index basically and distribute it to people. And well, the first one, the first problem is kind of being solved and communities is trying to get there but we still don't have a way to do it today. But the infrastructure is improving. We got wheels, we're getting a new Python Packaging Index called warehouse, which is going to be tested and easily changeable and so on. So everything around is changing but this is still not doable today. And with the build system hook that I talked before we'll be able to have different tools than just setup tools to build Python packages and hopefully one day we'll have a standard one that will be statically based instead of a script that you have to run. And as for the second problem, I don't know currently if anyone is solving that or crowd sourcing the versions but it's definitely something that we'll have to solve it ourselves or someone will have to do it for us. So, so Python is actually doing quite good in the sense that it has all of these things are being worked on and so on. But one thing that's really missing is if you think about it, that it's still not declarative enough. We have so many files that you have to touch. You have to touch the setup py, setup.jpg requirements manifest. Now the py project terminal is coming, talks.ini and it's just a lot of different things you have to set and in Haskell there's just two files, the cabal and the stack. And it's really hard to get rid of these because this is our legacy but it's a lot of information people have to know to actually to use it. And this is improving but it's still an ongoing process. All right. So this talk was based on the Peter Siemens inside our next packages Haskell infrastructure. If you want to see that talk, it goes a little bit into the details, how it all works. And I hope that you've seen what are the current implementations and at the same time I would still like to thank the Python Package during Authority and everyone who's working on improving the ecosystem. It's really hard to have 25 years of legacy and just replace all of this and say, okay, we have this new shiny thing, it's gonna work out. And it's going slow but there's progress. So thank you. So we have time for questions, right? Thank you very much, Daman. Someone wants to ask a question now? Any questions? Okay, thank you for coming.