 Okay, so the next talk is going to be about the Haskell infrastructure. A little backstory is that I discovered Nick's last year because of Rock's enthusiasm. And then I discovered the guy was sitting across the table from me was actually one of the main Nick's people, namely Peter Siemens. So Peter has been working on Nick's for seven years now. And he's gained a lot of fame in the last month because of his big efforts into improving the Haskell infrastructure. And that's what he's going to talk about today. So please welcome everybody, Peter Siemens. Thank you. Okay, the slides for this talk are online right now. So in case you want to read ahead and don't wait until we get to the point, you can do it. I think that right now the Haskell users in Nick's packages are in a fairly happy place, because the packages sort of appear mysterically and nobody really knows where they come from. Nobody has to worry about updating them. They are just there and they are up to date. And I want to look in this talk at the process that gets Nick's packages into that state. How do packages end up in Nick's packages and how does the machinery work that makes this stuff built and makes it work for the user? So I'm mostly interested actually in interesting aspects of the implementation. I'm not going to look at the user side in the sense of how do you install a Haskell package, how do you compose an environment with libraries. I won't cover that at all. That's a different talk. Okay. The Haskell packages, when you're a user of Nick's and you have an installation that involves Haskell packages, then these are the players, the entities that are involved in getting that stuff onto your machine. Obviously a package, I have it as a separate box there, but it's obviously part of the internet like everything else. But what I mean is that you have a package which is the central repository of Haskell packages. So everything that's deemed worthy of being published to a worldwide audience is typically registered on Hackage. And then there is a separate ecosystem of packages that exist on GitHub or Sourceforge or whatever kind of servers people use where they also publish Haskell packages, but they don't necessarily register them on Hackage. And when you have Nick's package, you get packages from both. So you have all of Hackage, all that's registered in the central repository is there, and you get other package in addition to that, which we pull in from other sources. For instance, the Cabal to Nick's utility, which Haskell users typically use, that's not registered on Hackage. That's a tool that's specifically for us. And so there seems to be no point of making it available or advertising its existence to everyone. So the data flow is like this, that packages show up on Hackage, then there is a separate effort which is called Stackage, which is called Stable Hackage. And what these people do is that they take a subset of Hackage and make sure that they pick versions of all the relevant packages that are compatible with each other. Every Haskell developer knows that you can't just update one single package and everything still compiles. That's not going to work because these packages are very intricately interwoven with each other. And if you update one, then this typically requires that you update others because they depend on the newer version. And when you update those, something else will not compile because it hasn't been updated for the new versions yet. And it's a bit of a mess. And the Stackage people, they resolve that for a subset of Hackage and publish essentially a list of package names and versions. And whenever you pick a package in that version from that list, then you know it will work together with everything else on that list. And we take advantage of these efforts to provide a stable user experience. Okay, so we get packages directly from Hackage. We have obviously all of Hackage, not just the packages that are in Stackage. We have the rest too. But from Stackage, we take the additional version information, then we take additional packages from other places in the Internet, and we provide that Enix packages to the user who then has the ability to configure local overrides at his own local packages, add packages from we don't even know about to his installation and use them within Enix all nicely integrated. The process is pretty obvious. So it works like this. A public shows up on Hackage. From there, it goes into this Stackage arena, which has one repository that's called All Cabal Hashes, and that's a Git repository that contains all of Hackage, that's versioned in Git. You have to know that Hackage does not provide that. Hackage does not provide any kind of versioning or anything like that. They have this one tarball that contains everything they have, and there you go. And if you have this All Cabal Hashes repository that continuously downloads this tarball, checks in the differences, and so you have a Git repository where you can actually see a bit of the history of when a package was added, when a package was edited. These kind of things are visible in that Git repository, but they are not visible on Hackage. So it's a service edited on top, but it's essentially just Hackage in a Git repository. Another nice feature that they add is that they download all those tarballs and compute the SHA hashes for those and add them into the repository. So when we generate built instructions, we don't have to download the packages to figure out those hashes, but they are in there already. I'll show you how it looks. So that's the basis that the Stackage people use to do their work, and then every day they do a snapshot, a Stackage nightly snapshot, they call it, where they basically update everything to the latest version that they can, then try to compile that, then they run into errors and see this doesn't compile, this doesn't compile, and then they manually configure, okay, don't use this update yet, or use this update, but specify some special flag or disable the test suit or somehow mess with the build to make it work, and then at the end of that process when everything compiles, comes out this list of names and versions that is a nightly snapshot, which is the most recent stable package set that you can get. Then there is, on the same source, there is another basically family of package sets, which is the long-term support LTS Haskell package sets, and these work like so. They typically, whenever a new compiler version comes out, they start a new LTS package set, and then they call it LTS packages 3.0, and then they pick the latest versions from their nightly build, whatever they have right now, and then continuously, every week they release a new version of that package set, and they include only minor updates that don't break the API. So when a new package comes out that changes the interface of the library, it will never end up in that package set. So you as a user can say, I follow this package set, and I have production software that depends on the contents of those libraries, then you will never get an update that requires you to modify your software in order to build, right? You have a stable API, but if someone fixes a bug or fixes security vulnerability in a way that's not exposed in the API, then you will get that update. So you're not stuck on a fixed version, but you have stable updates, right, which is the idea. Okay, so these two build products, the Stack Edge Nightly and this family of package sets, they all live in Git repositories too, and these are huge JSON files that basically say for every package, this is the version you ought to choose, these are the flags you ought to specify, run the test suite, yes and no, build the HEDOC documentation, yes and no, all the information is in there. And then we have a tool called Hackage to Knicks, which consumes all of this. So it consumes the entire Hackage repository, it consumes the build information from the Nightly snapshot and from the LTS, Haskell snapshots, and then it allocates some 30 gigabytes of memory because it's a Haskell program you're not a serious Haskell program if you don't allocate 30 gigabytes of memory. And then it writes a whole bunch of files into the file system, into the Knicks package repository, which contains these and there are obviously the one on top, right, the Hackage packages.knicks and these configuration files down at the top. So those are generated by Hackage to Knicks, these are automatically generated. The Hackage packages.knicks contains build instructions for every package that we contain. So it's build instructions for every Hackage, almost every Hackage package. Then there is a common configuration which modifies the build instructions in there with fixes so that stuff works that the auto-generating tool doesn't know how to generate it correctly for some reason. Then there is another configuration layer which applies fixes that are required if you're building this package with a particular version of GHC. And then finally, there is the last step which configures the versions that you see in those package sets so that they match the specification from the LTS Haskell package set. Now users, they basically cannot install any of this stuff directly. It's not possible. You can install the executables. That's possible, but if you have a Haskell library, you can't basically say, I installed this library and then you use it. It's not feasible. What you do is you build this GHC with packages environment in which you specify these other libraries I want to have and then Knicks builds you a GHC binary that knows exactly those libraries that you want. And the place where you configure that is this overrides attribute set which is in your config.knicks file in your home directory. So there you choose packages either from the GHC-specific package set or from the LTS-specific package set and then you can optionally pull in additional packages where you generate instructions with Kabal to Knicks automatically and then this whole process ends up in a place where you can install it and then you have pretty much the entire Haskell ecosystem at your fingertips. So Hackage. In the rest of the presentation, I'm basically going through those boxes each one by one, right? So Hackage, it contains over 60,000 Kabal files which group into about 9,000 packages. So this means that on average, every package releases approximately nine versions, seven versions. This box plot shows you that half of all packages lie within this range. So if you take any random package from Hackage, chances are that it will have between two and eight versions released. So there are some packages that are rare that release have 150 releases by now. So they release, I don't know, once a week or something like that. And there are packages that release a new version every couple of months and many never ever release a new version. They have to have at least one otherwise they wouldn't be there, right? So the Git repository that contains Hackage is just a long, long list of directories. The directory always matches the name of the package. In every directory, you have another directory which matches the version number of the release and inside of the directory you have the Kabal file which contains the build instructions and you have a JSON file which is added by Stackage which contains the hashes. And these Kabal files, they look like this. You have this general section which defines, you see it, right? Your home page, the version number, the package name, general information about the package and then the components of the package are defined below that. You can define a library, you can define executables, you can define benchmarks, test suits and they all have the same syntax and the same structure. So typically if you have a library you define what publicly visible modules you expose, what packages you depend on and this here is the part that makes installing Haskell packages so much fun. You can very accurately restrict the versions of your dependencies. So you can say, in this case I want the base library only if it's older than version six. So this is a convention in Haskell ecosystem you can't upload a package to the package unless you configure that. And, okay, but from my point of view why do you do that? Why do you say if it's base version five then, okay compile but if it's version six it doesn't exist, it's not there there is no version six, then don't compile. You have no idea whether version six is actually going to cause any problems or not. But that's reason enough for them to say since we don't know that it will work we'll prevent you from using it just to be sure. And since everybody does that and most people actually don't understand the implications of that people have no idea what versions to choose if you have 10 dependencies and each of those dependencies has four different versions then you have a huge permutation of trees of things that you would have to test to see if it works. And then you have a permutation that reflects reality but obviously you can't do that so what people will do is yeah I'll restrict it to the version I've been using and it's fine. And so when people update the base library or transformers library or whatever then all those builds are going to say I won't compile because you have the wrong version and they would compile just fine if the restriction hadn't been there There is another important thing it's this line here it says that this is a simple build a simple build means that this build configuration is actually entirely declarative so this is a text file there's no code ever being executed and just by parsing that file we know everything about the build that there is to know which is extremely convenient if you want to translate these things there is another build type which is called custom and when you have that build type then basically all bets are off what that means is that you ship an executable program the source code for an executable program called setup.hs and the Haskell build system will compile that and run that to configure the build and do the build so at that point you can declare in the file basically whatever you want the program do whatever it wants and those two things don't have to be related so basically when you have a custom build type there is no way to extract the information about the build from the file that you need that's bad for Nix the good news is that the interface to this kind of custom build system is so complicated and so poorly documented that nobody ever uses it and if they use it they use it for super simple things like I want to add an additional compiler flag on the command line somewhere and that doesn't affect us so it's fine, right? but if they would do something that would affect us like we add additional dependencies or we make the existence of a dependency depend on some property of the system or not then we would have no way of figuring that out so this is the other file which is fairly obvious there are these tar balls which contain the software and then we have a whole bunch of hashes for those and we need them to generate Nix expression which looks like this so this is essentially meta information we could do without that so we can describe that build essentially in those three lines that's all you need obviously we need the package name, the version and we need the hash of the tar ball then we have to specify the dependencies we distinguish between dependencies of the library components of the executable components and of the test components the reason why we do that is that when you run the build and you disable tests then we don't need the dependencies that are required purely for the test suite so we specify them separately and if you run the build without tests then the test dependencies will not be included and they won't be required and a lot of Haskell builds can be fixed by disabling the test suite because often times people update their software to cope with the new versions but they don't run their own test suite and they don't update their own test suite and then we run their test suite and say, yeah you haven't updated the test suite and then we just disable it and it still works but they should probably have a good CI system there are a couple of things in Hackage that make the packaging process very interesting one of the extremely nice features is destructive editing so what you can do is you upload a tar ball to Hackage and that contains the cabal file plus all the source code that's the entire package and then you can go to the website and edit the cabal file and change it and what happens is that but it's edited in place so the version changes so they did that then they realized, gee this changes the hash of our release tar ball this is bad and lots of people would complain about that so what they do is when you download the release tar ball you get the old version and then you look into the cabal file and see that there is this X-minor revision header and if it's there then this tells you that there is a new cabal file which you download extra and replace the existing one with that the good, obviously there is nothing in place that prevents people from including that line in the cabal file when they upload in the first place which just breaks I think they fixed it by now but it broke a lot of software obviously the second thing is that this revision number it doesn't show up anywhere so when you install a package then you go to the website and see okay, the build has been edited in some way then there is no way to tell whether the version that you installed is the new one or the old one for Nix, obviously this is a bit of a nightmare so what we do is that we track this revision number we needed the revision number only to construct a URL for the cabal file we don't use that information otherwise and we have the hash of the new cabal file and when you have a destructive edit taking place then the build lock will say something at the beginning, okay I have a new cabal file and then this is downloaded and replaced and then yeah, then you have in-place editing because just making a new release would have been more difficult I suppose there is another nice feature and this is a particularly extreme example you can have all the items in a cabal file can depend on things and make choices conditionally on the value of those things you can configure your build differently depending on whether you compile with a new or an old version of GHC you can change your behavior on your operating system, right, for instance here you say okay, if I'm a windows build then I won't need that but if I'm not a windows build then I need that library and obviously this makes the interpretation of these files a little less straightforward because when we generate the build extractions the right choice would be to say okay, we pass all those conditional trees and then we do some cool optimizing and filter out things that don't affect us and then we generate build extractions expressions that cover those conditionals, right, we say if I am on system Linux then add these dependencies and if I'm on system Darwin these dependencies and we would basically capture that at nix it's just that doing that is actually difficult so we don't. What we do is we assume all builds run on Linux using the latest GHC then we resolve all those conditionals and these are the build instructions that we generate so this means that we have builds that work fine on Linux but if you run the exact same build on Darwin it's not going to succeed fortunately these cases are rare and so rather than fixing our generator which is hard we edit this level of configuration which I showed you, right, this config common where you can add build dependencies and mess with the build expressions in a manually edited way and we do that for most of the packages that need it and it's not that many fortunately but anyway this is an area where the cabal to nix tool essentially has to be improved that it expresses this kind of intelligence in the build last but not least there is a feature which is perhaps the nastiest of all of them builds can say they can define flags these flags are boolean values a flag is anything it's just a string and then you can have conditionals based on whether this flag is set or not now what this build does, first of all it's important to know that whether this flag is set or not is never visible after the build has run so you've run a build and then you wonder does it have https support and there is no way to tell except for trying an https url and see whether it works but the values of those flags are not visible in the version information there's also no way for Haskell packages to depend on a library saying I want this version of the library this flag enabled it's not possible I can't do it so it's a tricky feature the nice thing about those flags is that the value of those flags users can specify it as a user you can say I want https support and then you enable it but if you don't do it then kabal will guess the value of the flag depending on your build environment so if you have these libraries installed in your environment then kabal will say aha I can use this branch and it will enable https and if you have this library but it's say version 2.1 then it says yeah that's too new and then you don't have https support and also this is something that we can't express in nix because sometimes people use these flags for something that's actually useful they say I don't know and then they enable additional optimizations or something like that so we would like to offer our users the ability to say I want this package and I want it with LLVM so our build instructions would technically have to contain some boolean parameter that's called https support and then as a user you can pass true or false and get the build with the appropriate flag set or disabled and this is also something we currently can't do what we do instead is that we have one global list of flags where we for every package specify the flags that are specified and for most parts we just let kabal guess and see what happens it works good enough okay the next box in this diagram was stackage stackage has about 18% of package covered and when you want your package in stackage you actually have to register with them you have to say this is my email address this is my twitter name this is my github account these are the packages I'm responsible for you have to promise that you will fix build errors within a reasonable amount of time you have to have a backup administrator in case you're on vacation and then once you've registered in the database your packages will be part of stackage and then you're part of a sort of package that updates things and informs the authors if there are problems and then they respond quickly and then you have this stable package set this curated package set stackage runs all these builds in github travis ci which means it's only linux builds so when you get information from stackage that says this package in this version is going to build fine then it's not necessarily going to build fine on a mac next we do have a mac we do have support for that platform so mac users or basically anything other than linux users don't get the full value out of this whole LTS effort because their platforms aren't tested then there is I talked about that already the build products are these nightly snapshots which is the latest possible version so that everything compiles there are LTS minor releases which are released every week and contain updates that don't break the I and the major releases they may break APIs but they are rare we are now at LTS major version 3 so 4 is going to come out together with gcc 7 12 3 I don't know we are not at 12 we are 10 right okay so the code program that does all the automatic stuff these are these two tools kabal tunics and hackage tunics they live on github and actually all the intelligence about how do you generate a build expression what kind of exceptions have to be configured in and so on is included in one library and then both of these executables are just front ends to that library the kabal tunics tool is supposed to be used by users when you say I have a kabal file give me a nix expression that builds it that's what this does the hackage tunics executable on the other hand says give me hackage stackage and then I generate your nix package so it updates the whole package set and people are typically not supposed to use that instead we have this run this update nix packages script that's run automatically and so basically once an hour we take all the new versions all the new information generate a new version of nix package commit it into a separate branch first of all so that there is some testing we have a hydra instance that continuously builds all that stuff and only after we have seen this stuff is really stable and it really works then we merge it into master so basically updates on master they could in theory appear once an hour so in theory if someone uploads a new package we would have it 60 minutes later at most the factor we merge to master I do it manually so whenever I think of it I merge it and it's every two or three days so basically if you're following the master branch of nix package you have an accurate representation of a package that's within two or three days which is I think pretty good okay so now the nix package machinery is interesting in itself I just have to check the time because we're running out of time the package set there is what we have is we have this one attribute set which contains the builds for all the packages that we feature and what you see is this is what it looks like we define for every package an attribute exactly the name of that package and then there is this whole package thing that you probably know how packages are defined in nix the only difference is that here we don't refer to a separate file but instead we have all the expressions in place so there is one file and it's completely self-contained and then we have for every package the latest version which is the one that has this only package name attribute and then we have typically older versions that are required for LTS support and these have their version number added to them at the back so when dependencies are resolved only the packages without this kind of suffix are used so when you say I depend on MTL you will get the latest version and when you say I want to depend on MTL 2.1.3.1 then you have to say that explicitly you get the latest version by default so this package set is implemented as a recursive function and this is a it's a lot of fun when you know about how it works but I suppose it's new for many people the idea is that this package set is a function that function produces a package set and as an argument it gets the package set that it's going to produce the the idea is here is a nice example where you can see how that works you have this package set which gets itself as an argument then it defines those two attributes and then it defines this attribute which refers to these attributes via itself and so when you when you compute a fix point this is the point where the argument and the output are the same then you compute this function when you expand that self argument again and again and again you end up just repeatedly calling PSPSPSPSPS and at some point this package set contains normal reference to self and when it doesn't reference self self is not computed because it's lazy evaluated and then the computation finishes yeah yeah you have this construct in NICS you have this recursive attribute set but the recursive attribute set doesn't work this nicely because it will be clear after this example one thing why we do this, why we structure the package set in this way is that this makes it really convenient to modify the package set in a way that looks like object of oriented inheritance so what you have is you have this function which as arguments takes a recursive package set and it takes another function and this function takes a recursive package set and then it computes changes it's going to make to that package set and returns the result as another recursive package set so when you when you take this example and you compute the fixed point without any customization you will get foo bar as a result but when you apply this function to modify it this function here then you have the function right it has its own output it has its input and this is a change it specifies to be applied to the thing so in here we replace foo reversed and when we apply this we get this output so we have modified the package set if this had been a recursive attribute set in the nix sense then the self argument wouldn't have been necessary right but in this case this would have bound tighter than any modification you could make if you would modify the foo value of that package set you would not modify foo bar because this has already been bound yeah so you can basically whenever you want to modify the package set you can choose between give me the value from above and I modify it and return it as a result or you can say give me the result of another modification so you can refer to basically a series of overrides that happen and you can here in the inner one refer to a result that's going to be here so this is an extremely flexible construct and the package set I'm going to come to an end and the package set this is it that's the actual code we have the Haskell packages this is the file that I showed you which contains all the build expressions to that we apply this common configuration which fixes missing dependencies or which adds other libraries that we want to enable flags during compilation then we extend that with the compiler specific configuration then we extend that with the package set specific configuration that gives you the version information then at the end we extend that with the user configured overrides that you can specify to change it and then we say fix then we get a result where this original package set that we once generated automatically may look completely different it has different versions different build inputs you can do with it whatever you please ok and this stuff are the other three slides I'll save for another talk ok thanks for the talk Peter we have a couple of minutes to understand what the fix point is so should, should don't be shy ok ok I think I already asked you probably a few months ago but how could we reuse this functionality which extends and overrides into other languages as well because I see it as a common piece that if you generate something from whatever stackage or pipi or bundler everything and then you build this set and you need to overwrite a few different layers could we then kind of abstract this and kind of create this more unified way how to approach languages and import them would that be an option technically it's absolutely feasible basically the entire infrastructure that you need for this are those two functions this is the real thing they are not more complicated than this this code is in the package set it's there it can be reused so everybody is free to use that approach and structure the packages in such a way that you have this base package set which is kind of the bare default and then you can have layers of configuration added on top of it and it's not difficult to do the thing is it's effort you have to do it so for instance I also generated the package set for R for the R utility and it's some I don't know 7000 packages in there and it's not structured in this way because at the time when I did that I had no idea about this stuff so these days every time I have to work with the R package set I think man this is a mess and it should really be cleaned up actually I once grabbed packages for fix and const and such functions and there are maybe three or four places where those functions are defined because they are so useful so maybe include them in the live lip yeah yeah so actually I think we should probably use this for Nix packages at the top level because right now Nix packages has a pretty ad hoc override mechanism which is really a undisciplined way of doing this so there is sort of as Nix packages actually defined by passing itself into itself and then some complications for being able to refer to the unoverridden version so yeah this would be much better so we should probably look into how we can do that somebody should do that we should open a GitHub issue sorry it's right, Nicholas has actually taken the effort upon himself Andy? I think there may have been some start already by factoring out the package overrides mechanism which is well it's the same mechanism again or very similar and there have been some changes recently that this well it's factored into a function I think that you can say I want these packages and apply a package overrides and get it as a package set not just in the general way that you edit it somewhere in .Nix package config I'm sorry I have very practical question the older versions of libraries we keep them just for LTS so we can't rely on them being in Nix packages like tomorrow or like rather after the next release of LTS theoretically we could say we drop support for LTS Haskell 0 point something and if we did then packages required only by that version would go away that's true the reality is however that the number of versions we ship is only going up so personally I think that this whole intelligence that selects which package to include and which ones not that should go away entirely we should just say we have every version on package period that's what we should do it's only a matter of what's the implication on the Nix tools in terms of memory requirements parser performance these kind of things we have to figure out but I think that the versions we distribute the number of versions that we distribute is going to go up only it's not going down I think so I actually have two questions like the first question would be what does the stack tool like do you have any thoughts on this the stack utility is a kind of advanced version of install it's a build driver so you can write a very simple json file yaml file where you specify I have this package I have these dependencies and then you just say stack build and it's doing everything automatically it downloads the packages compiles a dependency sets up a sandbox for you it's all very convenient and it's very nice and I use it myself it's a great tool it interacts perfectly with Nix there is no problem if you have if you use Nix to install a compiler in a library environment and stack sees that it needs some of those libraries that you already have installed then it will reuse them and it won't compile them so in theory you could say in your stack file I use LTS version 3.3 and then you configure GHC environment in the LTS 3.3 package set which contains exactly those packages that you need and then you would run stack build and it wouldn't compile anything because it's already there so in a way the two tools complement each other it's I guess the advantage of stack is that it's very ad hoc-ish you don't have to write a Nix expression save it somewhere enter a Nix shell leave a Nix shell you don't have to bother you can just run stack and it works but I think it's if people want to develop with stack and use Nix at the same time it's just working just fine no problem at all thank you and my second question would be when you clone Nix packages what's the percentage of Haskell packages in there I actually wrote an email about that a while ago but I don't recall the numbers it's significant I think we have Haskell packages are something like 12,000 I think at Nix packages as a whole has maybe 15,000 or something like that not including the Haskell packages so it's a large large chunk but in terms of how much space it takes up in a repository I think it's probably even bigger the percentage of Haskell packages because we have packages, synopsis, all this stuff that we take from the compile file which many packages don't have so it's hard to say but it's a fairly good part of the distribution