 Hello and welcome, dear Arch Linux developers, users, people interested in Arch Linux and people not here because of Arch Linux at all. It is my pleasure to welcome you to my presentation about distry, which is a project of mine to research fast Linux package management. My name is Michael Stapletberg. You can follow me on Twitter. And I'm going to do a one minute introduction now, followed by a couple of demo videos illustrating the installation speed of packages on Arch Linux versus distry. I'm going to do a comparison with Arch Linux. And then if you're still willing to stick around and learn more, I'm going to show you how distry works in detail. I have been a Debian developer for seven years from 2012 to 2019, until I finally left Debian because of antique tooling and slow changes. You can click the link on the slide to read my blog post about it. I've been using Arch Linux for about a year now. Previously, I had also tried Fedora and NixOS each for a few months. And in the previous life, I also used Gen2, Ubuntu, and NetBSD. So I've seen a couple of different systems. You might be familiar with my name from my work on the I3 Tiling Window Manager that I started in 2009. But I have since also done a couple of other open source projects, such as Debian Code Search, which is a regular expression search engine for all of the software that is available in Debian. Robust IRC, which is an IRC network without net splits on top of Raft. Go Crazy, which is an application platform for Go programs on the Raspberry Pi and a couple of other smaller projects. But let's talk about distry specifically today. Here on the slide, I'm going to show you the installation of the AC program, which is a tiny pulse script that allows you to search source code conveniently. On Arch Linux, this takes a little under three seconds, which is not too bad. But let me show you how quickly distry does it. Here, you can see that the installation of the AC program finishes in 0.5 seconds. This difference in package installation speed is even more pronounced as we move from the tiny AC program, which is just a pulse script, to the large QMU package, which comes in at many hundred megabytes. On Arch Linux, when the user says that they want to install QMU, it will take a long time until the package manager actually steps out of the way and the QMU package is available. I apologize for the tiny font in this example, but I didn't have time to do a better recording. But you'll still get a sense of what's happening here. The package manager tried to resolve the package dependencies, and now it's downloading all of the dependencies that it thinks it needs until finally it will extract them onto the system, configure them, and only afterwards we can use the QMU program. Now, as a user, this is way too long for me. I would have task switched to something else at this point and switching back and forth between projects is frustrating. Now, I find it kind of telling that I can still talk during this long pause, right? So it's 40 seconds in Arch, and now let's compare the same package in Distry, where when you say you want to use QMU, it will take 3.5 seconds until Distry steps out of the way and you can start using QMU. That seems much better to me. So let's talk through why package installation and updating packages can be so fast in Distry. One important factor is transport compression, meaning the compression algorithm that you use when you transfer the packages from the mirror to the end user's computer. Arch does pretty well here. It switched to Z-Standard in January of 2020, which is also the same algorithm that Distry uses. It's a great algorithm, it decompresses very quickly, and the file sizes are great too. Next, we should ensure that we have a fast connection to the mirror where all of the packages are located. In Distry, we just have one mirror behind a CDN, but in Arch, the users are asked to maintain their own mirror list. Granted, there are tools that make it a little bit easier, but it's just one additional thing to take care of and to do repeatedly. So I'm wondering why can't Arch also default to a content delivery network that's fast everywhere? Even in Debian, that's the status quo. Then in Distry, there are no hooks or triggers at all, which not only allows Distry to download packages using maximum parallelism, but also to install the packages using maximum parallelism. In Arch, it seems that we have neither, so maximum parallelism during downloading might be better in the Git version, but I think it's not yet released. And in Distry, we can also do maximum parallelism when we do installation because there's no package-specific hooks and triggers. Arch is moving from package hooks to Pacman, so package manager-level hooks. For example, for SysUsers, and I'm gonna show you that in a little bit more detail in just a couple of slides, and that's certainly a step in the right direction. So I'm gonna talk more about this later, but the idea is to just entirely remove hooks and triggers so that you can really go maximum parallelism. But then lastly, a crucial architecture difference here is that in Distry, there is no unpacking stage because Distry uses images instead of archives for its packages. This also makes the package installation more robust in Distry. Arch, for example, does not support partial upgrades, meaning that if I wanna install a new program, I not only need to update my package lists beforehand, but now I also need to update my entire system, and that's just very lengthy and a big distraction from what I wanna do in the first place. In Distry, however, the packages depend on the specific transitive closure of dependencies when they were built, so they can always be installed on the system and they will always use exactly those dependencies, so they will always work and there's no need to have the whole system consistent. In Arch, upgrades frequently require manual intervention. Some of these are caused by file conflicts and those are entirely impossible in Distry packages because they use a concept called separate hierarchies. Distry packages are also hermetic, meaning that they're not easily broken by other packages on the system. In terms of debugging experience, installing GDB should be all that a user needs to do. Afterwards, they should be able to generate backtraces and do basic debugging steps whenever a program crashes on them, so imagine that your mail client or your browser just crashes, you should be able to, without having to install or identify any other debug packages, just generate a backtrace. Arch does not yet provide debug info for all of its packages, and Arch also does not yet transparently make available these debug symbols. Luckily, this specific use case of debugging will be solved using debug info D, which is a distribution agnostic debugging information demon. In Distry, this whole thing is solved on the package manager level, meaning that the same experience also works for other use cases, so you could in theory also install packages that provide binary programs on demand. In Distry, not only the debugging experience is great, but also the packaging experience results in very quick feedback, which makes the whole thing more engaging and a more engaging packaging experience results in more contributions to the distribution. In Distry, packaging is also isolated by default from the host system, and that should really be the default, whereas in Arch, package maintainers are asked to do manual change route management. In order to do changes over time to your distribution, I think it is key to adopt the declarative packaging format. In Arch, both the format to specify what should go into the package build and also the formats that are used in the resulting package metadata are custom formats that are not clearly defined anywhere. I don't think they support auto formatting or machines to be able to make edits. In addition to being able to make these automated edits, which allows you to write migration programs for all of the packages in your entire distribution, it would also be good to have a uniform way of reading and writing to these packages. An easy way to accomplish this is to have a mono repo, for example, one single git repository in which all of your package descriptions are contained. Lastly, with declarative packaging, I also mean that you should try to express intense and end states instead of specific mechanisms. And the sysusers transition that I already mentioned is a good example for this. On the left-hand side of the slide, you can see how it was before. Every package has its own set of commands that it needs to run whenever a new user and group should be added, whereas on the right-hand side, you can see that with the new declarative format, it just specifies which user should be present on the system and then in this case, system desysusers does the rest. Now, I'm not saying that this is an example for a great declarative packaging format because I think the specification could be clearer. In this case, we just have these fields that are separated and you need to keep them in the right order. And that's not a great format to work with. So how does this tree work? Let's take a look at the existing distributions package managers and how they fare in terms of installation speed. We already looked at Arch Linux where it takes a little under three seconds to install the EC program. The other distributions come in usually slower, except for Alpine, which is just one second. But then we see NixOS at five seconds, Debbie and at 10 seconds. And in Fedora, we need to wait over half a minute to install a couple of kilobytes of Perl. In Fedora, also the metadata is huge, 114 megabytes. And you can see that the resulting data rates are abysmal. Even Alpine, with its 10 megabytes per second, is not even close to any bottleneck in a modern system. Both my hard disk and my network and also my CPU, of course, are fast enough to process data at much higher rates. So what gives? Why are these package managers so slow? Well, if we look at the two most widely used package formats, we can see that they're both archive formats. In Debian, the .deb files are actually just tar balls that are then again wrapped in a Unix archive. In RPM, the Red Hat package manager used by Fedora and derivatives, we see that we have a CPIO archive with a little bit of metadata around it. In Arch, the situation is the same. We have a tar ball with a little bit of metadata that is then compressed using Z-Standard. The task of these package managers always is to make the package contents available. So if you run, for example, Pac-Man-S NGINX, you would expect this to result in user bin NGINX being available on your computer for you to run. Traditionally, this means that you would need to resolve dependencies, download the archives, extract the archives onto the system, then configure the packages. And all of these steps need to carefully use the F-Sync system call to make IOS safe as possible so that if your battery dies during a system upgrade on your notebook, your system can still start up. Now the question is, how can we go faster? Well, in this tree, we use an append-only package store of immutable images. What does that mean? We use an image format instead of an archive format for distributing our packages. In our case, we use SquashFS, but any format would work. I've only chosen SquashFS because I already had code lying around that can read and write these SquashFS images. Then we mount these images under their own paths, which is a concept that we call separate hierarchies. So for example, on a distry system, NGINX would be available under slash RO slash NGINX dash AMD64 dash 1.14.1. So the fully qualified version number and architecture is always available in the path. This applies to all of the system's packages. So for example, the ZShell package would also be available under RO ZShell 562, et cetera, the rest of the system, however, is laid out as usual. So you have your standard slash ATC slash VAR cache and so on and so on. There are a couple of advantages to this architecture. Most importantly, mounting images is much faster than extracting them onto your system. This results in faster package installation and notably also in faster build environment composition. Because the package store can now be append-only, we can also use unsafe IO. It's not a problem if a package is half installed into the package store. It will only be used if it's fully installed and that's an atomic operation. So unsafe IO is entirely okay now. Because the package contents are immutable, it is no longer possible to screw up your installation, be it by accident or be it by malice. Hermetic packages industry means that when the packages are run, they use the same version of dependencies as when they were built. We accomplished this by using a wrapper script or program that sets, for example, the LD library path, Python path, Perl5lib and other environment variables that programs use to locate their dependencies. Despite being located in separate hierarchies, packages sometimes need to exchange data with each other via directories that have well-known paths. There are many examples for this, but I wanted to list just two on the slide. The man page viewer in the man package will exchange data with the Nginx package by looking up the corresponding page in user share man. The GCCC compiler will exchange data with the libUSB package when you link software against the libUSB by looking into the corresponding header files in user include. The prudent approach to make all of this work is to emulate these well-known paths on the system. For example, industry user include jpeglib.h is a sim link to the fully qualified jpeglib.h location of jpeglib.h. The advantages of separate hierarchies are that it moves conflicts from package installation time to program execution time, which is a good place for users to disambiguate what they mean. So this means that you only need to resolve whether slash bin slash Python should point to Python 2.7 or Python 3 or some other version entirely when assembling slash bin, not when installing programs on your distribution. Now, many distributions actually make Python 2.7 and 3 co-installable at the same time because it's such a lengthy transition, but in industry, all packages are always co-installable. So for example, you could install Zshell in version 5.6.2 and 5.6.3 on the same system at the same time, which also means that partial updates and partial rollbacks are easily possible. Notably, in terms of speeds, this also means that our package manager can be entirely version agnostic, which eliminates a large source of slowness, meaning dependency resolution. There is no need in industry for global metadata and package-specific metadata is enough to be downloaded at package installation time. The immutability means that package contents and also exchange directory contents are read-only. Rarely, we have programs that expect the system to be writable though. For example, GNOME's G-setting schema wants a cache directory in the exchange directory that needs to be writable. I think that such designs need to be improved upstream. In fact, here are three ideals that good caches should strive to implement. Firstly, a good cache is not required. It should always have a fallback to the slow path. You can think of it as recreating the cache in memory and then just using that. Secondly, good caches are transparently created whenever they don't exist yet. And third, wherever possible, good caches should be automatically updated when needed. Of course, in the edge case where it takes just as long to figure out whether to update your cache as it takes to rebuild the cache, that might not be worth it. I've mentioned it before, but let me spell it out very clearly. In this tree, there are no hooks or triggers. Hooks are sometimes also called maintainer scripts, post-installation scripts, scriptlets, et cetera, et cetera, are essentially programs that run after package installation. A trigger, on the other hand, is a program that runs after some other package has been installed. As an example, the mon package on Debian comes with the trigger that whenever another package is installed that contains a mon page, the trigger will update the full text search index for your local mon system. This is doing work at package installation time that might be unnecessary. Maybe I never actually used the full text search feature of my mon page viewer and then all of that work is wasted. Also, hooks and triggers preclude concurrent package installation because they aren't implemented concurrency safe. And lastly, because they're arbitrary code, they can be slow and they can have bugs and we've seen this plenty of times in Debian. The claim that I'm making with this tree is that we can build a fully functioning system without any hooks or triggers and there's two main strategies to accomplish this. The first is that packages declare what they need, which is the same as the sysusers example that I mentioned earlier with declarative packaging. And secondly, whenever the declaration is not the right choice, maybe because the work that is required is very specific to the package itself, we can move the work from package installation time to program execution time. For example, the SSH package comes with an SSH server that needs a host key to function. Instead of requiring a hook at package installation time, we just create the host key in an SSHD wrapper script before the SSHD starts up. Now, you might be wondering, these ideas sound nice in theory, but how practical are they really? The biggest chunk of complexity is certainly the slash arrow mount point industry. Instead of making the kernel manage separate mount points, overlays, unions and other building blocks that we need to fit together for the flexibility that the district architecture requires, we have found it both easier and faster to implement our own fuse file system that provides the slash arrow mount point. Aside from this, you only need to build your packages with the dash dash prefix argument so that they know where they will be installed on the target system. A small number of packages needs to be patched. Some of them have path related issues. For example, some system disservice files come with fully qualified hard coded paths, such as slash user slash spin slash engine X, for example. And these need to be patched to use the fully qualified location within the slash arrow mount point instead. This affects a couple of packages such as GCC, geo object, automate, et cetera, but not too many overall. And then the other class of packages that need to be patched is packages that have deep system integration, such as Drakehood for building your init ramfs. I don't think this applies much to Arch Linux because I'm not aware of such a configuration layer, but I will note for completeness that the removal of hooks from the system might not be for everyone. Some users might actually appreciate the configuration layers, such as debcon, yes, et cetera, that some distributions provide. So to recap, why is this tree so much faster? Well, traditionally you would need to resolve dependencies, download packages, extract the packages, configure them, and use careful fsync calls to make iOS save as possible. Whereas in this tree, you don't need to resolve any dependencies. You can just download images that don't need any extraction, that don't have any configure step, and unsafe IO is totally okay. This tree's approach scales to 12 plus gigabytes per second on a hundred gigabit link that I've tested with using just the standard libraries go net slash HTTP library. In conclusion, I wanna make the following observations. Append-only package stores are more elegant than mutable systems. They have both a simpler design and a result in a faster implementation. Exchange directories make these things seem normal to third-party software. So on this tree, we can actually compile unpackage software manually and we can run closed source binaries no problem. And lastly, all of these ideas are actually practical because live CDs with their read-only environments and cross compilation have paved the way for the district architecture. Let me also state very clearly that my project goals are not to try and build a community or user base and I'm not trying to get you to switch away from Arch Linux to Distree. Instead, Distree merely is my vehicle for Linux distribution research and I do regular proof concept releases so that you can try it out and see how it feels, but it's not for actual end users. But maybe if you're in the audience and you think now that you know some of these pain points, maybe I've motivated you to improve some things. Maybe some of the low-hanging fruits but maybe you also wanna think about where your package manager can go longer term. So if you wanna learn more about Distree, check out distree.org with a one instead of an I. If you have any feedback for this presentation, I would be very happy to receive it. You can leave feedback either by clicking the link on this slide or by scanning the QR code that I have on the slide. And now I just wanna say thank you so much for your attention and if you have any questions, now should be a Q and A session. Well, hi, thanks for that talk. I am Thor and here's Michael. You ready for some questions and answers? Hello everyone, thank you. Yeah, of course I'm ready. Well, I think we got a good bunch of interesting questions and I think I just have to start off by saying thanks for the talk, that was quite interesting. But here's one where multiple people have asked and I have to admit I'm curious about myself. And it's a question, a joint question from Diabanos and Daniel R. Parks wondering, are there any performance implications when mounting a large number of squashFS images simultaneously? Yeah, indeed there are various slowdowns in the Linux kernel when it comes to mounting file systems, especially if you have many of them, which is the main reason why in Distree we use diffuse daemon to transparently and automatically provide one single mount as far as the kernel is concerned. And then we open these squashFS images under the hood. So from that point on, everything is really, really quick and the lazy loading is an additional performance optimization if you will. So when you start your computer, you will not need to mount all of these images only as you start the programs within these images. That's pretty cool, lazy loading. Nice. We have a question from Fox Barron asking, is there a trade-off between the complexity and simplicity when using Distree? Absolutely, that's always the point, right? With Distree, I'm sort of pushing this slider to the very far end, right? The whole idea is that it's a vehicle for Linux distribution research. So very intentionally I'm choosing like the most simple architecture in the hope that the rest of the system can stay equally simple, right? Because sometimes you have it where you remove complexity from one end and then it pops out the other end. But in Distree, I'm relatively pleased with how it has been going. It seems that if you just constrain yourself around the simple architecture, the rest of the system stays largely simple too. Now, of course, different people have different understandings of what is simplicity. Is it, for example, a simple architecture that is not very easy to use or a simplicity? Well, it's simple to do in my day-to-day, but it has a heavy architecture behind it, right? So there's some subjectivity here. I'm not saying Distree is super simple from start to finish, but I am pretty pleased with how simple the key architecture is that I mentioned in the talk. It's pretty cool. There's some additional ones here wondering, and I guess this is quite an interesting area for a lot of people, because especially with how coupled package managers are to the distributions, wondering what some of the differences are and there are multiple questions on this, but there's one here I think we'll start with, which wonders what the differences are between Distree and Nix, or Nix OS, but not the distribution itself, but just the package manager. And would it be possible to implement the performance improvements from Distree in Nix? Yes, it's a great question. I have used Nix OS myself for a little while and here or there. A number of the concepts are similar, which I think is great validation from each of the projects to say, well, the thing that you're doing is great. I'm gonna do the same thing. And then there are a couple of differences. Most notably, while you have much of the declarative packaging, et cetera, it's like everything's machine readable. It's actually evaluated functional expressions in Nix, so that seems pretty cool. But what I found in my day-to-day is that the Nix language itself is not very approachable. Like it's a domain-specific language. I would need to learn it. And then when you're, for example, traveling in a foreign country and you just wanna try this one cool program that you read about in the evening and you can't use it because you're not able to write a package for it. I think that's sort of an additional hurdle. In Distree, the concept is make everything very simple. Many concepts overlap with Nix OS, but very distinctively the sort of functional programming and the sort of Nix's intensity with which it wants to own all of the configuration on your system and generate everything and configure everything through one interface. I think that's actually a mistake. I think in Distree, I've paid attention to there being a very simple language, like a declarative language, key value, not a functional language in which you can express everything, like all of the package build files, et cetera. So they're very approachable. I think it's much, much, much easier to get started in packaging Distree. And also, yeah, I think in general, there are similarities, I think, to get back to the original question, right? The performance improvements that Distree provides, definitely some of them can apply in Nix. I think there are some low-hanging fruit in Nix. One example, when I first did my research on this, I was using Nix M-I to install things and then somebody told me I need to use Nix M-I capital A because that doesn't evaluate everything and by default it did. So I think just switching that around and providing a nicer user interface is a huge opportunity where Nix has can just make the very, very common task of just install a package onto my computer, much, much simpler. If you have something in your package manager that makes it faster, bring it to the users by default, right? Only the defaults matter, nothing else really matters. Like people are always going to be able to tweak something but if I had to pinpoint two significant differences in Nix and Distree, I would say focus is reduced on only the packages, no idea of like configuring your whole system, no functional languages and a much bigger focus on the package experience. I guess that sounds like a quite sensible approach. Do you, in your experience, have you noticed images being smaller in size compared to packages compressed with a C standard? That's a question from Lietto. Yeah, let me just try to disambiguate and then hover the question like that because it doesn't entirely make sense to me. The images that we use are SquashFS' images but we do not actually use SquashFS' compression features. So we compress the images using Z standard on the transport way, much like Arch does, but then there's no additional compression. So I can't compare the two because we don't have any compression at that level. And the DJ Dord perhaps asks, how does Distree perform in terms of storage overhead compared to other package managers? Right, it depends a little bit on how you use it, right? If, for example, you say, well, the Distree feature of having all of the different minor versions of Set Shell on my computer really appeals to me, then you can just amass more and more packages and then your disk will be fuller and fuller with the packages and their dependencies. But in general, like if you just use Distree, like any other distribution where you have sort of a release schedule and then at some point you say, oh yeah, I want to update and have new packages, the storage overhead is not actually large because we do actually deduplicate, right? So if, for example, I'm using LibUSB in one program and then I installed another program that also uses the same version of LibUSB, you will still have it on your computer only once, right? So not, it shouldn't be particularly noticeable then. Exactly. Again, we're going back to the comparison arena. We have, well, I'm not quite sure I pronounce it, but we have a Satefen or Satefen and R-O-T-S-I-X asking if Distree, well, I would imagine they're wondering if Distree shares some similarities with snaps and flat packages, specifically, like could you have images that contain, you know, everything being independent from the system and are there similar security issues? Right. I'm going to make like a couple of small notes here. So I mostly used app image, not so much BlackPak or Snap, but, you know, conceptually, they're all relatively similar. It turns out that some of these actually only focus on bringing applications to people on known and supported base system versions. So for example, I tried running the KeyCad app image file on Distree and it wouldn't work. And then I raised the bug and I was like, well, is your package broken? It's supposed to be hermetic, right? No dependencies on the system. It turns out they actually do require that you have like a relatively recent Ubuntu base system. So the approach that, you know, Snap, Flatpak, App Image, all of these take is very much focused on the applications themselves. So they're not targeting distributions. They're not targeting distribution package managers. They're targeting like, you know, third-party software companies and stuff like that. And I think it's great that they exist and I used them from time to time. But with Distree, I really wanted to say, well, this approach of like, you know, putting everything in these images, does it only work for applications or could it work for every single package on your system? And I think the answer that I got in Distree is resounding, yes, that it does work, even if you put every single distribution package into such an image. Now regarding the security issues, I'm not entirely sure what you're referencing, but I'm guessing that, you know, whenever the difference between essentially shipping a static version or static dependency closure of your program comes up, people are like, well, but then I need to rebuild it if a library changes, right? The famous example being, well, there's a bug in OpenSSL. Do you need to rebuild all of your software or is it sufficient to just install a new package? In Distree, absolutely you would need to rebuild everything, but that is considered a feature, right? There's always trade-offs between, you know, having sort of a stable dependency closure. It means everything keeps working, but then when you want everything to change, yeah, that's where you need to do a little bit more effort. That being said, given that, you know, updating all of your packages is very simple and very quick industry. It's not actually a huge deal. I guess a follow-up question, just from being myself, there would be, you know, to what extent would you even need to rebuild? Well, I mean, it depends, these would change. You'd have to rebuild, but some, if it's a shared dynamic library, you might just be able to simply swap out which other image you're dependent on. Yeah, you could, in theory, right? But then you might risk subtle break. Like, if you say, well, this is ABI compatible, but then it wasn't, right, it always happened. Yeah, I guess it's a risky territory to do that. Better to stay safe. We have a question from, ooh, let's see if we can get this one right. Newsea Franjiballoon, wondering, and I guess this is a speculative question or maybe a hopeful question, wondering what the chances are or whether or not Pac-Man will implement histories, some of history's features or architecture. Right, well, I have no plans, right? But like the whole idea of me speaking at Linux conferences is to bring these ideas to more people. If it's not the ideas that I have that you would be able to implement, maybe it's still the kind of way of thinking about things, right? Like, I'm putting a benchmark here. I'm saying package installation can be really, really fast. Look, try it out, see how fast it really can be and then compare Pac-Man to it, right? So maybe some people will be motivated to tweak the defaults for this and that. We've already covered that parallel downloading is on its way to end users in Arch Linux in Pac-Man, which is great. So maybe it helps rally people onto that or some other small features here and there or maybe even the larger feature. I don't know what the Pac-Man development plans are, but maybe the developers can at least think about some of these things, like what could you do in the space? We've got a question from Kieto, asking if there are or rather reproducible builds will be better to apply with this kind of installation system? I mean, builds should definitely be easier to reproduce in the system. I'm not 100% sure what you mean by apply. If you just mean like installed and I don't think there's any difference. We don't have any special support for checking that anything is reproducible right now. But yeah, I think in terms of architecture, it's well positioned for any work that is related to reproducible builds, but we haven't done anything in that space because it's not the focus of my research. I have a question from Orhun, wondering how flexible the packaging process is and how much is the overhead cost with different scenarios between like an Electro-Investor app versus a simple command line app. I'm not sure if it's related to exactly like the text proto format, but... Right, so there's always an escape hatch, right? Industry packaging, you can just specify well and now at this point run this particular shell command. Of course, then you lose a number of the benefits, right? So the idea behind the declarative packaging really is that you never need to use the escape hatch of putting arbitrary commands and using full flexibility of your packaging process. And the benefit you get is that it's much easier to make sweeping changes where you update all of the packages with just a code change instead of having to update each individual package and then updating commands for best practices and stuff like that. In terms of the overhead cost, I don't specifically know why it would be different between simple command line apps and electron desktop apps. Like from the point of view of a package manager, those are pretty much the same, right? They come with data files and executable files. And yeah, I think it would be linear. There's no big difference here. Well, another question from Diabonus asking, how do you handle things like generating the InhydramFS that needs to be done after package installation on the user system, which is conventionally done using hooks? Yeah. So for the InhydramFS in particular, it's kind of a special case, right? Because as I mentioned in the talk, there are some cases where the model of district where we just have no hooks or triggers doesn't entirely go far enough. And that's whenever you have something that you need to install outside of your file system. So in this case, InhydramFS goes on to like the unencrypted hard disk partition that is available when you boot and not into your systems, filenamespace. So in this tree, the package manager knows that that needs to be done, right? It knows, well, I'm running on a Linux system so I need a kernel always installed and I need an InhydramFS and it can then do that. The point that I'm trying to make here is not that the functionality that hooks provide is like worthless or should be done manually or something like that. No, I'm trying to say that the packages should tell the package manager at a programmatic level what they need. For example, in this case, the package could say that it affects the InhydramFS somehow, though speculative and not sure if that even needs to be done. But yeah, the idea is that the package manager has all the knowledge and can then do like a very optimized strategy of applying all of the changes on your system. And I guess it's kind of related to it as well, but it's a question from Zvyn asking how do you set up a base system with this approach? And I guess one of the examples that brings out here is the file system package in Arch and say, asking if there are no dependencies how are base requirements such as having a TTY guaranteed? Right. I think this might actually be a little misunderstanding here because there are dependencies in this tree, right? It's just they're always fully qualified. So there's never like any conflicts between them and you don't need to resolve them at installation time, but you're still specified dependencies at build time and then they do get encoded with the fully transitive applied version numbers. So for example, if you install this tree on a fresh computer, it would actually indeed install a package called base and then there's different flavors of that. So for example, you choose to install base-x11 and that depends on base and that also depends on the XORG server just like you're used to, right? It's just a little bit more static in that it depends on a fully qualified version of the XORG server, right? So whenever you need to update something, you would also need to install an updated version of the base package. At least in that sense, you entirely avoid the problem of partial upgrades. Yes. Well, I'm not sure which choice exactly would be, we're talking about here, but Archcom 46 wonders if Arch might not be the first choice, which distributions and use cases would qualify best? And I'm not sure if that's for the package manager? Yeah, I have to say, I don't fully understand the question. So whoever asked it, maybe you can use the remaining time to follow up and clarify your question. I'm not sure what you mean by first choice. I mean, for what? I didn't ever say that I wanted to use Arch for something and it wasn't good enough. In fact, I'm actually quite happy right now. So I don't know what you think qualifies for what, maybe clarify, thanks. Name a question here from U1106 asking, what about a libc update? Isn't that a full reinstall? Or is it more than that? Because there are many copies of libc? Yeah, there shouldn't be too many copies of libc, right? In this particular case, let me try to clarify with an example. In the first release of district, I think we were on libc 26 or so. And then only for the next big release, like a year later, did I package a new version of libc. Now, of course, if you have like a more active maintenance of the libc package, then maybe you would actually have a couple of copies, but is it a full reinstall? Yes and no. There's not like, not everything actually depends on libc, right? There's plenty of programs that are linked statically enough, like in Go or OCaml or whatever. But yeah, you're right. It would definitely bring in a bunch of packages. Libc is very, very core at the strongly connected set of dependencies in the middle of a Linux distribution. But so are many others actually. I think we have an update to the previous question. Yeah, but they're still typing. So maybe we use the next one and jump back. We'll get back to that a little bit later. So here's a scenario from ROT6, wondering, let's imagine we have two versions of SSHD installed. Can we have those two versions running simultaneously? And how do we decide which version is preferred? Yeah, it's a great question. Yes, you could have these running simultaneously. Which version is preferred? Well, that's sort of up to you at a certain level, right? There's multiple places where you can do disambiguation industry. For example, if you used the convenient symbolic links in slash bin, you would see that there is just one link in slash bin for SSHD. And it would point to the most recent version that you have on your system. However, if you say, well, I wanna run the old SSHD from a terminal for debugging something, then you could just use the fully qualified path and then you would run that precise version. Now, if you said, well, I want my whole system to be on the old version, whenever I restart, I don't wanna just run it from a terminal. The place to do that would be in your system, the service files. So wherever you make the choice, be it on your terminal or in your service files or in your shell scripts or whatever else you have that runs programs, that's where you need to say which one you want. And if you don't specify anything, the most recent version is the default. Good question here from Sarah Volt asking, how do you feel about large runtimes that could be shared? Just the sort of, I guess, creating runtimes that you just don't care about the space usage? Yeah, I think large runtimes that could be shared, they will be shared industry, right? If they're here at the same version, you spend multiple packages and it will be shared. If you're implying that large runtimes make the problem more pronounced of having multiple versions, I would say true. If people don't care about space and I think it's a legit approach with the very large drives that we have in many, many computer environments these days, that's fine. If people do care about space a lot, they can just run a garbage collect more often, like whenever they update their system, ensure that the old versions of the large runtimes that are no longer referenced or strictly necessary, ensure that they're pruned from your system by doing a garbage collect. As one from the DJ Dorn, I think we can simplify it to say, does distry images, do they include all optional dependencies or do they just include the required dependencies? Right, yes. Great question, because I have a blog post stating that optional dependencies do not work. They're always broken, they cost so much more effort on distribution package maintainers. I think optional dependencies should just be avoided. There are obviously a couple of cases where you want some sort of flexibility, like in plug-in systems where you have dynamic loading and stuff, but in general, distry does not have the concept of an optional dependency or required dependency. We say everything is required, and if you have like plug-in system, then build it outside of the dependency mechanism. A similar question, we touched upon it earlier with regards to storage overhead. I'm not sure if there's anything too much to mention there, but have you had any experiences with, have you measured and looked at the storage overhead compared to other package managers? Yeah, I think it might be important at this point because people keep asking about overhead, like to clarify, like what sort of overhead are you thinking about, right? Because essentially, when you install a package, you say you're interested in its contents and then afterwards the contents are on your disk. So is the overhead the extra bytes that the Squashifus container needs? If so, Squashifus is very efficient. Is the overhead because we only build packages in the default configuration without paying attention to storage space so much? Then yeah, distry packages are a little bit larger because we didn't care to optimize them yet. But in general, or maybe you're thinking about metadata about packages, like package lists and archives and stuff like that. In that case, I can reassure you that the overhead is intentionally very, very small. Like in this street, there are no package lists that you would need to have. So whenever you install a package, only the relevant metadata about that particular package is fetched. So there's no list for package lists and maybe that makes the overhead come out favorable. I don't know. Let me know if you want this to be clarified. I think we have time for a couple of more questions. One here from Sean K. H. Liao. I'm sorry, but you're the pronunciation. But the question is, how do you deal with global configuration files such as things that exist in slash ETC? Yeah, currently, just like you would on any other Linux distribution, people have been interested. Actually, there has been an academic work, I believe it was a bachelor's thesis, but I think it's not yet published where somebody had actually built a snapshotting mechanism for the particular case of slash Etsy and others based on top of distry. So it's not something that distry in and of itself addresses or deals with, but for example, you could use Etsy Keeper on top of distry or Ansible or the snapshotting thing somebody built, I don't know. Whatever you like to manage your config files in slash Etsy, but for distry, it's intentionally out of scope. Similarly to how I mentioned earlier that in NixOS, you have the possibility to configure a lot of packages on your system. In distry, we say, well, whatever the package wants to provide in slash Etsy will copy it out of the package if it doesn't exist. But if it's there, it's yours to update, right? That's pretty much how it works in many others as well. Similar to the Pack New Files webinar. And I think we have time for one more before we'll go back to the publication question from earlier. And I guess this is an interesting one depending on what sort of network access you have. Fred Marcus wonders, isn't resolving dependencies faster than re-downloading dependencies for every package if they're going to be part of the images? All right, yeah. We don't actually re-download depths. Like the things that are already installed on your system, we skip them of course, right? So maybe that already answers the question. Like, you know, when, or to be precise, you know, when you install a package in distry, it goes and fetches the package specific metadata, then it looks what it needs to fetch out of that. And then it fetches only that, right? So maybe that was where you were thinking a little bit differently. And then we can go back to the clarification question. And I guess the person is pointing out that you mentioned that Arch wouldn't be the first choice to implement distry on. But I guess the better question is, what distribution do you think would be a good candidate for using distry if you were to replace the package? Yeah, this is actually a great question that I haven't thought about like a whole lot. I have thought about it off and on. It's a hard question for sure, right? Like how do you get an existing Linux distribution into the distry model? You know, the reason why I did distry aside from any other established distribution is so that I have the full freedom and flexibility. I think the more flexible your developer team is, the better. So for example, in Debian, you have many, many developers. You have the general model of achieving consensus. So it's slow to move. I think Arch is a little faster to move, but maybe has a little bit less manpower. So, you know, there's always questions around that. And I think it's more of a process and a community question than really a technical question. Because as I mentioned, the only changes you really need to make on like the package level is each rebuild with the appropriate dash dash prefix argument, then of course you need to add like the fuse demon or some replacement for it and a package manager binary. That's pretty much all there is, right? So it's like, you know, technically thinking about it. It's not so hard. It's not so many steps, but it is a gigantic transition for execution on its own, right? Like you would need to replace all packages. I think that's where the trouble is. Yeah, I imagine that's quite in addition to all the infrastructure as well. Exactly. Well, I think that's all we had time for. Thanks so much for watching you guys at home. And thank you so much, Michael, for answering all these questions. No problem at all. Thanks so much.