 I know almost everybody in this room, but in case somebody doesn't know me, my name is Zbyshek. I work in Red Hat in the Plumper's team on SystemD and I'll be talking about stuff that happened in the last year. I submitted a version of this talk for DevConf a month and a half ago and I gave the talk, I think it wasn't terrible, but I also didn't get much feedback from the audience and I asked somebody from our team, Lukash Nikrin, I don't know, probably you know him too. And he said that, well, it was a good talk, but maybe like if you're doing large-scale infrastructure and because I was talking about immutable images and PCRs and signatures and stuff, and verification and stuff like that, and if you're an individual contributor, this is not terribly useful. So I mean, there are usually second chances in live, but I guess in conference talks, you do get a second chance every once in a while, so this time I will try to do better and focus on end user features in SystemD. So yeah, if you care about the large-scale stuff, I'll try to stay away from those topics this time. So a few days ago, we released SystemD 254 and we tried to do regular releases. Our goal is to make six releases a year. We average at 2.5, so there's still some room for improvement. I think compared to, I don't know, five years ago, we increased the quality quite a bit. We do go for a bunch of RC releases, so we kind of copied the kernel workflow. We make an RC and when the RC is made, we stop accepting major features in general. We try to fix bugs until any known regressions in the last release have been removed. And this means that between the first RC and the final release, we block, and sometimes this takes quite a while. Occasionally, we just revert things if we cannot fix them. So 254 had three RCs and we plan to do another release this year. Another thing that has been, I think, improving the quality and the cross-district collaboration is point releases. So as soon as any given version is released, we create a stable branch for it and start pulling in backports of commits. And I think we are at, like, 251.18 right now. And every year we make more of those point releases. We have made 29 in 2023 so far. And if we keep up this space, it will be, like, 50 this year rounding up. But most distros have switched to building from stable races. The stable races started out as the patch set that was used in Fedora, and then we added tags to it. And I mean, I think it's good that other distros are reusing this work and actually contributing to it now. And the number of open issues in GitHub, which is used by system. The upstream is growing, well, 10% per year or 12% per year, it's quite a bit, but it's also not terrible. I mean, I think as long as the project is alive, this number will have to grow. There's just no way that we will be able to close more issues that are opened. But in Fedora this has happened. I think this is pretty nice. So a bunch of people made an effort to clean up the queue in Fedora. David Targon knew Watanabe. And other people worked on just going for the bugs, closing out some. So approximately over the last year we have removed, well, closed half of the bugs in Fedora. And especially like the RFEs where they resolve or move upstream, I think that's good. So let me talk about some specific features. So system D is big on synchronous operations. You request a service to be started, system D wants to know when the service has started bringing itself up and wants to know when it's actually ready to serve requests. And things have been like this since the beginning. And the easiest, nicest way to implement this is with type equals notify. I mean, there are other protocols. But if you, issuing of notifications using the SD notify library or the system D notify helper from shell scripts or remodeling the whole thing in your own code if you want to, it's trivial because it's just a text string that is sent over a socket. And this works nicely. We made it slightly easier to use from shell scripts in the latest release because there's a new exec option so that notify sends a notification and exec something, which means that if you're doing it from a shell script, it's a bit easier. But one gap that we had was reloads. So it is traditional for Unix demons to do a reload after getting a SIG hub signal or another signal. But this is asynchronous. And if you wanted to do it the right way, you actually couldn't use a signal because there would be no notification going the other way. So the recommended way was to implement your own binary and make it send a signal, communicate with the demon, do a reload and communicate back. And this was very annoying. And we figured out it's actually quite easy to do it properly by sending a signal and then having the binary send a notification the other way. The problem is that this cannot be introduced in a backwards compatible fashion by default because actually the demon needs to send a notification bug. So there's a new type called notify reload. And systemd will send a SIG hub on its own, but the demon needs to send it back a notification and now we have asynchronous reloads. So in general, for backwards compatibility, all the types that have been there for services are still there, but the recommended ways to do things have changed. So for services which don't have a startup phase but just are ready when they are started, don't use type simple, use type exec. So the difference is with type simple, systemd forks and then one of the children doesn't exec, but as soon as the fork happens, systemd considers the service to be started. In hindsight, this is not very useful because the binary might fail. And now in type exec, the point of readiness is not the fork but the exec that happens in the child. This is much more useful. If you use type forking, which follows the traditional Unix protocol that you do a fork and then another fork to detach from the parent with systemd, this is all useless. Just don't do any forking or exec, use type equals notify or type equals debus. So you have a debus service then the point of readiness is where the child acquires the debus name. So this is the type equals debus or when it sends a notification type equals notify. So actually you can do one better and switch over to type notify reload because if your service supports reloads and type one shot is still okay. Another kind of useful but not very well known thing is dependency called upholds. So we have once dependencies where you start a service and this service pulls in a bunch of stuff that is needed for it to function. But this happens once it starts. And then if those dependencies die, crash or are stopped, then the wants or requires dependencies have no effect. And upholds is like a variant of this which is effective for the lifetime of the upholding service and the dependency will be restarted. So a classical example is where you have a container or machine that has an Apache HDD service which also requires a database to actually provide any answers. And one is not useful without the other. So you make the top level service have upholds dependencies on all the other ones that are necessary for it to function. And then system D will make the best to restart the dependencies as long as they are needed. And now it's easier to do this. There's a new dot upholds directory. It's like you have dot ones and dot requires and dot upholds. And you can create those simlings in those directories at install time using the upheld by directive. And there is a bunch of new unit settings. So open file is just kind of a convenience thing where system D will open an arbitrary file for input or output in read mode or write mode from a service. And the nice thing is that this allows the service to be less privileged. So you can, I don't know, if the service needs to read a certificate from a file for example or whatever or write to a file in the file system somewhere, we can make this service less privileged, have the manager open the file for the service and then the service just gets a socket, not a file descriptor. This uses the same protocol that socket activation uses. So you get a file descriptor and the file descriptor and the variable that describes the file descriptor gives it a name and then you can figure out what the file descriptor is for. Another kind of convenience thing is delegate subgroup. So the kernel requires that if you have a C-group hierarchy, the process and part of this hierarchy is managed by, it is delegated to a unit and there is a process that does the management. It cannot live in the, like in the, at the top of the sub-character because the kernel does not allow processes to be in non-live C-groups. So the process would have to create a sub-character, move itself and then do the management. This was a bit of extra work that is not necessary. So with delegate subgroup system, you will do this initial setup and move the process into the right subgroup. And another thing that is kind of touching on the stuff that I wasn't supposed to talk about, so extension directories, you can give a list of directories that will be used as overlays on the host file system for the service. So by specifying, just giving a name of a directory here with a bunch of files in it, you can populate the file system that is seen by the service. I mean, it's nice for, well, extending things. There are other ways to do it, but this one is like very, very convenient. And in general, like for various, not those settings that I'm talking about here, but for security-related setting and sandbox-accurating-related settings, there's always good to use system de-analyzed security in case you haven't seen how this works. So system de-analyzed security and the service name, sorry for this, is this clear, large enough? Maybe I should do it like this. So this gives a list of settings, names, and it's adding apples, oranges, and cherries by giving some numerical score. And at the end, it says that the service is unsafe, usually. It shouldn't be taken too seriously, and of course, the service could be perfectly safe. The unsafe means that it's not using the system de-features that system de, well, wants to advertise here. But this list, it's very useful to think about different ways to sandbox a service and to, I mean, just, it works as a checklist and a source of hints. I mean, it's, this has been around for a few years, but still most units don't use it, and it would be really good if people were putting more, more sandboxing into the system. I mean, the kernel provides nice sandboxing features, and we should make use of them. Another feature that is new is Soft Reboot, and I wanted to make a demo of this. So I have a, can I turn this off somehow? Does this work? Let's see. I have a virtual machine here, and I do a Soft Reboot. So with Soft Reboot, the system de has three levels of restarting the machine. You have the traditional mode where system de starts on all the services, and Reboot goes through the firmware and the bootloader and the new kernel. We have Kexec, where we load the new kernel into memory and tell the kernel, well, system de shuts down all the services and tells the kernel to execute into the new kernel, and the new kernel starts a new system de, and new set of processes. And we have now Soft Reboot, which works in this way, that we skip all those extra steps. System de shuts down the whole user space, executes itself, and starts up the user space again. So if I do this, well, it's restarted, and the main benefit is speed, essentially, right? I mean, this is for the cases where you just don't need a full reboot. And I wanted to show that if we look at the list of processes, and I will do this not very nice way of just looking at, if I want everything that does not have a bracket, does this work? I tried this before, grep does not, please help me. I want to have everything that does not match a bracket. Bracket minus V, bracket. This should work, no? Ah, okay. So I have process one, and a bunch of other processes. And if I do Soft Reboot, where did it go? If I do the same thing again, I mean, one is still there, but I get a whole new set. I wanted to show this, I thought it would be cool. Because it also answers the question that people sometimes have, like what gets to survive, and actually pretty much everything goes away. This process itself is re-executed, so it's running new code. So this is also essentially replaced, just the number stays. Okay, and there's also a bunch of helper thingies, so system CTL list paths does what the name suggests, list path units. We have the same one for auto mounts, it's just a convenience thing. Actually there's a bunch of those, they get added every year or two, and they just list the specific unit types in a nice way. And I was talking about reboots, right? The proper way to, and not reboots, reloads of services and restarts of services. The nicest way to do restarts is when you don't close the file descriptors so that any service that connects to the sockets or pipes that your service has opened doesn't get a refusal, but it's just delayed a bit while the services are starting. And we have the notification protocol for it, right? You call SD notify and you attach a socket to the notification and you tell system D to keep the socket for you while you are restarting. And sometimes it's going to be a bit hard to figure out what is going on, like which sockets are which. So there's a new thing called system D analyze FD store, and again I will do a demo. I will do it on the laptop, because system D, I want to have a nice example. It requires privileges because, I don't know, like if you were able to look at the service and see what file descriptors are stored, this would possibly give away too much information, so it's a privilege operation. And this is an example for login D. You can see that, well, we have all the file descriptors, but they also have names so that it's easier to figure out for the service what those file descriptors are doing, what they are for. And there's a bunch of settings related to this, so file descriptor store preserve allows file descriptors to not be cleaned up immediately by system D when the service stops. So you can have like a semi-permanent thing that is kept by system D. And this of course creates the problem that if, I mean, sometimes you want to get rid of the file descriptor and well, there is now a command system CTL clean to get rid of those file descriptors that survived the service going away. And SD Notify has additional switches to send file descriptors with a name. I messed up the rendering here, of course, this should be a double dash. And another debugging feature is system D analyze malloc. So again, system D analyze, yeah, it also requires privileges. So this works over with D-Bus. It sends a request to a service, get malloc info, and well, it requires privileges. And what this is just a dump from a function provided by GPC to get information about allocations. And well, the idea is that various services will implement this protocol and allow you to, well, see what they're using memory for and so on. So I mean, this was for the system manager and for the user manager, there is some different answer. A bunch of system D services implement the protocol now, not all. I know it can be useful. Another one is you'd have added verify. So again, a demo. When we added this, so this was implemented in this release and when it was added, we found a bunch of bugs in our own, well, bugs, let's, I don't know how serious they should be, in our own rules and we fix them all. So those are the ones from the distro that remain and it's like, I know, like white space issues. I mean, like tokens that are run together and then there's NFS that does some very strange thing because it creates a package file that is not world readable, which is a violation of the packaging guidelines. And then there is some like minor other things. We should probably get some kind of like an RPM lint script for this, but I think it would be nice. And of course, this also finds more serious issues like syntax that is actually, that seems to be doing something but is actually not useful, for example, invalid option names and so on. And another like thing to make the use of system denizer is more edit the verbs. So we had system CTL cut and system CTL edit on unit files for a few years. And now we have the same for machine CTL and network CTL. So machine CTL is the configuration files are for N-spawn. So if you don't use N-spawn, this is not useful for you. Network CTL is for network D, net dev and network files, but also for link files, which are used by Udev. So this is a bit confusing. And another thing that has been changed, let me try this, the network, can I edit this work? I wanted to show that, let me try this. So now I'm editing a unit file and this is not very, this has been around for a while, but we keep improving it. So basically you get an editor that creates an override file. So the file name is, well, I mean, without the temporary prefixes here. But to make it easier, you see the preview of the existing contents. And also, the cursor is opened automatically in the right place where you would edit things. So I don't know, like add something and override something important and, well. So editing requires privileges, but of course, dumping of the file does not require privileges. And I don't know, like in related also, I don't know, this is also a sim link. So I can click on this and get it, this opened in the editor. Okay, so cut and edit for more files. And like another new feature or a group of features is support for better booting or better not booting of a machine when the power is low. So basically the idea is that when we are in the unit ID and before we have mounted file systems and before there is a potential to have the file systems mounted and data lost, we will do a check. So there is a STEMDAC power implements a check of the battery that, I mean, dash dash low is checks that it's below 5%. Or more precisely, it checks that the system is running has at least one battery that is discharging and no batteries are above 5%. So getting the check right took like, I don't know, 25 iterations, but I think we're there. Because there are, you have batteries, you have batteries for which the kernel doesn't know the charge level, you have batteries which are there but are not discharging and you have systems with no power supply and so on and so on. And there is a new system debattery check program that runs in the interd. It checks the battery. If it's below 5%, it says your battery is below 5% plugging a charger and if you don't do it in some very small amount of time, it shuts down the machine. So we will see, I mean, how it works in practice, but the idea is that this is better than booting to the system and then dying very soon after. It can be disabled on the kernel command line. And there's also a new way that we handle hibernation because the problem with hibernation is that, I mean, it's easy to, you pick a swap partition, you write the memory contents to it and the machine shuts off. But figuring out when you boot from which of the swap partitions to read the state can be complicated and also people use swap files and then, I mean, with partitions it's easier with swap files, it's even more complicated. So the traditional way was to put this information on the kernel command line but this can be out of date and then it becomes messy. So the new idea is to write information about where, which swap file or swap partition was used to an FV variable. So SystemDsleep creates a hibernate location variable that describes the idea of the device of the partition and an offset and some additional details and then when SystemD is booting up it will look for this variable and use this to, well, resume from hibernation. Hopefully this will fix the problem for people who have, well, I mean, for whom the previous approach didn't work. And I think I have some more time so the last kind of thing or group of things I want to talk about is SystemD Repart. So this is a screenshot from SystemD Repart running on some machine and SystemD Repart is a partitioning tool that works in this way that there is a bunch of config files that specify what partitions are expected. It goes through the config files, takes a config file, looks for the given partition, if it's there and it has the right size and so on then nothing happens. If it's not there it will be created or marked to be created and if it's there but it's, for example, too small it will be enlarged and so on and so Repart is pretty nice and a lot of work has gone into it in this release so it has the following features. So the partitions are created in an atomic way which means that first Repart opens the block device, uses a loopback device to get access to where a partition will be, writes the contents of that partition or more than one and at the very end after everything has been written it creates the partition table or adjusts the partition table so either you get a partition with the file system or you don't and also it knows how to create partitions with a file system with contents so it knows in the sense that it just involves the file system creation tools in a mode where they write files to the file system that is being created. This needs to be supported by various file system creation tools but the major ones do this and this is very nice because you get an atomic creation of a file system that is already populated and the new part is that system the Repart now doesn't require privileges so before it used the kernel loopback device to get access to the right place in the file but this has the problem that if you are fully unprivileged or you are in a container this is not available and now it has been well fixed changed to not use a loopback device and it has like support I mean primarily in principle it supports any file system but the ones that for example allow writing contents when the file system is created work better it has also support for the invariity and so on and well it's fast and it's important so basically the idea is that it runs on every like on every boot on and normally doesn't do anything but you can add in additional droppings and it will create partitions it supports minimization of the of the file system that is being created which is important if you're creating a file system images and another thing that that is new is that it works a lot of work has been put into system the repart another system the tools to support operating on a changer directory from the outside this is I mean some of them did support that already but not all and I think it now pretty much all support and this means that system the repart can be is used for the next iteration of MKSI so MKSI is like the system the image creation tool it started out as a tool to test system D in VMs but well has grown into quite a useful thing and so it has like a declarative configuration that lists a set of packages and because we added complexity to other things so for example to repart we were able to make MKSI simpler so basically MKSI had this whole understanding of partitions and how to create well which partitions and with which size and so on and this all has been removed it now just has a directory where you put in configuration for repart and repart is called to create the partitions so MKSI creates a temporary directory uses DNF to or some other package manager it also supports DNF 5 in case you're wondering to put files into this temporary directory and then tells repart to take those contents and put them into partitions in an image file and this is like I mean a much different way of doing this than we did before but I think it's I mean it has nice advantages in particular there's like this separation of concerns and MKSI itself is much simpler so okay so it has a declarative configuration like everything in system D it operates on package names and this also means like you can specify anything that DNF will understand you can for other distributions you can specify other specifications so I don't know like versions and package names with version bounds and if you want to add stuff to the image generally the best way is to have a skeleton tree that will be just dropped into the right place and well it uses other system D tools so it supports read-only images and signatures and the invariant and so on and for like for reproducibility we are still not there with full reproducibility but at least we're making logs and manifests of what is installed into the images and I mentioned DNF and DNF5 but the nice thing is that MKSI uses well can support pretty much any distribution that has like just a bunch of packages so it supports apt Pacman for Arch DNF and zipper for for RPM based distros and it also supports Gen2 so basically if for example on Fedora if you have Pacman and apt installed and the binaries you can create images of any other distribution without I mean just just it feels like native support and this is very nice and unfortunately this whole rework has required a huge compact break between the last release which is like I know a year and a half old at this point and the the upcoming version and since we're I mean we're doing so many changes that we broke compatibility and the release is still delayed one thing is that in the previous version there were there was like a set of stages and it was kind of fixed like you build and then you install and then you test and different people wanted different things so this has been replaced by something called profiles where you there's just a set of stages and the next stage or next profile can use previous profiles and the name is a bit it fits some of the use cases but maybe not the others and like like I mentioned the a lot of the heavy lifting have been moved to system theory part and this means that MKSI now does it can create images without any privileges so this it works in a container it's also faster and another thing that we are kind of getting rid of is that before there was always this mismatch of what happens of some things that you want to do when you're creating an image you want to do from the outside because for example you want to copy configuration from the host to the to the image that is being created but other things you want to do inside of the image because you you want to use tools in the image and then this also means that you have to you install those tools into the image and then maybe if the image is supposed to be very small to remove them afterwards so this was messy and because we have been adding support for operating on a change your directory to all system details including report you can now I mean the idea is that everything you do in the MKSI like all the all the build stages are invoked on the host and if they want to they will just use the change like operation to switch into the temporary file system but in general the idea is that tools implement support for you operating on a change directory themselves and then this this means that the whole thing can be simplified because you don't need to have anything installed in the image that you are creating then you then you don't need there and well you know that's what I have and as always as every project we are looking for contributors MKSI is in Python so it's a and system DC so it's we have everybody can get involved if they want to so I know questions please I have three questions but I'll ask first and then pass it on for folks who have it right so with the introduction of upholst dependency there are certain services that use needs to point towards network target so would you recommend them to use upholst and then add network demon to it whichever they make use of a system D network D network manager or would you ask them to still keep using the network target so network target generally does not mean that the network is functional so it's there's like a whole wiki page that explains the difference between network target and network online target and so on I think the dancer is that this is quite complicated because for services when you start a service and it's well it has been started then you know it's it's there but with network you you depend on external state which you don't control and no matter what you do on the machine you might not get network right so it might make sense to to use upholst to to keep the network configuration demon up but I'm not actually not sure how useful that would be because let's say that I'm using you're using network manager and it crushes you don't lose network right you you get you have some DHCP these and it will probably continue for the next day or two so the answer is I don't know it's it's complicated any more questions for anyone else all right then I'm going to go ahead with the second one so back like five years back I used to I wrote the system the unit to actually mount a partition and I didn't know that well that was not possible I have to use the step five for that but I saw that you mentioned of the list auto mounts command and I wondered if it's possible now to automatically mount partitions on boot using system D sorry sorry the last sentence I because it because I heard if it's possible to mount partitions automatically at boot with system D that it has always been possible why not okay yeah say any more questions for anyone else okay well third question comes from me as well say with system D doing bootloaders with system D boot and cron with system D timer and network with system D network D virtualization with N spawn partition management as well now that you mentioned by by when we can expect a complete world domination by system D you know next year thank you so with the with the mounts I mean maybe we should talk maybe you should give an example like later because I maybe I don't understand the question I guess I say I tried using the normal mount command but I added it in a system D unit and that's how I tried doing it but maybe that's not the proper way of doing it oh yes well so part of the answer is that mmm so system D had this this problem where it would have a vision of which mounts should be mounted and then if it the reality disagreed with that it would unmount things making people very unhappy we have mostly fixed that I think I mean it's much better than it used to be so basically a few years ago if you did this mount command chances would be that it system D would actually unmount it soon after but now it will probably just let it stay but I mean still the recommended way is to use FS tab you can also create a mount unit but there is really no benefit it's just more lines there's also a new setting system D mounts extra I think on the kernel command line which allows you to specify an FS tab like line with source and destination separated by columns and options and so on and then this will get mounted like if it was specified in FS tab understood thank you so much well let's have a big round of applause for the talk thank you