 I'm talking, I'm going to be talking about using systemd to secure demons, which might sound a bit like systemd is doing some important work, but actually this is mostly us exposing functionality implemented by the kernel. Systemd just strives to make this easier. So because of that, this is not working, right, so if the kernel provides some functionality, for example, mount namespaces which we can use to limit the view of the file system for a given process, why do we want to involve systemd in this at all? So because we could just write a wrapper script for our demon or we could do it using some special commands or something like that. So the answer is that by doing this once in systemd, we get a very complex implementation, but we only need to do it once. So we, well, we have upfront cost, but then in the long run we come out better. And systemd tries to make the whole thing much nicer than it would be otherwise. In particular, it abstracts the differences between kernel versions, architectures, and tries to provide high-level knobs that make use of the kernel features in a way that is easy to consume for user space. And finally, it gives us an easy way to have a mode where our service does not have any privileges from the very beginning. So it runs, for example, as a different user. And systemd does all the preparation and the operations that limit what the service can do in cases it's broken. And there's no way in which we can mess up and do something wrong in the service because it is never running with any privileges. And finally, if we run a repo query over Fedora, it's a bit complicated because we need to look forward for appropriate files, but then filter out files which are not service units, and then, well, filter out duplicates, and we get 1740 unit files in Fedora. So when I ran this like two years ago, it was about less than 1,000. So the growth is pretty quick. And in principle, those protections could be applied in any of those files, all using this one implementation in systemd, which just makes the whole thing scale better. And so I will be talking about systemd security features, and I think that is there anyone who doesn't know what unit files are or doesn't know what systemd is? So we have more or less three ways to start a service. We can use the unit file. We can start a service directly. And something that is very useful but not as widely known as it should be. We can call systemd run. And systemd run is a little tool that essentially constructs a unit file on the fly and runs it. This mode with minus t attaches the unit to our current terminal so we can do, like, interactive input and output to our service. And it's probably not a good way to run general services, but for testing this is very easy because we can spawn things whenever we need them. And now let's look at the protections. So the basic unique security feature is splitting into users and we should use it much more often than we do. A surprising number of services run not as a special user but as a root and they shouldn't do that. At some point there was a tendency to run services as nobody user and then they all share the same privilege set which was, of course, also not good. So yeah, we should put everything that we can into separate users. Android does this for every application and we should also do that on Linux as far as possible. So to show the example with run, by default systemd runs everything as root. And when we tell systemd to use a specific user either through this special switch or through this general syntax which is just set some specific property and this is essentially the same as if we set this property in the unit file just on the command line, the syntax is the same and the majority of properties which can be set in unit files can also be set on systemd run command line, not all of them because it requires a special re-implementation of the passing so it's not fully automatic unfortunately. And yeah, so this is nice. And then there's a bunch of options how to limit access to the file system and the two most high level ones are protect home and protect system. And in general, before I talk about what those specific options do, systemd has a man page, generated man page which lists all the possible unit configuration options and there's, I don't know, thousands of them and they vary from those very high level and very useful ones to some extremely detailed options and knobs and in general, I mean, it doesn't make sense to start with the low level ones, the high level ones are the ones that you probably want in general. And so protect home either takes away the, makes home fully invisible or makes home amount really only for the service and obviously for the majority of system services, this is a fully appropriate solution, they shouldn't be able to see or touch home in any way. And I wanted to make a short demo, why not? So sudo system, what they want, protect, I messed up, okay, so systemd run and so the last option is shell. So I'm starting a shell and so what this really does is that it starts a unit and the input and output of this unit is connected to my terminal but the process is not a child of the current shell, the shell communicates, starts systemd run and systemd run communicates with systemd to start the unit and then attaches the input and output, so this is, now this shell is a child of the unit manager and so we specified protect home equals yes, so if we look at home, it's empty, if I do the same thing with only, there is some garbage, I use my laptop for testing systemd bugs of various sorts, so it accumulates and as you can see, I am root and if I touch, I can't because it's a run if I system and obviously on its own, this is not a full protection because as root, I could break out of this without too much trouble but it's an incremental safety feature and Apache should probably be running with something like this set and yeah, there's those lower level options which allow us to tweak the view of the system for the unit and we can mount things in various places, remount, read-only or not read-only, we can do quite complicated things there, oh and I forget, protect system is similar to protect home but it's for the non-home parts of the system like slash user slash ETC and so on, the details of those options are all explained in the man page, it's generally about taking more and more, making more and more parts of the system either read-only or invisible and in particular this strict mode makes pretty much everything read-only and then you are supposed to punch holes into the appropriate places so that the service can actually interact with the file system where it needs to and yeah, so the question was if I said protect system equals yes, is the system protected? So protect system equals no is the default and the service is the normal file system and with protect system equals yes, it gets parts read-only, yeah so sometimes the naming of the options, I mean maybe this case is not so clear but we also have options like restrict something and then it's not clear, yes, so generally the idea is that we start as route with full privileges, kind of incompatibility to how demos were always started in Unix world and then we slowly take things away, it would be nice to make things safe and locked down by default but those options have been developed over time and for backwards compatibility reasons we can't just suddenly turn it on even if this would be maybe a good idea in general because it would break things, so yeah, the default is to be fully open and privileged and you have to take things away one by one, all right and private temp is like a very useful thing because it gives a private mount of the slash temp and slash var temp directories for the service and it protects both the service from the other services and users running on the same machine and those services and users from the service in case the service tries to do something bad so it's a bidirectional thing, very useful and I mean all those options are useful but in fact they are so, you know, 2018 or something like that we can, we now have better options and the idea is to not only mount the file systems in the right place, in the right mode but also create directories on disk before the services started and remove them or not depending on the type when the service is stopped so the idea is that the lifetime of those directories is tied to the lifetime of the service and they are created just before they are needed and removed as soon as they are not needed so for example the runtime directory in run full is of this kind and I don't know logs directory it is not removed because it kind of wouldn't make sense to remove logs immediately after a service is started so each of those directories encodes some kind of disk space with different lifetime for the service and the good thing is that we can define a, we can define a service including its file system set up fully in the unit file so we just drop in the unit file we don't need to do any script let's preparation or anything like that we start the service and everything it needs is right there at the right time so basically there is those five different settings because there's five different types of lifetime and traditional locations for those directories and yeah so an example would be if I run this command as user's bishek directory run full would be created the ownership would be changed to my user and then when this command ends the directory is removed and everything is I mean stuff is cleaned up and so so if you can create a directories on demand what about users right I mean I said it would be nice to run everything as a separate user but then we want to run I don't know some cron job as a separate user and we have to create this user then our password database looks like my home directory not very nice so the answer is to create also to also create users on demand and the option is dynamic user equals CS so you might wonder what does it mean to create users on demand they are not it's not that system d takes the etc pass wd file and adds entries to it and then maybe at some point removes it because this would be rather messy the the user exists only in this way that we provide an ns module when that when queried we will return information that this user exists but the user doesn't really exist in any permanences at any time so as usual system d run minus p dynamic user equals yes or true and and I'll try to switch again so as you can see I mean every every once every time we do this we get a user and if I try to query this user the user doesn't really exist if I run a command that stays in the background I can I will run status on this and I'm turning off the logs because the logs are not interesting and I can see that there is a a process running now if we query this process there's some dummy username here for this for this particular user and when this process exits of course the the name is gone and we can we can take it to the extreme we can you know if we are building some pipeline and we want to run this as separate users to to protect things we can I mean you can do this kind of crazy thing where even each part of the pipeline is running as a as a different user we can create them on demand and I mean this works right it's in this type example of course it's stupid but I don't know if I was running a job that was translating some transcoding some video file it would be kind of nice to to lock down the each translation process in this pipeline so I was talking about limiting access to the file system for mount namespaces using users to to separate privileges and now let's talk about the network so there's this nice high level very high level option called private network equal yes it puts the the service in a private network namespace and well then it only has access it can talk to itself essentially there's a loopback device and in a private network equals yes is the recommended way to run network services I don't know if this this sounds right to you so so how do we how do we do that through socket activation so the idea is again that the we split out the the preparation in this case opening of the sockets we let system d do it and then when we start a service it cannot even open a network connection every other setup has been done and it's passed in to the process before it's well when it starts so it's a bit confusing with socket activation there's two types there were confusingly named wait and no wait in the in a d x in a d times and then now they are confusingly named accept equals yes and accept equals no system d times but generally the idea is that the in one case we get a listening socket and the and the other case we get a socket for a specific connection and this this second type is rather inefficient but the first type is as efficient as this the demon opening opening the socket itself there's really not much difference in any regard except that the privileges are except that the process can run without any privileges and this is I mean when combined with private network this is a very nice way to to to run services and that there was what I had about the high level stuff there's plenty of plenty of low level options I don't want to talk about each of those those are maybe like the mid-level ones so memory deny right execute sets the takes away the ability to to create executable writeable mappings this is probably appropriate for 99 percent of stuff that we are nowadays private devices takes away access to devices so what does it mean takes takes away access to devices it means that even if the process was running with elevated privileges or otherwise the permissions on the device were set in a on the device file were set in a way that would allow the process to access the device and it is even though though even with those things true the process still cannot access the device and I mean similarly as in the case of protect home even though we are running as route we cannot modify modify the file system if we can limit address families we can restrict process elevation we can there's a bunch of protect something options that take away the ability to to set some kernel settings so it's usually about taking access to slash this or slash process away from the process and forbidding some some calls that the the process should not make and the system called architectures is actually an interesting one because it's a it's the reverse thing we are not protecting the process we are protecting the kernel because we limit the attack surface on the kernel from the process by simply disallowing executions of the execution of system calls from a different architecture which occasionally has has led to to to kernel bugs in the past and then there's options to set capabilities I don't want to talk about that there's a whole set of well there's one option to limit access to specific sys calls and this would be rather ugly to use because there is hundreds of sys calls and new ones are added in every kernel release but we provide high level groupings so you can say add group name and then all the options all the sys calls in this in this group are allowed or disallowed depending on how sys call filter is constructed and I don't know like for example basic IO is anything that the process needs to do disk IO and if we are running a I don't know if it if we can either set it or we can take it away but we do it as a whole and when new sys calls that belong to this group are added and system d is updated it will new sys calls are added to the list and yeah I don't know as a demo system d analyze so and there's like those things that are even hard hard to find documentation for but the kernel allows those sys calls to be called for backwards compatibility and for most cases we can just you know I mean take away to to call tax call and it just makes things a bit safer oh and network filtering right and if we if we don't set private network we can have this network filtering at the individual service level so this is nicer than I don't know IP tables because it is not a global thing it is attached to the to the specific service the problem is that the so this is implemented as a ebpf filters which are attached to the to the control group and they operate only on the processes in the control group and this would be great and it can offer really nice capabilities the problem is that the implementation has only be done for some very basic settings so you can filter by IP address or you can currently attach a custom built bpf program and then do some I know whatever filtering you want in principle but then you have to build this this this program in bpf yourself and and well I mean hopefully in the future somebody who who likes assembly coding will implement more features like port filtering for example that would be other useful and so so there's this this is plenty of options and getting kind of oriented in them is is not not so easy and there's a very nice helper tool for this so system the analyze security is a tool that takes a unit file and tries to give feedback what system the thinks about the this the unit file and a big disclaimer there will be words like safe and unsafe on the in the moment it does not mean say I mean it does not mean safe it just means that either system d features are being used or not so so I wanted to do so security system d resolves d so basically this goes option by option and and well if it's used then you get a plus if it's not used that then you get a minus there's some scoring some numbers are being attached to this they're probably not very important and at the end you get some you know test score yes we have emoji and unicode and colors so I mean one one way to use this is to basically to to start it and you know look at the list and think what stuff would be appropriate for a service and maybe turn turn turn things on I mean this this mostly goes over the what the stuff that I was talking about before and some some additional things and I also wanted to show I think HTTP for example so you know the not good no but the truth is that it basically is not using almost any of the of the security features of system d and it probably should when you multiply this by the 1700 files service 1700 units in fedora there's a lot of potential for for making use of this much larger scale and that's that's all I have there's a link to the documentation and to the slides if you if you want to look it up so let me I know questions yeah so the answer is kind of complicated there's okay so the question was in row seven there's PAM namespace yes that's the name which allows to allows moving a process when the PAM authentication is being done into a specific C group they did I get this right yeah so this this has been solved by having by using PAM exec to run a script at the same instead of the using this and then this script can do stuff that needs to be done and this solution is of course not very nice so yeah so the answer is right now yes you can kind of make it work but there's no nice solution yeah so okay so the question was most of this is implemented using C groups yeah C groups are involved in pretty much all of it so so the question is how can we see what the server services right or okay so so so I mean first of all this all of this is documented in a level of detail this probably too big so we have the documentation we try to essentially document everything that needs to be documented but as far as debugging this well it is not easy so the first option would be to set system debug level to log level to debug for for stuff like setup of the mount nicepaces system you will actually log everything that it does so you can get quite a lot of information from there another option is to set to have strays running on a PID one yeah well I mean also there is in some ways you can just do like with system D run you can kind of simulate because it's using the same exactly same implementation so right I mean run stuff locally and debug them then any any way you can okay so thank you