 And our next speaker is Philip Cren and he will talk about SecComp. Just welcome, Philip. Thanks. Hi, everyone. Who has heard of SecComp before? Okay. About half the room. Good. So let's see where we can take this. So SecComp, the secure computing model, is something that has been around for quite a while, but is still not as widely used as you would wish. And generally, security-wise, often this is us, and we say everything is fine until something, whatever that is, happens and then nothing is fine anymore and everything is on fire and everything is terrible. And then you always wish that you had something else to fix your security model. And SecComp might be one piece in the big puzzle of security solutions that you could use to help fix stuff. Obviously, there are no silver bullets and there is no one tool that you want to have like more layers and an in-depth security model. And SecComp is exactly what comes into play here. Generally, it's just one piece of least security. So it's trying to take away things that you don't actually need that might just expose you to more risk. And it basically just blocks certain system calls that you know that your applications do not need. So if you, for example, know that your process never needs to fork another process or you never need to call another binary from your port process, why not just take away the possibility from your application? So even if you have a remote code execution and somebody can run whatever they want in the context of your application, you still cannot do some stuff that's just generally not necessary for your application. And that's really just the idea of SecComp. And it's basically just instrumenting the kernel to do that for you. And it's like a sandbox to filter out stuff that you don't want. But it's application-driven. It's not like generally, it's not a Selenux, for example, or app armor. It's really application-driven because the assumption is the application developers know what their application needs and then they can register the right profiles for the applications and then filter out all the unnecessary stuff. It was introduced like forever ago, basically. It had the so-called strict mode. The good thing was strict mode was very strict, and the bad thing was also that strict mode was very strict. With these permissions, you can barely do anything. Yes, you can read and write a file, but there is no network access at all. You cannot even change the system time. Nothing can be touched here, basically, except these four system calls. That was a good first step, but it was way too restrictive for the average application to be actually useful. So it had to kind of develop over time. And a couple of years later, the foundation was added for system call control. And in 2014, SecComp was actually added to make that configuration easier. And I hope that everybody is on the kernel version that supports that, but that's a different story. Anyway, so you can generally, I guess everybody knows, man, sis calls, so you can see what system calls does your kernel version support. And by the way, there are new ones being added over time, and you can see that in sis call as well, which ones have been around for how long. And then with SecComp, you can basically see how you can filter those and not allow specific system calls. The way you write that is BPF, the Berkeley packet format. Where do you normally use BPF? Probably TCP dump when you do that. When you're pretty desperate and you try to find something on the network interface, there's one of the places where you might use BPF. And those are not the most human-friendly ways to write anything, but you maybe get used to writing BPF at some point. So the minimal setup would be, for example, here, we need these two header files, and then here at the end, you basically have a structure where you register the system calls that you want to allow, for example, or deny. And it could look something like this, and this is pretty small. The most important parts are, first, you want to check the architecture, because different architectures, different system call numbers, which can add confusion, and you might allow the wrong ones then. Here, we basically check which system call is coming in, and then you decide, do you allow that? So here, we're basically having an allow list, like everything we allow will pass through, and everything not only allow list will then run into the kill process. And just shoot down your process. Every call that you do, once you have system call set-comp enabled, will then run through that. The question that sometimes comes up is it very kind of expensive because it's for every system call, not really because it runs in the kernel, so you don't have to go to user space for all the calls. So it is pretty cheap to actually run those. The results that you can have is a call can be allowed, that's probably the good case. A threat or a process can be killed. You can also return an error and log any problems if you run into those. Is anybody using set-comp? Are any of the applications that you're using on a daily basis, are they using set-comp? Probably yes. Some of the more widely used ones are these here. So you probably touch set-comp somewhere every day. Especially Docker has used set-comp quite a bit. And if that hasn't been changed in the meantime, 44 system calls out of 300 plus are being filtered. And that used to be quite a lot of manual work for them to figure out what you can block and what you cannot block, but they found a more or less secure setup of what can be blocked out by set-comp. So for example, some stuff that they are blocking is you cannot change the system time because that's not namespace. Time is global on the system, so you cannot change that from within a container. You cannot clone a namespace. You cannot reboot the host. You cannot unshare or copy a namespace either. So these are system calls that are blocked. And you can also check that out. So for example, what you can do, which is obviously not recommended, but you can disable the set-comp filter on Docker and then you can just clone a namespace and then see that you're running here as root. If you use the default settings, then this would not be allowed because unshare has just been disabled. So this is where Docker uses set-comp, for example. Is any of your applications using it? You can also filter that out if you look at proc whatever status. And that number tells you zero means no. One means the strict filter, which is probably too strict and nobody's really using that anymore. So you will probably not have any processes with one, but the ones using set-comp will have a two there. And that's how to identify which processes have actually registered a set-comp filter for you and which ones have not and probably should. And then you can just check out the process from there and see what's being filtered and whatnot. So why am I talking about that? I work for this little company called Elastic. We have some products, and we also use set-comp heavily. If you've never seen any of our products, maybe you've seen this one, the good old elk. Or we've added another component called Beats, and then we kind of had to redo that elk because there is no bean elk and then we tried this, the elk or elk bee. And you can see it flies and everything, but anyway. So in our stack, we have two components that use set-comp heavily. That is first, Beats, which is like a library in the agent or shipper written in Go, since that runs on pretty much all the systems where you want to collect information. And then Elastic Search, since that stores your data and probably you want to add some extra protection there. Lockstash has an open issue, and Kibana as well, they don't use set-comp yet. But Elastic Search and Beats, for example, do that. I'll give you an idea how to do that. Elastic Search is a Java process. How do you set set-comp filters from a Java process? JNA Java native access is the thing what you do. This would be the relevant part from the source code. First, something that is not strictly related to set-comp, but something that every process should do is this one. And people mostly accept that you should not run processes as root, especially server processes. So we will just throw an exception and we try to run Elastic Search as root and exit. And secondly, we will check or we will try to register system calls there. Then, depending on your operating system, set-comp is really Linux-specific, but most other operating systems have a similar concept, even though it might be called differently. And we try to register those on all the available operating systems that we support. And otherwise, just throw an error and say, we don't support that one. There are different ways to do set-comp, but we'll stick to Linux and then it's just set-comp. And this is basically what we do in Elastic Search. So the joy of BPF at work, the main ones that you're interested in, for example, you cannot fork a process and you cannot execute another binary. So even if we had a remote code execution in Elastic Search, which hopefully we don't, you could not fork another process or execute any binary on the system because we filter those calls out. So that just wouldn't work. Then we have beats, which are this lightweight agents, and they have basically their own syntax to filter that out in Go, or they wrote their own library because they didn't want to write BPF itself. They wanted to have YAML instead. So basically they created something like this. Here you can see this allows all the calls by default and then filters out specific system calls. So this is basically a denial list. Is this what we're using in beats, actually? No, because beats, since it's shipping data over the network, would need some network access and this doesn't allow any network access, so this is just a sample of what we do. Actually the real rules are many more, but we do use an allow list. So we basically have whitelisted all the actions or calls that we need. This will be a very long list, and everything else will be blocked. So every time a new system call comes out that we need a new one, and we want to whitelist that one, we will need to update that list manually. But as always, allow lists are more secure than denial lists. Because new system calls, you might allow something otherwise that you don't have intended. So we could just try that out very quickly. For example, if you run good old Netcat, and then you could just open a connection and say hello, and ideally my hello should arrive. Where's my hello not arriving right now? No, this is, Netcat doesn't use second, and also, but we'll get to that in a moment. Let's see. Ah, this is the wrong host, my bad. Because this is a demo, and I'm already using this domain for a demo. Because, and well, if you use the right one, this works. You can use it as well, it's just online, so you can start chatting. Now the main questions are, which system calls are involved here? What are we using to figure out the system calls? S-trace, yes. So we can just run this with S-trace. Oops, S-trace. And that will show us a ton of system calls that will be needed here. The most important one is at the end when, okay, I think somebody said yay here. And that then killed, with a bad file descriptor, my process. But you can see all the calls that we are doing here. And that is, by the way, one thing that is kind of relevant. So yeah. If you run S-trace, and for example with dash e-bind, you can just filter down on the bind action, because that's the one I want to monitor here. So you could just see, this is just a system call for the binding. That's how to monitor that. And if you leave that out, you see a lot of system calls. And this is, by the way, a good way to figure out which system calls does my binary even need. What do I need to whitelist the other way to figure out which system calls do you need to allow for a binary to write the right second filter. There is under this address, you can find a little C utility, which you can add, which will then collect all the system calls that you're doing. Otherwise, you can always use the S-trace output and just grab the system call, which is always the first thing in every line. And you could just aggregate those what you need to allow or what you can deny. And then you can simulate what would happen if you would deny a certain system call. And for that, there is a little tool called FireJail that can basically run or deny specific system calls. And you can run it like this and let me paste it in. I think I've copied that because nobody wants to see me type that. So we have FireJail without the profile. We drop the bind permission and then we just run S-trace on my binary and I tried to bind to port 1025 again. And then it just tells me well my system was called my process was killed because you can see I tried to open a socket, I then tried to do the bind and exactly then does FireJail kill my process just as expected. It's not allowed here. And yeah, that's the output you will get here. One thing that is sometimes interesting is what happens if somebody takes over your binary and then tries to change the filters. Any guesses what will happen? Will it, you can only, I'll give you three options and you can decide what you think is most likely. First you can only set second comp once. Second you can drop the permissions to change anything in second comp. And third euro. The third, no it's not euro. So what you need to do is when you set up second comp the first thing you want to set is set no new privileges. This basically means you can always tighten down over the life cycle of an application. You can always tighten down the rules but you can never loosen them up. So once you've taken away some permission you cannot add it anymore. So even if somebody takes over your binary you can allow a fork or execute or whatever because you can only tighten it down. So this is the first thing that you should set to make this secure. Are we doing that? And I was kind of curious how we were setting that up. Yes we are. So for example here you can see the system call since Linux 3.5 for set no new privileges is 38. Then we try to set this one here and afterwards we actually take and if setting that filter did not succeed we would actually arrow out on the process. So we ensure that this has worked and you cannot change the permissions afterwards. Beats has since it has this YAML syntax just has this no new and you're done which is maybe a bit more readable but the library in the background that does all the YAML parsing is just the same with VPF to set the right filters. And then you probably want to figure out that somebody has kind of like caused some second violation. So Audit D, the Linux auditing framework can also collect those. We have another component in our stack called audit beat which can collect any violation of those and then you can just collect those and see where it's going. So for example you can look in Kibana, you could just say like give me all the violated second policy actions and then you would see these were all the kind of like things where you violated the policy. And then you would also see like which user which binary which system where that happened and then you can try to find out like do I need to change my rules or did somebody break in and try to run some filters or system calls that they were not we also have this new scene component which is more drag and drop oriented and then you could drop that in here and this shows you this was when I was trying that with Netcat and you could see for example you want to use on the what the fuck host was trying to use Netcat and that was actually a violation and I filtered that out here and then you could would know like which binary is affected which user might be affected and you could filter those down. So to wrap up it's always kind of Lego security so there's not just one block yet you need to put in but there are many blocks that you need to combine to have better security and SecComp is just one of these little building blocks. Sometimes people ask like how does it compare to Excel Linux or AppArmor. One is or both of them are or all the approaches are kernel level filters or interceptors. The main difference is that SecComp is where the process actively sets that and basically the application developer roasted out with its application whereas it's setting it mandatory before the application is even being started which is what Excel Linux and AppArmor are doing. So it is widely available hopefully all your kernel versions support that you can use SecComp and hopefully have more secure applications. If you want to write that more easily from your own application code there is a library called LibsecComp and they have a lot of samples of how to write the filters from various applications. So that might also make your life a bit easier to apply SecComp in your applications. And then people always ask me about Windows and I haven't used Windows in years but I googled and this seems to be the most similar system called filter that Windows has and it has I guess everything is as terrible as the name is there but this is what you get and if Windows is your thing look into it but I have no idea how it works and I don't really intend to look into it too much but this is the closest equivalent that I could find and that's pretty much it. I think we're right on time for a couple of minute questions. If you want stickers grab the stickers afterwards wait for the microphone so everybody can hear the questions. By the way before we do the question I always try to take a picture with you so I can prove to my colleagues that I've been working because we're a fully distributed company and nobody knows where I am today. Can you all wave? Very good thank you. Question? I was waiting to and wondering why you didn't mention the SleepsecComp and it was just on the last slide so the question would be why aren't you using it actually because as you mentioned doing that work manually is quite hard you have to have those defines when, which kernel version, which architecture, all that stuff and the SleepsecComp is actually doing that for you and making those ugly defines macros not necessary and all that stuff right? And also gives you a much easier way if you because that thing you also I think not mentioned that you not only can filter which system calls can be used or not but you can also check the arguments, so for example you can only allow certain syscalls if the arguments are right like you can allow read or write or open on some file descriptors or things like that. So the main question is why aren't you using it if it's there and you know about it? To be honest I don't know because I've only been with the company for four years and it has been there before me so I kind of don't know the history why we started on our own path with with our SleepsecComp. I will try to find out, this is actually a good point, I don't know why we maybe we didn't just didn't know about it I don't know but it was a long time ago so hi, great talk, thank you. In Docker when you look at syscalls there is a SecComp that is involved and also capabilities and I was wondering could you please describe how they both fit with each other like is Docker generating a SecComp policy relative to which capabilities are enabled or like I feel like there is two moving parts that I don't know how to fit together. With the capabilities you mean like NetAdmin for example there is a Capsys Ptrace that controls the Ptrace syscall but you can also block it in SecComp. Yes, I don't know how they join together but you can override basically the SecComp filter with that setting and you always set so I know that for example if you want to run a container that monitors the network calls from other containers you need to set that NetAdmin that must override the filter because otherwise you wouldn't get to those. But how that is implemented in the background I haven't checked the source code of Docker for that. Okay, thank you.