 I'm safe for now anyway. I'm Elitio Frosi. And right, so we are talking about Satan, but that's not the thing you eat. We are actually talking about a lot of privileged operations in the sense of system calls mostly, second containers, virtual machines. Some problems about them or what we think is a problem in terms of security and the solution we propose. After that, there will be a demo and then questions and don't overestimate us, but we hope to answer to your satisfaction. So quick recall or what do we understand as a system call? So in essentially every modern operating system, you have several rings, right? So you have a kernel ring, you have a use space ring. That might be ring one or ring four, or in any case, you have something like a system call abstraction. In case of Linux, it's simply that you have a process requesting resources or services from the kernel. And this is maybe perhaps the main security model that the operating system implements. So depending on your user or capabilities on Linux or context on Linux and VSCs, you might get that this request is granted or denied. So if you ever try to insert the family's module that's called evil things as a user, the kernel will not let you do it. However, if you touch your own files, then the system call will succeed. And the difference is just that the number of the syscall is different and root could do this and root could do that. OK, that's kind of obvious, but useful maybe to introduce what we want to improve. So quite often, we see in container environments and virtualization engines or mixes thereof, such as, let's say, KubeVirt or Kata containers, for example, but even if you just stick to Podman and Docker. Let's say the container wants to create a natural interface. That's usually a ton. It's like the most basic tunnel interface on Linux. And you need to do it the old way. So without that link, you have your control. And you want to tell the operating system, I want to create a network interface. And on Linux, this doesn't need root anymore. But it needs CapNet admin, which means you can do pretty much whatever you want. And you can create how many interfaces as you want, spoof the traffic, bring down the networking. So there are a few other examples of these things that are actually quite common, like, for example, setting up the priority for a real time virtual machine where you just need to affect the priority of one process, not all of them. There are other problems right now. So for example, you want to create a device node as a user. You want maybe to connect to a specific demon or open a specific file. And all of them, there have been impressive improvements recently in Linux. But wouldn't it be nice if we could just say, OK, I want that this process can create this top device. And sure, Linux security modules do something almost like that, but those are kind of fixed policies. They're not so easy to dynamically configure them per process. So we started looking into it. And of course, BPF in Setcomp is an important part of the story. You can do Setcomp with a small BPF program where you say that you might want to deny or accept a syscall based on its number. Not good enough, because that's too generic. Big improvement again in Linux recently. You can tell another user space process details about your syscall. This is called Setcomp Unotify. And essentially, the kernel tests you a lot of things, arguments. Position time. So container already make use of Setcomp. Usually, you have a JSON file that defines the syscall that are allowed, denied, or notifiable by the container. So usually, the runtime take the second profile that is part of the OCI spec. And basically, it use Lipsetcom library in order to generate the BPF filter, as we saw from what Stefano just described. If you have filter, you need it for filter, the syscalls. In OCI, we also have support for second notifiers. And basically, the runtime need to communicate with the monitoring process through Unix sockets. So basically, this is the OCI extension. So through this Unix socket, it has the file descriptor where the monitor will receive these notification events. And when this starting phase is done, then the monitoring process is able to monitor the container and take action if one of the filter syscalls is executed by the containerized workload. So there are already existing solutions that take advantage of second notifiers, like, for example, LXP, or there is a King Wolf second page. However, those projects are in common that they implement an handler per syscalls. So for example, if you want to add a new syscalls or maybe even change in the behavior, you need to code it yourself. So it's not very easy to reuse. So in order to do that, of course, you need to have a deep understanding how second notifier works. And this is the place where Satan comes to play. So the idea is that, if you are an admin or you are developing a tool, you will be able to describe this into a recipe. So basically you will have a match. So basically describe the syscalls that you want to filter on arguments and you will associate an action on it. So basically this will be, we choose to use JSON format. And the Satan cooker is basically taking this as an input file and generate the BPF program. And then a bytecode representation of matches and action. We need a kind of launcher that install the BPF filter in order and then launch the real process that we want to monitor. And this is the goal of Sataneter. So the actual monitor is what we call Satan. So basically this take gluten, that's the bytecode representation for matches and action, monitor the notifiers and then basically perform the action in the alpha of your target. So here you have a visual representation of the flow. So we have two distinctive phase. We have the generation of input that can be on a completely different building system. So it doesn't need to be like we saw in the container on time when we start the container. So cooker will read the recipe that you wrote, will generate gluten that will be the input for Satan and the BPF filter that will be launched by the eater. So when the eater launched the target, then basically Satan can start monitoring the target process. So why Satan? So we decide to choose a declarative approach versus an imperative. So this gives you a better visibility of your operation. So the privilege operation. It's flexible, so you don't need to code an extra handler if you want another behavior. This entire, the Satan setup will take care of that. So what you need to do is just writing the JSON recipe. It's a generic, so it's an independent and self-contained tool. We are not relying on lip sac comp. We are going to see that in details, but basically the setup generates the BPF program and the matches and the action that we saw. So here you have a visual representation of a code. It's a snippet from a keyword, but you could solve it with the JSON recipe. So this actually, it's a nice representation what we mean declarative versus imperative. So of course, security is one of the strongest and large use case that we have and so above for Satan. Rootless container. So we want to target rootless container by reducing the number of capability given to them by impersonating only the necessary these codes. And I think with sac comp is that you can have a deep argument in introspection. So for example, you can check also complex objects such as for example, strings, structs and buffers. A new, a nice add-ons that we think could be beneficial is also counting the number of the Cisco execution. So again, this gives you a more fine grade control of what your process is doing. However, security is not the only use case. We think also could be used in other contexts like for example, testing. So for example, you could inject some error if you execute a certain Cisco. So for example, you want to simulate how your application behaves on different error. You can also mock a Cisco, so not execute on your system, that particular Cisco, or maybe another thing could be injecting some delays or something asleep and then continuing the Cisco. We have a deep introspection of arguments. So that could be used for profiling your application. So for example, it could be an alternative to tracing tool that use P trace today. We already mentioned for example, could be also used for managing the resource allocation. Like for example, sac comp allow you to inject a file descriptor into the target process. So this could be an alternative way to SCM rights or the use of PDF ticket FD. Or maybe you could, a use case could be to connect maybe to container, to application that runs into the container that don't have the buy mount. So the socket is not available to the both containers. So SETA could take the file descriptor of both application and connect to them. So those are just some example. Here finally, you can see an example of the JSON that we were mentioning. So you can see that there are two sections. The first is the match. So in this case, we filter a make note with major number one and a subset of minor. So basically we are going to do a privilege operation. So they call only for certain type of argument. And the action that will be performed is basically redoing the make note in the context of the target. So you can see context mount caller. The second example is what I was explained to you about testing. So in this case, we have a match. We on connect on two different paths. So you can see test one socket and test two socket. So if your application try to connect on test one, we basically simulate the C-scull because we return zero. And in the second case, we are returning an error with this minus one. So this is just some JSON example that you could describe with SETA. So now we're technically... Sorry for this one. Right. So, right. The cooker generates two parts, right? We say there is a BPF program because we need to tell the kernel, please tell us about a number of C-sculls. Not all of them, otherwise we would have a few problems. I mean, that wouldn't be really useful if we just got all the calls, like are your calls like read, write, or send message or networking calls in general. So we need to be selective. We just want to get what we are interested in. And this is the role of the BPF in the kernel. So, great. This is not in UNO. You see a binary search tree. That's what ellipsa comp implements. Why do I have a binary search tree? Because, well, I have a list of C-sculls. And maybe, maybe there are a bit more than seven. Maybe there are 200. And then every time BPF needs to check if it's a matching number, well, it's a comparison. So we want to keep a... It's actually quite relevant to keep an average complexity for the search operation to something reasonable. Okay, great. For that we have a big awful again. But there is something that makes ellipsa comp a bit simpler than what satan does. Because ellipsa comp is used typically to just deny or accept these calls. However, we need to be a bit more detailed and we need to do like accept, deny or notify. So we have two optimization goals. And one thing that we found to be quite effective in our solution is to fill those, those the bottom most level, the leaves with some intermediate more jumps. Like there are redundant jumps to these possible actions that are sure here. We just show the user notification or let the process do it as if nothing happened or block it. Then, yeah, right. And this brings me to what's overhead you might wonder. We haven't been really scientific yet because yeah, that would need a bit more time. But essentially what we did is to try around 10 million as sick on a post-modern 10-ish years old something laptop. And well, Linus got quite fast or maybe CPU got quite fast or something got really fast. I don't know, but it just takes seven seconds. Okay, so we tried a typical usage of BVF that we see with satan, with filters that might make sense for typical podman containers that need to just mount a volume. So it's 100 instructions that do essentially nothing. Plus we have some comparison and then we jump across this filter. And that takes a bit longer, 8.2 seconds. But from there we estimated that every comparison is something between 20 and 40 clock per instruction. Again, there might be something better on the market now. So I guess we don't care or we do, but this should show that what we're doing is actually doable. Now we were talking about the BVF and this is the other part. So this is the part that is adjusted. Yeah, sorry, not intended actually. Consumed, same by the user space monitor. So the user space monitor gets notifications and now it needs to decide what to do with it. So this is pretty generic. I'm sorry, how obvious this is, but we have an area of instructions, an area which is read-only with the constants that you put in the JSON. We have a temporaries area that's only read-write part. And that's pretty much it. And we have a structure that's really simple that's what second gives us, which is a list of the arguments plus the PID of the target. And looking into instructions, we try to keep them to a minimum and we are of course concerned about feature creep, but yeah, we are quite committed to not add more than this because otherwise what we are doing is we will not really be able to claim it's secure and it will not actually be secure. So the options are, well, the obvious one, okay, check that the syscode number is matching what I wrote in the configuration. A couple of them are specific to SecComp. So SecComp allows us to inject a file descriptor atomically, so atomically with a code, meaning that the task cannot do anything else, meanwhile. And this is useful if you, well, you see that later in the demo, but you can connect to something and the supervisor connects you to something else and replaces the file descriptor and this is actually safe to do. We can return an error or a success and then we need to shuffle a bit the data around because the configuration comes from JSON and the process can pass whatever in it. And a quick mention about the context. So by context, we generally mean namespaces on Linux. We also enable specifying the namespace, the several types of namespaces where we want to execute a syscode. So when we impersonate a syscode, we want to be able to, for example, yeah, for a container to do that in its mount namespace. And plus, yeah, obvious boring things such as a working directory, UAD, GIG. Tags, so in this JSON recipe, of course, we need to have references between matches and actions because we might want to recycle some data. Security, how about this security with this? So it looks like we kind of explained to your look so far. But that's not our intention and actually what turns out from a bit of experience we have with several container engines or virtualization. No virtualization in the sense of VM because, yeah, in that case, you have a much, much stronger isolation and you wouldn't need to use this at all, probably. But let's say you have something in between or a mix. And yeah, sometimes you need to, I mean, we look at it and we think, okay, yeah, actually I didn't want to tell a component that an RPC needs to pass a path about a file that I want to open. What you need to do is to open the file and maybe there is actually a way. So the obvious benefit of this is, yeah, that instead of implementing several types of RPCs that we saw in several projects, you can have a unified mechanism and unified both in the sense that this should be generic enough to be used by different projects and also that for the same container or the same engine, you can have a single place where you just say, okay, those are my set of privilege operations and nothing else. We don't want to do the parsing in the supervisors so of course right now it's 500 lines of code and we really, really, really hope to keep it that way. There is a surface, definitely there is a significant attack surface and in that link, we listed a few considerations about those but overall we think it's not perfect, it doesn't guarantee security by itself, it's all magic but we think there is some clear value in this solution. Okay, so now we have a live demo. We have a website where you can find also all those demos so if something goes wrong, please go there. Okay, so we have seen some example. Now we can see Satan set up in action. So first of all I would like to show you what we are going to execute so this is similar to the example I listed in the slide. So there are different matches in the first one. We are going to try to connect to a different pass so you can see that in the match we have a cool sock and we're going to try to modify the connect and try to connect to the demo sock. So different pass, a different survey and in the second match we're going to inject some error permission denied when we execute the connect and in the third one it's to execute the rest of the connect. So first of all we need to generate the input file. So this is done by cooker that it takes as input recipe then we need to generate the gluten that's the input for Satan and the BPF filter. Okay, so some prints, but we have generated the file. So what we would like to do is just to print some, to read some file and print it in the server so I'm just generating a file that will be read and then we can start the socket as a listening server and this will be the path where we want to actual connect with. Okay, and then we need Satan either in order to launch our application and this takes as input the BPF filter and our application it's again socket that is going to open the file that I wrote previously and we want to connect to the cool socket. So actually we don't have this cool socket but we try to connect to the server. So Satan either is blocking because it's waiting there is some synchronization because we need to start Satan otherwise we might need to my lose some Cisco. So Satan takes as input the other file that we create before, that's a gluten and it takes the PID of the eater. Okay, I hope I haven't missed time. Okay, of course I did something wrong. Let's see what I did wrong. Okay, let's try again. Live debugging as always. Okay, we didn't start the other socket, so let's do it again. I mean it's live, so that the. Okay, so now you can see that this has finished and we have printed the string. Okay, the second part of the demo. We are going to try to execute the same command but on a different path. So the different path on this path we are going to inject an error. So Satan is always the same, it takes the same gluten files and you can see that we have got permission denied. So this can be a nice way you can test on different behavior for your application. Of course if you are going to do another path in our case we don't have this socket. This connect is not filtered and it will be simply continued and in this case it pays because we don't have it. So those are the three matches that we add in the JSON. Okay, so this is the first demo. The second one we are going to use Podman and try to create a character device. So here you can see that in the match we have make nodes, major one and a subset of minor. And as a call we are going to replicate the make node in the context of the caller. Okay, so first of all I want us to show you what happened if we don't use Satan. So in this case I'm just trying to, I drop all the capability here. So I'm not going to have cut make node inside the container. It's a Fedora. It's right just to create make node with some, so yeah, I got the permission denied because the container doesn't have the capability. So now we can try to do to start Satan. So it takes in this, it's a slightly different flavor of I haven't generated, sorry. So first of all of course we need to generate the input file otherwise it's not quite right. Again this takes the JSON file I show you previously. Again it creates gluten and the BPF filter. Okay, so now I'm going to start Satan, need root, in order to be able to create a make node, the device node. So it takes as input again the gluten that we generate now. But in this case we are not passing the pit but a path to a socket. This was the thing I mentioned into the OCI integration for a second filter. So it's going to be, okay. So now I will start again the same container but I have added two annotations, so I hope you can see it. So in the first annotation I'm reading, yeah. So in the first annotation it's second BPF data and basically we are reading the BPF filter that we generate with Cooker. And the second annotation is the OCI, the OCI support for the second notifier and it takes the path of the socket where the runtime will pass the file descriptor. Okay, and then we are basically as a command create, again try to create the character device and try to list it, so we will see if it's successful. Okay, so you can see that now we have been able to create the character device because the call was actually performed by Satan. So in this case we have been able to even create a character device without making it capability inside the container. Okay, so, sorry. So the takeaway of this presentation are that Satan is a tool for filtering and executing privileges code. We aim to reduce the capability and the privilege given to containers. Important is the clarity of approach versus the imperative way. And you can find more information into our website. Future plan are of course finished to code and a big cleanup, it's really needed. And yeah, right now we have very few syscalls but we plan to add more. We would like, we would love to have feedback from you because it's a very new idea. So if you have any questions please speak up. And yeah, our goal is to try to integrate Satan with container engines and virtualization engines such as Kewgurt. And special thanks to Andrea Frangeli, Kristen Browner, his blog has been very helpful. And Kewgurt developers that help us to shape our design. Oops. Could you, sorry, I haven't realized it. Any questions? So we have one question from the audience. Okay. It's about the demo of the app with the new model. The question is, around this and more in containers, is it possible to intercept and allow these kind of calls outside of containers? Yeah, I mean the first demo. So the question is if it's possible to do MkNode with outside of a container? Yeah, sure. I mean the first example was without a container and you could do exactly the same. So yeah, the. The BPDF program needs to be loaded. Yeah, I mean you have seen the two flavor of Satan. I mean in the first example in the demo we have used Satanator because it's exactly, I mean we need to launch the process by installing previously the filter. Yeah, thanks. Thanks for your time.