 Kaj našel? Ako ga se našla, ki se čeli, ki je. Sej, da se prišla. Tudi se je najne. Vse do vrlo, nekaj ne se prišli. Tako sem v 같은i, nekaj ne se prišli. Čeli, da sem prišli, nekaj ne se prišli. Svi bili, da sem prišli. Ne, tako našel. Ne, tako našel. Ne, tako ne. Ne, tako našel. Ne, tako našel. Ne, tako našel. Ne, tako našel. Ne, tako našel. So, many of you do you follow good security practices, especially with container security when you deploy your workloads. And now the tough questions. How many of you do you deploy custom profiles, security profiles, such as SECOM or APARM, as Linux with your workloads? So very few people. ne vedne that that deploying custom profiles is not such a complicated task. So this is the agenda for today. We will first start looking at some motivators for container isolation and also we will look at security configuration in equipment with respect vseče začanje, in se zelo drugi, neče načinu, ki je orebio, kot se jazem da se zelo in jazem, saj, da se je orebio. In potkaj govoril smo v izgledi, kot se je zelo, kot je se je zelo, kot je si je zelo, ki je se zelo, kot je je izgledi, kot se je zelo, kot si je zelo, kot se je zelo, kot se je zelo, So what are some of the motivation for container isolation, maybe all of us, we know that containing us, they share the same kernel when they run, so in order to achieve a good container isolation, we need multiple layers of security. I hope everyone of you follows basic or security container security practices, such as not running container as root or don't use excessive Linux capabilities, so we will not go in these details in this talk. We will look at security profiles, which is essentially an additional layer of security. So, also when we want to achieve a better isolation, we want to reduce the number of features which are exposing through a container, so it doesn't make sense to have API calls in containers, which they don't do file access, for instance. Also in some situations or use cases, we have requirements to run customer code, and this requires additional sandboxing or hardening of the container, so we need to go a bit an extra mile. And maybe all of you heard of quite a few kernel bugs over the years, which they got exploited to achieve escalated privileges and escape containers. And maybe not lately, my detainancy, it's probably a reality of any Kubernetes cluster, medium or large in today's organizations. So, what are security profiles? Maybe you know, I will just have a short recap here. Yeah, maybe you know about Secom, this is the most basic, it allows a set of system calls, and it's a security mechanism, which can restrict the number of system calls, which can be performed by a container. Also we have a Palmer or AC Linux. These are Linux kernel security modules. They allow much more fine-grained control over the capabilities, which can be used by a container. Now, what's the current state of these technologies in Kubernetes, or in Kubernetes, let's say security configuration? Unfortunately, the Kubernetes doesn't provide very strong security defaults. And especially for Secom, we need to explicitly enable Secom in the Kubelet arguments. And this is actually a runtime default Secom, which gets shipped with the runtime. This is a fairly large default profile, which includes probably over 400 system calls, which is crafted for generic workloads, and it runs most likely any workload. I would say in most situations, we don't need that many system calls, so it's relatively easy to reduce this number of system calls, to reduce it to hundreds of them. Also in the security context, we have AC Linux, but unfortunately not yet up Armor, it's not yet graduated. And also a challenge is that today we don't have any more only docker, or docker was deprecated. We have a number of runtimes in the cluster, and each runtime has a slightly different default profile. So some of them have additional system calls, so we want to have a consistent experience. So what's the state of Secom, how we can configure Secom in Kubernetes. So Secom can be configured in the security context, as I said before. Either we use the runtime default, or we can create a custom profile. So a custom profile needs to be deployed on each cluster node. And as we see here, when we set up the security context of a container or a pod, we need to reference this profile. So yeah, this is a challenge, especially in large Kubernetes clusters to have this deployed. Yeah, up Armor on the other side is not yet in security context. It's basically annotation driven. You just need to apply an annotation to the pod in order to define a custom partner profile. Yeah, it's more or less the same, like Secom needs to be available on the underlying node. So some of the challenges, as I said before, the default profiles are too permissive for most of the workloads they have. Over 400 system calls, and we can see that even workloads like ng-next, they can run with probably less than 200 system calls. Another challenge is that if you already created custom profiles, you know that it's really tedious to do it manually, and it's quite a challenge. So, especially when you think on a cluster where you have hundreds of containers and you want to automate this to some extent. Also distributing profile in a cluster is not easy unless you control the entire Kubernetes distribution. There is no really standard mechanism. Most of the managed Kubernetes distributions don't have a way to install custom profiles. And also another aspect is cluster autoscaling. So when we create automatically new nodes in order to scale our cluster, we need to make sure that these profiles are already installed on the nodes. So how can we achieve this? Yeah, now if we look at the life cycle of a security profile, we see that first of all we need to create or record the profile. Then we need to distribute it or install it into cluster nodes. And only afterwards we can use it in our workload. And also let's not forget this profile needs to be kept up to date. So we all use now continuous delivery, continuous integration, continuous delivery. A service evolves, introduces new functionality, which probably requires new system calls, or is removing functionality, which can lead to less system calls. So this is also a challenge. So now in order to automate this Kubernetes X created a project called security profiles operator, which main goal for this operator is to automate essentially the life cycle of a security profile. So it runs inside of the cluster as an operator. It also provides, more recently, we have a CLI tool, which can be used outside of the Kubernetes use cases. For instance, if you want to record the profile in a CICD systems, and then distribute it, we will see in a minute a demo. So some of the features of the, or like the main features of security profile operator is profile recording. So it automates all the profile recording functionality. We have essentially two implementation for profile recording. One is EBPF instrumentation. Essentially, we insert the EBPF bytecode, when a profile is recorded. We trace all system calls, and then we save them into a profile. Another way to record the system calls is to use audit logging. Maybe you know where, if you use Secom, that we can switch Secom in audit logging. And every time, when a system call is executed by the container, it gets locked into audit logging. So we can parse this and create a profile. This is probably less precise, like EBPF. Also, profile recording can be simply started and created by just, custom resource, we will see in a demo. Or, as I said before, we have a CLI, so there is a command where you can record of the cluster profile. Then the second main functionality is profile distribution. So this profile needs to be installed into the cluster, and also it needs to be kept in sync every time we modify the content. Of this profile, it needs to be reinstalled, or we remove a profile, it needs to be cleaned up. At the moment, we support three types of profiles, Secom, AC Linux, and AppArmor. Also, it's possible to compose profiles. Basically, you can define a base profile, and then you can reference that profile inside of another profile custom resource. Or you can also use an OCI image instead of just directly reference a custom resource. We will have an example and see how we create an OCI image, how we publish and sign it. Another feature is profile binding. So instead of creating the security context on each container, the profile operator is able to add this on the fly. So we can say, I want to associate security on the fly, so we can say, I want to associate security on the fly. We can say, I want to associate security profile with a container image, and then the operator will watch the deployment in the pods and automatically add the security context to these pods. And also we have some metrics which are exposed and can be used for visibility. So now let's have a look at the architecture. Here looks a bit complicated, but essentially the operator has three main components. So there is this manager. You see it in the bottom left corner. When we installed the operator, we just installed the manager, and manager is driven by a custom resource configuration where we can define the different configuration settings and is able to create on the fly webhook. This webhook can also be statically deployed when we deploy the operator. And essentially the webhook is used just to mutate configuration when we want to record the profile or we want to use profile bindings in other scenarios when we install is not required. And most importantly, the operator manages each demo, which runs on each node of the cluster via a demon set. And as we see here in this yellow box, the demon runs on the node and actually does the whole heavy lifting. It does the profile installation in the underlying node file system and also the profile recording using different profile operator recorders. Also we have a number of custom resources. We will briefly have a more detailed look at them. So we have three types of profiles defined by custom resource. We have this profile recording custom resource, which is used to define and also profile binding. So now let's have a look at an example. So we have here a second profile custom resource. If you are already familiar with second, you see that it's the profile itself, it's inline into the specification of this custom resource. For instance, in this case we have a default action, which is executed every time none of the below actions take place. And then we see that we have a number of system calls, which are allowed and a number of system calls, which are denied. So basically that's the content of a second profile. Similarly for up armor, we see that the up armor policy is directly inline the same inside of the custom resource. Yeah, it's transparent. And as well for SC Linux profile, we have an example here. Yeah, how it looks like. So now let's have a demo. In this demo we will install a second profile into a cluster. So let's start. So basically we already have installed security profile operator in this cluster. We can have a look at it. You can see that now we retrieve basically the resources for this profile operator. We see that we have the operator, the web hook, and four demon sets, which they run on each node. So this cluster has four nodes. So now we want to create a policy, which denies all system calls, which looks like this. So now we create this policy. As soon as we have created the custom resource, then the operator will install it in all underlining nodes of the cluster. Now we can check the state of this. We can see that was installed. This is the content of the policy after the installation. And also we have a resource to check the state of each node. So what's the state of this custom resource on each node? We see that was already installed on each node. So now we want to test this. So let's create a test pod, which is referencing this second policy. So basically now, because we deny all system calls, as soon as we create this pod, it should fail in this situation. Because we didn't allow any system call. So we see that the pod, it fails. So now let's remove this and let the operator to do the cleanup. So we want to remove the second profile from the cluster. So the operator will completely clean up the profile from the underlying nodes and also from custom resource. So now if we check, we see that profile was removed and we don't have anything on the cluster. So everything takes automatically in case that, let's say our cluster out of scales, new nodes are created, then the operator automatically will create a demon. So the demon set will create a demon on this new node and the demon will install the profile base. So basically on that node. Okay, now let's have a look at profile recording custom resource. As I said before, we see that in the specification we can define what kind of profile we want to record. So in this example, we want to record the second profile, what type of recorder we want to use, we use BPF. At the moment we support the second and ST Linux and for a second we support both BPF and log parsing, but for ST Linux it's just log parsing supported. Hopefully in the future we will have also up armor here. Also when we define a profile recording, we need to define an object selektor. We see here, we want to select a port which has this up, alpine, sorry, label. And let's say on this port we have multiple containers, but we just want to record the second profile for engine X container. So we can define which containers, for which contents we want to record the second profile. So now let's have a demo how we record the profile inside of the cluster and see basically. So in this demo we will record a profile for engine X and then we will use it and see that it works. So first, when we want to record something, we create a dedicated namespace and this namespace needs to be labeled accordingly for recording. That's just, we will see later to reduce the amount of work which needs to be done by the operator. So we create this namespace. Inside of this namespace now let's say we want to record these containers, so first we need to create this profile recording CRT which looks very similar with our example. We just have, we want to record it for engine X container, we just have an object selektor for up, my up. So now the profile recording CRT gets created and now we want to record something. So in order to record, to start recording a profile, we just need to create a pod which has the label up, my label and contains an engine X container. As soon as we create this pod, the security profile operator will start recording. So basically to start record all the system calls during the startup and then we'll keep recording any system call executed after the startup. So now let's say we leave this container for a while, it runs, we are happy, we want now to save this profile, we just need to delete our pod and then the operator will basically collect all these system calls and save the profile into a custom resource. So we can have a look now after the pod was deleted so we see that the profile was already installed. We see the path on the local node where it was installed. Now let's have a pod running but this time it's the same pod which we use for recording but this time let's use our recorded profile in our security context of this pod. So basically we just reference now the local profile. So now when we start this pod it should work because we just record it so we should expect that the pod starts. So we create the pod and we will check that it starts. So we see that the pod was successfully started basically. So that was about profile recording so that everything is automated. For this use case to just create one profile for NGNX it took us less than two minutes. I believe most of the workloads are not much more complex than NGNX so basically it's very easy as you see everything is automated. So now let's, I mentioned that we can compose profile we have this profile stacking feature where here we have an example with two profiles. So the profile on the left is referencing another profile in this case it's called Run C and this profile is already installed into the cluster so additionally let's say we have a base profile that we want to use where we say these are our minimum system calls allowed and then we can additionally add new system calls. On the other example we have a base profile which was defined in an OCI image. We will see in a minute a demo. So this image was created, signed, published and then we can use it as a reference as a base profile the operator will fetch this image and create a profile. So now let's have a demo where we show how we can record a profile packaged into an OCI image sign it and then publish it into a container registry which supports OCI specifications. So basically in this case we want to create a simple second profile let's say for echo command it's just simply we run a CLI you see where we just say record and then we specify which process we want to record for and the profile is going to be recorded using eBPF it's saved into a YAML file take this YAML file and package it into OCI image so this is the content which was created we see all the system calls for this command execution now we have a command which can take this YAML so we just specify the path to the YAML it takes the YAML it creates an OCI image it signs this image using Cosign it publishes the signature into the sign store that is widely available for everyone to verify so and then later on basically everyone can verify including the operator itself in another cluster when it uses this image so now let's create a profile which basically it just referenced this OCI image where the profile was defined so as soon as we create this profile the operator will fetch the OCI image from the registry it will unpack it first story it will verify the signature check that everything is right and then it will unpack it and install it into the cluster so in this way we can make pretty good security assumption about the source of this image so it's a good measure for supply and chain security we make sure that these profiles were published by us and not by someone else which tries to inject some malicious let's say system calls in our cluster so that's about the demo let's have a brief look how profile binding looks like as I said before it's relatively simple we just define a profile reference in the profile binding and an image which we want to associate a container image for which we want to associate this profile and the operator it will automatically verify the configuration of any port deployed which doesn't have a security context it will add the security context referencing this second profile ok, now let's conclude with some challenges we faced when we first adopt security profile operator so our first main challenge was that we have clusters they run a containerized Linux distribution so we got flat car Linux here everything is containerized and also the file system is immutable so basically you cannot modify anything the operator initially was developed for I think Fedora, Ubuntu, like standard Linux distributions where you can modify file systems and so on so we had to implement some fixes in this or some redesigns so basically when the operator starts we do some setup into an init container to setup the file system for the operator and only this init container needs root privileges as soon as the init container finishes setup job then we start the effective demon container which does the installation but this demon container doesn't require any root privileges in this way we can install files we can install files in the underlying node another important requirement we had was that as I said we have a webhook and we want this webhook to be highly available because as you may know let's say we have a webhook which is set up to watch for pods let's say if this webhook goes down then the entire cluster is completely down because you cannot create pods and anything so basically this is a very important requirement because it doesn't affect the cluster functionality and we had to implement some specific selectors that the webhook is not affecting cluster operation and also that runs highly available with enough memory and CPU resources another challenge we have clusters that have mixed run times so they have different run times of course each run time is more or less similar but they have slightly differences and the operator had some issues initially for instance to work in such a cluster was kind of designed either you have a cluster running let's say only cryo or container d from the start or docker and not a cluster which runs a mixture of these run times so we have clusters which runs a mixture of run times to cope with kind of this situation that is able to install profiles on each a node running these run times also another big issue we had was memory usage so first time when we created let's say medium large cluster with hundreds of nodes we had thousands of pods when we start the operator it runs immediately out of memory it took us a while to investigate why and it turns out that basically when we start watching for pods in a cluster even though let's say you use a selektor for watching only specific pods the Kubernetes controller it loads these pods in memory in the cache basically of the controller so we had to modify the cache of the operator in order to load only pods which they have a selector in the cache this is a pretty standard memory optimization for controllers in order to avoid memory issues in large clusters we have as I said also multi tenant clusters so we have multiple tenants running in the same cluster in this case of course we have much stricter security requirements we cannot risk that one tenant can affect other tenants because our initial intention was to allow tenants to record profiles for their workloads so imagine we have a scenario where one tenant introduces a malicious syskoles into another tenant profile or introduces a malicious syskoles in its own profile in order to escape let's say the container and up on the nodes so in this case we had to define a sort of an allow list in the operator configuration which is predefined by the cluster operators and this allow list actually prevents that someone installs system calls which are not defined into the list so let's say we have dangerous system calls which we want to completely exclude so this will basically get that nobody outside of an administrator will allow to install the system calls another challenge we faced when we started to record profiles for larger ports let's say so we had a number of ports multiple neat containers multiple sidecar containers and essentially one big application container running a Java for instance service so in that scenario we were just interested to install to record a profile for this Java application so we had to extend the operator to allow us to just select one container for that big board and also we faced issues where we had not ports but containers running a sequence of processes so ideally we just have one process in a container which starts when the container is started but in a real life we have bootstrap scripts they run processes they run like they do some setup and then they kicked in another process so the original design of the BPF instrumentation was done to record the sys calls bad process ID but as you can imagine in this scenario we have multiple processes process IDs during the lifecycle of a container so what it happened was we were losing system calls from the profile and then we could not use that profile so we had to redesign the BPF instrumentation to record profiles not process ID but namespace ID because namespace ID is unique per container so in this way we made sure that basically we have all system calls in our profile so if you have any question happy to answer please take the microphone ok so for the challenges that you faced I hope assume that you guys upstreamed that again to the operator so all fixes we send in upstream project and then all are merged and released ok then I have a very simple practical question so that kind of relates to some of the challenges that we are facing what exactly do you know your profiling is done sometimes you have applications which do strange things only once in a while so what exactly do you know when you are done and it's like 15-20 years ago when they had to build like acilinics profiles it's already exactly the problem so we had exactly this problem so basically once it's enough but of course it's not so basically you need to leave that process running either expecting that let's say you have a running service and you say I record for a week for instance and then you create a profile or you run a low test against that service you try to exercise as much coverage as you can and you hope that yeah of course I don't encourage you to take I mean do a recording and put directly in production that profile just go through a normal release pipeline so first try it in dev run some tests check that everything is fine no sis call is denied if you see something then you can keep recording and update that profile and then release it into a production and one more question related to the best practices too asilinx.com and asilinx profiles or so we started with the secom first and we want to add of course asilinx so definitely I encourage to use both either asilinx or up armor with secom because secom is just basically limiting the number of system calls but asilinx or up armor has more fine grain for instance networking related stuff or other operations which can be performed inside of the operating system yeah and then the recorder for the asilinx profiles is basically using the audit log and then the audit to allow functionality yes exactly so it just starts that but it's everything automated it starts audit logging with asilinx and then it collects all the stuff and then creates and the same can be done for up armor so there is a tool and then so that's the intention like I think by the community to create that thanks extending that question of how do you know when you've profiled enough have you considered automated ways of like letting it run for a week in soft fail and then switch it over later so that's basically what we tried yes we set up let's say a stage cluster and we let it run so that's what we tried specifically though the automated step up was that something that you did or did you always have manual review? we actually did not I mean it's automated in the sense that we ran the operator to do the recording so we don't really watch into the logs and stuff like that but yeah we try to exercise as much functionality as we can basically no other questions yeah thank you for joining thank you