 So welcome to Night Talk and welcome to Security Village. Just a few words about me, because I don't have a lot of time and I like to talk, so I'm going to talk a lot. So I'm Ben, I'm co-founder of Armo, maintainer of Cubescape, also contributor at Tech Security, and I used to be white hacker for a really long time. And yeah, I have a big family of four kids at home who are waiting for me. So usually, you know, most of the cyber talks you're hearing have this effect on people that they're hearing these, all these keywords like lock for Jay and I don't know SolarWinds and the people are going out of the cyber talk and like, we are doomed. Now, I really hope that this is going to be a different cyber talk. So I'm going to share with you something with this at Armo, the way we leveraged cloud-native technologies in order to create what we call continuous security. How much of you are using CI CD? Can I have hands up with Kubernetes? Yeah, so I think it's going to be relevant for you, like a good match here. So what we're going to talk today about three different aspects of your security. These three different aspects, two of them are Kubernetes-native policies. One of them is related to your vulnerability posture in your applications. And we're going to talk about them, how you can leverage your CI CD to make those things much better and like what we call decrease your blast radius in your systems. I'm going to show you that how we are generating runtime policies from application behavior during testing and how we are adding it to our CI CD. And I'm going to talk about is it good, is it bad, or I guess you know where I'm going with this, but it's going to be good, hopefully, for all of us. So today's threat model is like, I think, a very classical threat model around the Kubernetes cluster. So most of the times our pods can be either accessed from the direction of the public internet, either by a service, a load balancer service, or ingress. This is how they are getting traffic from the outside world, or maybe an attacker will try to penetrate your pods through supply chain, through image registry, and so on. But first thing I'm going to show you is actually the application exploit, so your vulnerability posture in your workloads. I'm going to show you how you can decrease the noise around them and only focus on the real things that really matter here. The second thing is that if the attacker was able to penetrate your workload and have code execution in your workload, in the confinement of your pod, the attacker has two directions to do lateral movement or progress. Either he can target your kernel trying to escape the confinement of the containers, which are making up the pod. And we are calling them kernel exploits. We had a flow of a few of them. And as soon as if an attacker is able to escape in a Kubernetes node, escape the confinement of the pods, he or she will be able to mostly be able to take over your cluster, because every Kubernetes node has its node secret. And if the attacker is able, in most of the time, they are simply files on the file system of the Kubernetes node. So if the attacker will be able to take over these files and these secrets, he will be able to talk, as a Kubernetes node, with the Kube API server. And he has a lot of leverage there. The other way the attacker can progress from here is from a pod. He's actually doing lateral movement in the network, going after other pods, going after unprotected parts of the cluster. It should be internal. But when the attacker was able to enter one of the public-facing pods, then he's going to be able to move in that direction. So this is our threat malware for today. I'm going to show you three things. One is how to reduce the noise around potential application exploits. Second, if how to decrease the attack surface for the node and how to reduce the attack surface for lateral movement. And together, we are decreasing the blast radius of the attack itself. So the first thing is application vulnerabilities. So most of us using different kinds of security scanners in our systems in order to know what are the vulnerabilities inside our software packages in the container images. So these tools are very good. Good tools like Trivi, GRIPE, Encore, Sneak, and so on. But there is a small problem with that. Like in general, at least for our production system, we have open-source images with more than 100, even 200 vulnerabilities, which are really, really hard to handle. And for us as a relatively small company, it's really overwhelming. But even with a bigger company who has more resources, they will have obviously more microservices, more images. So this is not something that scales very well. So it is really hard to handle all these application vulnerabilities and always update them. So it's very, very costly to manage them. So the first thing we are going to use in our CI CD is a new feature by Cubescape. Actually, we are releasing it this week or maybe in the first days of next week, which is, we call it the Cubescape Relevancy. Cubescape is a CNCF project. I'm one of the maintainers I said before. And what we do at Cubescape is we are installing an eBPF agent inside every node. And this agent reports to us back all the file activities on every workload which is running inside the cluster. Now, we are getting a stream of files which are touched inside the workload. This enables us to take the sBom of the container image, which is running inside the workload, cross the information of which files been opened inside the workload with the list of files inside the sBom, and mark all the packages, software packages, inside the sBom, which has been really used and opened inside the container runtime. Now, this enables us to create, remove all the packages, which haven't been touched during the runtime of the container. And we can remove all those which are not relevant, not really loaded, and feed back to the vulnerability scanner. And we'll get a filtered vulnerability scan results list. Now, I'm going to show you in a demo at the end of the presentation. But I can tell you that in our production system, it lowered the number of vulnerabilities by 80%, which is a huge noise reduction for us. And I really, really suggest all of you who are working with vulnerability scanners to look into this. So this is our first component. The second component is, as I told you, is how to protect the kernel, how to remove the attack surface on a Linux kernel inside the Kubernetes node. So in the pod security context, you have all these classical things of what's the user ID and group ID of the container, which either you can change or cannot change depending on the container image. Obviously, the right thing is not to run your containers as root. But sometimes, the container image itself requires user ID 0. So you have to recompile and change it. It's really hard sometimes. The other thing which you need to handle here is the capabilities, Linux kernel capabilities of the process inside the container. And remove all the things you don't need. Or even if the container is a privileged container, check whether you really need it to be a privileged. But I have to tell you that since here the default is off nearly for everything, most of the applications are OK from this perspective. The third thing, which was the thing which we are going to talk about, is actually second policies. Second policies are there to enable you to control the containers inside your pod what kind of system calls they are allowed to make to the kernel. Now, the reason why we are talking about it that I did a small research half a year ago, looked over all the CVs of the Linux kernel, which were showing container escape. And as it turned out, most of them were using system attack. They actually needed to use system calls, which were very exotic system calls, which are usually made by only applications which are managing the node. And our applications usually don't need that. So it is a very good way to protect our kernel by removing all the unneeded system calls. So in order to apply second profiles, either we are using pre-made profiles, pre-made list of system calls our applications are allowed to do. Either it breaks the application, or it allows the application, or the application can run. But usually, even if they are running, they are over permissive. You can, the other way is to do a manual definition of these second profiles and create a list of system calls your workloads are allowed to do. But it is a very tedious job for every each workload. And honestly, we don't usually have the professionals to do this work. Because if you are telling application developer today that ask him what are the system calls you're making, you won't understand what you're talking about. So the best thing we thought is to generate from actual application behavior, check what are the system calls the application is doing, create the recording, and turn it into a policy. Katelyn was talking about it yesterday. So there is a Kubernetes security profile operator. It's a Kubernetes security project, an awesome project which enables you to record using all the system calls your application is making, create a recording, turn this recording into an actual policy object, and apply the policy. Again, what you have to take from this slide is that you're using a premade tool to generate the list of system calls your application is doing. And you can use it also to turn into actual profile object and have Kubernetes enforce it. Network policies, on the other hand, I think the least complex thing to explain here in the security village, I think everyone knows why network security is important and why locking down a different network, unneeded network paths in an application environment is very important. There are reasons to the attacker will need can do if there is no micro segmentation in the network of a Kubernetes cluster. The attacker can do reconnaissance, lateral movement, and can excretory data from other ports through unprotected APIs. So the major problem with network policies in not just cloud native, but in general in cloud environments as opposed to old style monolithic self-hosted environments is that the application we have microsurface architecture with multiple components and it's really hard for it to know who needs to talk to whom. And if you're investing time in it and you're defining it once, you're working really, really hard. But at the end of the day, you're applying it and you might be breaking some of the application features then you have to go back, iterate it, talk to the application developer and so on back and forth and you're creating your application, your network policy. But at the end, what turns out that the application within two weeks changes again and it's again broke because it uses a new network path which wasn't used before. So it's very, very hard also to maintain because it is spread around multiple functions in the organization and there are a lot of things to break. So here we also decided to go after an automated tool, inspector gadget, another CNCF project. I think it's got to sandboxing in a month or two months before. It's a great project. Again, same thing using eBPF to detect who's talking to whom inside the Kubernetes cluster. The inspector, you can run the inspector gadget as a record mode. It will create you the list of who's talking to whom and then you can turn it into a network policy object in Kubernetes and apply the network policy which is actually the network policy object where we reflect the actual connections itself before. So think about the following. Had we have a magic box where we are throwing our application with all of its workloads inside the box and everything it needs, everything it runs and also out of the box comes a short list of vulnerabilities your application has. You really have to concentrate on network policies and security and second policies. So it would be really great. And why it is really, really great? Because this is what we're good to hear. We are really good at automating things, right? We really love it. If you really have to work manually, we'll break things always, we'll have all of these problems. If we can automate all of these processes, we can earn a lot with this because we can have automated security. So it will make our security people, our DevOps people and DevSecOps people very, very happy. By the way, all of the images here, I've generated them with mid-journey. So it was a really nice thing to play around preparing this presentation. So they're smiling here, slightly, by the way. So this brought us really to the concept of if we have CI and CD, we should have continued security, right? Because it's a really good approach to trying to hook up all these things in our existing processes. In our CI-CD, we're already running tests. We are already doing a lot of things of testing out application behavior. If we could capture all this information during the testing phase, we could actually turn these behaviors into policies and apply these policies into our systems. So we should improve our security much better. So let me go through the simple way of how we can do this. So the first thing is to create a namespace where we'll throw off our components and we'll test them. We start recording with all of these tools, either of those are just one of them, and we can deploy the application, run the tests on the application, stop the event recording when we finish the testing, generate policy objects. Now we are rebuilding the application with these policy objects, applying them to the Kubernetes cluster, all of those, re-running the test in order to make sure that we haven't generated policies which are actually breaking our application. And after we retested the application, we are committing these objects into our Git repository, so they are picked up by Argo in the production. So, I don't know, is there anyone who saw my teaser at Twitter before coming here? Like, I've promised a joke, but actually about Inspector Gadget, the panda and Captain Cube going into a coffee shop in Amsterdam. This is the picture, mid-journey generated from it, but I couldn't come up with a joke. I mean, I'm sorry, but I think it's cute. At least the people told me it's cute. So, what I'm going to show you is I'm going to show you a demo of all these three things we talked about. I'm going to do a small environment setup. We are going to see how the second profile is generated. We are going to test out and prove that the second profiles are working and indeed protecting us. We are going to do some network policy generation. Again, using the testing phase, we will test out the network policies. And at the end, I'm going to also show you the vulnerability reductions I've told you before. So, let's hope that the streaming is working. Okay, so the first thing is I'm creating a mini-cube. By the way, at the end, you will see I'm publishing a GitHub repo with all these things as goodies and examples. So, I'm installing in the mini-cube the search manager. I'm installing the Kubernetes security operator. I'm installing InspectorGadget and Cubescape, all three things during the installation phase. By the way, I'm not sure that you have seen the mini-cube starting up so fast. I have edited the video. But now, the first thing I'm doing is, actually, I'm running the test for creation of second profiles. So, the third thing is I'm creating the recording objects for recording all the applications I have in my namespace. I'm creating the applications, and then I'm waiting for other deployments to start up. After the deployments have start up, I'm running tests for 60 seconds. We are not going to wait here for 60 seconds. I've edited the video, but believe me. And after, sorry. So, after it, I'm scaling down all the deployments. And after, which triggers actually the generation of the second profiles, which you can see on the screen, after the second profiles have been generated, I'm reapplying the second profiles which have been created to all of the deployments automatically, and restarting the whole application. Sorry. You can see that we've created all these, during this phase, we've created all these YAML files with the profile objects themselves. I'm going to open here one just to show you what you have in the YAML files. These are the second profiles. So, you can see here a list of different system calls the application is doing. As you can see, it's not really good to generate this manually, but it's really great to generate them automatically. So, what I'm doing now is I'm trying to show you how it protects the cluster. So, I'm trying to open a shell on one of the pods, which is protected with the second profile. The thing, what you're going to see is that you don't see anything, which is kind of cool because I'm trying to open a shell here and the shell is not opened. It is not opened because the shell itself uses system calls, which the application haven't used, and therefore the shell is not allowed and it is killed by the system. Now, what I'm going to do here just to show you that I'm not cheating, I'm removing the second profile from the actual deployment here. You can see at the security contacts of the pod, you can see here the second profile with the JSON file, which contains all the system calls. I'm deleting and restarting the pods after the editing the deployment. And what you will see here, that I've removed the second profile, is that I will be able to actually open a shell. You can see here the second pod is the one who's been the second version which is just been created. I'm doing a shell, trying to open a shell using exec. And you can see that I was able to, without the second profile was able to open a shell. Now again, just for like explaining shortly, let me just stop it for a sec. So, the way I've presented, I didn't try to do an actual attack on the Linux kernel, I just show it through the second profile works through the shell I tried to open. But think of it, most of the use you're getting out of this is not just like protecting against processes which are shouldn't running inside your container, but you're protecting your kernel as well. So I'm showing the second process where I'm creating network policies. So I've enabled the recording and restarted the hipster shop application. By the way, the application is called Google Microservices application, which is a great demo app for these things. I'm again recording all the network events. After I recorded all the network events and run the application for a minute, I've generated a network policy file, which I'm going to show you here. Again, I'm using Inspector Gadget. So I'm opening the network policy object and you can see these network rules that which pod is allowed to talk to which pod inside the cluster. Now I will show you that it's indeed, it's working and protecting your cluster. So I will try to, as you remember, I said that the attacker would come from the first public-facing pod. So in our case is the front-end workload. Here I'm opening a shell on it and I will try to access the Redis component inside this demo application, which should be the database in this demo app and would be very logical for the attacker to try to exploit. In general, they shouldn't be talking one to another. They should be like, the front-end should get information through other microservices from the Redis. In this case, I'm opening a shell and I'm trying to telnet into the Redis cart Redis component to its port. And if it succeeds, then it will say that it's connected. If not, then it won't say anything and you can see that it's stuck because network policy is protecting actually this pipe and I'm going to delete afterwards the network policies just to show you that in general, they are able to talk to one to another and there is interesting thing here because in Kubernetes native network policies, you have the ingress and egress part. So you're protecting from both parties. In our case, I needed to delete both of the network policies both for the Redis and both for the front-end because one of them is denying going out to the Redis. The other is denying accepting from traffic from the front-end. So now I'm trying to, after I deleted the network policies, I'm trying to connect the Redis cart and I'm going to be, it's very surprising but because you don't have network policy, you can connect and it's not good obviously. And I forgot how to use telnet properly, sorry. So, and now the third thing is the vulnerability scanning. So I've installed Cubescape in the cluster and deployed again the application and Cubescape is scanning the image vulnerabilities in the background and what I'm doing here is I'm after it's scanned already continuously the vulnerabilities and it has all the information and let me stop it for just a minute to explain you. It generated two vulnerability objects. Both of them are in SPDX format. One of them for the base image, not the base image, for the image itself. The second is based, vulnerability list is based on the actual application behavior. So we can compare them one to another here and you will see that in the case of this worker which I chose, which is called recommendation service, I will try to extract the number of vulnerabilities through the Cubes CTL to show you how many critical vulnerabilities you have in this application and you have seven. Now I'm trying to show you that you have high 74, again, 58 medium vulnerabilities which is kind of a lot but now let's check when we overlay the information of the runtime from DBPF which I told you how many actual vulnerabilities we get after the filtering and you will see that it is much less which is actually a kind of point of this demo. So I'm trying to find how many critical vulnerabilities I have. Only three out of seven are indeed in the memory. 19 out of the 74 of the high and 17 out of the 58. So we have a huge number of vulnerability reduction here and I can say that even this is very well maintained, these are very well maintained images. So we've seen images where we got like 90% out. So this was the demo. I hope you could follow because it was a lot of information. I'm sure for like in a short time. Okay, so this means that our DevOps is like really, really happy and also not just DevOps but a security guy and also developers because they've saved a lot of time and now they have time for cakes and I don't know what the drink is, but it looks good. So let's talk about what is wrong with this because it's nice for me coming up here and promoting an idea, but let's talk about what's wrong with the idea. So what we noticed actually when we install this in our systems and this process, we noticed that we have to allocate much more CPU for this EBPF based tools during the testing phase and it costs some money. I cannot deny it, but it's not too much but you have to be aware of it. The second is that these tools, although they are great tools, CNCF tools and I can say a lot of things of them but they are not very broadly adopted. So there can be like some usability issues and so on. So if you are going down this road, you have to be a little bit patient. Also things like a few of them are managed through CRDs which are kind of, we've talked about it with Kellen here that like in an imperative process like a CICD process is sometimes really hard to follow what's happening inside the CRs. And obviously when you are missing some of the test paths in your CI CD tests, then actually resulting policy objects might break your application at the end because if the real application is using different paths then you might get to broken policies or broken applications. And also if your system is incomplete when you are capturing this information, obviously again your policies will be broken. Think about the very smallest thing we didn't have in our dev environment, we didn't have our Prometheus app and no one was calling into exporter APIs. Therefore when we promoted into the production, when we started to use the exporter API, the expert API wasn't answering because the network policy logged it out. So you have to take in these things to account and go more, do these information captures more in the direction of CD than CI. But what's good? So what is the positive things? So the positive things are actually you are improving your security posture by like far. Like adding network micro-segmentation, adding sec comp profiles limiting the number of vulnerabilities you have to follow and update is like something that's really like by order of magnitude raises your security game. And I know that it's cheating to say that set up once and forget it. It's not always true and you always have to have some maintenance but in general it is like any other CI CD process we have and you have to sometimes maintain it but you are earning a lot of value from it. And you are really getting into the fence where what you call least privilege principle is applied for sure because you are only unable of things which are in use. So I've created this demo repo. You can scan the QR codes. I will let you like give you a minute. You can jump here. Actually, I have to say it was working on my machine. So, but if there is a problem I think I would really like to maintain this like in a way that's to promote the idea. So if you are getting into problems just open issues and I don't know we'll like sort it out and I would be really happy to like make of use it as a educational repo for people who are want to implement advanced network policies and second policies inside their clusters. Also about the Cubescape which is I've added to this presentation but it's not released. It's going to be released next week. So if you are interested in the vulnerability part you have to wait until next week and you will be able to use it as well. So that's about it. Rate my talk, send me messages on CNCF Slack or on Twitter on I don't know Pigeon or something. You're more than welcome. And if anyone has any questions I'm open here for questions because I see that I was able to make it which is great.