 Perfect. Thank you very much. Hi, everyone, and welcome to my session on the Kubernetes risk assessment. I appreciate it. I know you come after lunch, so I definitely appreciate it. And just a quick poll before we start. How many of you are assessing risk in Kubernetes? Great. How many of you heard about the CIS benchmark? Nice. Okay, perfect. So we're in the right place. I'm not going to talk a little bit about it, but that's not the main scope of the talk. Great. So my name is Ariel. I'm today part of CISCO, part of the responsible for the security application, cloud security application in CISCO in a business called ETNI, which is a new business unit on emerging technologies. I was in Portshift, which was acquired by CISCO. I was in Aqua Security. Some of the people know this company even in Checkpoint before that, working on some open source contributions, whether it's QB. It's a nice open source for creating S-Bone probability scanning, part of the CNCF in a few initiatives, and also MitraEtec. Anybody familiar with the MitraEtec? Great. Okay. So we're going to talk a lot about it. Great. So for this risk assessment, why do we need it? Where does it come to play in our life? And how many times are you doing it in a year? So just for those of you who are involved with it, how many are you doing it periodically, those risk assessment for clusters? CIS benchmark compliance, anything? Okay. I assume you don't do it for fun. You're doing it because you're trying to assess. Now, the challenges that at least force us to go deeper into this risk assessment is, ideally, understanding where we are, what are the risks we have, what do you think we need to mitigate. The primary suspect is, of course, the application, pause, the worker knows, because that's where the attack usually happens. But the master node and the control objects are not less important, maybe even more, because a small change in the master node or a small change in the worker is going to create a much bigger impact. So small changes on any of those elements can create big impact. And this is why you always want to monitor them more. Even more than that, the Kubernetes clusters have dynamic parts. We keep changing. We're changing objects. We're changing roles. We're adding policies. We don't have policies. All those changes make a small change in role-based access control without understanding who has this role-binding. And you open up yourself to many potential attacks. Same thing, apply for policies. You enable something. You change AP. And then all of a sudden, all your cluster is exposed. And there's objects like ingress and egress that make it even harder. Now, usually, I love this picture because I always find myself in a constant chase that usually we have compliance, usually done once a year. A lot of times, it's required that we're threat-modeling because we want to come to developers and tell them how they need to design and plan their services. So we want to better understand what is the threat. And most commonly, usually, we want to avoid the abuse of the cluster. A lot of crypto mining. You probably read a lot about it. It happens. And we want to make sure that we cover it and we're not exposed. So I ask you about the CIS benchmark. So the CIS benchmark is definitely a great start. Those who are familiar and working with it, it's very impressive. There's a really very comprehensive set of security checks focusing on securing the configuration of Kubernetes elements. So it starts... I think the CIS benchmark was the first that was there. It was the first framework that people start using. It was in the very beginning of Kubernetes. And the good thing is that they keep updating it as things are moving by and they keep changing for every version. There is a new CIS benchmark. And indeed, it's a very compressive check. We're going to see it in a minute. It really touches everything on Kubernetes. Everything on the master node, even a lot of comprehensive checks on the worker nodes and admission controllers. And warning, this is tough pictures. But it's really, really, really a detailed list of many, many, many tests that you need to check. Now, again, as I said, they keep updating it, updating it, which is very good because they keep doing a great job in that. Again, I think most of the attacks are usually because of those misconfigurations. Now, they're also going to network policy, ensuring that you are setting up some policies, covering even the role-based access controls, secrets, making sure that any sense of information using secrets, and even so that in the recent editions, they added stuff about port security standards, which is fairly new recommendation for isolation using Seccom, which is very new. And again, they're doing a great job in making sure you configure a cluster carefully. So this is really great. Now, I think this is the primary topic of my talk, is that CIS benchmark is a really great start, but it's definitely not where you need to stop. Or it is definitely not enough. Why it's not enough? Because those benchmark provide, you know, looking at all the security misconfiguration. Now, security misconfiguration are indeed the primary cause for cyber attacks. There's no doubt about that. But security misconfigurations are not the only risk factor in clusters, right? When you look at all the attacks, which are published, or all the nodes which are made, most of them coming from different vectors. A lot of them coming from backdoors in like open source images, like you read a lot about, you know, new images which were discovered and new vectors which were added into it. A lot of them coming from compromise on insecure, but very popular tools. And those who remember last year was a big campaign on Kubeflow. People using Kubeflow for machine learning in Kubernetes clusters. And there was a very nice campaign in Azure that really exploited some misconfiguration. I'm not even like deliberately, but some, you know, mistakes that were made on the configuration of Kubeflow. Network manipulation. There was a recent CVE which was showing how you can leverage the ingress fusing and genics in order to expose secrets for the clusters. So the idea is that misconfigurations are important. Definitely, it's a good place where to start. But definitely not where you need to end if you want to make sure that you are secure. Interaction is abusing you. You need to take advantage into the actual, I would say, dynamic content. You know, images, networking, things which usually are being exploited when you want to launch an attack. Now, if you ask yourself, okay, so we understand that misconfigurations is a good start. We understand, you know, where, you know, we want, you know, where we can start. But the question is, where we want to go? Or how do we make it, you know, even better? And here, I think there are three factors that we need to consider when we plan our ideal risk assessment plan. I think the first one is the attack context. One of the challenges that I personally find the CIS benchmark is the fact that not all misconfigurations or not all vulnerabilities are born equal, right? Not all of them are equal. Some are more important, some are less important, okay? Some has bigger success or bigger impact, some are smaller. What a smaller impact. And we need to factor in what is the real risk. Now, today, when you look at the CIS benchmark, usually you run this test, you get to see, you know, how much you're accomplished, where you are covered, where you're not, how much you need to do it. There's no details on what is the impact of this misconfiguration. This is like some untouched element that you never reach to it, and it's good it's going to be configured properly, but it's not such a big thing if it's not. Or it's something that, you know, open your cluster for anyone to go in like a dashboard or something like that. It's very powerful and there is no login requirement to do it. You really need to add some more risk context into those tests. This is, I think, an important aspect to consider. Another thing is the security context. So ideally, you know, or sometimes you can find yourself that with changing one setting or changing one security layer, you can eliminate many risks. So there is also, you know, balance between how much effort do you need or what you required versus what the gain that you gained. So again, one of the challenges that I've seen in the CIS, there is just plain list of tests. You can see where you're complying where you're not, and watching it to complain. But you don't see like if you can do one change, then maybe it eliminates all the other changes. So there is it to be like more holistic view, something that give you, you know, the full context of the security. So it'd be much easier for you to plan and to configure your security tools in the right way. And the last item, which I think is critical, is the remediation context. So good risk assessment tool should lead to good remediation plan, right? And if you can do everything with automation, it's the key. So for one side, run an automated process to assess all the risks, and then run an automated process to mitigate those risks and mitigate them. And I think those, in my opinion, of course, would be great tools to take our risk assessment one step further. Okay. So let's try to examine what other options are there on the table if, you know, we want to indeed to go one step further in our assessment. So this is the Mitre attack. I don't know if people are familiar with it. And the Mitre attack publish and the Mitre org, to be more precise, publish and attack metrics for different environments. When they see that there is an environment that is becoming popular for attacks, they publish an attack metrics, which is actually, you know, a very detailed list, that list all the techniques and the tactics and the techniques that were used in the wild. So they not doing just theoretical tests or theoretical explanation of what can be done wrong, but what I think that's what you're really being used. So sometimes, you know, there are great theoretical risk, but the exploitation is so hard that it's almost impossible to use them. So the beauty of all these metrics is you're taking a list and they're doing a lot of work to validate them. Those are where actually in use and the specific environment that the metrics is trying to cover. And what I like very much on those metrics is that they're organized according to the cyber attack kill chain. So you really get the security counter. So you see the all the stages of the attack. You can see where are the, you know, the exposure or where are the items based on the attack. And then you can understand, you know, to yourself, you know, where you are and what do you need to do maybe better or, you know, could be that there is a risk, but it like maybe down on the chain that if you cover everything before, maybe it's not relevant for you. So I think that this cyber context and cyber kill chain is very important. So happily, the Mitra attack published a metrics on containers. It started in 2021, after almost a year of research and validation. And it really documented all the real life attacks on containerized environment. It was a very good cross country collaboration and I'm happy to say that, you know, we in Cisco collaborated a lot into this work. And the beauty about it that, you know, until then, when you look about, you know, research on containers attacks and stuff like it, you can find a lot of theoretical things on every KubeCon. You can see many talks about how you can break out and what you can do. But in the real life, most of them didn't happen. Now, most of them maybe because they're too complex that they because, I would say, attackers are lazy. Probably you don't want to assume that. But one of the things that the the mic could put, you know, as a starting point, which was very, very important, is we really want to know real things, real things that happened, real attacks, not just theoretical ones. And I think this was, for me, at least the first time was a very interesting initiative that looking at, you know, things that are real and give you something that more better validation to what you think. And the end reached like plus 40 attack techniques, which each of them is really documented. So for every technique, there is detailed information on what is the method, how it was actually really used, you know, when and where it was discovered, how you can mitigate it, you know, how you can detect it. So sometimes, you know, you want to detect before you mitigate it. You have like very good references. So if you really want to read and get, you know, more information about it. So overall, I think it was a very interesting initiative. We can look at it in this table. And as I said before, all the all those, you know, tactics that are listed here, every cell in this matrix contain like a real life attack attack that happened in real life. And the beauty about it is that it's really set according to the attack kill chain. So you can see on the left, this is the initial access, you know, how do you access a cluster? How do you access a container? So you can see like, for example, you know, exploit public facing application, there's references to all those dashboards which were exposed, whether the Kubernetes or Kubeflow dashboards, which were exposed. How do you do external remote services? How can you abuse them in order to get access to your clusters? And then there is the technique for the execution, you know, how do you execute, you know, your code on those clusters? Persistency, how do you maintain persistency, privilege escalation. And again, a lot of detailed detailed techniques, good information, a lot of things which you can do. At least if you want to assess the risk of your cluster, those are items that you definitely want to check to make sure that you're not exposed. And as I said, when you can see the context of what part of the attack, what stage of the attack, it's happening. You know, for example, that if you are covered and there is no way to do initial access or execution or persistency, and you really covered your privilege escalation in a way that no one can make out, then if maybe if you have, you know, wrong things in discovery, you can understand that your risk is not so big. Because until an attacker need to reach the discovery phase, you need to pass so many things which you covered and you protect, then you understand that it might be an issue, but not, you know, a big one. On the other side, if you see that, for example, you have holes here, you know, in that public facing in what, you know, documented initial access so it can lead to an execution, then maybe a bigger impact because all the rest will follow. So I think overall this context is really important when you want to assess what is the actual risk of your cluster. Let's take, you know, a deep dive into just like examine and examples. So one example is, you know, execution, you can see like, you know, for example, malicious image, this is just one of the sales. So you can read the old information about how this can be abused. You can see who is using it, for example, team TNT. It's a known attack group that is using it in order to get, you know, usage. I think in this case was mainly for crypto mining, you can get the details on how you mitigate it, how you detect it, and overall it's a good way for you to test, you know, how to avoid it in your clusters, you can even see who are the contributors of it, which is again, very nice information. Same thing about container API, how can you abuse them? And, you know, I choose those examples because not just to show that it's really working, but mainly because those are items which you cannot cover by CIS benchmark. So CIS benchmark, which is very good start on how you configure things properly, but if you don't look inside, you know, the active components, you will not be able to discover it. Those are items which you're not going to define. And I think that this is, you know, an area where the Mitre attack can really provide, provide, you know, a good benefit and a good usage for the user. So what are the benefits of the attack metrics? So obviously, you know, it enlarges the coverage of techniques which goes beyond security misconfiguration, you know, you get to see the real attack context, so users can understand the impact and the risk and plan their mitigation in accordance, and provide a good remediation context, how to mitigate it, how to, you know, minimize the exposure, what do you need to do? But there are also some challenges. Okay, nothing is perfect. There are some challenges. And here one of the challenges, for example, in the Mitre attack is there is lack of automation. The good thing about the CIS, you can really automate the process, the many tools, you know, you can automate it, you know, get it from cloud services, they can, you know, do it for you. And here, you really need to do it, a lot of things you need to do manually. You need to manually go and check, you need to manually, you know, do it. And the lack of automation, you know, this is something that, of course, make it much more error-prune, take it a much longer process, and put some burden on the user. Now, another challenge that I think I see in the attack metrics is it's a reactive approach. So it's only document and listing attacks which took place. But unfortunately, you know, in the cyberspace, also in Kubernetes, we talk about very creative, you know, attackers. And usually, you know, it's good to know what happened, but you also need to predict what will happen, right? And, you know, if you only look at what took place, you know, it doesn't mean that we're going to prevent new things that will change or something new that can happen to our cluster. So I think that the micro tech is providing really in-depth approach, a very good looking, but still is not perfect. Okay, so that leaves us to the question of what's needed, right? If you really, you know, want to create our ideal work, is what do we need? So here, I have more, I would say, wishes than just really concrete work that I can share, but I'm saying we definitely want to add an element which is, I think, missing in all the different framework, which is the security context. Today, every item is being looked independently. There's no overall security context, so the desired security context can allow users to check the impact of each mitigation. So if I do one mitigation and I cover many, many risks, of course, the cost benefits for me is much bigger than if I do a lot of work and just cover one risk. So assessing what is the protection context versus the risk can help me to prioritize on where I can get the bigger value, of course, right? So having this overall context, because sometimes, for example, let's assume that I see many risks coming from potential network exposure. This will put just create policies on my ingress or my egress, right? Or if I put maybe more, I will protect my network, create maybe more restrictive network policies on whatever is coming into my cluster, I can really save many of those risks, right? So this is like an idea where if I have a broader context and a mitigation plan which take this context into consideration, I can optimize my steps and I can get, you know, better value to my efforts. Another thing that is really missing is the proactive approach, right? The most challenging barrier is assessing the unknown. And I think, you know, it's always hard to predict the unknown, right? It's one in security, it's one of the big things. You never know what you never know. But here, this is an area where the CIS is doing a good job because, yes, you don't know everything that you can do, but if you know that you can minimize the attack surface by covering all the different elements and, you know, adding this security layer. And as I said, automation is definitely key, especially in cooperatives environments. And this is a call to the open source community, here in the room, is to create automation tool for the Mitre attack, Mitre is something that can assess the test and can also help you to automate the remediation of it. So if I ask myself, you know, what would an ideal Kubernetes risk assessment framework would look like? So I think, and again, I'm saying it's ideal because there's a lot of work around it, but something that starts with leveraging all the CIS benchmark, which prevent, you know, the common and maybe the unknown risk by setting up, you know, good security posture to the environment, something that leveraged the Mitre attack metrics because the check were in the cluster, you have exposure to exploitation that took place in the wild. So you can really know that this is something that can really happen to you. We did it, you know, in CIS code, this is something that we also contributed to the back, you know, to the Mitre attack and again, hope to create some more automation around it. And of course, something which is fully automated and not just in the inspection, but also in the remediation plan. And hopefully it's not done annually once a year when the security team comments ask you to do compliance, but something which you do constantly to making sure that your clusters are always protected and not in the risk. So if I want to summarize everything, Kubernetes clusters really require, you know, content risk assessment picture, but it's not something to do once in a life and go to, you know, to do other things. It's a constant work. You always need to check yourself because unfortunately, the good things which happen, but also sometimes bad things happen. The common framework is the CIS benchmark. I think it's a very good and comprehensive work, really comprehensive set of configuring tools can be automated. Really, really great start. But again, it's not enough because it's only focusing on misconfiguration. And this is where I presented the Mitre attack, which is a very good, you know, it's a very good work that really looking in the details and provide you, you know, what I said, you know, the risk context, you know, where are you in the kill chain of data attacks, can we provide good detection mitigation options so you can always, you know, know how to detect or how to mitigate this attack. But again, even together, we still need more steps to get to the ideal framework because we also need to look at not just showing me risk and show me what's going to go wrong, but also showing you how to fix it. And where is my bigger value in fixing and what steps I can do, for example, in fixing like an infrastructure layer, and then automatically eliminate risks that coming from the application called the infrastructure itself is secure or the cluster is protected, or there's no way to do exploitation by, you know, providing a more basic layer. That's it. Thank you very much. I hope it was interesting. And hope you find it useful and you're going to protect your clusters. Thank you. If no more questions, and I give you back a few minutes.