 Hi everyone. I welcome you all in this session on multi-layer hardening in Kubernetes in KCD Chennai 2022. My name is Kavya Rangaraj and I'm a software engineer working in Prodap Solutions in the research and development team. And my work includes exploring and developing applications using next-gen technologies like machine learning, natural language processing, depth stack ops, blockchain and so on. The agenda for today's session includes introduction to multi-layer hardening and how to improve the security posture of your Kubernetes cluster through multi-layer hardening by following a three-step approach and the reference pipeline incorporating all the tool chains and guidelines you need to follow at each and every layer with a unified view. The key takeaways of this session which I believe are the understanding of multi-layer hardening and the key components to secure at each layer and a unified view of your Kubernetes cluster in terms of security and how to adopt depth stack ops to your application. Starting with what is multi-layer hardening? So multi-layer hardening is a concept of securing multiple layers of entities under consideration. That means an entity could be an application, a database, an infrastructure or a Kubernetes cluster. So you consider an entity and identify all the layers in the entity and you harden or you secure the entity. But what is the purpose of hardening? The purpose of hardening is to improve the overall security posture of the entity and it makes it difficult for an attacker to get control of your entity. So if you're hardening a Kubernetes cluster, the purpose is to improve the security of your Kubernetes cluster against various attacks. Hardening doesn't confirm 100% security, it just makes it difficult for an attacker to get control of the cluster. So if you take an application, there are multiple layers like presentation layer, business logic and database layer. Likewise, if you take a Kubernetes cluster, there are multiple layers in a cluster like cloud layer where you run your Kubernetes cluster. It may be an on-prem data center or it may be a cloud environment by any of the cloud service provider and followed by the cluster layer. So in the cluster layer, you have all the Kubernetes control plane components and work and work components followed by the container layer where you run your application workloads and followed by the code layer which is the innermost layer and forms the primary layer for an attacker to initiate an attack. But if you see the pattern of adoption versus the pattern of security over the years, the organizations that are using or evaluating Kubernetes is 96% whereas the organizations that use Kubernetes in production is around 69%. The adoption is tremendous over the years. Whereas if you see the security pattern, it's like 55% organizations have had to delay their application deployment process because of container and Kubernetes security concerns and 93% of organizations experience at least one security incident related to Kubernetes and container in the last 12 months, which means as your application, as your entity evolves, like your cluster evolves, the feature of the orchestrator evolves and the adoption rates increases, the attack surface is also increasing, the attack space is also expanding, which is opening a room for more security incidents and new type of security incidents to happen. And this strongly establishes the need for a multi-layer hardening in a Kubernetes cluster. So let's talk about how to implement multi-layer hardening. What are the process you need to follow? What are the steps you need to follow in order to implement multi-layer hardening in a Kubernetes cluster? The first step includes the understanding the attack surface, be it a cluster layer, be it a code layer, a container or a cloud layer, you need to understand the attack surface, the components that are present in the surface and how they interact with each other. Followed by you need to comply each and every layer with industry standards and best practices. If you ask me, most of the hardening work is in this second layer, you need to comply your entity, be it a Kubernetes cluster or an application with the industry standards so that you are safer from the various attacks. And over the period of time, you evolve it into a DevSecOps for Kubernetes approach into your SDLC. So these are the three steps we can follow in order to achieve multi-layer hardening. So now let's talk about the cluster layer in the Kubernetes architecture. Why security? The biggest concern in the cluster layer. So here you're seeing the Kubernetes architecture containing the control plane and the worker nodes. The control plane has components like API server, ETCD, scheduler and controller, whereas the worker nodes runs your application workloads as parts and they have components like Qblit and Qproxy. The more general idea why security is the biggest concern in a Kubernetes cluster is because of the complex architecture. You have various components and you need to maintain security of all these components. So it seems to be very complex. And the second idea which makes the security a biggest concern in Kubernetes is supply chain dependencies. Say you have an ETCD component and you're storing the secrets in ETCD unencrypted. So if in case they are exposed, it may lead to a serious attack, right? So even a small less secure component or a misconfiguration can lead to a serious attack. So this supply chain dependency is one of the reasons why security is the biggest concern when it comes to Kubernetes. And Kubernetes is a promising target, right? Once an attacker is able to get hold or it control of the Kubernetes cluster and can control the underlying resources, the attacker may run a mining mechanism inside a cluster, right? So they can use this crypto jacking whether they run the mining algorithms along with your application workloads and that may go unnoticed, right? They may even keep a similar name for the mining workload like core DNS followed by some ID. So you don't even notice that there is a new or there's an unauthorized workload that is running in your cluster, right? So these are the reasons why security is the biggest concern when it comes to a Kubernetes cluster at a higher level. To add more granularity, we can see the major causes of a security incidents in a cluster are misconfigurations. Kubernetes components are by default, they are misconfigured by default, that means they are open and they are insecure. So this misconfiguration needs to be handled at the initial stage. So if these misconfigurations are not handled, they can lead to a serious impact and failed audit. So the cluster needs to be frequently audited against various standards and benchmarks in order to avoid supply chain attacks. So it doesn't mean when you develop an application, you scan all the code, you scan all the containers, the images and you just deploy it in the Kubernetes cluster, your security doesn't stop there. It actually starts there for Kubernetes. So you have to conduct frequent audits in order to make sure your cluster is in a secure stage and also runtime security incidents. So if you have containerized images for your applications with vulnerability that doesn't follow any best practices or lacking some best practices either in the code stage or in the image stage or in the container stage, it may lead to runtime security incidents. And the last one is the failing compliance. Most of the Kubernetes clusters which are attacked are somewhere failing the compliance with the industry standards. They are not adhering to the industry standards or compliance frameworks. So let's just see how we can overcome all these causes one by one. Starting with the misconfiguration, so the national security agency or the NSA, it has provided certain recommendations to harden the Kubernetes cluster against the attackers. That means the NSA has released certain guidelines and categorized it into these categories like port security, network separation, upgrading and application security practices, audit logging. So under each category, say you take the category network separation, there will be a guideline that you have to make sure you the access to the control planning components is secured through a firewall and you have to enable role-based access control. Whereas if you take the audit logging, you have to enable the audit logging and also the audit logging for your application. You have to have full observability of your application. And also upgrading and application securities practices, like you have to follow the best code practices and you have to see if there are any unused packages and you have to do frequent vulnerability testing and security scanning of your code. And these are the various guidelines that have been released by NSA in order to harden your Kubernetes cluster against attackers. And this falls in a misconfiguration part of it. So by following these guidelines, you can overcome certain misconfiguration and make your cluster secure. The next framework could be the Mitre, adversarial tactics and techniques and common knowledge framework. So it's a knowledge framework containing the tactics and techniques an attacker will use to attack your Kubernetes cluster to get control of your cluster. So it lists all the techniques, it elaborates the techniques in which an attacker can use to get control. So you can evaluate your cluster against these techniques and make sure where your cluster can be compromised. And you can improve the security posture by enforcing adequate detections and mitigation mechanisms. And here is the threat matrix of the same framework. So you can see it starts with an initial access, how the attacker gains initial access to a cluster. So it may be using cloud credentials, you have exposed your cloud credentials or you haven't enabled that much authentication authorization mechanisms. And it's just exposed to the public and you have this application vulnerability, which may form as an initial stage for an attacker to get into your cluster. Once an attacker gains the initial access, the attacker can run their own scripts, their own containers into your cluster. And they need to be persistent, that is they need to stay in the cluster, even though the initial foothold is lost. So they can escalate on privileges. For example, if they got access to a container and through this container, if that container is a privileged container, and through this container, they can get access to the nodes or other containers. And attacker also follows a defense evasion mechanism. That is, the attacker should not explicitly show you, some logs are running or some unauthorized workloads are running. So the attacker may delete the container logs of the attacker's script. They can name the pod or container similar to that of the Kubernetes control plane components, right? And it can progress till they can explore on what are the type of workloads you run, and what resources use, how your cluster is capable of handling the workloads and what are the secrets you have stored inside the ETCD. So all these things may lead to either data destruction, resource hijacking, or denial of service. In all these cases, your cluster becomes compromised. Having said that these misconfigurations are used to either evaluate or harden your cluster against serious attacks, right? But how do we implement them in practically? So there are various tools available in the industry to scan your cluster against the NSA framework, NSA guidelines, and this Mitre framework, right? One such tool which I would recommend is a tool called Cubescape by Armo Security, which actually scans your Kubernetes cluster against the NSA guidelines and the threat metrics. So here you can see the results of Cubescape of scanning your cluster against the NSA guidelines, right? And also you can see the results of Cubescape scanning your cluster against the attack framework. Now, once you have secured your cluster by following all the guidelines and configured it in the correct way, right? The next issue, which we saw a major cause for a security incident is the failing compliance, right? So you have to follow the industry standards. Your cluster have to adhere to industry standards, right? So the most widely used and widely followed industry standard is the CIS benchmark that is from the Center for Internet Security. So they list all the secure configurations and the best practices to secure your cluster, right? And you have to comply your cluster with this benchmark so that you reduce the vulnerabilities, you reduce the attack space of your cluster. And there are, again, so many tools to do this. And one such widely used tool to check for compliance against these benchmarks in your cluster is the Cubescape tool. Here you can see the result of a Cubescape tool scanning a Kubernetes cluster, right? Scanning every component, right? Yeah. And followed by deployment, so you can see we are actually like backtracking from cluster to deployment to container all the way through code, right? We are hardening it from the cluster to the code level, right? So starting with the deployment. So deployments are like files which you use to apply to your Kubernetes cluster in order to create a resource like service or any other type of resource, right? So here is a classical example of a deployment file in which you use to apply it to Kubernetes cluster so that it creates a service and a container starts running, right? But is there any issue do you see with this file? No, right? If you take this file and simply apply it to your cluster, it will run successfully, it will create a service, right? But the problem is in the security aspects, if you see it from the security point of view, it's lacking all these specifications, say container resources. So the amount of resource this particular container named app container can access is not specified. The memory limit on the CPU limit is not specified and there is no network policy explicitly defined which means it can communicate with all other containers in the nodes, right? And there are no privilege checks whether it is running as a root user, whether this container will run as a root user or how it maintains access to its file system and there is no security context specified, right? So all these small aspects like it can cause a bigger issue in your in compromising your cluster, right? And in order to avoid this, you have to do static analysis of deployment configuration files before you apply it to your Kubernetes cluster. Then again, there are various tools available to do the static analysis. And these two tools are recommended like Q Blinder and Q Score, which can scan your deployment configuration files against the best practices in terms of security aspects as well. And you can modify your deployment files and include all the security aspects to make sure your deployment files are secure, right? And it doesn't cause any harm to the cluster. Fine. So now we are going to see about how image of container security is very important. So we have seen how to handle this misconfiguration and failing compliance, right? In the cluster followed by the deployment files, which can also become a serious cause of a security incident. Now we'll see about how this container security is a big concern in the Kubernetes cluster layers. So as we can see here, 56% of developers currently they are not even scanning their containers and 98% of organizations, they need container security with runtime security topping their list, right? So there is definitely a need for container security. And this container security, as well as the cluster security, that is the environment in which your cluster is running comes under the runtime security incidents. So there is definitely a need to scan the running containers once you deploy your application. And you also need to scan the host in which the container is running. And if you are using say Docker container, you need to scan the Docker daemon. And you also need to scan the container images, the images in which your containers are referring to, right? And also the environment, say our cluster is running in an environment and you have certain sensitive files, and you need to manage the access of those files, right? And the network connections of your environment, the inbound, outbound connection and see if there are any privileged containers, if there is any unexpected or unusual network connections or activity happening. So all these things needs to be monitored continuously. So the solution for this would be definitely the continuous monitoring, right? So the continuous monitoring of environment as well as the container needs to be done. And again, there is a tool called Falco, which does this continuous monitoring. It's a widely used tool, and it's an open source tool. It's now a CNC of incubating project, right? And it uses the system calls, your Kubernetes audit logs and your container logs as input. So all these will be emitted as events and these events will form an input to the Falco rule engine, right? And Falco has this powerful rule engine with default rule as well as an option to apply some custom rules. So if any of these events is violating those rules, Falco raises alerts of varying severity. Yeah, as you can see, you can pass system calls, Kubernetes audit logs, even cloud activity logs to Falco. And if you can see if there is any rule that is being violated by the event, it raises the alerts. So say you have a pod or a container running in a user space, right? And you have the logs, and you want to scan all these logs. So what you can do is you can deploy a Falco engine to your user space, and this Falco engine will be scanning all those logs, will be taking these logs as input and checking against the rules and giving you the alerts. Whereas for system calls, Falco uses drivers like kernel driver or EBPF in order to get access to the system calls and scan those system calls, right? And here you can see the varying severity of the alerts of the output of the Falco, right? So it can be an emergency alert or it can be a critical or it can be notice or it can be informational or debug and so on. And Falco alerts can also be integrated in various platforms like Teams, it can be published to Teams, Slack, Discord, and Falco offers comes with a UI called Sidekick. So here you're seeing the dashboard of Sidekick which shows what are the rules that have been, you know, violated in the given time range and the events which cost the rule violation, like who is the user of the event and which namespace that event has occurred, all the information will be locked here. You can see, it can visualize all the information in this, using this Falco Sidekick UI. So this runtime security, it can be used for two things, right? Like one is to scan the running containers as well as to scan the environment in which your cluster is running. So we have seen from all the way from cluster security how to secure from misconfigurations and then to deployment, then to containers. And now let's go to the image security. So image security is like your containers will obviously be referring some images, right? And the image security itself is a multi-layered defense because when you create an image, you need to know what's inside your application or when you're referencing other images in your image, you need to know where it came from, the authenticity of those base images and who modified it, right? And you need to ensure the images adhere to the industry standards like CIS benchmarks, right? And you need to scan your registry. Whenever you push your images to a registry, an automatic scanning mechanism needs to happen so that you don't push any vulnerable image to your registry and you need to sign your images. So this image security itself is a multi-layered defense. So here are some of the tools you can use to ensure your images are secure like Docker Bench, which scans against the CIS benchmarks. And there is a tool called Privy and Clare which can scan the images for various vulnerabilities, right? So now you have come to the code part, right? So this code part, there are three components we can secure. One is the open source libraries or third party packages you use in your code. And this may also cause serious issue. For example, you are aware of the recent block 4J attack that happened, right? So it's an Apache block 4J, it's an open source package and a small glitch in that open source package led to a serious issue, right? So software composition analysis, frequent software composition analysis needs to be done in order to ensure the packages which you use are secure and up to date. And also the application code needs to be tested against various best practices depending on the languages, right? So it's called static application security testing. And endpoints, you also need to scan the endpoints against various attacks like denial of service, right? And this comes under dynamic application security testing. So when it comes to as a whole multi-layered hardening, it's not only about the cluster layer, it's about the cluster layer, the cloud layer, the code layer, as well as the container layer. And here you can find some of the tools and recommendations depending on the language of your choice, right? If it's a software composition analysis, and if it's a Python, you use tools like safety, and if it's a code security analysis for Python, you use tools like Sonar, Cube, Bandit, right? And so on. So for every programming language, we have different set of tools, we have different tools and recommendations, right? Yeah. Now to summarize, we have seen that in order to harden the cluster layer, we are following the NSA hardening guide, the Mitre framework, the CIS benchmarks, the static analysis of deployment files and runtime security of this cluster environment using tool called Falco. And these are the tools which we are used in the above stages. And coming to container and image security, we need to scan the image layers, the image registry, we need to sign your images and we need to check if they adhere with CIS benchmarks. And also continuous monitoring of running containers needs to be done. And these are some of the tools which can make sure all these things are correct. And coming to code part. So in the code part, we have code, we have to check your code against best practices through a technique called SASD, check all the packages, open source packages and libraries through SCA and scan the endpoints through DAST. And these are some of the tools which we use in order to scan your code. Now, so we have this, we have all these layers, we have all these layers, we have seen how to harden each and every layer, but how to apply it as a whole, how to consolidate or incorporate into a single view, right? And that is through DevSecOps. So before going into the reference pipeline that is incorporating all the tools which we have seen now, let's just start with some introduction to DevSecOps and then go with the reference pipeline. So the main aim of DevSecOps is to give a secure SDLC, right? The purpose of DevSecOps is to give a secure software development lifecycle so that your application is secured, right? And it evolved from this, like earlier we used to do security test just before we go like, just before we deploy in production, we do the security test, right? But that is not insufficient in today's ecosystem. So we have to shift our security practices to left that is incorporated into the earliest stages of our SDLC by following an approach called shift left. So shift left means you're incorporating security into earlier stages of your SDLC, but isn't shift left alone enough? Not really. You also need to follow the shift right approach, which does continuous monitoring of your application as well as the environment, as well as the cluster, because only in continuous monitoring, you can know about the runtime security incidents which are about to happen and you can do a proactive remediation. You can identify patterns in your logs. You can get a complete observability of your platform, right? So you definitely need to follow both shift left and shift right approach. So here is the reference pipeline incorporating all the layers which we have seen now and how to harden them, right? Starting with the code layer where you do a SCASD test and with the image layer and container layer and also the deployment of deployment files where you apply to Kubernetes and also the runtime logs, right? So all these layers will have various tools and all these tools will output different security reports or security results, right? And we need to incorporate, it's very tedious to scan each, see each and every report, right? And then come into a conclusion of the security posture of your cluster. So to avoid that, to make that easier, there are various tools, there are various vulnerability management tools that can incorporate results from various tools and provide you a single view. One such tool which I recommend is called Defect Dojo. It's an open source tool which can incorporate various types of security tools which can incorporate reports from security tools and give you a dashboard of what are all the vulnerabilities that are found by each and every tool and how is the security posture of your cluster or your container or any such layer, right? So it's like you can run the reference pipeline which I have shown and while running the pipeline, parallely, the security results gets published to this Defect Dojo and you can view all the findings that are coming under individual tools, like if there is a Q-bench scan, what are the complaints that has failed and what are the complaints it has passed and you can also ensure risk acceptance and you can incorporate SLA management and you can also send these alerts to various platforms like Teams, Jira and Mail and so on, right? And so the aim is to bring out this reference pipeline, like which follows this multi-layer hardening and provide you a unified view. So this exactly shows the state of your Kubernetes cluster in terms of security, right? And you can evolve it over the period of time by adopting a desktop ops for Kubernetes approach. Thank you. If you have any questions, please feel free to post it. Thank you.