 Hello, everyone, and thanks for joining us today to talk about policy as code, and how can you use it to manage risk in your Kubernetes environments? Before we begin, let me introduce myself. My name is Cesar Rodriguez, and I'm the head of developer advocacy at Acrex. And I've been working on the cloud security space for most of my career, helping secure public and private cloud environments in the military and financial industries. I want to begin by talking about some of the challenges that I've had before using policy as code in my experience. So one of my first jobs, I worked on a project to help write and revamp the security policies for our organization. And this was taken security policies and standards like those provided by NIST, SANS, and OWASP, determining which of these were applicable to the organization I was working for, and how should we comply with those standards. Once the policies were written, we performed some of the traditional security activities, which had some challenges due to the approach that we were taking. One of the challenges was that the way we assess our environment for compliance with the policies were through point-in-time assessments. And these were things like verifying the security of our systems at deployment gates, or periodic security reviews that don't really take into consideration the evolving nature of the systems and applications that you're trying to assess. The problem is that when you take this approach for only checking your security issues at certain dates, there are going to be gaps in the understanding of the security posture of your environment. And what most organizations do is to just focus on acquiring and running tools that check for security vulnerabilities at runtime as opposed to trying to prevent issues as soon as the code or configuration for your system is written. Another challenge in the traditional security world is that we have security appliances like firewalls, IDS, IPS, WAFs, and proxy securing our environment, but only a handful of people on the security team have visibility into how these are configured. And most developers and operators don't really know how these effects their applications and systems. This usually leads to having unnecessary calls to the security team every time there might be an issue that could possibly be related to some of the security systems, because no one really understands what's going on and how these systems work. Also, since these were primarily manually configured, no one knows if these devices have been configured correctly to comply with the policies of the organization or if there are bugs that might expose the environment in an unintended way. The final challenge that I want to highlight with traditional security is that everything was manual. Starting with the way you engage with the security team, for example, if you need a firewall rule updated, you open a ticket with security, wait for them to look at it, maybe schedule a meeting to understand what you're asking, probably saying they'll say no the first time, you go back and forth and once everything, everyone is on the same page and the schedule a change, then finally someone manually makes a change and we all hope it goes well. This approach is not only error prone, but it's inefficient and sometimes it's very frustrating. Another example where things are done manually is when you want to verify if the organization is in compliance with a particular policy. For example, if you have a security policies where all of your data must be encrypted at rest to enforce compliance with the policy, you need to check that all of your storage systems are correctly configured and appropriately and that's usually done manually by going into each of those systems to verify that they're properly configured. This can be time consuming and doesn't guarantee that you'll be in compliance 100% of the time because things can be missed when you're doing it manually. A lot of these challenges can be fixed once we move to a cloud native architecture and systems and start using declarative languages and infrastructure as code to provision and manage our systems, whether they're security related or business centric. So what is policy as code? Policy as code means that you're codifying the policies that are important to your organization into your development process to allow consistent enforcement of those policies through automation. So this allows you to move your security posture from reactive where you're only checking things at runtime to proactive as you want to detect and remediate issues as soon as they're codified and not at runtime when they're made in the actual environment. So from our previous example, if you wanna ensure that encryption at rest is applied to your environment instead of manually checking all of your storage systems with policy as code, you will write a rule or a policy that can be used to programmatically check that the code that's going to provision and manage your storage system is using is half encryption properly configured. So the first step to start implementing policy as code is to start defining your infrastructure and its configuration as code. Let me give you an example of how I've used it on my experience and how I've used the declarative APIs and infrastructure as code to improve the security posture of an organization I've worked for. So a few years ago, I was working on a big cloud migration project for a financial firm. To kick off the cloud migration, we were tasked on developing and deploying a significant customer-facing application in three months as part of an MVP. We formed a cross-functional teams with members of the business, developers, infrastructure engineers, operations and obviously security. And the idea was to build the initial core cloud infrastructure for the firm at the same time that the development team was working on the application. There were some engineers where this was their first cloud project and they wanted to learn everything using the console and doing things manually. To them, this seemed like a good idea to start learning, but once they tried to replicate what they've done manually through the different environments and into production, things were painful and delayed. So I was tasked with security and I've had experience before. So I knew things about infrastructure as code and I also knew that we had a short timeline to start delivering. So to get things up and running from the beginning, I decided to use infrastructure as code from the beginning. So things like how we were going to manage security groups and network security controls and the inbound firewalls and identity and access management were able to do using infrastructure as code. And the way that this was delivered was in repositories that everyone in the team has access to and visibility into and ability to create pull requests. So I even had time to create some core opinionated modules that had security controls built in that the development team then used to provision the infrastructure and deploy their application. One of the great things about doing infrastructure as code was now that the security controls of our environment were codified and were easy to govern in a way that was transparent to the development teams. So for example, if the development team and if anyone in the development team needed an update on a security group, it was just a matter of opening a pull request to the repository and have someone just review and merge. Having this type of getups workflow also enabled scalability as once we started migration and modernizing the rest of the application portfolio, the same repositories we set up for the initial project were able to support hundreds of applications and multiple environments and regions. In that original MVP, it was also important to iterate quickly security policies to deliver our business objectives while keeping our environment secure. So with infrastructure as code, we unlock the ability to codify the security policies and programmatically apply them. An approach to do this as well is to have centralized modules and reference implementations to have security opinions embedded in them to ensure that your environment is secured by default. So in contrast on how we've managed our traditional security infrastructure with infrastructure as code, we were able to expose our security controls for other technology teams to learn and possibly contribute. And I've had many instances where development teams have found misconfiguration in security infrastructure when they were trying to learn how the environment worked and they offered pull requests to increase the security and the resiliency of the security infrastructure that everyone relies on. So shortly after working on the cloud migration project that I talked about, I took a month off for paternity leave and at that point, developers were heavily relying on main for security assessments of our cloud environment. So during some sleepless nights with the baby I was thinking, what would be a good way to analyze our infrastructure templates similar to the way that we did static code analysis of programming languages like Java, JavaScript and Python so that they wouldn't be so much reliance on the security experts to do this. So I looked online and at the time there weren't that many tools to do this. So I decided to build my own and release it as open source to help anyone look into security cloud environment. And that's how Terescan was born. So Terescan is an open source tool that can help you define your security policies as code and enforce these on your infrastructure as code. The tool currently supports scanning of Terraform, of Kubernetes, JSON and YAML configuration files, JSON in help, vTree and customized vTree. So here are some of Terescan's features. It's packaged as an executable so you can easily integrate it into your workflow by running it locally on your desktop as part of your CI, or as part of your CI CD pipeline. It can also be executed in server mode and this allows you to centrally govern IAC scanning by having a central hub with rules to help meet your organization standards. The tool leverage the regal language from the open policy agent project as part of the policy engine and this help us standardize in the way that we implement policy as code and we have over 500 policies included as part of Terescan and we're frequently updating it with more policies. We also wrote the tool so that it's easy to add support for additional IAC tools. So the idea here is that policies will be the coupled from the IAC language so that a policy written for a particular technology will be applicable regardless of the IAC tool. So for example, the Kubernetes policies are the same whether a Terescan is evaluating your Kubernetes YAML files directly or if you're using customized or if you're using Terraform to manage your Kubernetes environment. Here's how we visualize integrating Terescan into your workflow where you can run the tool natively in your local development environment so you can catch any security issues as early as possible. You can also have a pipeline that verifies compliance to your policies before pushing any changes into your cloud environment and you can also run Terescan in server mode where you're feeding it infrastructure configuration files before making any changes in your environment. If you're interested in configuring Terescan for your particular pipeline technologies we have examples in our documentation and it's also integrated into Superlinter if you're using GitHub actions for example and we have more examples for different types of technologies. So now let's talk about how to use Terescan to find Kubernetes vulnerabilities. And today I'm gonna talk about two recent vulnerabilities where we've published blog posts in our website to discuss in details how to detect this vulnerabilities in your systems and how to use policy as code to detect and prevent these types of vulnerabilities. So the first vulnerability I would like to discuss today is CVE 2020 8554. And this one was announced at the end of last year, December of 2020, and it was rated as a medium severity vulnerability. The issue in this one is a man in the middle vulnerability where if a potential attacker has access to create or modify services and pods in a multi-tenant cluster the attacker may be able to intercept traffic from pods or other nodes in the system. So the initial reaction to a vulnerability like this might be that you think that it's not that important as there's a limited number of people or systems with access to create pods in your environment. But even if you've accepted insider threats as part of your threat model, if any of your insider systems with access to create pods or services or things like your pipelines are compromised through other means then you can be exposed to this type of vulnerability. So now let's take a look at how to use policy as code to defend against this. So if an attacker has access to create a service such as the one depicted here, the attacker will be able to redirect any cluster traffic destined for TCP port 80 on the IP addresses 1.1.1 or 8.8.8 to my application on the target port 8080. And for this example, I'm doing TCP but you could also do UDP and redirect DNS traffic. The other attack vector is if the attacker has access to patch the status of a load balancer service and set the Ingress IP to an IP address of the attacker's choice. So here's an example where we're setting it to 8.8.8.8. So a way to prevent this issue is by making sure that there are no cluster IP services with external IP specified. The use of external IPs is discouraged in general. So having this type of policy would catch any instances of this potential security weakness without causing too much noise. So on the right hand side, we have an example regal policy to help prevent this. And this policy we're checking that given a service, we're going to do a type check to verify that it's a cluster IP service type. And then we're gonna check whether the external IP attribute is defined. And we're gonna alert on that. So now let me show you the output of, you would see if you use Terrascan to detect for this type of vulnerability. So here I have two YAML files for the vulnerabilities that we'll be discussing today. The one that we're currently discussing is 85.54. So let me cat that. And as you can see, the issue here is when you have a cluster IP service type and you're using external IPs. So if we use Terrascan to scan this, here's the output that we get. Where it's saying on the violations that we're vulnerable to CVE 2020, 85.54. It's giving us the file and it's saying that it's a medium type vulnerability. The next vulnerability I would like to talk about is CVE 2020, 85.55, which is what's rated as a medium severity vulnerability. And the issue here is a potential server side request forgery or SSRF on vulnerable versions of CUBE controller manager. You do have to be authorized to create pods to exploit this vulnerability. And use one of the affected volume types like Gloucester FS. If you're not familiar with server side request forgery, it's one of my favorite vulnerabilities as it allows you to make requests from the vulnerable server into the network. And it's a common way to exfiltrate things like credentials from metadata endpoints in cloud environments. At the bottom of this slide, I have the link to the blog post where you can learn more in details about this vulnerability and what are the affected versions. So now let's take a look at how to use policy as code to defend this. So here's an example YAML file that can expose your environment if you're using one of the affected versions of CUBE controller manager. So this steps from the attacker's perspective are two. Number one, create a pod in your cluster. Number two, mount the Gloucester FS volume since that volume type is subject to the vulnerability. Number three is once the vulnerable volume is mounted, the CUBE controller manager can now make get or post requests from the master host network and can be used to make a request to any URL that the attacker chooses, which this URL is not validated in any form. So the best way to prevent this particular vulnerability will be to upgrade to the latest version of Kubernetes where this has been patched. If you can't upgrade, the next best solution will be to use policy as code to detect and restrict usage of the affected volume types or to restrict storage class write permissions through RBAC. This regal policy on the right, we're verifying the name of the volume for the given pod. So this is the pod and we'll check in the volume type, making sure that it's not one of the volume types that is vulnerable like Gloucester FS, StorageOS, Scale.io. So let me show you how this looked like when you're using Terescan. So let me go back. These are the two JAML files that I have here. So let me cat the CPE 2020, 85, 55.JAML. And this is where we got, where we're using the Gloucester FS volume type, which is one of the effective volumes. So now if we use Terescan to scan this, this one has a few issues. So there's a low issue, which is we're using the default namespace. It's also showing us that to a couple of medium issues here, one is the minimize the emissions of containers wishing to share the host process ID namespace and that containers does not have resource limitations defined and that's more of an operational issue. And then it's also showing me that we're vulnerable to CPE 2020, 85, 55. If, and this is only true if you're using one of the effective versions of CUBE controller managers and these are the list of the effective versions and there's a medium vulnerability. So I wanted to leave you today with the four security risk categories that Terescan helps to identify and you should be mindful of when thinking about security in your environment. So the first category is data protection. So if you have any sensitive data or information traversed into your systems, you want to make sure that that data is encrypted in transit and encrypted at rest if you're persisting that data in your system. The next one is access management. You want to make sure that any, that you only allowing access into your system with minimal, with the minimal privileges as possible. The next one is network security. And this is where we want to make sure that we don't have any exposed ports or services that should not be exposed or to prevent anything to me from being accidentally publicly exposed. The last one is visibility. And these are, do you want to make sure that you have the proper information captured as part of monitoring and logs? So in case of a security incident, you have the ability to reconstruct how the incident happened and have a trail of what actually happened in your system. So outside of security, but related, Terescan also have a category for operational efficiency policies that we think are important to consider from a best practices perspective. So that's the end of the presentation. I want to thank you for watching today. And if you want to learn more about Terescan or cloud vulnerabilities and policies, you can visit our blog where we're constantly pushing the latest findings from our research team at acrox.com slash blog.