 Hello, everyone, and thanks for watching. Today, we're going to talk about Helm, the package manager for Kubernetes and the security considerations you need to have when using and incorporating Helm charts into your system. As we all know, as engineers, we all strive to use the best technology to solve the problems we're facing. And cloud-native infrastructure and containerized workloads are fueling that innovation nowadays. I was reading a CNCF survey that said that 92% of the organization surveyed are using containers in production, and 83% of those are using Kubernetes in production. So there is wide adoption of Kubernetes, and the use of Kubernetes is increasing. At the same time, we're seeing an increase in cloud adoption. Throughout research, we're also seeing that there's been over 30 billion records exposed to the cloud misconfigurations across 200 different breaches in the last two years. And these breaches have happened in all types of organizations, from small to large. And the problem is that, well, it's easy to get started with cloud-native technologies. It's also easy to make security mistakes that can put systems at risk. So having security in mind while using cloud-native technologies is important. So let's start discussing one of these technologies called Helm. And I think the best way to do that is to take a look at similar technologies. For example, if I'm trying to install a package on my Mac, I usually use Homebrew to install that package whenever I have an option. And here, you see how I would install my SQL on my laptop. And Homebrew takes care of figuring out all of the dependencies necessary and any updates I need to perform to properly install my SQL, which makes it very easy to get on a local database. Another way to get my SQL is to use Docker. The MySQL Docker image comes prepackaged with everything you need to get a MySQL database running as a Docker container. And here's an example of how you would run that. If you already have Docker installed on your laptop, this is an easy way to get MySQL without having to blow your local system. And it's a good option if you need an affirmable database for local testing, where once you're done, you can just destroy your container. Similarly, Helm helps us install packages in our Kubernetes cluster. By executing Helm install, Helm will download something called a Helm chart, which is basically templated to Kubernetes configuration files that have been prepackaged with all of the settings and dependencies necessary to run the target state system in your Kubernetes cluster. You have the option to customize this with variables for the template if needed. And the benefit is that you can get up and running easily without having to write all the configuration files yourself. So at a first glance, using Helm sounds awesome because now if you want to install any software that has been packaged into a Helm chart, you can just do a Helm install and install it without breaking your head to figure out how to get the software up and running. And that's really great for experimenting, but you do have to be careful when installing packages that have been written by third parties with the goal of getting the software up and running as the chart may not have necessarily all of the security considerations for your production environment embedded in by default. And that's what we'll discuss today. How to safely use Helm in your projects, making sure that your security and operational requirements are met while maintaining the benefits of the ease of use of a package manager. Before we get started, let me talk a little bit about myself. My name is Cesar Rodriguez and I'm a developer advocate at Acurex. I've been working as a cloud security architect for most of my career in the military and financial industry and I enjoy contributing to open source projects. So I like to break down the steps we need to follow to use Helm securely into three. The first step is to identify the security requirements and the best practices applicable to the system you're trying to secure. This is going to be specific to your project and the business objectives of that project. The next step is to use something called policy as code. And this is where we take the policies and requirement that we've identified and convert them into a declarative language that can be used to evaluate our infrastructure configuration code and the Helm charts we're using. Step number three is taking that policy as code, one more step and implementing guardrails to make sure that the most important policies are always being followed and to give engineers early feedback on any code or Helm charts that may be in violation of our requirements. So let's start with step number one and this is where we're identifying the policies that are applicable to our system and capturing our requirements. And like I said, different systems are going to have different requirements. So for example, a banking system is going to have a different requirement that a music streaming service or any commercial application. And the important thing here is to identify what are those requirements going to be early in the project so you can start evaluating your infrastructure and configuration against the requirements as early as possible. Trying to retroactively apply security controls to your system once it's in production is always very painful and costly. So the first thing I always try to do when identifying policies is to capture what are our business objectives and what are we trying to accomplish with the system. And this triangle always comes to mind where we are trying to achieve a balance of security of the system while making sure that usability and functionality are not affected. What we're trying to visualize with the triangle is that if we make the most secure system that system is probably not gonna be usable or functional. For example, if we're trying to secure a network the most secure network will be the one where I kill all the network traffic to it and no one can access the network. At the same time, if we made the most user friendly system for example, by removing the requirement to have passwords in the system it's not going to be the most secure. So we always want to strike a balance where our business objectives are accomplished without compromising on security. The next place to look for in terms of security controls that will be applicable to your systems are frameworks or any compliance requirements that you may have or industry standards. Depending on the data you're collecting or the type of system you may be subject to the payment card industry requirements in PCI DSS or HIPAA for health systems or if you're collecting personal information in Europe you may need to be GDPR compliance. There's also industry benchmarks that you can use like those put out by the center of internet security and assist benchmarks. And cloud providers also give guidance on the best practices for systems using their technology like the well-architected framework from AWS. So once you have those requirements it's helpful to think about the security risk categories and how to categorize the requirements in these. Data protection deals with how do we protect any sensitive data that our system could be handling. So things like using encryption at rest or encryption in transit will be examples of security controls in the data protection category. Access management is how do we make sure that any system, people or process accessing our system are only authorized to access the functionality that you need. With network security and platform security controls we make sure that only the necessary open ports are exposed for the processes that need to be exposed. And to other systems or people. And we also need to make sure that we have the visibility and observability necessary. Not only to debug and any issues in the system but also to reconstruct any scenarios where we may have been compromised. One category that's not in the slide where we need to keep in mind all set of security is operational efficiency. These are things like making sure that you're only using the resources that you need for your system so you don't incur unnecessary expenses. So now let's look at an example architecture. And here I'm looking to install WordPress which is an open source, constant management system that you can use to host websites and blogs. And WordPress uses MariaDB for storage database and we'll be using memcached in front of the DB to cache database queries and speed up the site. So all of these are third party open source technologies and this is a really good use case where we can use the Helm chart that's already published. So let's take a look at how we find that. So we open artifact hub.io in our browser which is the site where we can find Kubernetes packages and we're gonna search for WordPress. The first link here is a Helm chart published by a company called Bitnami and looking at the TLDR. We can see how straightforward it is to install this chart with a quick glance at the introduction. We can see that the chart includes MariaDB chart as a dependency and that the memcached chart is also one of the dependencies. So by using this chart we get all of the resources we need in our architecture. So now let's go back to our slides. So now that we have our architecture defined and have an idea of the Helm chart we're using, let's take a look at some of the security requirements that we need to make sure we comply with when we provision the system. And I've identified three requirements that we're gonna look at that I found from the SIS benchmark for Kubernetes. The first one is to ensure that there's no secrets in environment variables. The reason this is important when you're dealing with a third party software is that sometimes there's applications that dump the state of the environment as part of error logs. If the environment contains secrets, that means that anyone with access to the error logs would also get all the secrets in the environment variables. So we wanna make sure that we avoid these whenever possible. The second security requirement is to avoid containers running as route. Although there are runtime security features in Kubernetes that protect us, we will still have the increased likelihood of processes within the root containers breaking out and escaping, which affects the risk of your host and other containers running within that host. Since we're using third party containers for Republic Helm charts, this is something to be mindful of. So there's an option within Kubernetes to allow paths that have privileges that the parent process did not have such as set user ID and set group ID file mode. And you can control that with the allow privilege escalation parameter. We want to make sure that this is set to false on Helm charts we're using to avoid privilege escalations in those containers. Okay, so we've identified our architecture and determined what are the security requirements needed for our system using industry standards and best practices. Now it's time to codify these requirements into policy as code. But what is policy as code? Policy as code means that you're writing the security requirements that are important for your system to be compliant with as code in a declarative language that can be used to evaluate your infrastructure as code against. So your Helm charts, your Kubernetes configuration YAML files or any other declarative language can be evaluated using policy as code. And this makes sure that you are consistently enforcing your requirements as part of your systems deployment and development lifecycle and allows you to move your security posture from reactive to proactive as you want to detect and remediate issues as soon as these are introduced in your code and not at runtime when they're made in the actual environment. For policy as code to work you'll need to be using infrastructure as code to control your system. So your manual changes to your Kubernetes cluster and everything should be defined declarative not manually. And this works well for Helm as your Helm charts are already written for you. The more traditional approach to security is just handing over security requirements to engineers and then practically checking through scripts, tools or manually that these have been implemented at runtime. There are multiple benefits of using as policy as code instead of or in combination to traditional approaches. And I wanna highlight a few of these benefits The first benefit is that you can implement your security requirements with low friction when we're delivering security requirements as code and asset declarative language. There's no ambiguity into what needs to be accomplished in order to meet the requirements. You can reduce the number of gates and interactions between engineers and security and the system can be tested throughout the SDLC to ensure that the requirements are met. The next benefit is that your system will be secure from the get go. Having implemented security upfront would reduce the number of issues that need to be addressed once you're in a runtime production environment. Since we've identified the requirements we need to implement from the beginning and have tested our system against these with automation. Last benefit I wanted to highlight is the increased security visibility. Since now we can store the applicable security requirements with our system code and can easily identify what we are compliant with and any gaps that need to be addressed. So let's look at how we can implement policy as code using an open source tool called Open Policy Agent or OPA. OPA is a CNCF graduated project and the tool allow us to have a unified language and framework to do policy evaluations in our systems. So we're going to take the three security requirements that we identify in step number one of our example architecture and convert them into a policy as code language using the regal language from OPA. So our first regal implements our security requirements where we're trying to avoid secrets in environment variables. So just to refresh, this is important because we're trying to protect our secrets from being exposed in error logs if the containers we're running capture the environment as part of errors. And if we're using a third party health charts this could be an issue. So let me walk you through this line by line what we're trying to detect in this regal rule. So if you look at line number one we're trying to extract the API definition from our deployment. Then on line number two, three and four we're getting the environment variables in our container specification. All these lines so far contain variables that we're using to walk through the Kubernetes YAML file that we're evaluating and that will be a little bit more clear on the next slide. On line number six is the expression where we're testing if there's a value named secret key ref which would mean that there's a secret being used in an environment variable which is the issue we're trying to detect with this regal rule. So here's the regal next to the YAML file that violates the rule. So looking at line number six at the YAML on the right we see where the container are specified and then on line number nine we have our environment variable and if we look at line number 12 and number 17 we see where there are a couple of variables for username and password being used which would be in violation of our regal rule and would trigger in the statement we wrote on line number six of the regal. So this was a simple example where we can see how regal variables can be used to walk through the Kubernetes YAML file and how statements are used to trigger whenever issues are found. Let's look at a more complex example. So in this second regal we're going to identify when the allow privilege escalation parameter is set to true or when it's not defined as we don't want the pods from our help try to be able to escalate privilege. Our regal rule is specified in lines one through five and we have a couple of helper functions in the bottom. So the first couple of lines of the regal rule we're getting the contents of the security context in our deployment specification then on line number four we trigger our functions to do the evaluation. On the first function we check if the value of the allow privilege escalation parameter in the security context is set to true which is the violation. And then on the second function we check if the allow privilege escalation parameter has not been defined as our requirement is to have this parameter set to false. So to make this more clear we can see a snippet of a YAML deployment file on the right side of the slide and looking at line number five we see where allow privilege escalation has been set to true which would violate our function in line number nine and trigger our regal rule. So now let's look at the regal rule for the third security requirement we have and this one is about avoiding our containers running as route. We want to protect our environment from the possibility of container escapes and it's the best practice to avoid running containers as route. So I cut the regal rule from this snippet as it was similar to the last one I wanted to focus on the functions that we're using here in the first and the second function highlighted here we're trying to identify if run as non-root parameter in the security context is set to false meaning that we're going to be running as route or if it's undefined it's by default Kubernetes will run the container as route. The next couple of functions we're trying to identify if the run as user parameter in the security context of the pod is set to zero which is route or if it's undefined. So I have a couple of YAML snippets here that are in violations so we can visualize these issues. On the first one, if we look at line number two run as non-root is set to false so this is in violation of our policy and on the second YAML snippet in line number two we see that run as user is set to zero which is the root user and that will be in violation of our regal rule as well. So now that we have our policies to find as code it's time to implement our guard rails to make sure that any helm charts we use are in compliance with our policies. So there's two main types of security guard rails proactive and reactive. Reactive are more of the traditional security controls where we update the runtime environment after we notice a violation has occurred. So for example, when doing a vulnerability scan of a server and we detect that there's a critical vulnerability that needs to be patched we could have a reactive guard rail to automatically patch the system. Proactive guard rails help us identify issues early in the development process where it's easier to fix and less risky and this is where policy as code fits. So there's an open source tool that allows you to implement your policy as code guard rails that we had accurate maintained called terrace can and what I like about terrace can is that it's packaged as an executable so you can easily integrate it into your workflow by running it locally on your desktop or as part of your CI CD pipeline. It can also be executed in server mode which allows you to centrally govern your infrastructure as code scanning by having a central hop with rules to help meet your standards. And we recently added the ability to use terrace can as an admission controller where you can use it in server mode to listen to the admission webhook in your Kubernetes cluster to prevent any actions that will deviate from the standards and policies that you define in terrace can. The tool leveraged the regal language from the OPA projects as part of its policy engine so you have a standard way to implement policy as code and it includes a lot of policies by default that you can use as a baseline to evaluate your system or enhance your current policy set with the ability to scan Terraform, Kubernetes, YAML files, Helm and customize. So I'm going to walk you through the three example ways to implement proactive guardrails and then we'll see a quick demo of how terrace scan works. So the first and most straightforward guardrail is to use a policy as code scanning tool on your engineer's local environment. So here's an example where we're using terrace scan to scan the example WordPress architecture we've been using through the presentation and finding the three issues that we've discussed on the deployment of the channel file. This is the best way to get early feedback in through the Helm charts you're trying to use. And in this example, I had the Helm chart code on my local environment where you can also scan remote repositories. And here's the terrace scan command you would use to do that with the dash R to scan the git repository that has the WordPress Helm chart directly. You can further enhance your workflow by creating an alias to check if there are any high findings in terrace scans output and failing your Helm chart install if there are any. The next place to apply guardrails is on your CI CD pipeline. And this is where you can prevent any code that deviates from your standards to be introduced by having a CI job fail if there are any issues that you would like to walk on. And we have to keep in mind that it's always easier to fix issues on your code rather than at runtime. So you want to make sure that changes to your runtime Kubernetes clusters are only known through code. And this will help you use CI CD as part of your security guardrails. The last guardrail I want to discuss is using an admission controller. Admission controllers hook into your Kubernetes admissions web hook to make a policy evaluation on the API call that's being made and can be used to block any requests that deviate from your standards. If you have policies, this must be enforced no matter what, this will be a good place to make sure that they're being followed. Also, we know that there might be break the glass situations where we might not be able to use declarative code to make changes to our clusters and need to make a quick change at runtime because of an emergency. So using this approach, we would have an additional control to prevent issues from being introduced through those break the glass changes. So now let's take a look at how Taurus can works locally by evaluating the WordPress Helm chart. So now for this example, I'm using the deprecated and obsolete github.com slash helm slash charts which is an old repository which is no longer maintained, but it's good for our purposes of illustrating policy as code. And the first thing I did was to get cloned this. So I have this stored locally. But one of the things I wanted to show you was if you don't wanna have a repository you wanna scan, which is not local, you can use terracecan slash i helm slash our gith and put the github repository in there and under the hood terracecan will download that repository and run terracecan on it without you having to necessarily clone that repository and give you the results which is an easy way if you have remote repositories that you wanna scan. So I'm just gonna scan it real quick to show you what are the things out of the box that we get from terracecan when scanning this repository. And this might take a little bit because it's downloading the repository from the internet and then running terracecan. So if you see here are some of the results that we got from running terracecan on the repository and this is what the default policies that terracecan has. So you'll get a lot of results which may or may not be relevant for what you're trying to do. So one thing I did with this repository on my local copy of the repository was I added, let me zoom this a little bit. So I added a policies directory and this policies directory has the three policies that we're trying to evaluate today for our examples. So just to show you the first one that we're gonna take a look at is using containers in environment variables. And as I said, for this one, we're just looking for this particular value of secret key ref and you can just looking at it locally on our deployment.jaml file. And I had this open previously so it's redirected me right there. But looking at this file, if you search for this value, that's all of what we're doing if we were gonna do this manually, that's all what we're doing. But this time we're using a tool to look for that secret key ref. And here you see that they're using a, they're storing that secret for the WordPress database password and also the username. So that's what they're doing here. And we'll go on to TeraScan shortly so we can see how these three policies evaluate the code. So the next policy here, so let's take a look at this one where we're looking at the security context and seeing if allow privilege escalation is set to true or if it's undefined. So I'm just gonna look real quick to see if they set this on the deployment.jaml. I don't see it set at all here. So it's gonna be undefined. It's probably got a trigger based on this rule. And the last one is running as a non-root, which it's expecting for you to set the run as non-root on the security context as false. The one thing here is that the way that TeraScan works is that it will automatically use the variables in the helm chart and populate the values for it before doing a policy evaluation. So although I'm doing a quick find in this deployment.jaml file, although I'm not seeing the values that would trigger these policies, it doesn't mean that they're not there. It might mean that they're using variables for these. So this is an example where it's saying, so run as user, it's using a variable here. So what we're looking for is run as non-root or having it undefined or the run as user being set to zero. So right here, we have run as user, but it's using the variable to set this up. So TeraScan will look for this value on the variables. So let's look at the values.jaml file and what was the name of run as user? So run as user is set as a non-zero user, which means the rule is not gonna trigger for this one. So now let's run TeraScan to see if which ones of these policies are gonna be triggered by our local helm chart. So we can do TeraScan-p and the dash-p flag is where you can specify a directory containing your rego policies. So this will override so that you don't use the default policies included in TeraScan. So we're only gonna be using the three policies that we've created here. And then that's I is our IAC provider, which in this case is helm. And once I hit enter, this is gonna run pretty fast because we're just running TeraScan against a local repository as opposed to the previous example where we downloaded the remote repository from GitHub. So as we see here, our scan showed two of the policies being violated. So the first one is where the container is using secretion environment variables. So this is where on our deployment, the JMO file I showed that the database credentials were there. And then the second one being triggered here is that the container should not run with allow privilege escalation. And this one is triggering because if you remember our policy, we said that if you don't have the allow privilege escalation set to false, we will trigger this policy. And our final policy, which is running as non-grid, is not triggering. And that's because we have run as user set as a non-zero value. So let's recap on what we did today and how we implement Helm securely in our system. The first step was to identify and define the requirements that were applicable to our system. In the example, we captured three different requirements that came from the SIS benchmark for Kubernetes that were important to us. And step number two, we took those requirements and rolled declarative regal policies to help evaluate our infrastructure as code against them. And finally in step number three, we discussed the different ways that we can implement guardrails to make sure that our policies are being met and to get feedback on policy violations as early as possible. So thanks for watching today. If you wanna learn more about Terescan or cloud vulnerabilities policies and best practices, you can visit our blog where we're constantly publishing the latest findings from our research team at acrase.com slash blog. Thank you.