 Hello, my name's Robert Clarke and I'm here today to talk to you about the Hitchhiker's Guide to Kubernetes vulnerabilities. I recently joined the Amazon Elastic Kubernetes Service, EKS, where I lead security. I co-wrote this presentation with Micah Hauser, who's been working on EKS and contributing to the Kubernetes security community for several years. This presentation is really an exploration of what Kubernetes vulnerabilities are, how they are handled by the community, and the things that the Kubernetes operators, users should take into consideration when deciding what to do about any given vulnerability. In some ways, this presentation is about how not to look at vulnerability data and the warning of the dangers of taking the data and data sources at face value. Neither myself or Micah are data scientists. Most security architects, engineers, people who are going to be looking at vulnerability data and trying to use it to help make more informed choices, aren't going to be data scientists either. So we've taken that approach, we've pulled the data together, we've tried to present it in a couple of different ways to help us understand whether Kubernetes is doing better or worse with regards to software vulnerabilities and understand what we expect the trends to be in the future. In the spring of 2021, Red Hat produced a Kubernetes security report. It was a short, useful survey of industry, but it showed that vulnerabilities were only the second most concerning issue raised among the more than 500 DevOps engineering and security professionals that participated in the survey. There were some interesting call-outs in the survey. 94% of respondents experienced at least one security incident in their Kubernetes environments in just the last 12 months. That is a terrifying statistic. And 55% of respondents actually had to delay or slow down application deployments into production due to container or Kubernetes security concerns. So while vulnerabilities might not be top of mind for survey respondents, I believe that market pressures over the next 12 years may change the results of a similar survey. And what do I mean by that? Well, there's lots of attention being paid to improving the secure defaults in Kubernetes, right? This is where everything, the far left of this graph today, is all about misconfigurations. There's an entire ecosystem of tools and support aimed at improving the full lifecycle of Kubernetes deployments. By contrast, not much has changed in the vulnerability management space in the last 10 years or so. It's important for us to understand what the vulnerability posture of Kubernetes is likely to be over the next few years because not much is going to change in this space. So we should get into what a vulnerability is. And for the purposes of this presentation, it's a design or implementation defect that negatively impacts Kubernetes in one of six dimensions. So confidentiality, that means something that should be secret is no longer secret. Integrity, something that should be trusted to be one state and not written can be written and therefore untrusted. Availability, a thing that should stay up goes down. Authentication, I know who you are and I can be reasonably sure that in that assertion. Authorization, I know what you are able to do and a vulnerability that compromises that might mean that you can do things that you're not supposed to be able to do. And audit, audit is actually really important to us as security practitioners to really understand what's gone on in a Kubernetes system. So the ability to disrupt that and either add malicious things to an audit trail or cause certain operations to not be logged. Both super, super important for us to pay attention to. So within Kubernetes, the product security committee, which is which was recently rebranded to the security response committee deals with vulnerabilities. This rebranding better reflects the function of this wonderful group of volunteers whose names are up on the slide right now. They've come from across the Kubernetes ecosystem and they provide time support and expertise to triaging, remediating and coordinating the responsible disclosure of Kubernetes vulnerabilities. So what happens in the life cycle of a Kubernetes vulnerability? Well, research will discover an issue and they will hopefully report it responsibly to the security response committee. At that point, their process takes over. So a public disclosure date is negotiated between the SRC and the bug submitter. The SRC will always prefer to fully disclose the bug as soon as possible, but that must be after a user mitigation is available, apart from in situations where no mitigation can be made. The timeframe for disclosure is from immediate. The issue is known in public already. There's little, little reason to go slowly or cautiously to a few weeks. For a vulnerability that has a straightforward mitigation, the security response committee expects the disclosure date to be in the order of seven days from when it's reported. You'll find from the chart that actually the SRC very recently has, you know, they've introduced process and rigor and is really doing well at maintaining a steady flow of vulnerability response times. There are some obvious points in the past here where responses were more tricky, and that will always be the case. Sometimes something will get reported and it will take a long time to fix. It's a delicate trade-off and we put a lot of trust in the security response committee to make those trade-offs and they do an excellent job. So the Kubernetes project is rapidly evolving with new features, design updates and bug fixes. Currently the Kubernetes community releases a new, minor version approximately every four months and maintains three active versions at any given time. Occasionally Kubernetes upgrades change APIs and this can sometimes break existing deployments because API changes will be required. This happened from 1.15 to 1.16 and will happen again soon with 1.21 to 1.22. Now, past a certain point, usually around a year, the Kubernetes community stops releasing Bergen CVE patches. Additionally, the Kubernetes project discourages CVE submission for deprecated versions and these are no longer available for bug bounty payouts. This means that vulnerability specific to an older version and unsupported version of Kubernetes may not even get reported. Leaving customers exposed with no notice and no remediation options in the case of vulnerability. The key point here is you need to try and stay in a supported version of Kubernetes. So if we look quickly at the history of the Kubernetes project, we see that it was released in version one in July 2015, followed in November 2015 by the first Kubernetes CVE, which allowed remote attackers to read arbitrary pod logs via container name manipulation. In January 2016, the Product Security Committee was established and before that vulnerabilities were reported direct to Google. So thank you to Google for performing that function before the PSC was created. In December of 2018, the Security Audit Working Group was founded to identify and select vendors who could provide security audit for the Kubernetes project. So there have been a number of companies now taking a close look at some parts of the Kubernetes source code to find problems. In January 2020, the bug bounty program started, it was funded by the CNCF to reward researchers for finding vulnerabilities in Kubernetes. And in August 2020, the Kubernetes SIG for security was created. So let's talk about where to get vulnerability data. One place you can go is CVE Details. It's an excellent resource. It's useful for pulling big batches of CVE information about many open source projects. All the data is taken from XML feeds that are provided by NVD, the National Vulnerability Database. I'll talk about that in a moment. I will caution people that you have to take care when looking for projects in CVE Details. If you go look for Kubernetes, the first set of results you're going to find are actually for vulnerabilities in the Jenkins Kubernetes plugin. So a little bit caution there. Make sure you're pulling information about the right set of resources. But this data can be easily extracted into a CSV for analysis. Beyond that, another good resource is the National Vulnerability Database, which is produced by the National Institute of Standards and Technology, which we all know and learn as NIST. The NVD has been running in some form since 1999, where it was first known as ICAT, the Internet Category of Attack Toolkit. Actually, it has an interesting history. Around 2001, it was defunded and taken over by the Sanders Institute. They stepped in and provided funding and kept it running. It was then refunded by the US Department of Homeland Security in 2004 and has been critical in the development and adoption of standards like SCAP, CVE, CVSS, and other related technologies. The NVD is basically the world's largest free open source database with known vulnerabilities. It ingests information from cv.mito.org. It has a rich query interface, and if you want it, it will provide you with nice JSON formatted documents to put in your charts for no particularly good reason. While those two sources are super useful, and they actually both feed in from MITRE's backbone of data, if you're unlikely to build tools or scripts that are really spending much time looking at that information, generally, if you're running either your own clusters or using others, there will be tooling available that makes vulnerability management and understanding vulnerability lifecycle easier. With the exception of severity, which we'll talk about later, we're not going to go much further into vulnerability management specifics for Kubernetes. It's very use case specific. It changes from operator to operator and vendor to vendor. But I would encourage everyone obviously to look at what is available, what is made, what is natively in your platform, and make sure you have a good understanding of the threats that your platform may be facing. After gathering together what I thought was all the vulnerability information about Kubernetes, I figured the best place to start understanding the current security posture was to look at how the vulnerabilities are distributed over the years. I believe that Kubernetes is more secure now than ever before. So, can I make this presentation really quick by showing a nice graph that slopes down to the right? No. Okay, well this is a bit embarrassing. It looks like I was able to find 39 vulnerabilities and the vast majority of those were published in 2019 and 2020. Okay, so what do we say about that? Was 2019 a terrible year for security? Why didn't I have this sense that things were getting better? Is the annualized view even a useful one? We probably need to add some granularity. So, vulnerability distribution by year might be an unreasonable thing to use as an indicator. After all, there can be three or four releases in a year, so I decided to map the vulnerabilities by release. This paints a more interesting picture. We can see that there was a big spike in 2019 around the 1.14 release, but subsequently vulnerabilities had started to drop off. Maybe now we can start to classify periods in Kubernetes vulnerability management history and look for trends. Okay, so clearly security was great in the early days of Kubernetes. Maybe it actually stayed pretty great all the way up to the end of December 2018. You know, modulo a small hiccup around the 1.9 release. Apparently, things took a turn around version 1.14 if we just based only on when fixes were released. 2019 through 2020 were really bad years for Kubernetes and definitely busy years for the security response committee. But at least things seem to have gotten better recently, with only one issue fixed in the last couple of releases. This data based on what is currently available in the MITRE-MVD, you're like our canonical resource for the case, it didn't, that doesn't, that doesn't seem right. I'm pretty sure there have been issues recently, but I checked CVD details, I checked MVD. So I, you know, do what everybody does. When they're a bit confused, just go to your favorite search engine of choice and have a look at the issue. You know, is 2021 the best year in history for cube security? Can we conclude from the lack of CVs that everything's been fixed? Well, Google says no. Google says there is more data that for some reason isn't readily available or hasn't made its way through to the, to the MVD database yet. So let's go ahead and let's add that graph, add that data to our graph. And one of the vulnerabilities that you saw in the mailing list screenshot there was a very Windows specific client thingy. So we kind of grade that out here. But either way, it's looking a little bit more concerning now. You know, actually we've got this big red block. I guess at this point we should pause to make three key observations. One, Kubernetes is clearly doomed. Two, there is sometimes a significant lag in Kubernetes vulnerability information making it into the MITRE CDE list, which feeds tools like MVD and CVE details. So you need to track or you need to make sure that the tools that you are using are tracking the Kubernetes security announcements in the mailing list and making sure that you're getting up to date information. Okay, so here's our updated chart. So what can we really say about this? Well, you know, we can look at a trend line and we can say, well, maybe actually this looks a lot like the distribution graph we saw earlier in the presentation. Like, it's going up and to the right. It's not looking the way that I was hoping it would look. And if you judge Kubernetes solely by the number of vulnerabilities published and at the release time, it certainly seems that the situation is getting worse. The thing is, many things can influence the number of vulnerabilities that get published in each release. We see a modest increase in vulnerability publications, but we know that Kubernetes has grown massively in the same period in every dimension, whether it's user-based lines of code, contributors, operators, security researchers paying attention, security companies oriented to it. Like, all those spaces are growing massively and we definitely see a sublinear growth by comparison in just vulnerability distribution. So maybe this isn't a problem with our data, but maybe it's just that we're not able to factor in so many of those environmental concerns. But maybe, you know, maybe we can understand more about Kubernetes vulnerability situation by looking at how long a vulnerability has existed for perhaps our health has made better judgment. You know, when we combine the data from the PSC that shows how long vulnerabilities live for, which is in orange. So the PSC rarely but sometimes posts a from and a to date on a vulnerability. Actually, they don't do that very much anymore. So we took that data and then we added in the put data here that's in purple, which is from identifying the line of code that introduced a problem in a CV and basically working back through get histories until we found roughly where we think it was introduced. We were able to put these two together and put together this little chart that shows the lifetime of vulnerabilities and then their remediation windows. Here we see that the increase in reporting around 1.1.5 didn't really correlate with long-lived vulnerabilities. We don't see a bunch of vulnerabilities snapping shut on that date. So it seems likely that the increased popularity of Kubernetes is resulting in maybe more people examining a code and finding issues. So try to understand this situation, try to understand why it can trend up like this. We know there are environmental factors, but how are we doing here? So what we actually do know about some environmental factors. So let's take a quick look. So we know that the Kubernetes security audit happened. We talked about it earlier. We know that the bug bounty program came in around the start of 2020. And we know that these things have caused vulnerabilities to be found and caused improvements to be made. So maybe these things are actually tracking with us. We looked at the results of the audit and actually very few results. CVEs came directly from that. It was focused more on secure design patterns than specific CVEs, which is great. It's just not super useful for us to have a conversation about CVEs. What was in the data was a very interesting demonstration of the impact of bug bounties. So here you see the overwhelming majority of vulnerabilities that were found more recently came through the bug bounty program. And that's really interesting and it leads us to ask some more interesting questions about those more recent vulnerabilities. The number of different types of vulnerabilities you see. And to do that we need some common way of enumerating weaknesses. So MITRE, the organization that brought you CVEs, has been working on common weaknesses enumeration for a very long time. Common weaknesses is a large database of pathologies of vulnerabilities. Different ways in which people can make mistakes and cause these problems to occur. It's a community-led effort to continuously categorize all these weaknesses. And it provides us with a mechanism to understand what went wrong and track classes of issues over time. So again we'll go and have a look and we'll chart the number of differences in CVEs. And we'll start with this big ugly graph as you've gathered and I like to do at this point. And we see again that there are lots of vulnerabilities here. There are lots of unique CVEs here and a lot of them happen around 2019. But we do see a good drop-off around 2020 and 2021. So let's have a look at what that looks like in a slightly different format. So a number of the categories here are clipped. But the interesting point here is that the distribution is fairly even. Like there was just a lot of vulnerabilities going looking through all of Kubernetes. A lot of vulnerabilities in a lot of different parts of the CVE. Which means just holes everywhere. Swiss heat. Problematic from a security point of view. So this can be common in early projects. It can be common in startups. Projects are trying to make a lot of progress. Trying to get a lot of stuff out the door. But it's really concerning if we're seeing huge swaths of CVEs. Presenting in a project that's been running as long as Kube. That has with as many people paying attention to Kube. And if we take a look at just the data from 2020 and 2021. We actually see many fewer CVEs. And what's particularly interesting here is that the bug bounty pattern starts to have some interesting impacts. The graph is a little bit clipped. We'll provide the data after this talk. But insertion of sensitive log information was this big light blue patch of 20-28% of all issues found. And that's interesting because it was one bad logging pattern that was identified in numerous parts of the source group. So we think the frequency of reports increasing and the diversity of CVEs decreasing is an indicator that the security audit followed by the bug bounty program has pruned much of the low-hanging fruit of the Kubernetes code base. Vulnerabilities are still being reported at a reasonable pace, but the diversity of them is reducing. We know that modern bug hunters use tools like CEMGRAP to find repeated instances of issues in code bases. And this pattern of identifying an issue and then all of its siblings in a code base is common. We're also reassured by the announcement of further code audits that are going on in the Kubernetes ecosystem soon. So I've talked to you a little about the life cycle of the vulnerability and presented a few different ways to look at the data that exists today. And we've talked about how our vulnerability goes from being discovered to a CVE being published. And I want to spend a couple of minutes to talk a little bit about how to manage vulnerabilities in your Kubernetes infrastructure. And I'm not going to get into bits you need to go flip, but I want to make you aware of mechanisms that you should be considering when you're having to decide how important a vulnerability is. Because we all have to. This is probably an email that many of you have received at some point or some variation of it for sure. For some of us, emails like this are almost a daily occurrence. And with good reason, vulnerabilities in Kubernetes is a serious business. And if not managed appropriately, severe vulnerabilities can put workloads or even entire clusters at risk. So I've talked to you a little bit about the life cycle already. And I'm going to talk to you now about CVSS v3 and re-scoring and why it's important. Because very few organizations do this. And I'll give you an example of one that does it really well in a minute, but I'm just going to walk you through it quickly because it's very important. The vulnerabilities we've talked about have had a CVSS score. Take this mythical vulnerability. It's very rare. The exploit is easily triggered remotely and can be used to change the system's integrity, break the system's availability, or steal data, i.e confidentiality. This looks like a game over and plug the internet vulnerability. Only you know how your software is deployed inside your organization. And the more you know about software and its purpose that it holds is very important to how you consider how to deal with the vulnerability. So I mentioned this here because I want to talk to you about the changes you can make. Look, it's not red anymore. It is now a lovely shade of light yellow. And our high 9 plus vulnerability has dropped down into the threes. This was all to do with environmental scoring. So the environmental score goes to talk about how it is a vulnerability can change depending on how it's deployed. Not all Kubernetes vulnerabilities are reachable by all parties. Maybe your control plane is strongly segregated from your data plane. Maybe you run individual clusters for different websites. This vulnerability actually got upscored by Red Hat. Red Hat telling their customers, hey, this vulnerability for the way that Red Hat believes that their software is being deployed is actually more important. It's more important than or more severe than the MVD rating. So these things can go up, they can go down. The context is really important. You really do need to spend the time doing the rescoring. There's only a little bit over 40 vulnerabilities in the last five years for Kubernetes. And it's not super time consuming for your rescoring. It's absolutely worth doing it to understand which things you need to escalate and take more seriously and which things maybe you can go a little bit slower and be a little bit more cautious with. So this is really bringing us towards the takeaways here. This has been a whistle-stop tour of how Kubernetes vulnerabilities can present. Stats are hard. We think security is getting better and obviously you can realize it's great. So I just want to leave you with some of these takeaways really. 2019 saw the highest number of CVEs by distribution. The number of vulnerabilities increased over the last several years. We think it will continue to increase. But we think that this increase is sub-linear compared to the numerous different dimensions in which Kubernetes is just exploding in terms of size and growth. And that the number of unique CVEs has consistently decreased over the last few years. And in the last two years really did decrease very significantly. We're seeing more issues reported of the same type because of tools like SEMGREP and other things that are allowing people to say, hey, like I found this bad pattern, I'm going to report n number of vulnerabilities that I found or match this pattern. Canonical sources for vulnerability information can sometimes not be fully up to date. So, you know, if you're relying on CVE details, CVE.mitre.org or the MBD, you really do need to keep a close eye also on the Kubernetes security announcement mailing list. And your provider or you, come back to the first point you need to make sure that you're staying on an up-to-date version of Kubernetes. Staying in that window of like the last three or four releases is absolutely the thing that is going to keep you the most secure. A mean that you don't have to go and spend a lot of time worrying about individual CVEs and pulling this data together and trying to make these decisions. So with that, I invite Micah to come on screen. I encourage all of you to ask any questions you might have. Thank you.