 All right, thanks, everyone, for coming. If you're here to talk or listen about the Kubernetes security committee, you're in the right place. I'm Micah Haussler. I'm a principal engineer at AWS. I focus on Kubernetes security, EKS security. Hi, y'all. I'm at Microsoft. I'm Mo. Let's see. I guess I've been doing Kubernetes stuff since 2016 now and all sorts of security stuff. Thank you for coming. So before we start our talk, we'd say we wanted to say, if you think you found an issue, stop before you tell somebody. The first place you need to go is the kubernetes.io slash security. On that page, you'll see a bunch of helpful resources on what to do. If you think you found one, the short version for this talk is, please don't post. Anything that you think is or might be a security issue on GitHub or Kubernetes Slack or message it to someone you think is on the security committee on Slack. User names can be semi-faked. So please report to our Hacker One bug bounty program or to our just our inbox security at kubernetes.io. Go ahead. So what does the SRC do? So we run the Hacker One bug bounty. So if you're a security researcher and that's how you make your living, we're happy to give you money if you come tell us where we screwed up. We're more than happy to do that. But also, a large part of our work is just coordinating with the various code owners to get fixes out. So while we have a lot of expertise within the group, we don't know everything about Q. So a lot of times we're spending a decent chunk of our effort just saying, hey, SIG leads, is this an issue? Is this how it's supposed to work? Is this a doc fix? Like, what is this? How severe it is? Who does it impact, those kind of things? So we have a pretty broad representation within the community. So myself and Rita are from Microsoft. Micah and Boloji are from Amazon. Joel's from Red Hat. CJ is from Google. So this diversity is really important to us because what might matter to Microsoft might not matter to Amazon, depending on how you deploy and run Kubernetes. But we want to make sure that when we do a severity rating, we're taking into account various deployment models within the community. And just taking great care, we want to be on the side of caution. So if we can think of a subset of individuals where something that we might consider a medium is a high for them, so we want to take those types of things into account. So I'll talk a little bit about our bug bounty program. So this is funded by the CNCF. And as far as scope, it's basically anything in the Kubernetes project. You can go to our actual Hacker One page and see the specifics on which projects are in scope. But primarily, it's project code, project infrastructure. The eligibility is basically almost anyone with exceptions for people on the security committee, CNCF staff. If you're a project reviewer or a maintainer and you're reporting something that you maintain, you're not eligible for that. But if it's another part of the project that you don't work on or not familiar with and you found something, that's great. Please report that to us regardless. But you're eligible for a bounty. We have several different tiers in our bug bounty. Tier one is the GA and beta features of core Kubernetes. So think about if you spin up Kubernetes yourself or you're using a managed provider, sort of the things in Kubernetes and Kubernetes on GitHub. So Kube API server, Kube proxy, all those kind of things. The core dependencies, so KLOG, for an example, would be included as well. We also put in scope the ability to alter source code without owner approval. Like if you're able to bypass our community processes and protections for the source code, that's important to us. Or the ability to modify release artifacts. That affects our users. Same with release artifacts. Like if you're able to deny service to our container registry, that's in scope for this tier one. Not DDOS, so please don't drain the project of funds for hosting artifacts. But that includes registry.kates.io, dl.kates.io. And so our top tier, so critical, is $10,000 US. And we often, if we find, even for lows in our other tiers, we'll do bonuses and stuff, too. Because we really appreciate when people report this. And we want to incentivize people to find and report stuff. Yeah, like if you make it easy for us to reproduce and kind of show it to the sigs, it makes our lives a lot easier. So yeah, even if it's a low, we'll give out a bonus just for like, that was really nicely documented and saved us a bunch of time. Thank you. Yep, yep. Yeah, and so for tier two, it's more GA and beta features of non-core. So think of CSI drivers, Kubernetes dashboard, kubatim, that kind of thing. And that's a little bit lower on the high end, with criticals being $5,000 USD. And then tier three being our infrastructure like prow, documentation, kubernetes.io, that kind of thing, as well as alpha features in core. So if it's not enabled by default and it's a feature we want to get eventually, but you found something, we still want to reward that, but just it's not quite tier one. So you might be wondering if you're familiar with the various sigs and pieces that make up the Kubernetes community, like how does the Product Security Committee or the Security Response Committee. The Product Security Committee is like the old name for Security Response Committee, by the way. And sometimes we still mix them up in our heads. How does this relate to SIGAWT and SIG Security? So if you're thinking about code ownership and features and how does some deep internal thing work? How does encryption at rest work in Kubernetes? That's SIGAWT. So I'm going to leave it there. So that's kind of what we do, but if you now think about what we spoke about with SRC does, so we have fixed coordination and CVE issuance, and we have core features that are security enforcing within the project, what about all the other aspects of security like, well, how do I run hardened Kubernetes and all those types of things? That's where SIG Security comes in. So they help organize the Kubernetes security audit, as well as things like the official CVE feed. They built a tooling around that. So if there's a Venn diagram, there's almost certainly an intersection between these three groups, but that's kind of like the high level thing. But again, remember, if you think it's a security issue, come to SRC first, and we'll help you find the right place. Yeah, so to talk through our process a little bit, we do get a steady stream of issues reported to us. They come in through the two front doors of Hacker One and our email list. The first thing we do is just a preliminary assessment. A lot of us on the security committee have a background in Kubernetes multiple years involved in the project in sort of different areas. And so we can have generally enough context to say, is this a legitimate issue or not? And if it's not, we can help close it, redirect the reporter to the right place if it's still an issue that needs to be resolved, but not a security issue, or if it's just actually not an issue or not something that we care about, like text records and for email domain validation or something like that. If it is an issue that we can either say is an issue or we need more help, we'll typically work with the code owners of the affected component. Sometimes it's not clear who that is. Usually it is. And they often have a lot more context and can help say, is this an issue or not? When it is, the next step is really for us to issue a CDE. So Kubernetes is a CNA, a CDE naming authority, I think, that can or numbering authority that can issue, we can issue CDEs for security issues. And at that point, we have an identifier that we can hold on to and say, this represents this issue. We can refer to that before it's public. And the next sort of decision point for us really is, does this affect core Kubernetes? Not just from the bug bounty tier one perspective, but really from just a mechanic perspective of, do we need to work with the release team? Because when you do a Kubernetes release, you can't just put a code fix in main or master and call it done. There's a release process. You want to get that out to users and everything. If it's in core Kubernetes, we work with the release team to make sure that that gets into the main branch and then back to possible maintained release branches so that we have cherry picks for all those. If it's not in core Kube, we still work with typically the code owners, if it's in Xangres or any other sort of sub-project under Kubernetes. And then the next decision is kind of the same on either Kubernetes core or non-core. And does it affect distributors? So is this something that distributors might care about and needs to be handled with more care? The kind of rough estimate we use, and we have this codified in our public GitHub repo, is generally around a medium. If it's a high or critical or some kind of medium, we might do a coordinated disclosure and embargo process with distributors. So distributors are those who kind of think major clouds or vendors who sell Kubernetes, sell and manage Kubernetes or Kubernetes artifacts, that kind of thing. So yeah, like I said, major clouds, canonical, red hat, like those kind of folks. And they get a distributor notification to say, here's an issue. It might affect you and your users. You have X amount of time before this goes public to resolve this, whether that's in a managed environment, fixing it for your users, or if you're not in a managed environment, test these changes before either the changes are public or they're released and test that it won't affect your users. And if it doesn't affect distributors, generally that's kind of a low issue or could be a high issue we'll talk about later. That just doesn't affect distributors. We'll work with code owners and then do a public disclosure. And that public disclosure process generally involves a GitHub post, GitHub issue for the tracking issue, an email list blast to several different email lists, as well as publishing the CDE details to MITRE so that all the public CDE feeds that you follow and have the actual specific details including versions affected and all the classification of the issue. So we'll talk about a few of those. So let's see. I think this bug, which was reported through the last security audit for Kubernetes is possibly one of my most favorite bugs in like a really long time because it's like super easy to explain and think about. And I think I've been reading the same code since like 2016 and I've never seen it. So it happens. So let's walk through this. So let's say you're a bad client and you try to make a get request saying, hey, I want to get all the pods in the dot, dot namespace. And anyone who's ever done anything with paths and like Linux already knows where this is going. But if you're not familiar, the way sort of the path routing works is after the namespaces part, that little block is like the name of the namespace you want. And then all right after that is the resource that you want. So conceptually, the API server will take this request and try to authorize it against, hey, can the current user list all pods in the dot, dot namespace, which by the way is not a valid namespace identifier. So that means there's no way to like grant this via like a role binding, but you might have cluster wide read. So if you have the ability to list all pods in the cluster, it'll go through. And then we have to turn this into the correct at CD query to figure out how to give you your data back. Well, if you're familiar with how kubectl works, you can ask for resources that are namespace scoped but cluster wide. So you can say, hey, show me all the pods in the cluster. The way that's efficiently implemented in the back end is the resource comes first and then the namespace. So that way you can list all of a resource without first enumerating all the namespaces. So the at CD key ends up being pod slash dot dot. So it's reversed from the rest logic. And then because we like producing bugs, we have a path prefix on this to like have like the I think the reason the path prefix exists is because technically you can use one at CD from many different Kubernetes instances if you so want. So you have to have a way of like bucketizing this. So we like to use the fun bug creating go function called pat.join where if you pass it anything with dots in it, it helpfully cleans it up for you because that's what you asked. And so it turns into slash registry, which is a fun way of saying, hey, Kubernetes, please dump out at CD for me. So as you can tell, that's not what list of all pods means, but that's what happens. So how did this go for us? Well, it didn't go too badly. And you might be asking, well, why? It's called really good dumb luck. So for nothing related to security, the API server code has a lot of like type safe logic. And by type, I mean like Kubernetes resource here. So if you tried to do that last command on like an old API server, it would basically be like, I don't know how to turn a secret into a pod, and it will fail the request. But custom resources don't have static types because we don't know any static type to represent them. So in a certain set of use cases, you can get the API server to give you access to resources from that are custom resources that you don't have read access to. So this was bad. And we got lucky, but obviously we don't ever want this to happen again. So what did we do? So we started checking for dots very explicitly at the lower levels of our code just to make sure that those are not valid in any sense of the word. So we just don't let you have that, no matter how you make it through the millions of lines of go code above those layers. If you somehow get to us and you have the empty string or just a slash, that's also not valid, because that should never happen. And then we just stopped using path.join. We just used good old string and cat. And yeah, no more bugs there. So the other issue we wanted to talk about is that with Minicube. This came out last week. So you may not have even seen this notice yet. It's actually a combination of two different issues. One affects only Minicube on macOS. And that first one is a macOS bug, where if you tried to, the VM, so backing up, Minicube is a development environment of Kubernetes. So you can install it on your Windows, Linux, and macOS to say I want to have a mini Kubernetes cluster that I can play around with for development. So the first bug resulted in specifying local host as a listen address for the Kubernetes VM. But it ended up listening on all interfaces on the host and forwarding any forwarded ports on all interfaces to the VM. So that in itself is not great, but not necessarily awful or catastrophic. The real issue here came with the combination of this other CVE, where in the Minicube VM, so if you install Minicube, however you do, the VM image had a hard-coded root password and had SSH turned on and had root login enabled for SSH. So that means anyone running an affected version of Minicube had a hard-coded open SSH password on the local network and that could be accessed remotely. So remote code execution to the VM and in some cases, depending on the driver, I think sometimes the home directory was also mounted into the VM, which is particularly bad. So this was actually an example of a critical vulnerability that we handled, but we didn't have to do an embargo for. Because we can't? Yeah, because we can't. There's no way to do an embargo. But this was 9.9. Yeah, this was a 9.9. So this was probably, I think, one of maybe the worst one I've handled in my few years on the security committee. You can look at all the great details of it in our public announcement and on the Minicube GitHub issue. So just key takeaways from this are, you have heard what we do, what the security committee does, and how we handle our issues. And if you do find an issue, please report it. We really want to work with you, but please do it responsibly through Hacker One or our email list. But that's all the slides we have for today, but we're happy to do Q&A for anyone who has questions. So if you report something through Hacker One, that makes it eligible for the bug bounty. Though if you don't want to make a Hacker One account, if you did report something through the mailing list that we would issue a bounty for, we would just ask you after the fact, just go submit it to Hacker One. Like, we can't give you money, but Hacker One can give you money on behalf of us. So that's kind of how that works out. Yeah, any questions? Let's see, do we get, I'll walk around. What about architectural flaws? For example, I had discussions with developers about the service account token before Kubernetes 1.24, which is definitely an architectural flaw importing this kind of token into every pod. And the developers needed it. And after a while, I simply said, okay, this is a hack here on GitHub for my trainings. You can use it, get access to the entire cluster. And then I get a call from a vendor and then I don't know if I was really responsible for that in the next version, the service account token was not the default. And then on Microsoft AKS, it's 1.23 now, it's not fixed there in the default disk. So how do we handle this kind of things? There's a, yeah, I can take this one. There's a lot of ambiguity in all this. So the flow chart that we showed is like very, I would say a very rough approximation. Just a lot of these are one-off questions that we have to sort of figure out the process for. I think, I don't know if you wanna add onto this. Yeah, I mean, that diagram is like the special happy happy path, right? Like where there's no like, hmm, I'm not so sure about this one, right? Yeah, I think just to answer your specific question about like architectural flaws, I think those are definitely things that we wanna know about. But oftentimes, like the specific one you mentioned, I think is like kind of a well-known one where the code owners generally know, if you do find something and you do think it's a security issue, please report it. Like we'll help you. It's okay to find, for us to say, no, it's not. And here's the right person to talk to and the right SIG lead to say, okay, here's a thing we, and oftentimes the SIG leads will already know and say, yeah, we wanna fix that and we have a plan for that or something. But I think I would probably say generally report it rather than not. Yeah, so like as a lead for SIG-Auth and like a component owner for stuff like service accounts and stuff, right? It's like, we have to be very careful with these things now because we have actual users and we can't just go like ripping stuff out, even stuff like legacy service account tokens, which we know is a bad idea. And we've known it was a bad idea for like years now, but we're kind of stuck, right? But that doesn't mean that we can't write caps to like give people the tools to like, hey, yeah, by default, you still get this bad behavior because we can't like break people and upgrade. But here's a new knob that makes it safer or here's a new style that we're migrating. Like with the service account token stuff, I think we started rolling out the fixes in like, I don't know, like 116. And it's like, we were just going like a very slow path because we know it's so critical to pods and we don't wanna break people's stuff. But at the same time, we want newer clusters as well as like upgraded clusters to get eventually to a safe space. But yeah, so like certainly, you know, reported to us, even if it ends up just being like a doc fix or a blog post to just highlight the issue, that's fine. Like, we're not gonna get upset if people like give up too many issues because like, I mean, we have the hacker one triage folks helping us too, so it's not just us. Any other questions? Thank you for Mike. Do you have some of our best practices for how to do security in Kubernetes cluster? You wanna take it? Okay, I'll take it. Yeah, I think there are a lot of best practice guides out there. I think that that specifically would sort of fall under SIG security as best practices, how to run Kubernetes securely. They are often the folks who also work with like the people who write the CIS benchmark, that kind of thing. So we're sort of like, we know them and we work with them, you know, tangentially a lot of the time and we're even closely sometimes. But that's probably the best venue for those kind of guides because they're the people who own that and write them. We'll often work with them and say, here's a thing, it might be a report, it might be something else or a report and then we start digging and be like, okay, that's a valid report. Here's like three other things that are affected that either need to fix or need doc updates. But SIG security would be the main owners of that. Yeah, Tabitha, a member of the SRC is also one of the leads for SIG security, just like I'm a lead for SIG-Od. So like, we have a lot of like cross membership so we'll find you the right people. The question was, do we wanna talk about older, sorry, I'm bouncing around on you. Talk about older Kubernetes version. So part of our bug bounty program and disclosure program is we scope what versions that we take reports for. Generally that's just currently maintained versions. Kubernetes has about a one year period of maintenance on minor releases. And so those are the ones that we accept an award for the bug bounty program monetary awards for. We don't make awards for unmaintained end-of-life versions but there can still be security issues in them. And that necessarily might not affect newer maintained versions. In those cases, we'd still ask, I think that you report the issue even if it's not eligible for bounty because it's a known thing that there are a lot of distributors who also distribute versions beyond the upstream end of life to some longer period. And we'd wanna work with the community and with our distributor community to notify them to say, this is not something we're gonna patch or maintain or anything, but you all are owning Kubernetes for your customers and we wanna help you out and say, here's an issue you might wanna know about it. You get to go figure this out, but yeah. And this is maybe a bit more vendor related, but I wonder, so we are running clusters with Glue and NGINX. And I wonder if there's a security vulnerability in one of those frameworks libraries. Is there a central place to go to find if something is going on? Or should I go to each vendor or is there a central location? I think I can take that. I think it's really gonna be each vendor. Like if you're paying a vendor for paying a five or whoever for NGINX, yeah, work with them. If you're using an open source project like NGINX Ingress, that's a Kubernetes-owned thing, you report that to the Kubernetes security committee. For anything else, it'd sort of be its own respective, either vendor if you're using a vendor paid solution or to the respective open source project. So things we don't take reports for, even though they're commonly used with Kubernetes or like CoreDNS, etcd, Prometheus, like Hotel, all those kinds of things have their own process and their own people who maintain those projects and handle those reports. So we, or container D being another one, right? It's used, almost everyone uses that or Cryo with Kubernetes and they have their own security process. So we sometimes coordinate with them if it's a, sometimes there's weird coordination bugs where it's a Kubernetes and Cryo specific or Kubernetes and container D or something. But generally, if it's a specific, if you think it's Kubernetes, please tell us and if we find that it is container D, we also know those people and can forward it to them. But if it's, if you know, okay, it's with this specific project, you'd follow up with either a project or a vendor. Yeah, and like maybe to add to that, like just like what we do like internally, sometimes it's like we'll have like our own mailing list that has like our security folks and then we'll subscribe that mailing list to all the vendors that we interact with. So that way we have like one place when we see an email notification, it's going to our security folks and then we can figure out, oh, I see a container D had an issue. All right, I'm gonna go find our container D person and be like, what do we do? Does this impact AKS? Yes or no? Do we have to notify customers or is it something that we can do on the back end and all that stuff, right? But yeah, just as, you know, like Micah and myself are in SRC, we're also on the other end and in our employers, right? We sort of handle the inputs. Any other questions? So the hacker one bug bounty has been active for about three years, it looks like maybe $50,000 has been given out, which is pretty cool. But across three years, that maybe feels kind of small, only a couple with having done security audits now on the project. Do you feel like the project is getting enough attention? Both on that kind of generating issue sides and then also on the response side, how are you all doing on sustainability within the SRC? I'll start, yeah. I think we do get a steady stream of reports a lot of them are to the point of architectural weaknesses. Sometimes it's not architectural weakness. Like there was recently a, I don't know that this came through a report to us, but it went through, I know it went through us and a bunch of other vendors on very powerful pods in a cluster that like if you have a powerful daemon set that you can privilege escalate to other nodes. And that's not like a necessarily Kubernetes vulnerability, but it's a pattern that can be done, at the time was copied and used a lot. That was an insecure pattern. But to your question about engagement, I think we do get a pretty good engagement. Like we do get some reports, a lot of them are just lows. And I think, I don't know that I would say that Kubernetes can't have a critical issue. I definitely think it could. I think it's just between our security audits, between more people looking at the code, we found some more of these issues, but we just haven't seen the higher end, more of the higher end ones come through. A lot of them are like on the lower end where it's a single node DOS or something like that. And I think it's also just a matter of like complexity. Like I think the amount of effort and knowledge and skill set that you would need to find a critical and Kube is kind of high. And I mean, I'll ignore nation state actors that won't report it to us, they'll just use it. So I think it is some of that. Like we definitely wanna encourage people to look for those and try to find them and report them to us, but it hasn't really happened. I think on the second part of your question, Tim, on like sustainability. So I've been thinking about this a lot recently just because like, you know, that's not a lot of people on that list, right? So like I think like two high level plans I have, and this is just like in my head, not really like formally that well discussed, but I'll say it here. It's like we're thinking about like, should we have like some folks who have a different role within the committee? It's like a more like a product manager role because we don't have like explicit deadlines. We don't have like releases and like just things to kind of keep things moving. So it's very easy for us to like get consumed with like our downstream or even other upstream commitments. So like, I personally struggle with like dealing with all the paperwork involved with like a bug. Like if there's like an interesting bug, I will go fix it in like a day, but then it'll take me like three weeks before I can like write the doc on it because I just don't like doing that. But you know, it's human issues there. The other aspect is like that list of like eight or so people, like we are all like relatively senior and later in our careers, what that also means is we have a lot of responsibilities already. So I've been hoping to come up with some plan to like have more junior folks join that obviously will have more to learn, but they'll also have more time and energy to kind of put forth and things like, when we had, I think I'm blanking on the name, like the yearly SIG help thing, what is it called? The annual report, right? So like the annual report was asked of us recently. I was like, I don't think we have anything to put in this. And that doesn't sound like a good sign. All right, like there's like one thing I would like to have is like better like tooling and process around like private releases and stuff. Like if it's something was critical, like how would we like maybe like issue, like not like a private image, but like an image that has non open source code to our community because you know, we like we can do this for distributors, but we could buy just handing them the patch and letting them do it, right? But if it was critical, critical, we would we wouldn't want to protect the community better too. So one minute, anybody have a last question and a short question? I think we'll call it. Thank you everyone. Thank you for coming. Thank you.