 Hi, I'm Pushkar. And I'm Nadir. So I'm a staff engineer at VMware, so I work on all sort of Kubernetes bits and pieces. I've also been a maintainer for Cluster API, which is sort of like a declarative way to create Kubernetes clusters using Kubernetes itself. Yeah, and we're here to talk about how some animals like to ride some bikes. Well, not really. But I've gotten a lot of questions about the title and people are like, what's up with the animals? So they are actually talking about three projects. Each project's logo happens to be an animal, officially or unofficially. So the first group that we actually work with is CNCF tag security. I'm a tech lead for that tag. You might have seen some of my friends in the keynote stage in the morning. Our logo is a raccoon. That's why raccoon is in the title. We have another group that was involved in this whole process, which is talking about how we secure things in Kubernetes called sick security. And their favorite animal happens to be geese. And for the next one, I'll let Nadir go. Yeah, so those of you who are friends of Terry Pratchett, this world is a big world, big flat world on the back of a turtle. Terry Pratchett was famously asked what's underneath the turtle? Somebody said it's turtles all the way down. So given that Kubernetes cluster, APIs and Kubernetes to build Kubernetes clusters, we thought, well, what's underneath that? Well, it's also Kubernetes all the way down. Hence the turtles. Yeah, so how do we get started? So back in 2019, I was sort of in a field engineer, sort of more customer facing role, but I was a user of cluster API. I was worked on the cluster API project. I'm a sort of nosy person, so I like to spy on the CNCF TOC. And I noticed they were doing security assessments for various projects. I was like, can we get one, please? Can I? And not much happened. And the reason for that is it wasn't really plugged into the CNCF that well, I didn't really know that many people. So it kind of just stayed there for a while. And that's because, like, you know, I'm not chopping wood, carrying water, as we mentioned in the keynote. I'm not active in that space. I didn't really have the time for it at the time. So we just kind of just let it slide and let other things happen. And then, April 2021, met Bushko, and finally we got an issue open. Maybe let's do one around a Kubernetes subproject. So if those aren't familiar with Kubernetes, there's an overall Kubernetes project, big code base, there's lots of little subprojects for various things underneath there, organized through special interest groups, plus the APIs part of SIG plus the lifecycle. And we finally got... Yeah, exactly. And with security audits, one of the things was, do we want to increase the scope or limit the scope? So I talked more about that. The other thing that we also, as you can guess from his accent, is not from San Francisco Bay Area. And I'm from San Francisco Bay Area. Most of our tag was based in North America. So we really had a problem about, if we had to do in-sync meetings and discussions, he would have to sacrifice his evening time with family, which wasn't great. Me being in North America, I was privileged enough to talk and meet them at 10 a.m. in the morning. So eventually, what happened was we started to collect a group of people. Some of us were in the Bay Area like Robert. And we had Nadir in UK. We have Fabrizio and Lubomir. Yeah, so Fabrizio is one of the maintainers of the cluster API project. Lubomir and Arvid's one's Qubadm. Qubadm is sort of heavily used by cluster API. It has kind of already been through security audit through the main Kubernetes project. But because it's a core dependency and we use it in interesting ways, we really wanted some subject matter expertise from Lubomir. We also had Ankita, who was my sort of code conspirator on the cluster API side. He was based in Bangalore. Yeah, so we had a group of people. We had a project or an idea. But we had to get started somewhere. So with everything on CNCF, it started with a GitHub issue. And the idea was in CNCF tax security, we do self assessments for security for any graduated project of CNCF. So for folks who are not familiar, CNCF is the big umbrella under which Kubernetes and all other graduated projects are. And then in the same graduated projects, there are multiple sub projects. So the sub project that Nadir was maintaining at the time was called cluster API. And it was under Kubernetes and tag works in this umbrella. So there was a gap here. And what when I went to tax security, they were like, well, it seems like a good idea, but we really haven't done it. And I know we have some friends in Kubernetes community who do great work in security. They're called six security. And I said, yes, I know because I am part of them. So luckily being in the right place at the right time, I could then go to the next six security meeting and share with them. You know, I heard from CNCF tax security, they do something like a self assessment. We don't have a process like that for a sub project like cluster API in Kubernetes. We being in the same slack space like cluster API folks being in the same Google groups, being in the same GitHub org. Maybe it makes sense to adopt what the CNCF tax security did and use it in six security. So as you see and expect from community, everybody welcomed that idea, including the chairs. They were like, yes, let's do it. And that was a great thing for me as a new ish contributor. And then I could go to another and say, well, we have a solution now. So we created this GitHub issue in tax security to keep a track of where we are, but the real work actually started in Kubernetes six security. So in all the screenshots, please note the org name and the repo wherever you see it. And here is where we actually started discussing things. And first thing was all of us are all around the place. Let's go a sink who wants to meet at midnight, who wants to wake up at seven a.m. in the morning, have meetings without coffee. So we thought, let's create a Slack channel. So we created a Slack channel in May 2022 or 2021 on 22nd May. But it was still, we didn't really get anywhere. Eventually, Robert who was one of the six security and tax security members is like me almost asked us, wouldn't it not make sense to have maybe one meeting where we just talk, figure out what needs to be done and then go a sink. Maybe that will trigger and kickstart things for us. So we did that. And our first meeting was actually on August, which is about you count three, four months after we created the Slack channel. And again, with Kubernetes, just creating a Slack channel needed a GitHub issue because you don't get to create your own Slack channel. Since obviously we have to follow the process that's established, you get to ask the wonderful people in sick contracts to create those things. And then we had that Slack channel by just creating a GitHub issue, which was great. Eventually, though, then we started where we are right now. How do we get started? And that's where Nadir came in again. Yeah. So why did this? So we had our first meeting. We decided to do some stuff asynchronously. I took a look at what some of the other graduated projects who had gone through the process had done. One of the things I noticed is a lot of them had done this sort of self review of their secure software development practices. So this was initially under the core infrastructure initiative, but it's been moved under the open source security foundation. If you are a maintainer of a project, you can just go in. You just sign in with your GitHub ID, and you start finding a questionnaire, and it sort of asks you, are you doing security? Are you doing run and vinty scanning? Are you doing static analysis? Are you doing security code? What are your processes for contributions? Security reporting? So I just build it out. Are we doing those things for plus API? You can see that the current status is not quite 100%. We're not passing, and we'll come back to that at the end. But that was just easy things you can just do straight off. Next stage was, oh, that slide didn't work. Anyway. We have one after this. Yes. So that's the wrong way round. That's why. Right. Next thing is data flow diagrams. So I thought I would give this go asynchronously. So data flow diagrams are a way of modeling components within the system where information is traveling between them that can help a security analyst determine threats, find weak points in there. So I thought, oh, yeah, that's easy. I'll just do it. Ended up with basically a spaghetti meatball. And that's when I thought maybe we need some help here doing this properly. Yes. So the first thing we did is let's think about scope. So to avoid spaghetti meatball situation, maybe let's focus on some key areas. Now, cluster API covers a lot of different infrastructure. You might have just seen an announcement about cloud stack, apparently cloud stack being added two days ago by Amazon. So can't deal with all of those interactions. Let's keep it close. We could look up sort of hardware trust systems because you're creating nodes and how are they authenticating? How are they proving their identity? Tenant boundaries. So you've got different clusters in the same, like maybe a same Amazon account or the same Google project. Like maybe you can jump across different clusters. Core Kubernetes components should have already been covered in the main Kubernetes audit that was done a few years ago by Trail of Bits. So that could be excluded. And finally, certificates. Like we do a lot of certificate stuff. So I work on Amazon in particular. Also, I work at VMware. I did not want to make this sort of a VMware fest. We could have said, oh, yeah, let's look at vSphere. That ought to be appropriate, right? Like don't want to make it a sort of one render show. We know Amazon's the most popular used cluster API project. So that's the one that we went for. Yeah. So as mentioned, we've already done the Kubernetes security review. And one thing about scope, Amazon already has a shared responsibility model. So there's no point covering things which are really the responsibility of the cloud provider. Like, oh, what happens if someone steals a hard drive from Amazon's data center? We're not interested in that. There's nothing we can do about that as a cluster API, as a Kubernetes sub-project. So keep the scope manageable and sustainable. Yeah. So we took a slice through the system. So if we're creating clusters, we're mostly creating machines, creating Kubernetes control planes out of them, joining machines to that cluster. So that covers a large enough slice through different systems, like Kubernetes itself, the cloud provider, and then that sort of set the scope of what we're doing. So that stage is like, okay, I don't want to do this spaghetti meatball again. So let's start having some meetings and have to do security. Pushka and Robert joined us. We recorded, we set up some sync calls. They're all recorded. They're all on YouTube. They're still there. People want to do them. And I'll ask you to tell you how to do Dataflow diagrams properly. Yeah. So huge shout out to ExcaliDRAW, which is what I used to draw this diagram. It's still a complicated diagram. Like I won't lie, but it felt better than what we had started with. And the idea was if we have narrowed the scope, the complexity is going to kind of go away. And once it is simpler, then it is easier to poke holes into the flow. What if this particular component that's represented in this block diagram gets compromised? What do we do? What if there is a man in the middle attack between two components interacting with each other? Is there a sensitive data that's being transferred over TLS or without TLS? What happens if I escalate my privileges to admin? Do I have more control? And can I do more things as a malicious insider? So all of those questions, which is what we ended up asking, Nadir, Ankita, and so many others in cluster API, they were very patient. It's not easy when you're continuously asking questions for multiple hours or multiple days, but they were very patient and it really helped. After the diagram comes the obvious thing about words, which is write down what you thought and discussed. So what better way to write everything down to start the discussion than Google Docs where you can come and add suggestions and have much more of an async discussion after like multiple hours of meetings that had the data flow diagram as its output. So we shared it. We also started documenting different threads. We used one of the more popular thread category called stride, which is spoofing, tampering, repudiation, information disclosure, elevation of privilege, and I missed the other one, denial of service. So we had the threads. We had the initial data flow diagram. We had the assumptions clearly called out. We had mentioned the scope. And once we got more feedback from the community in cluster API, we also started sharing it with Six Security. As we became closer to a realistic PR, which is a pull request on GitHub, we thought it might be a good idea to now convert what we have on Google Doc into markdown because everything on GitHub at least on Kubernetes and CNCF spaces is on Markdown. So we converted the Google Doc into Markdown. It was on HackMD. Another set of reviews happened. Again, this took a few weeks while we were all doing our day jobs, all doing all the other existing roles that we are all playing. And then eventually we had a draft PR, which was almost unbelievable when it happened because it's been so long since we had started this and we could actually see things in progress and being maybe helpful for the future. One of the discussions we had in Six Security is this is a massive document. It was about 1,000-plus words or lines or something like that. And we wanted to see what is the best place to keep it. So there were a couple of ideas. Why not keep the security assessment doc where the code is? So that would be Kubernetes slash cluster API. Another idea was what if we do more self-assessments in future? Wouldn't it make sense to have all the self-assessments in one place, which is why we ended up putting this self-assessment into Kubernetes slash Six Security. And it had its own dedicated folder called Six Security Assessments and then every project would get its own dedicated folder. So if you want, you can take a look at previous assessments, learn from it when you're doing a future assessment. After that, the main goal was now we have the PR, we have about 72-plus conversations on the PR. It's probably in a good enough state where we can start figuring out what to do next. And we were doing all of that. We were trying to figure out, is there a way to handle any kind of changes in roles, changes of personnel? And soon enough, we actually found out that that was the case. So good news for me, less likely bad news for another. I was working on a 50-50 split between downstream and upstream when I started. And by the time we had the PR open, I had a 30-70 split in favor of upstream, which meant I could do more of these things, spend more time on upstream, and help not only the people who are in Cluster API, but all the end users of Cluster API who are going to benefit from any improvements we make. And for another, I'll let him speak. Yeah, sadly, I moved completely downstream. So I mean, I'm still acting as a consumer of Cluster API, but I'm not a maintainer at that state. Well, I'm not a maintainer with the Cluster API AWS project, which is what I was doing at the time. So there's no point doing that security audit if there's no handover at all. So at that stage, that's where I pulled in. So when we started this, I was not a maintainer with Cluster API. I was just interested in sort of driving it. So at that stage, we pulled in the maintainers and said, look, we've done this security review. You're going to have to own it from now on. So let's make sure it's everything that you can agree on and carry forth basically. Yeah, so one of the things that we did do for the Cluster API project is because it's been used in quite a lot of products today. And it is being used to create Kubernetes classes. So it's made the case that it's actually sort of a security-critical project at this stage. So we... CNCF, Robert very kindly helped us get funding from the CNCF to get other logics who do have a talk later today, not later today, on Friday at 4 p.m. So I won't go into details of what we actually did, but they spent a month setting up Fuzz testing. We're now running on Google's OSS Fuzz infrastructure. So we have continuous Fuzz testing. The maintainers get an email every, probably like twice a week saying, found a new edge case, and we make a determination of whether or not it's relevant or not. Yeah, so the final outcome is, well, we've got the merge PR, we've got the Fuzz testing. One of the other things that we did are all of the core issues that were found. We filed as issues with tags, they're all labeled security self-assessment, they're all on the cluster API project. Some of those have been dealt with, some of them haven't. Yeah, so one of the issues that we've found is we don't have a good process because the security one is reporting for Kubernetes sub-projects because the maintainer base tends to be smaller. So what does it mean to have 24-hour response time when there's only three or four people? So that is something we need to think about as the sort of CNCF community. It's all very well having the Kubernetes core, which does tend to have like a larger maintainer base with the security process. But as we break Kubernetes up into lots of smaller projects, we need to do the same thing now for those sub-projects, so we need to figure something out there. That's been one of the main stumbling blocks. And also, we probably need some help with from security experts to prioritize what's the most important issues here. So there are a bunch of security issues that we've found. What are the most important attack vectors? So please still need some help with prioritization. Yeah. Yeah, so we finished Cluster API assessment. We have a list of things to do. Does this mean it's over? Well, turns out it was the start of something even greater. So after the relative success of Cluster API self-assessment, we thought why not open it up for everyone? And we actually are doing that now. So there is a talk tomorrow from Six Security, where we will focus specifically on what the security self-assessment sub-project does, how you can get involved, and how maybe if you are a Kubernetes sub-project, can do a similar self-assessment like we did for Cluster API. I'll pass it on to Nathya for the rest of it. So I'll just cover some... What do we learn in terms of doing this for the Cluster API project? So find a code conspirator, a fellow geese, a fellow vacuum that can help you on your journey on getting this security review. Make sure to get maintainer buy-in. Don't just put yourself outside and expect anything to happen. You need to speak to the maintainer. If you're an end user and you're using it and you're worried about security, just make sure you get the maintainer's buy-in on that. Choose manageable scope. You can't boil the ocean. You've got to start somewhere, and then maybe once you've dealt with those most critical things, you can start extending the scope to other areas of your project. You need to balance working asynchronously and synchronously, like account form, the subject matter expertise that you need to bring in, time zones. Yeah, so actually, one thing we didn't talk about much. One of the things we did during this... Thanks. One of the things we did during the drafting process is Pushka and Robert came out with a bunch of attack vectors. We came in and actually reviewed those as well and said, okay, this one doesn't really make sense in the context of how the project works, or this thing is out of scope, or we defer that element of security to an external project. There are things you can do with OPA or Kereno to secure the Kubernetes cluster with something stronger than RBAC. There's limits to what you can do with Kubernetes CRDs, so make and get more meaningful. We didn't really want to do security theater. It could be well easy. We'd just file a bunch of CVE reports, whatever. That's not terribly useful. We wanted something meaningful that says, like, these are things that are in your architecture of the project, so we were involved during that whole report drafting process. It wasn't just simply external auditors finding a thing, and then we'd sort of just fix some CVEs afterwards. Process works best when there's vendor sponsorships. Now, VMware was mostly involved, but it wasn't a formal thing. It was mostly IDROPE. I told my managers, I think this is important, we're going to do it. So even if it's informal, you need to dedicate some resource to this. And really, if the vendors are building products off the back of this, it should be them. Yeah, and finally, the open SSF, the best practice thing any project can do that is fairly straightforward. You can just run through the questionnaire and see where you are. And there's probably going to be some easy things you can do, like set up some static code analysis on my GitHub action and make sure the developers are using MFA keys when they're doing releases. Yeah, and then finally, just something we need to figure out. Oh, no. Yeah, if you're a critical project, consider going to the OSS FAS program. It's probably worth following those instructions to implement the fuzzers anyway. But if you are security critical, get onto the program. That means your code is always being fuzz tested continuously on Google's infrastructure, which is great. And yeah, just to find a point, we do need to figure out security reporting for the Kubernetes subprojects, and I'm hoping the tag security and the security will help us with that. So, open up for questions. Only one rule you ask. We try to answer. Yeah, so I'll try to summarize the question in case it wasn't audible for the people watching the recording or watching the live stream. For folks who are starting new, obviously you seem technical. What is the, what would be your recommendations for folks starting new who want to do something like this, either as a security assessor or as a maintainer of a project? So, you want to go with the maintainer part first? Actually, I was hoping... Well, I think there's two parts that in terms of getting started with the projects itself, I think that is probably up to the maintainers to produce relevant documentation. And then if you're interested in the security perspective, maintainers probably need to provide some basic architectural documentation about how their stuff actually works. So if you're... But I think there's also a slightly different question. Even for maintainers themselves, they're not necessarily security focused. So I'm not actually interested in your answer to that. Yeah, for sure. I would say thread modeling seemed, for me, when I started something I would never be able to do. It felt like, wow, only the people who are experts and have hacked multiple systems in the past would be able to thread model. Turned out it wasn't the case. I haven't hacked multiple systems, but I was... I'm decent enough with thread modeling. So your journey would be different. I can share my journey quickly. For me, it was reading up about what thread modeling is, watching some well-known, well-documented sessions to explain what thread modeling is, how you can get started. Talking to people who have done this in the past and telling them, can I shadow you when you're doing thread model so I can see what you're doing. And then asking them, can you shadow me when I'm leading the thread model? So those things helped. Eventually, once the technical piece was done, the harder piece I felt was the unwritten rule where developers are not very happy with the way security people think they are and how security people are not happy with the way developers think they are. So in the beginning, one thing I promised myself is I'm not going to be here to stop, block and shame cluster API team. The main idea was how can I be friends with them? How can I ask them questions which are more molded in curiosity versus molded into how could you do this or why did you do this? It was more about I'm curious if this particular system worked this way with this system, what would happen? So those kind of discussions I think brought them some level of comfort where it felt like, okay, this is not a meeting where I have to fight and be defensive, but this is where something good is going to happen for my project. So that's my take, but your journey could be different. If no more questions, thank you so much. Thanks.