 Thanks for coming to this talk, this afternoon. I would like to introduce ourselves. This is the CIG architecture, Kubernetes, intro and update talk. So like, I did want to ask you all, like, what do you think this talk is about? Anyone have an idea what this talk is about? Good. Not in the traditional sense as such. We are not gonna talk about deep into the internals of Kubernetes, but we'll talk about how we take architectural decisions, how the community operates, how we go about doing the different things, which might not be very specific to one area or one special interest group in the community and things like that. This is an intro and an update. So we do this kind of talk to welcome all of you into the community, to take part in the processes that we have and help make Kubernetes better for everyone going forward. So we'll talk about like, how do you make enhancements in the community? How do you add new features? How do you make sure that the old features, you know, like just installing Kubernetes not the end of the story, right? Like you have to have a, you know, upgrade maintenance, everything comes into the picture and then it becomes really hard, like just like any other software that we have. So SIG architecture is a place where we talk about these things and work on practical solutions around how to make sure that we don't break people and how to make sure that people are getting the things that they need from Kubernetes. Hope that helps. Just warning you out of time. So my nickname is Dimms. You can call me Dimms. I'm on GitHub, Twitter. It's the same Dimms everywhere. John. I'm John. Dimms and I are two thirds of the SIG architecture co-chairs. Yeah. John is from Google. I work for AWS currently. So let's take a quick look at the goals of the Kubernetes project itself, right? Like you've seen this before. You've seen how Kubernetes is being used by everybody everywhere. People have called it, you know, the, I think Priyanka called it as a Linux moment of Kubernetes in one of her keynotes. We try to be very flexible, extensible, automatable. We have some new ideas around like, you know, how exactly people need to do their APIs and how, you know, declarative patterns and things like that are new. How to use AAML for all the things. So you've seen all those things and that's why you are here in this conference and you want to use Kubernetes and you see the value in Kubernetes. And these are all the things that we think about while we go about our tasks during the release cycle. So we do have a few community values that we adhere to. You know, if you look at the small URL at the bottom of, you have the direct link to what our values are, but, you know, summarizing. We want to be sure that this works for everybody, right? If it just works for Google and AWS, it's not enough. It has to work for everybody in this room. You know, we try to do as much automation as possible. We, not just for the community to work, but also making sure that you all can do your own automation in an easy fashion as well. We want to be more inclusive. So we try really hard to listen to multiple voices. I will give you a call out at the end about some of the things that we are looking for feedback from you all to make our forward looking things. Like for example, there was a talk today on, there was a keynote today on working group for LTS. Where does that happen? That happens here. So, and there's a survey in the morning, there was a call out for that. So we'll talk about that a little bit again today. And we want to keep evolving. If we stagnate, and if we, then, there isn't much fun in it for anybody, right? So, Kubernetes has to evolve, and it has to keep up with the times, just for, you know, think about this week, right? Like what are the new things that people are asking from Kubernetes? ML and AI are new things that were not as much, you know, the year ago or two years ago, but now people do want to run ML and AI workloads on Kubernetes. So then we have to figure out like, how do we translate that to actual things that we need to do as part of our community work, right? So, we have to keep evolving, basically. That's the point we are trying to make. So, it's important to understand the structure of how we work, how we do work in the community. So, I mentioned briefly about special interest groups. Special interest groups are essentially, you know, you can think of like domain experts, like networking, storage, you know, security, scalability, windows, authentication, CLI, right? Those are all like areas where people know what they're doing, like the six CLI, he is writing a CLI, right? Authentication folks are working on authentication and authorization, right? The networking force are deep into the weeds on how things need to work for all the network use cases that you have. But then, how do you tie all of these together? That's where CIGAR architecture comes in. So, if you look from the top, CNCF is the foundation we live in. In under CNCF, there is the Kubernetes project. Kubernetes project is run by the Kubernetes Steering Committee. The Kubernetes Steering Committee essentially delegates the responsibilities and the roles to different special interest groups. Now, for the technical things, they have delegated that to CIGAR architecture. So, essentially, they are telling CIGAR architecture that hey, you are responsible for making sure that all the CIGs are working well with each other. And if there are conflicts between multiple CIGs on something, then you have to step in and help moderate and come to a conclusion and things like that. So, we have a charter for the CIGAR architecture. And that is a charter that the Kubernetes Steering Committee has approved, which gives us the authority over things that we do in the project. Does that help or is that logical? Does that make sense? Okay. So, this is important for you all to know also because if you want to do some work in the community, then you need to figure out like, oh, what kind of a role I'm looking for? What kind of an area is of interest to me? Then it helps you navigate to figure out like, yeah, I think I'll go spend some time in network or storage or runtime, node, things like that. Okay. So, like I said, we have a charter and our charter has these things listed. So, I'll let you read it briefly and digest it a little bit and I'll walk you through a few of the things that are here. Now, okay. So, let's take the first thing. It says conformance test definitions. What does it mean? Right? So, all of you want to use Kubernetes. All of you have different vendors or you are rolling your own Kubernetes. How do you make sure that two clusters of Kubernetes, maybe from one vendor to another vendor or a managed cloud provider or self-deployed Kubernetes cluster, you could be using cubes, mini-cube, you can be using cops, you can be using something else, K3S, K0S. How do you make sure that your application still work on all of them, right? You see the problem there? Because if everybody starts making their own changes, then your applications are not going to work. So, the way we enforce that is using conformance tests. So, in conformance tests, what we end up doing is, here is an API for Kubernetes. Here are the things that must work, exactly the way we have defined it, across everyone, right? So, those are the conformance tests that we write, which we then use a conformance program where all the vendors, if they want to use Kubernetes in their name, for example, I'm from AWS. So, the AWS Kubernetes managed service is called EKS. And same thing for Google, there is a GKE. So, if either one of us wants to use Kubernetes in the name, we have to pass the conformance test. And we have to publish the results publicly, and that must be replicable by other people. So, that is the bar for using Kubernetes in your name and in your product, so that any customer can be sure that the Kubernetes that you get is something that you expect. Doesn't make sense? So, similar to that, we help out with design principles, deprecation policy, and lately, we've been spending a lot more time on production readiness review, which John will talk about in a little bit. And I did hint about how do you make enhancements? We do have an enhancement process, right? Like, so, it's not enough to just throw code over the wall, right? Say you're interested in something, you're working, you have like a 10X programmer who can like write a bunch of code and like, okay, the new feature we want this in our product, guy goes right and throws out on the wall, what do you do with it? We don't know what to do with it. We don't know you from anybody else, right? How, what's gonna happen if there is a bug in the code that you gave us, right? So, we have to know deep into the details of like, why are you doing this? How are you doing this? What kind of things did you think about? Is this the only proposal? Is there an alternating proposal? What is the effect of this new feature in production? How do you track logs, metrics, all those other things that you need for your production clusters? We ask those questions to the people who are working with us. So, that is part of the CAT process. So, we talked briefly about some of these things. The test review and management is under the purview of SIG architecture. We have Jordan here who helps with API reviews. API reviews are very important for us because imagine, you know, the APIs keep changing and they break, like you code to an API and then the API is changed under your feet and then the thing that you wrote is not gonna work, right? So, there is a natural progression on how to evolve the APIs that we have over a period of time. And from our experience, we have written down how exactly you need to think about this. I'll give you a simple example, right? Imagine right at the beginning, Kubernetes had only one IP address for a pod, right? And then they wanted two IP addresses or an IPv6, right? So, the API change. So, the CLI changes, the API changes and how do you make sure that the evolution of the APIs are backward compatible and, you know, forward proof, those kinds of things we end up doing in the API review process. We talked about the caps. So, this is a process where it's a living document. We write it down. We keep updating it as we go through the different stages. There is a process of alpha, beta and GA that we go through to make sure that, you know, existing code is not broken. So, we add feature flags and things like that. We make sure that when we are deprecating things, like one of the things that you might have heard in the last year or so is like, hey, in 1.24, we deprecated, removed Dockersham. So, you can't use Dockersham anymore. So, that was, okay, I see one person smiling. So, we literally plastered that information across every channel that we have, right? Like, that it is a Slack channel or a website or, you know, anywhere you go, you would not miss that fact that we were deprecating. We don't do that for everything, right? But that was an extreme case where we had to do it because we know that it would affect people who are upgrading from older versions and we wanted to make sure that they see it before. They have to, and they can plan for it ahead of time. So, deprecation process is very important. Version skew is very important because you have to upgrade and say if the upgrade didn't work, then you should have some choices around what exactly you can downgrade or like during the upgrade process, if you have a lot of components, then, you know, how? You might not be able to do all of them at the same time, so like, what is the skew that will work between components in a version range, right? So, what kind of debates we will end up having in Sikh architecture? So, when there are technical leads and chairs and owners of different areas, when they have disagreements and they're not sure they might be trade-offs, we end up moderating those discussions and making sure that we are taking the right decisions and then we refer back to our values, refer back to our charter and figure out like, okay, let's do, in the end of the day, we have to pick a way to do something. There is gonna be consequences. How much consequences? Who pays the price? What do we need to do? When do we need to do? How much comms do we need to put out? All those things we end up talking about in Sikh architecture. We have a mailing list for sure. We do try to write a lot of these things down and we, when in doubt, we write a policy about something or the other, right? But then not everything is written down. Things keep changing in the world, right? Like, a while ago, we didn't have a policy around like, when do we update Golang versions or do we go around changing the Golang versions in older versions of the releases that we make? So we started writing one because it started bothering us more, especially with the CVs that are coming out in Golang and elsewhere. We're like, okay, fine, we need to do this and we ended up writing those things down. One of these days, people will come and say, hey, I have this exciting new architecture. Can I add this to Kubernetes? Right, like, yeah, you're welcome, but here are the steps. Like, you know, you have to start here and make sure it builds fine and then you have to go through these set of tests and you have to go through those test set of tests. Then we might think about making a release where your architecture is supported, right? Like, or Illumos, right? Different operating systems that we don't currently support. And we did have to make a call on saying, okay, Windows with ARM, not ARM, the ARM32, sorry, we're not doing that anymore. So we have to make those kinds of decisions because keeping these things up is hard too, right? Like, if you don't have a CI, what is the point in supporting an architecture or an operating system? So that's a call that we need to make. So that is the kind of decisions and discussions that happen in CIG architecture. Okay? Do you want to go from there? Yeah, I can go, yeah. So to get all of this done, generally the structure within Kubernetes Project overall is you have the CIGs, but then you have, underneath the CIGs, you have sub-projects that are sort of narrower scope. So within CIG architecture, to get all of each of these things that DIMMS has been talking about falls into a different sub-project as you see here. So some of these, he's covered pretty well already, but I think we'll drill into a few of them as we go. API review, DIMMS mentioned. This is, all of this comes down to what are the processes we use to control the flow of features in the Kubernetes? Because Kubernetes, every feature adds risk, every feature adds capability, and we need to manage the risk and capability trade-off. So API review is about consistency among the APIs. So when you go to a storage API versus a network API versus a compute API, they have the same flavor, right? And how those are maintained. We'd love to, if you're interested in this area, get involved, jump into this sub-project. It's a really, actually this is one of the coolest things you can do in Kubernetes. It does take quite a bit of depth and understanding of the system and of Go, but it's kind of like a, maybe a pinnacle of achievement for people in the community to get to API review, because you end up doing a lot of design review on new features as well. The other sub-project, in other sub-projects, is code organization. So here you're thinking about the dependencies and dependency management. Things like when Go is upgraded, but also things like, we have hundreds of dependencies. How do you manage those as they're, how do we reduce the dependencies as much as possible and how do you manage them as they change? Enhancements process, this is the overall organizational process to control that feature flow. So some contributors kind of, hate us to some extent for it, but the reality is that without a controlled enhancements process, everything would go to hell pretty fast. Excuse my language, but so we control things in the way that Dimms was talking about earlier. Conformance testing, another sub-project, so all of these things are done, but basically what happened is that Kubernetes, as it grew, conformance came in a little bit later. And of course that meant there was a bunch of technical debt, there was a bunch of missing conformance tests. So over the last several years, the CNCF has been kind of funding a group of engineers actually to build out the conformance test suite and we are at like 99.6% coverage or something, almost there of the GAAPIs. So if we've got folks from IEI here, thank you very much, it's been awesome. And we're super close, but that's really critical for that user workload portability, which is of course key to Kubernetes' story. Little more on production readiness review, so this is another part of that feature flow process. So we have how many contributors? 10,000, 15,000, like enormous number of contributors. One of the, you know, and everybody's got ideas. So we have to have processes to get that flow, but not only that, as you mature those ideas, you know, when they first come in, you come up with this crazy idea, we need you to come into it at an alpha state. And so what does alpha mean? Well, production readiness reviews kind of help define what the real hard constraints are and what the real hard constraints of each of these stages means. Pretty much in alpha, what we look for in production readiness review is you can turn it off or on with a feature gate. If you turn it on, you use it, and then you turn it off again, you'll be okay. And then if you turn it on yet again, you'll still be okay, because there are lots of times when, you know, maybe you upgrade and then you turn it on again and all of a sudden it's broken and it causes some cataclysm. So that's kind of the goal at alpha, at beta it ramps up where we want metrics, you can see people can actually monitor it, they can tell that it's working the way it's expected, and of course at GA it's even further. So this is a process by which each cap, each enhancement proposal is reviewed at design time and to ensure that people are thinking through these sorts of questions. Basically, the point of view is, I have to run 50,000 clusters, how can I tell whether something is working or not? All right, we have 10 minutes left and we wanna have time for questions. Another big area of focus for SIG architecture is helping to guide and define, I mean really it's the other SIG, SIG architecture itself doesn't actually write a lot of code or own a lot of code, but what we do is help with the other SIGs and help coordinate them and guide them in what principles we should use for extensibility within Kubernetes. So that's part of that overall design principles concept. So over the last several years, there's been a big focus on doing less entry upstream, meaning as part of core Kubernetes and doing more using these extension points that the community has built. And so you have comment on that? This is a key thing that we do because if you look at the CNCF landscape and you can see so many projects, guess what, all of them have some sort of story with Kubernetes, they extend Kubernetes in some ways, they have a Kubernetes story, they live within Kubernetes or run on top of Kubernetes, they use all the extension mechanisms that we use and that is because that is how we design Kubernetes. Right, yeah. You can read this, lots and lots of different ways to extend Kubernetes and you're probably familiar with many, many of them. So where is Kubernetes going in the future? I think that we're gonna continue with extension points as we know we've already built out most of conformance and now there's a policy where you don't go to GA unless you have conformance tests so we maintain conformance. And I think something that's not on here is what Dim's mentioned earlier is work with relation to AI ML workloads. I thought you said something on LTS. Yeah, yeah, how do you participate? Come to the meetings, we are easy to find, join Kubernetes lag and we have channels out there that we really want your opinion, we want your use cases, we want your pain points, we want what you wish for from Kubernetes for sure and one call out that we definitely have for you today right now if you can do is tell us where you are in your journey of Kubernetes. What sorts of Kubernetes clusters that you're running right now and what kind of pains points that you've seen when you are upgrading Kubernetes and tell us like right now we have a policy, our policy is like any release will be supported by the community for a year, right? And if you have clusters that are older than a year then you're out of options, at least from us. And then you depend on your vendor or you depend on somebody else, especially with all the security stuff coming down the pipe and you've seen all those vulnerabilities and like every week there is something new which is scary and so don't run clusters with unsupported Kubernetes, you should update to new ones and you tell us what kind of pain points that you're seeing when you're doing that and please go through the survey and tell us, so we are collecting the information so that then we can figure out like how can we support you in one way or another? We can't promise exactly what we will do but the data that we collect from you is gonna be helpful in a lot of decisions that we are gonna make and the better data we have, the more data we have, the better decisions we'll be able to make. Yeah, so exactly, so this is your easiest opportunity to influence the long-term support initiative within Kubernetes. Of course it is an open source community and it's a very open friendly community and so if you have a lot of concern in this area, please join the working group, Jordan over there in the back, Jordan here, he's a good person to talk to about that if you want to, but that's one of the major initiatives going on. That's what we have, but we have six minutes for questions. If you have, what bothered you about Kubernetes? Like if you have a Kubernetes story, Kubernetes failure story, go ahead, please. Yeah, I have a question about it. Yeah. You have the microphone. Yeah, there is a walk-in mic. You can queue up too if you have an additional question, you can join. Check, check. I have a question about security, so like practically nowhere throughout the presentation there was like much about security, so is it something you actively think about as an architecture group? Yes, so we do have a Kubernetes security response committee, so and they do the pre-embargo and those kinds of things. We have a vulnerability disclosure process. We've written things down, saying like when you see something that you think it's a security issue, here are the steps that you go through and report things to us, and we have a program where we have a set of folks that who look at the incoming queue and they talk to different vendors, different people, different people doing security, and then figure out what is a plan on how to address this, come up with an embargo date, come up with a set of patches. We do this all the time. We are really good at this right now in the CVE process. Thank you. I was actually talking not about vulnerability per se, but more about like the general architecture, like we have network policies. Yeah, so. But to do that, you need to collect them. Absolutely, so we've done several of those things in the past. We have a security which helps with these things. We have a security audit that CNCF pays for every year, so we are expanding slowly what it covers, and every year we find less and less of things that we need to fix. So yes, it is a constant ongoing process. We have a SIG auth and we have a SIG security and we have the Kubernetes Response Committee. We have the audit, so believe me, between all the vendors poking at things plus all the community infrastructure that we have, I can say in good faith that we are trying to be extremely careful about what we are giving you so that your stuff is better off. Yeah, I think the way I would put it is, the people that tend to be involved in SIG architecture have been around a while and so have a viewpoint that will take security into consideration, but the hard details of that fall into the other SIGs. So you've got SIG security, which is really the security expertise, but you've got SIG auth, which is gonna have things around the authentication and authorization, and then you've got even within SIG node, there's so many layers to security. It doesn't fall just at the architectural layer, but you've got privilege mode on the pods versus not. Those sort of things fall into the individual SIGs. Right now we are running CTFs somewhere in this conference center. What's the CTF? CTF, capture the flag. So essentially they'll give you a set of instructions for you to go in, look into a cluster, and you have to go find things, break into things and find things, and then basically we close the loopholes. Thank you. Any other questions? Any Kubernetes failure stories? We love those. I don't know if you heard the story from Datadog this morning in the keynote. Yeah, I see one nod at least on that side. That was a very interesting one. It was about unattended upgrades, bringing down their entire production system. I didn't have anything to do with Kubernetes per se, but it was a really good story nevertheless. So we are constantly looking out for how we are using Kubernetes, what are you using it for, what are the pain points that you're seeing and facing, and that's how we learn, right? And if we learn, then we can apply to the next set of things that we ship. So that's why we need your help, okay? All right. Thanks a lot everyone.