 Hi everyone. Welcome to the SIG architecture introduction and update. You might be wondering what SIG architecture is and what its role in the Kubernetes community is and what we've been doing for the past time and what's new coming. Probably we'll talk up if you get some time we'll talk about those things too. So I am dims. My nickname is dims. So you can call me dims. On Twitter, GitHub and Slack, CNCF Slack, Kubernetes Slack, I'm easy to find, DIMS. So hit me up if you have a follow up later or if you want to get engaged in the community. Happy to get you all on board. John? I'm John. Yep. I'm John Belamerick and I'm also John Belamerick everywhere. So easy to find if you can remember my name. And so, yeah, happy to chat with any of you anytime. If it came to this talk, you have John's name. Just look up the schedule. That's right. Yeah, let's get started. Okay. So let me go over there. So it's a little bit easier than you can like switch. So the goals of the Kubernetes project is anybody not using Kubernetes right now, right? So all of you are using Kubernetes and you kind of like agree with the statements that we've made there. Those were the goals that we written down in our documents saying these are the things that we would like to see. This kind of helps us going forward too. It's not just the state of the thing that we're talking about right now, but we're also when we design new features, when we think about things that we need to do and how to do it, these are the goals that literally helps us figure out like, okay, for example, number three, right? Like, is there if when we're designing something, Hey, this is just too much for users. It's too complicated. We need to simplify. Can we like go halfway to make it simpler for the users, even if it is harder for us to do it on the back end kind of thing? So we use this as a mechanism to say, okay, fine. You're writing a new enhancement proposal. Please take a look at these goals as we do the design. So we do have some values. These values are very important to us and you will see these reflected when you come talk to us, when you come and talk to us on Slack or mailing list or PR reviews. This is what we believe in and anybody disagree with any of these things? Probably not, right? So this has kept us very good. In fact, let me take one thing and poke on it, right? Like for example, automation over process. So if you go look at how our testing infrastructure is set up, right? You know, it's running on really good with a few bumps here and there, but it's mostly things are happening on its own. We write bots and bots talk to each other, bots talks to people. So that is one example of how we are doing number three, okay? Next one. So the Kubernetes is part of CNCF. So overall structure is CNC, well, if you want to start right at the top for technicality, there is Linux foundation and there is CNCF and within CNCF, you have the CNCF TOC and CNCF TOC has a bunch of tags, technical advisory groups and tag run time is probably what we come under. And under TOC is all the projects. Kubernetes is one of the projects and in fact, the CNCF foundation was started with Kubernetes as the first project. So within Kubernetes, we have a whole bunch of special interest groups. Some of them go deep. Some of them go horizontal. I'll take some examples here, like take SIG windows. They focus on how to run Windows based workloads on Kubernetes better. And another example, if you take SIG release, you can see that that SIG owns the release process and it coordinates amongst all the SIGs. It fixes out the calendar date when we need to make a ship and then we work backwards to figure out when there's a feature freeze, code freeze and things like that so that we can have a release together. And we end up defining policies in the SIG saying, hey, we're going to do three releases a year. We're calendar based, for example. So those are the kind of things that we do in a specific SIG that is horizontal. Testing is the same way. So it spans across all the SIGs that we work on. So that's the way. And we do have some special committees for code of conduct, security response, and steering overseas the whole thing. In fact, the way to think about it is when somebody comes to us to do some work, we look if it fits in any of these buckets. And if it's not, we'll ask them to write a charter. And that charter essentially says we own code repositories. This SIG will own code repositories that pertain to whatever it is, X, Y, Z, right? Like something new you want to bring into the community. So then the steering essentially will approve the charter and you begin your work. That gives you the validity in the community to go around asking for resources and talking to people. And there is a technical lead and there is chairs and then you start doing your work and you figure out how you want to do your work. And we also have, like, if you take SIG note, for example, right, you talk about how the KubeNet works. Storage is similarly. How does CSI fit into the picture, for example? So essentially committees, SIG groups, working groups and user groups. We are trying to get rid of the user groups and we are trying to push that out into the CNCF layer, so to say. And this is constantly evolving. We added new things. We took some old things out that were not really relevant or there was no work being done. For example, there was a service catalog, SIG service catalog, and they were not doing anything. We bumped them out. There are other things which span multiple SIGs. We call them working groups. Working groups, people from multiple SIGs collaborate and the owning SIGs will pay attention to what they want to do and how they do it. John. Yeah, I mean, I think one of the main points with this is that distribution is better than centralization. And so each of these groups will own a part of the code. The project level things tend to set policies and processes across the project. So architecture fits within that area. We don't actually own much, if any. We maybe have a little bit of tooling code, but no real code or functional code. And so, you know, that's kind of where we fit in this. The horizontal pieces are things that every, you know, the API server, the API machinery and then the vertical are like resource management type of things. So that's kind of how the project is organized. So specifically talking about SIG architecture, our scope falls into, like I said, kind of this cross project area where we're sort of the monks who keep the design principles kind of in line and we have groups and processes that will help other SIGs as they're coming to design a given feature, make sure that they're in line with those design principles and make sure that we're following all the different keeping Kubernetes, the Kubernetes that you know. Space, space or tab? Oh, arrow key? Okay. Next. Okay. So like I said, the processes, different cost credit, cross cutting processes around conformance API review and then each of those processes we can go to the next slide is sort of, oh, I guess that comes later. Okay. We can still go to the next slide. So we'll talk about each of these later in a bit. Other kind of issues tend to come up for us. So whenever there's a sort of question between SIGs about how something should be done, what makes the most sense in the context of Kubernetes as a whole, they typically will bring that sort of thing to us. We're not really an escalation path, but a lot of the people that participate in the SIG tend to be people that have been with Kubernetes for a very long time and have very deep and broad knowledge. And so when there's a conflict between say the two chairs in a given SIG or the TL, you know, a couple of TLs in a given SIG and they're trying to figure out how to make it done. We're not going to make a decision for them, but we will have a conversation with them and try to negotiate or broker. That's a good word. Time broker and sort of come back to those values, come back to those principles and say, okay, if we go back to those principles, if we go back to those values, let's think about each of these viewpoints in that context. So like other SIGs, SIG architecture is arranged into sub-projects. So sub-projects then have owners. Those are the people kind of responsible for that area. So we talked a little bit earlier, we saw a list of the different processes. Each of those processes is effectively run by one of these sub-projects. We have five sub-projects in SIG architecture and we're actually going to drill into each one. Yeah, sure. So how many of you used CRDs, for example? Right? So CRDs have to go through CRDs and other things like all resources have to go through an API review, right? And the API review essentially says, okay, if you're adding a field or removing a field or you're converting from, the classic example we have is like typically, you know, resource originally when defined was a field, say an IP address, which was a string, right? And then you figure out like, oh, there needs to be many IP addresses, so you turn that string into an array. We've done this before, but there is a set of rules and regulations because we have to think about versions queues, how to support mixed versions of things, the API servers and the cubelets. We have to figure out if older cube CTLs can work with the newer API servers and so on and so forth. So through time we have accumulated a bunch of thinking around how we can make it easier because, say somebody from SIG storage comes with an idea and we say, oh, we've done something similar before, but here were the pros and the cons and the problems, so maybe you should think about doing it the other way where we had lesser problems for end users, for example, right? So that is what the API review is and we have a project board and we actually written down the changes and the conventions and things like that. So when any of the SIGs come to us for API review, we point to them to these two documentation so that they can go read what it says and then come back to us and say, hey, we looked at these things, some of those make sense, but we still want this unique thing that we want to do here, is it okay? And then we end up negotiating with them, talking to them and like coaching them through the situation so that it's good for the users, it's good for the community and it's useful for the developers. Next one. If I can just say a couple of things about this. So API review is sort of one of the most critical aspects in the development process and in the entry APIs, the built-in APIs, everything must go through code through API review and in a sort of, if you're looking to participate in a project and you really want to get deep into lots of different areas, or even if you're in a, want to participate in, say, SIG node, working towards being an API reviewer is a great way to get involved, we really need more people to do that because it becomes, it can become a bottleneck. We have many reviewers for API, but there's only a handful of people that can do the approvals, so it requires a sort of a lot of knowledge and you need to do a lot of API reviews under the guidance of those few current API approvers before, you know, you're sort of like, okay, you're good to go. We've scraped our knees and bruised our, so many times, so that's why we have this review, so to make it easier for people who are doing the work. So reduce the burden on them and give them well-defined things that they can follow. Exactly. And for CRDs, if you want to use the Kubernetes. You want to use the kates.io group, then you need to go through API reviews. Exactly. So code organization is a massive, massive effort. So everybody comes to us with a feature and they say, hey, this is a new feature, this is what we want to do, this is going to be awesome, everybody's going to love it, and we vendor the code into the tree and do things from GitHub here and there, and guess what? After six months, those people are gone, right? Because they're off doing other things and there is a set of people who have to maintain the dependencies, update your element versions and make sure that things are not broken. And there is this constant thing about, hey, there is a CV that was in this dependency and you're using this dependency, my scanner is screaming at me, can you go, hey, there is a set, for example. So there's plenty of code maintenance things that we need to do. Another example is, hey, we need to switch go versions, we need to go to 118. And in 118, can we use all the new fancy things that is coming in 118? Somebody has to say, hey, don't do it right now. Let us ship 124. When we go to 125, you know, then we can start adding generics a little bit. And then we'll figure out how it works and then we'll expand the usage of generics somewhere else. So maybe we should start with libraries first because then those are easier, smaller to tackle. And we'll figure out how we do the testing. Maybe there's some additional problems that we'll end up seeing with generics. One simple example is, hey, we need to patch something in master and we need to back port 124. And 124 doesn't do generics, right? So those kinds of issues, people come to us and we work through those situations with them and there's a constant effort to make sure that the dependencies are up to date, they work and the people who are ending up importing Kubernetes code into their repositories, they should feel that we're not pulling the rug under their feet either, right? So how many of you have projects where in your Go mod files you import things from Kubernetes? Yes? So you are our audience, right? Like if you change stuff that is going to break you, come yell at us. If the tags that you want is not in the Kubernetes repositories, come and tell us, we'll help you fix those things. That's yours. You want to do that? Sure, I can do it either way. So one of the other sub-projects within SIG architecture came out of, well, SIG PM closed down and so we adopted the enhancement process. So the enhancement process, for those of you, how many of you have contributed to Kubernetes? A few of you, right? So if you want to bring a new feature to Kubernetes, you don't just start writing it. You want to have some agreement amongst the community that the feature is appropriate for Kubernetes, that the feature is designed properly in alignment with the other features so people know how to use it without having to read too much documentation and that it's among other things. And so we have this process called Kubernetes enhancement or CAP process. This is probably the thing that contributors interact with the most that comes out of SIG architecture because it is a little painful, right? But it's like writing a design document. You've got to write a design document and there's a bunch of pieces of information you have to kind of fill in and you have to walk that through a bunch of approvals and gates. So we own that process. There's a sub-project that's trying to make that better. So if you're one of those few people that raised your hand and you don't like how the CAP process works, come talk to us at that sub-project. Come help us make it better. Some of us go through that process. Some of us don't. And so feedback is what makes it better. Yeah, this was supposed to be my slide. But even if you wake up in the middle of the night, either one of us can talk to either of those slides. Yes, good point. Another sub-project conformance. So if you look at different Kubernetes distributions like GKE or Tanzu or some of these other ones, right? Everyone will have a certified Kubernetes badge. And if they don't, they can't call it Kubernetes. So that's a program run by the CNCF that allows vendors to submit their test results and then CNCF gives them the badge or not. However, it's this group that decides what those tests are. Which tests actually define Kubernetes? Which functions are part of Kubernetes and which are optional functions within Kubernetes? And so sort of where does the name, when do you get the rights to use that name? Which tests do you have to pass? So we've been working on this for years and we had a whole bunch of technical debt. There was a team from a company in New Zealand called II that's really done a tremendous amount of work for the CNCF to kind of pay down that debt. And this is not just to bug the vendors to pass this test to it. It is to make sure that you all can depend on the output to the project, manage cloud providers. You get a consistent experience across all the products that you use that say that they support Kubernetes and that is the reason we do it. So it's not for us, it's for you. Absolutely, yes. All right, production readiness review. So this is one of those things that people don't like. Well, contributors don't necessarily love, but it's important. So two years ago maybe, as the project matured and there's thousands and thousands of people depending on it to be stable, we introduced the process that a lot of big tech companies use internally, which is for production readiness review. So essentially it's contributors, when they write those caps, we added a big questionnaire in there that as they move from alpha to beta to GA, they have to sort of document how you turn the feature on and off, what the metrics are for the feature, what are the known failure modes and how do you detect them. Basically make it observable, make it supportable, and ensure that people who take this next version of Kubernetes that has this feature don't have to roll back. That's the main goal. We don't want people to have to roll back their Kubernetes. And again, this is for you, not for us, because we are putting ourselves in your shoes and we are asking the question, hey, we turned on this feature, what metrics can I go look for that is new in this feature, for example? Right, exactly. How do I know the feature is working? Yeah. How do I know if people are using it in my cluster? If I'm a cluster administrator, how do I know my application developers that are using my cluster are using this feature? And you see the big QR code. So every year we do a survey, since we started this third survey, we do a survey to see if it's effective, right? Because this is a burden on our engineers and in the community. We want to make sure that we're not wasting their time. So those of you that operate clusters, please click on the link, go fill it out. It just asks some questions about have you had to roll back? How many clusters do you have? How many nodes do you have? Do you think Kubernetes is more reliable than it used to be? Yeah, we closed this at the end of the month, so I'll just do it today. Yeah. So if you go look at the CNCF landscape, I know it is a huge and big, and how did we get to that spot, right? That is because we were thinking about these things. We were saying, okay, we need to distribute responsibilities. So there is CSI, CRI, CRI. There's like, you name it, we have an extension point, we have webhooks, we have mutating webhooks, validating webhooks, we have all kinds of things where people can integrate. It was not the way at the beginning. Like if you go look at, you know, five years ago, Kubernetes versions, and check those things out, you won't see it that way, right? Like because everything was in a monorepo, everything, any vendor wanted to get anything in, everything was going right into the same bucket. But over a period of time, we did all these things so that people can do things outside of us without asking our permission, without needing to put things in our repository, for example, right? And again, so it's exploded the ecosystem. There are literally hundreds of projects out there that extend Kubernetes in one way or another, and that's a conscious decision. There are some bad outcomes to it. For example, the people who used to work on the code now go do something else, right? So we need to figure out how to attract them back too. And that is happening too, because there is plenty of times where what we have done is, we've done an extension, for example, scheduling, right? So a scheduler is extremely, you know, configurable. So people went to experiment outside and now they're coming back and say, hey, we did this experiment in like five projects. We are trying to do different things. Now we think that we are at a point where there are some things we need to add to Kubernetes code, then we can go around changing everybody else and change the usage of Kubernetes. Again, the examples, I was already mentioning that, you know, one example, funny example, CRD was an iterative thing. It wasn't CRD before, it was called a TPR, third-party resource. That was one more experiment that we ran and we found some resources and we were trying to figure out how people use it and then we said, incorporating all those feedback, then we designed the CRD, then all of you are using CRDs. So, go ahead. Sure. So, where are we going? Where's sort of the overall project architecture going? And a couple of years ago, we did this talk and it was like, well, maybe it was at the contributor summit and we said, no, and neither are we and we're not going to do it. And the reason we're not going to do it is because instead, we focus on building extension points. And by building extension points, we don't need to change, we don't need to go to a V2, which of course, if you're familiar with semantic versioning, a V2 means we break backward compatibility. We don't want to do that. So, we can continue, we're going to continue on that. There's a real effort around things like CRDs, and this would be an API machinery that that, but would make them as fully functional as the built-in APIs. We're working towards it, it takes a long time. So, how can you participate? We do this. We have mailing lists. We have Slack channels. All we need you to do is come help us. So, the things that we are going through, for example, the KEP, enhancement proposal, if you are interested in a feature that is going to come out in 125, we need your feedback. If you don't come and tell us how that is going to affect you, and whether it is going to work for you or not work for you, you will only see it when 125 comes out, and by that time, it's too late. And then you need to wait till 126. So, come talk to us. Like I said, we don't buy. So, speak up, offer your thoughts, help wherever you can. And typically what we say is come to the closest to what you're already doing. So, for example, if you are interested in new features for your products, then come to the enhancement process. If you are a maintainer who loves maintaining libraries and things like that, come help in co-organization, for example. So, there is a lot of work to do, too few hands, and we would like your help. Thank you. Thank you for the talk. I have a question about GA. So, as I understood, you folks actually like in charge of letting the feature happen in the first place, but then there's a life cycle, and then feature reaches GA. I've heard an opinion recently that if it reaches GA, it's there forever. And since then, it will not ever move from... It's not true. Docker Shim, gone. Docker Shim is one example. Can you give a... Oh, there's lots of examples. We are removing the PSPs, too. So, we've defined our life cycle for the PSPs to go out and it will be replaced with something else, for example. All right. It gives us hope that at some point, Ingress Resorts will go away and we'll proceed with that. Yeah, it is Gateway API. Gateway API is the new thing. We want you all to switch to the Gateway API. That's being defined, and people are working hard on it, and that's what we're going to go to. So, deprecation policies are sort of a purview, but we do really want to make... In an ideal world, it's been specifically APIs. Once they hit a GA v1, you don't want them to go away because people have built things on top of them and when you take something away, now, that person, that whole ecosystem built on top of that is either locked out or they are forced into a migration process. So, we do take very, very long time to do it, but we do it. Yes. We've also said we have an escape clause too in our documentation that says hey, we reserve the right to do this to make things better. So, yes. You mentioned there was a list of values and you said that it helps you navigate if there's an event of uncertainty or some conflict, you take the list of values and it helps you come in. Correct. I'll give you the exact example again with the PSPs. The people in the SIG said hey, SIG Auth, they said, hey, this is not working, let's just get rid of it. Let people use SIG Auth, and other ecosystem projects to do exactly the same thing and people can already do that. So, when all those things are there, why do we need to do it? That was the initial balloon that was floated from SIG Auth. Then SIG security came to us and said, not cool. We need something basic which will cover 80% of the cases and then maybe the 20% of the cases, the advance cases, we can use other things. So, that was the negotiation and that was because we raised a cap and there was follow-ups and then we had meetings and we figured out where we, again, goes back to meet the users halfway as a value. We said that is exactly where it kicks in. This is where my question is, among those values, are all of them are equal or some are more equal than the others? Is there a priority in the event of like... I think it's a more human process than that. I do want to say one thing there which is people in the room make the decision. Those are guidelines for the people in the room because the same people are not going to be in the room all the time. Those are guidelines for the people in the room to say, hey, please think about these things when you are making a decision. And the order of the priority of things might depend on the people in the room but it's a general guideline. Thank you. Any other questions? Come on, there has to be one at least. Mohammad. I have a simple question, right? So, Kubernetes v2, I'm told that's not a thing. But how do you align that with the entire removal of v1APIs, for example? It's not as a contributor. There's no contributor. Again, we have a place where we write down some things, hey, when we get a v1API, maybe we will do that kind of thing. But there is no proposal on the table for a v2API and there is no way v1API is going away ever. Well, let's say that a little differently. Each API can evolve in its own time, right? So every API group will have your group version kind. So your group version, you get a v2, we have some v2s, we have a lot of scaling as v2, we have a few v2s. But what we're talking about is Kubernetes v2, which we mean fundamentally changing the API infrastructure such that it works differently and it breaks everything. That's not going to happen. Instead, we've architected this in such a way that individual pieces of functionality can evolve over time or at least thought processes in mind for some people saying what is the core set of things in the Kubernetes core API that are not in GA that we need to get to GA before we can say hey, maybe we can do a LTS version now, right? So, and that is an even longer discussion but the idea is we are trying to push APIs to go GA so that then we can do an LTS or somebody can do an LTS even if we don't do an LTS. I think there's also another thought process or these are all just sort of vague ideas people have but there's a part of Kubernetes that is sort of container orchestration and there's a piece of Kubernetes that is the API machinery which is being used for many, many other things than container orchestration and so if you you can kind of parcel out potentially that gem into a smaller core of Kubernetes and then the other functionality is not necessarily unnecessary like namespaces are necessary, users are necessary there's a bunch of things that we can do but there are vendors that don't ship Kube Proxy because they do it differently so they don't need Kube Proxy Cillium for example so and there is projects like virtual kubelet they say hey this kubelet thing if we replace it with virtual kubelet we'll do another set of use case and scenarios there is a version of kubelet that is written in Rust that supports Wasm so that's another use case so there are a lot of things happening in the community where people are trying different things out and we are watching, listening learning from what they are trying to do and at some point we'll evolve things over a period of time we just want to do it in a fashion where we are not going to break you and you know there is guarantees around stability versions queue and things like that which we have baked in so that we are trying to do the right thing for you so it's easier on you thank you for the question any others yes how much time does it take you to take this role of your work life 100% how much do you put yourself into this architecture team role 100% personal question I think it varies we are co-chairs and we have one other co-chair Derek Carr so I think it varies it's not just the two of us we are just the chairs all of these sub-projects have other team members people working on it and it varies during the time of year I am one of the people that does production readiness reviews and there is only two or three other people to do that for the whole project so that means in the two or three weeks before enhancements freeze I am spending a lot of time he gets reviews John please this one there are other time periods where it's like I only spend a few hours that week it varies depending on the calendar time same for me like co-organization we try to do whatever we can early in the cycle like before alpha one or something like that we have milestones so we try to put as many of the changes so that it bakes in the CI systems so over a period of time we will watch and say hey something is wrong so you start making changes early and the more invasive the change the earlier you do it for example we are changing K log so we try to do it right early in the cycle so later in the cycle people are like okay it's smoothened out over time and nothing is bothering the CI systems and users will be able to use it fine okay I think we are out of time but thank you all thank you