 Hi, everyone. Thank you for being here today. And let's just get the talk started. First, a little bit about myself. My name is CeCe Huang, and I'm currently working at Google as a software engineer. And I have been contributing directly to Kubernetes upstream for a couple of years and across multiple SIGs. For those who are not familiar with Kubernetes upstream community, SIG is short for Special Interest Group. And I started at the C-Club Rider and won the Contributor Award there. And then I released Kubernetes 127 as the release manager, which is the previous release. And I'm also a contributor in SIG APM machinery with the focus on the extensible features. And recently, I'm leading the self-related work there and which we are going to talk about a little bit in our next slides, maybe. So the topic is declarative everything and what's going to be covered today. So we'll first begin with the declarative nature of Kubernetes and then talk about the notable missing pieces inside declarative APIs and talk about all the improvements we did and plan to do, and then the future plan at the end. So let's start with the declarative nature of Kubernetes. But first, let's begin with the very basic concept of declarative versus imperative. I know most of you might already familiar with the concept, but in short words, declarative is when you see what you want, while on imperative is when you see how to get what you want. So let's take a look at an example of a Costco sampling station. I just assume everyone loves their free food there. So see if you'd love to make sure there are always six samples on the tree. And for doing it imperatively, we maybe periodically check the number of the samples on the tree. And if the number doesn't match what we want, we make adjustments. For example, if it's empty, we just make six samples and put them on the tree. If it's more, we just take off the extra. While the declarative way would be to state the desired state, which is having six random samples on the tree. And you can rely on the system to do the right things for you. And you don't need to worry about how the system ensures the desired state to maintain throughout time. So how it can achieve in Kubernetes. You just specify a desired state with a configuration file like this. Everyone familiar with the arm. And after submitted to Kubernetes, Kubernetes would perform all the sample monitoring and adjustment for you in a control loop, which we also call it a reconciliation loop. So what's the benefit of this? The biggest thing is that Kubernetes declarative API enforced the separation of responsibilities. So you don't need to spend effort to develop and maintain the implementation. And it's much more intuitive to understand what the end goal is. And it's much easier to change the end goal if you really want. And as everyone know, all the Kubernetes magic is done through declarative APIs. So from things like resource allocation to the basic stuff like resource object creation and the authorization. And we even have a declarative API, which is to define new declarative APIs, the customer resource definition. Before we go any further, I'd love to set the stage. So first, I will not get into the backstory of why we went declarative, like the decision made roughly 10 years ago. There is a fantastic resource I've linked here from the Kubernetes early stage and authored by Brian Grant for those curious about the history. And second, we won't be going deep into the inner workings of control loop or all the details about how we do the declarative management. And last, I'm not here to argue about the way over the other. So either imperative or declarative Kubernetes offers ability for both and for different use cases. And they always continue to do so. So what are we going to talk about today? So over time, there has been a lot of effort to make declarative APIs more powerful, not to mention that the versionings, the resources, the structure schemas, the defaulting, the server-side apply, the open API history, and all those stuff. But there are still a bunch of things which we are not able to do them well in declarative API just yet. And today, we're going to talk about one of the notable missing pieces, the data validation. So everyone know data validation is critical and you just need data validation when there are constraints on your data and it almost do constraints everywhere. So however today, from the declarative APIs, you may easily get information like, say for example, your resource might have to have a name and the name is supposed to be a string. But that's it. It's less obviously to find the constraints behind the fields. So for example, here, even for the constraints, constraints name itself, Kubernetes have so many different constraints applied for different names. And if you are defining your own customer resources, there might be further. And also, type string was widely used for all the other fields and each of them may have like different constraints associated. And it will get even more complicated when you define your own customer resource. And we all know that validating is critical. If we don't do validating or don't do it properly, things will break in a hard to reason way. And future debugging would be much more difficult. So here is an example of the current OP API specification which gives you exactly what I mentioned like you're supposed to have a name and the name is supposed to be a string and that's it. People may argue that, okay, I can write like a very detailed description there and the user is supposed to like read all three of them and follow them. But everyone knows it's not what's happening in the real world. So let's begin with, let's talk about how the improvement we did in this area and how this gonna benefit for all the users. And let's begin with the one people cares the most, customer resource definition. And there was quite some time that we only have like a very limited support for specify constraints declaratively. So for example, we have seridistructural schemas, we have OP API V3 validations, so on so forth as an example here. So we are declaring the type and value here for the IP addresses and we can use some of the, we can put some of the constraints in a very declaratively, but that's it. So what if you have like more advanced constraints? Like what if involves check of another field? What if you want to only apply certain constraints when the see the type is set to a specific value? So for quite some time, all the things which are not covered here, we have to use a thing called admission webhook. Any of my here have ever developed their own webhook? Wow, not many, any of them are familiar with the concept of the webhook? Cool, so the webhook is basically another very powerful extension point, a Kubernetes upstream offering. So it will happen like after the data being authorized and authenticated and before the data was saved in the specific storage. But it's a separate like a binary and component added into your system. So introducing a production grade webhook is not only a substantial development work but also increase the operational complexity dramatically. I'll explain why, but I'll explain it briefly today but you can get the idea as a separate component added into your system. You just have to wherever you were trying to introduce a webhook, you just have to think carefully on things like how to package it, how to release it, how to integrate with your existing monitoring or learning system and how to upgrade it or roll it back if needed. What about the latency added, how to scale it and to make it even worse, webhook is very easy to be misconfigured. So an example I always like to give is the failure policy you have to set in your webhook. So basically webhook allow you to set either fill open or fill close mode by setting the policies inside your configuration. So, sorry, if you go with fill open, so this basically means if the webhook fills either the binary stop serving or there is an error happened and you're just gonna let the request through anyway. And if your webhook is doing some kind of security check that clearly will be a problem. But on the other hand, if you choose to fill close which means if the webhook has a problem, all the requests that are routed to the webhook will be rejected. So if your webhook matches to all the polls or all the deployments, so you basically lose your controlling availability for that. So that's why still now webhook is still remain the leading cause of the controlling outage. Even people learned over time like to be more cautious to configure it but it's still having a lot of issues. And for a while, webhook is the only solution to the functionalities we want. And as everyone know, like whenever you bring up your CRD, especially like the customer resource is the one you define. So you have to be the one who responsible to validate it. And webhook is widely used for that purpose but it's causing all the trouble we just mentioned. And we began to as a community maintainer, we began to think about how we could do here to help things better. So after research, we found that the vast majority of the use cases people want to do with validation are really simple. So they want to make sure the field is immutable or they want to do some basic crossfield check or they want to apply a specific format to a field. So then the question becomes can we use something simpler? And I'm really happy and excited to announce that all of that happened through a very magic extension field we added into CRD called x-communities-validations. The feature called CRD validation use and I linked the documentation here. And I'm happy to announce that like we just graduated to stable in the current community 129 release. Of course, if nobody reverted before the former release date, roughly a month, yeah. So what they did is they'll leverage a power of a common expression language, which allowed you to write a very powerful expressions in your CRD to do your data validation. You don't need to know much about cell, it's just a tool we used and after you see the examples following you can easily guess what they are doing and you might not even need to look at the documentations like for the basic validations you're trying to write. But for cell, all you need to know is it's an open source project and we work really close with the maintenance there. It's designed to be simple and efficient and it's a type language. So we will do proper type checking for you and we have already successfully embedded cell into Kubernetes. So that's great. And the cell got pretty solid adoption in the community ecosystem and other cloud provider offerings. So I will not spend much time there, but just the basic stuff. All the magic was done through one single extension field called x-community validations and it can put anywhere under open PIV3 schema if you're familiar with CRD. That gives you a lot of flexibility and you can just start writing expressions there. In this example, we want to make sure that the replica we said is always smaller than the max replica being said. So you can use self, which is a cell variable to provide you access to the values scoped to current schema. And as I mentioned earlier, we have a really nice type checking which caught the error way ahead of time. So if you have like mistyped a field or anything like that and it will be caught while you are creating or updating your CRD. And of course, we have another cell variable called old self which allowed it to be used for, so we call it a transition rule, but it's mainly just to force the immutability. So in this example, we just try to enforce that the field called full is immutable through a very simple cell expression. And my colleague, Alexander Zaleski, wrote a very nice blog on the use cases on the immutability, so please feel free to take a look if you're interested. And here is a real world example I just borrowed from Gateway API. So as it's showing here previously, we have all those basic kind of validation done through OpenPI schema for type and values. But as an example, I just explained earlier what if we want to apply a specific format for a specific like a value if it's been set. Now for those kind of advanced validation, it's all possible with the CRD validation rules I just mentioned earlier. So in this case, we want to apply a specific format when the type is set to host name. And I will stop here, but I do have another talk earlier this year in another cube count to talk about all the details and all some features offered here. So please feel free to take a look if you're interested. And they have great documentation on that. And first to mention that, thanks to Mathos Maurice who went ahead and built something called Cell Playground which comes with a lot of examples and allow you to play with the cell. So it's easy to just play with it, make sure all the expressions you wrote is right and valid before you actually put it into production. And another exciting thing, news I wanted to share is about the adoption. So it's from the Gateway API team which just released the GA version. And by using the feature we offered here, they successfully replaced more than 99% of the validation webhook they were used before which like a really big relief for them. And thanks to Nick Young and Rob Scott who shared the slides with us. And thanks for everyone who worked on Gateway API to make the adoption working. So now, wait a second. If we take a step back and look at the CRDs, then that means CRD now is even more declarative than native types because as many of you may already know that native types doesn't offer those many formations. So what is validation done in native types? It's mostly the hard-coded validation logic. You have no idea how many lines of validations just for API sitting inside the Kubernetes upstream repo. It's like tens of thousands lines of code only for API field validation. It's a lot of maintenance effort and also it makes it very expensive to add validation later because we're humans and we make mistakes. But as a maintainer who like maintain the project like Kubernetes, which is such a fundamental project has a whole ecosystem like built on top of it. It's we really don't want to break anything like. So the first principle would be like we hope that things which previous work will keep working. So which makes it very expensive to add any missing validations later. And also it might cause inconsistent error message in different ways because it's all human writing code. So it might have like a different error message. And also because servers use a ghost drug and the kubectl uses open API schema. So there might be inconsistency there as well. So what are we going to do here? There is a enhancement recently called Declare Validation Rules for Declarative Validation for Native Types. Thanks to my colleague Joe Bass who initiated the effort. And thanks for Alexander Noleski who is leading this effort now. So the main idea behind this is catch the native type with what we did, what I just mentioned in CRD. And so later like you will be able to use IDR tags to declare validation rules for native types. So an example would be here. There is some IDR tags people might already familiar with like a minimum maximum. But the field we just mentioned earlier for CRD will also be adopted for native types which allowed us to write more complicated rules, validation constraints like here like the cross field check or something. And I have linked the cap link here. So if you're interested to play take a look. This enhancement is going to alpha in the current community 129 release but there is no notable changes for users yet. So we're looking forward for the future work. But the end goal is we plan to support all those IDR tags for native types as well. And all those information will be published to OpenATL together. And you can see X-community, stash validation. We're using the exactly same OpenATL validation field for all the complicated and advanced constraints. So what those will bring us? It will benefit for both community maintainers and community users. As an API reviewer who recently trying to help with API review, you have no idea how complicated or how many code sitting there only for the API validation. And after this, it will be easy to develop, maintain and review APIs for sure. And it will also will enable the improvements to the API machinery. For example, if we have future enhancement for the validation parts instead of writing code across maybe 15K lines of handwriting validation logic, you can just embed it, implement it into the declarative validation subsystem. And people may ask why I should care if I don't contribute to community upstream? Here is the reason. It will allow the direct access to extra API validation rules instead of only checking for descriptions. The end user can now very visibly to see that all the constraints behind the API. And also this effort will make OpenAPI more valuable for sure. It will for sure tell us more about what's expected. And this effort is also gonna allow the shift left validation, slice command line to validation to be possible rely on the information provided by OpenAPI. And in the link, there is already a initiative effort done by Alex who is proposing to have like a command line validation tools for that. And also it will great help the API deposition. So for example, there is a very common use case that people would want to embed native types into your CRD. And earlier you just have to like basically take responsibility for all the validations and in a separate part. But now after all the constraints are banded together within the unpublished in OpenAPI, it becomes very easier to like have like the embedded native types inside CRD. And it's also will gain the validation of native types automatically. So now we have talked about specifying constraints for CRDs in a declarative way. We have talked about specifying the constraints for native types in a declarative way. But you may wonder, like if I declare the constraints within the type definition, then the constraints will be forced to like wherever I use that type. But what if it's not what I need? So what if I have some kind of constraints which is only specified for my own cluster? And for example, for like a security reasons, I only want to say allow image being pulled from my cooperation like registry, or I only allow authorized user to perform a 13 operation. So in another word, policy enforcement as people is more familiar with. It's used generically on security, compliance, governance or configuration management. And previously like to be able to do it, we only have very limited support together with the community core upstream and people have to use some third party like policy engine. For example, OPA gatekeeper or Kavanaugh to achieve their goals. Or you just have to write your own webhook for that. So what we did there is we offered a feature called the validating animation policy. I'm happy to share that we just graduated this feature to beta in Kubernetes 128, which is the last release. And we are working on promoting it to stable in very soon. So what they did is I will not explain everything is maybe too much information, but we introduced some new Kubernetes resources called validating animation policy and validating animation policy binding. And the main reason behind having two resources here is that we would love to offer the possibility to properly separate the responsibilities of writing policies and the policy enforcement work here. So in this case, the policy would be reusable and sufficient configurable to be able to support more than just one company or one clusters. And also like cluster and main who probably will be the one who enforce all those policies. Sorry. So we'd have enough flexibility to configure the policy based on the end goal of their own organization. So here is the new resource called validating animation policy and it's very, if you are familiar with webhook, it might be easy to pick it up. We have like the match conference which defined which resources this policy applied to and the way it works is very similar as the way current webhook works, but we have some more fantastic features here as well. And we have like a field called validations which you can just start write cell rules to explain what the policy does. So in this example, we just do very simple validations. We want the policy applied to all the creation and update deployment requests and we want to make sure that the replica number set is always smaller than a number which were referred to a parameter resource. You can use basically anything for your parameter resource which gives flexibility to cloud and main to like configure the policy. So then cloudster and main can go ahead and create something called that information policy binding which bind the policy to your own cluster. And in this case, the policy will start to be enforced and the parameterization will allow the cluster mainly to bind like multiple policies in different ways. So for example here, we might want to enforce the replica to be smaller than three in a test namespace but for your production namespace that might be a different number. And there are so many awesome stuff going on in this feature but unfortunately I don't have enough time to talk about it. I talked it a little bit together with the best practice in my earlier talk and hopefully the documentation we provided will like sufficient answer like all the possible questions. And yeah, so also another thing worse to mention that if you only want to enforce something very simple which you don't need parameterization at all you can just remove the parameter resource and only use those two like new resources to achieve your goal. And another thing worse to mention is that the whole ecosystem has been aware of the effort we did and the main policy engine such as Opaket Keeper or Kavana already adopted the feature and yeah, so that's it. Feel free to talk about more details like afterwards I'm happy to share. And then let's talk about the future plan and the key takeaways. So the way I view this is like we have like a bunch of use cases we pretty familiar with like deployments, jobs, Arbex and stuff which are supported by declarative APIs. And we have some other like emerging use cases like we talk about today the data validation both for native types, facilities for advanced policy purpose which are not supported by declarative APIs earlier. And we have some further for example the mutating cases in policy enforcement area. What about the CRD version conversion and some other use cases which are not covered by declarative APIs. And now cell will give us a power to expand the power of declarative APIs so that a lot of use cases which were not covered before will be covered now. And we are as I mentioned earlier the mutating case and the CRD conversion is also critical since CRD remains one of the most important extension point in Kubernetes offerings. So our next step will be expand the declarative API power to include those cases as well. And there is our cell all those work I mentioned has been done under the CKPM machinery. Here is the contact information and they also have a specific working group called cell working group and have like the separate select check on the mailing list. So please feel free to talk to us if you feel that's usable. Thank you so much. And I'm open for questions. And I guess we have Mac on each side. So if whoever have questions feel free to speak up. Thanks for nice work. After validation to accomplish your declarative goal are you gonna do remediation? I saw your drawing board as a future work. Oh yeah. So mutating would be our next target for sure because there are a lot of use cases especially among the policy enforcement area where people just wanted to make sure they apply or like see the labels in their resources or whatsoever. So that will be our next step. So can I say to make declarative API a reality there are still gaps, right? Mutation is one of them. Yeah. Okay. Yeah. Can you elaborate any other gaps? For example, like a controller. So even though you like we allowed the power to declare your like new declarative APIs through customer resource definition but you still have probably to write your own, I don't know, imperative controllers which told them how to do. So I will keep the door open to future possibilities to also say supported use cases like that in a more declarative way. Yeah. Thanks. You're welcome. I got a question on sort of validating the cell that you've written. Like in the replicas field that you had that typo. Like what are sort of tools that you can use for catching errors like that? I mean like you said that it would've gotten caught and it seems like there are tools like that but how can we validate that actually we're writing, you're adding code essentially to the game files. Like how do you validate that code? Oh, that's a great question. So as I mentioned, cell is a typed language. So we probably do the type checking for you in all of the features I mentioned. So for example, in the CRD if the good thing about the type checking is that we do it in a really early stage of the life cycle. So whenever you have like a C type checking error there so you misspiled a field name for example, it will be caught immediately when you try to apply your customer resource definition. So and also you can use either like the cell playground I shared to pre-verify or you can make your own like cell like tools to pre-verify before you really apply your CRD. And we do have some cell linker tools available but currently it's lack of maintenance. So feel free to make your own or come to contribute. Thanks. Yeah, thank you. Yeah, hi. The current validating admissions webhook ecosystem is pretty large. Basically every project has one multiple tons. Do you see this less like significantly lessening the amount of charge that are shipping with full validating webhooks or is it more of a different? Yeah, we agree that's a huge area out there. And as I mentioned, like we do see a lot of troubles like webhook was bring to us like not only the things I mentioned but also like it's a lot more complicated configuration and it's a separated component. It's a separate binary. So it involves a lot of extra effort there. So what we did here is we were trying to really focus on the Kubernetes upstream offerings try to think what they should do to make this extension point better. So after the validating admission policy were widely used it can also be used by among the men like policy like engines like I mentioned earlier or like among other kind of third party policy engine tools so that they could leverage the power of this to kind of reduce the trouble previously made by webhook. I hope that answered. Yeah, that does. I have one more if no one else has one. And I might have missed this earlier. Are there any plans to extend this to mutating policies? Oh, yes. Our next plan would be like expand the power to mutating use cases for sure. And hopefully we could get something like wrapped up for the next release. Yeah, they're working on the design and because mutating obviously are more way more complicated than validating cases. And we discussed earlier inside the CIG APM machinery. So all the key maintainers there agree that we should wait until we finalized the validation cases before we move too soon to mutating cases. Yeah. That seems like a pretty huge deal for versioning primarily. If you have minor bumps within your versions you no longer need any webhook at all. That's pretty cool. Yeah, hopefully that's our end goal. Thank you. To save people from the webhook nightmare. Oh, sorry. Thank you so much for being here.