 But today we want to talk about validation. So validation is something I think everybody used in the last 10 years, especially when you built things on Kubernetes and using CADs. And this is basically the topic, but not only CADs, but also native types at the end. Validation, so shift left, what is shift left? Cryptic term, we will see it in a second. And we'll talk about the past, so we will have basically a little history session, what we added over the 10 years, what we're doing at the moment and where we want to go to and why we do all of that. Like validation has been in Kubernetes since day one basically, but we are not finished and we will see why and what the next big topics are. So let's start in 2015. So this is like very much in the beginning of Kubernetes. And shortly later, CADs came up, but it was not there in 2015. So everybody knows that when Kubernetes was built, especially the kubectl tool, kubectl create, everybody knows that. When this was built, this was a model. Like you have a manifest file, some deployment general, and you say kubectl create and there was a mistake, right? You see now, I type secret names, secret names cannot be right. And the server basically noticed that and tells you, okay, there's a mistake, fix it. Super fast, super nice, and kind of a new user experience which was also common at the time. But this is 2015 and 2024 looks very, very different. So the reality is this, probably most people have the ICD in place. So we have this long pipeline. You have your developer notebook, you type, you write general manifests and you commit to get to some branch. You prove it, pull request and everything. Your GitHub actions start running and they check something, hopefully they find this mistake. But if they don't find it, Cosplane or AgroCD or some other GitHub's tool will deploy your code into production and the Kube API server will notice. Maybe. Will not notice if you mistyped the field because it's a typo, right? A typo is for the API server, it's not a mistake, right? It can be just that you have a new version of the manifest and the server's old. So intentionally, it might accept that even worse. So your change has no effect. But basically, we have no Kube cut anymore as a primary tool to apply manifests. So shifting left is basically, shifting this lightning there to the top. That's where you notice the mistake. Shift it left, very visually, shift it left. Feedback after 10 minutes plus is not acceptable. It's not an inner loop. That's like take two coffees and then you can see the error. That's not what we want. So we can build it into GitHub actions or any kind of CI tool like in the steps which are running in CI and find it there. Maybe you get down to a minute if you're good. But not 10 seconds, right? Impossible. So this is what you want. We want instant feedback again. So seconds. And that's the title of the talk or the reason why we have this talk and why we do all the work even 10 years later on Kubernetes to come back to that experience. So client-side validation has been a very long journey in Kubernetes. We first started looking at this in around 1.8 when we added dry run client to Kube Cuddle to try to restore that really tight feedback loop that we had with the server. But fortunately this has a really limited validation support. You can only really see that your YAML parses, that your fields types are the correct, like a string is still a string and it should be an int. Whether you misspelled a field or you left out an unconditionally required field. This also gives different error messages from the server and it's pretty much a non-starter to use for offline validation in CI because it needs to be connected to a running cluster or other open API endpoint. So the other alternative that we've seen people turn to is to dry run server against a staging cluster. This has the benefit of having much better error quality than the Kube Cuddle client-side validation. But there are a few drawbacks that make this also impractical to use in the CI system. The first could be that org policy or other things might prevent you from connecting your CI system to your staging cluster or it might be offline. Another is that this might require you to punch a security hole into your staging environment because you need to elevate your CI to have the same permissions as your CD system. You can't dry run the create or an update of a resource unless you have the permissions to do that actually. And lastly, there's a functionality gap and that it doesn't test against your current Git repository state. If you've updated your CRD and your Git repository, next time your CI runs, it's not gonna validate against that schema or if you've added new namespaces or new CRDs, your CI is gonna trip once you try to go and validate those. And I'd be remiss if I didn't mention that you shouldn't hook your CI up to prod depending on how heavily your API server is getting hit. This will limit the availability of other critical workloads and like the security applications for the staging environment, there's also, you might also have to give right access to your prod cluster to allow CI to dry run it. And if that wasn't enough for you, most of the audit logging in Kubernetes does not have the context to know if we're being dry run or it doesn't check. So your audit logs are gonna be filled with all the requests being made from dry running. And if you're ingesting those logs, that's going to cost you extra. So since those two options don't work, we've seen most people fall back on a temporary test cluster. This is the idea that you would spin up a local API server using something like kind or mini cube and just simply apply all your manifests and see what errors fall out. Unfortunately, this is a little heavy-handed to use just for validation. It needs a lot of compute power and RAM resources, especially if you have a lot of CRDs which we're seeing more with dependencies like cross-playing. And it's hard to get it right. You might see errors in your test environment that you don't see in your cluster in vice versa. For instance, on the cluster updates give different errors than creates. So you'd have to make sure that you're applying things in the same order as they would be on the staging cluster. And you'd also have to make sure that you configure your namespaces and your CRDs correctly before you try to apply them. And eventually this just gets really unwieldy and not worth the effort. Last reason why we don't think this is a complete solution for client-side validation is it doesn't scale to IDE and it's not portable. You can't use it in your controllers and it takes minutes rather than seconds and a lot of compute power. So we need something better. Yeah, so a little history. What have we done in the last 10 years basically to make this better? I mean everybody will think, okay, schema is open API, it's a spacer or well, but it's not that simple and we will see it in a second. So this is the first thing which many people might know. Like verifying, validating a pod spec, pod deployment. Anything which is hard-coded in Golang, in Kubernetes, in the source code. It's go, right? It's nothing a third party tool or your IDE can know about. But it's how Qube started and we will come back to that, how to fix that. But of course in the history, we didn't tackle that. Like, open API validation, but it's where we, like the types or the spec speed we export for pods, for example, it's incomplete. It doesn't say anything about which values can be in which field, for example, it doesn't say that. But in custom resource definitions from the very beginning basically, we had an open API schema, so there's open API v3 and here's an example, so it's a custom resource definition and you have to define which fields exist, what's the type. And there's some more things you can do, like minimal values, you can have regular expressions, these kind of things. So for one field, you can specify what you want. And this was a start and you can verify a lot of things, but of course there's much more than that. So here's the timeline of the last 10 years and you see where CIGs were introduced. In the whole talk, we don't talk about validation. Everybody when talking about more advanced validation, you think about that, people misuse that mission for that and we come back to that. So, but we don't talk about mission here. We want to really talk about validation. So maybe I, as I've shown, first v2 and luckily v1, v3, v3. So it's expressive as open API is, but it's a start. So you can actually ask, it's the people I saw about just going to the slash open API endpoint and you can complete that. This is like a multi megabyte document and you have everything for the moves. So it's lots of data, lots of JSON, pretty big and it gives you what you just saw with the fields, type and fields, regular expressions and minimal values. This exists and the tooling is a the ecosystem uses a sense of it. There was a big switch, some of you might remember that in 17, 18, somewhere, I want the time. We made this mandatory, like you have to write this was one big step. And there must be structure. There's a detail, I don't talk about that. But basically we did that and we added something more and we realized, open API is too weak, we need to walk. There are things you cannot expect. At least you can do it in a short, this type is one of them. I'm not sure, who knows this type? Not many, that's the reason. So we want to promote that. You can specify and you know the related types. If you have a code spec and you have containers in there every container has a name. So it's actually, it's not a list in an array, it's actually a method. Logically, it's a method from name to container as a solution. And you can specify that you can see these and there's this excavated extension and you can define it for every list. There are maps, there's something called atomic. If you don't really care, I mean it's just one value, it's a list but it's really considered as a very good form. Bits measure. Anyway, so you can find that, you can find the names and if you do that and you apply your document name, bot spec, you will get an error you would expect. You cannot name containers with the same name. Potspecs look a lot like an object, right? Labels, annotations, if you have a requirement, the bot spec defines labels. So it's nearly an object, it's inside that extension. So you can say the substructure in a CLE is actually nearly an object, it's an embedded object. And if you do that, basically everything which looks like a metadata substructure because it feels like an object, they will be very difficult to do the metadata. So automatic, you specify the name, the label, what kind of cases are, it's automatic. All right, and if you do that, like deeply embedded in your type, say it's a stroller or something, it should be a label and of course it's not a label, there is a certain format you can set it off which is sort of a CLE. All right, so we've talked a lot about some of the power that we've been adding to the schemas using X Kubernetes extensions. They do a lot of powerful things but it's really specific use cases. So what about everything else? For everything else, either CRD authors don't include the validation anywhere in the schema or they'll just put it inside of a validating admission webhook. This is problematic not only because validating, or not, yeah, not only because admission webhooks cause problems in production, but because we can't validate them client side. There's no information in the schema or anywhere for us to validate this. So this is not a long-term solution. Sorry. In response, we've added cell or common expression language to Kubernetes. You can put this into your CRD schemas. It's a portable, type safe and interpretable language that has a familiar syntax to anybody who's ever used a programming language before. It can express complicated conditional logic, you can call functions, and you can also even have different rules for creates and updates. So let's take a look at how cell works in practice. So here we have an example CRD schema with a spec property which has three sub-properties. I have replicas, min replicas and max replicas. They're all non-zero integers. And we can refer to all of these fields within an X Kubernetes validation that is attached to the spec. So when we add an X Kubernetes validation to spec, we're able to refer to the spec as self and access any of its sub-properties from the schema and relate them to each other. So in this example, we have replicas being enforced to constraint that it must be between min replicas and max replicas. We can also have a custom error message to give the user more context and specify a custom field path to attribute the error to if it makes more sense to give it to a sub-field rather than the root spec object. Another powerful use case of cell is immutability. So here we are specifying an update only cell validation called the transition rule. You have access to old self using cell so you can simply specify that old self is equal to self. And Kubernetes knows that since you mentioned old self, you should only apply this rule on updates and it won't allow you to change the value of replicas. So you see in this example on the right, we're successfully creating this resource with a single replica. And then immediately when we try to change replicas to two, we get the error message that we had placed in our schema. So cell is now available GA in Kubernetes 129. New release CRDs are actively making use of cell such as Gateway API and over time as Kubernetes 129 leaves the version skew of other projects, we anticipate that they will begin to also adopt cell. And so we're excited about seeing a lot of the validation logic that's currently stored in web hooks and inaccessible to our local validation systems and NCI. We're excited they're finally gonna be available in the schemas. So the cell standard library is pretty extensive and we're constantly finding new things to do with it. You have date and time manipulation. You can create strings from formats from properties inside of your object and then construct a custom regular expression and parse that object that you created. You can do lists, comprehensions and aggregations, can parse and manipulate URLs, even interpret Kubernetes quantities like you can add 10 megabytes to two kilobytes and check things about that. And we've also last release added optionals to cell so that it's more expressive because we've received some feedback that was a bit verbose. And this list is always expanding as we're finding more and more validations that are inside web hooks, but not in schemas. So the most recent feature that we added to Kubernetes to improve the power and expressibility of CRDs and make schemas more valuable is automated validation ratcheting. So let's say you have a CRD, it has two properties, a replicas field and an IP field. Replicas is a number and IP is a string. You didn't specify any validations on this field so that your users can go ahead and say they have two and a half replicas and specify an IP with five components. Kubernetes will happily accept this. So the CRD author sees the users are running into problems with invalid schemas and naturally they move to update their schema to have a stricter requirement. Now replicas must be an integer that's at least zero and IP must conform to an IPv4 format. And as expected, if you were to apply a new object into your Kubernetes cluster for the schema, the cluster would respond with an error. But what if we had existing resources already created and updated, already created and we updated to a new schema? To be frank, you're gonna have a bad time. You can't edit this resource, can't add annotations, touch the labels. Neither can the Kubernetes system, can't remove finalizers. And so your resources are effectively immutable and undelatable until you manually go in and you fix this resource. So CRD authors who knew about this have avoided for a long time strengthening the validations of their schemas. And those who didn't maybe just broke the user's workloads. But now that we have this feature we can maybe start to move some of those validations out of webhooks and into schemas. So let's take a look at how this actually works in practice. We look at each validation individually and only enforce it if the value that it pertains to in your object had changed. So in this example, the value in the green box is only used if the green value's had it changed. The minimum validation on a replicas only cares about the replicas value. So unless replicas changes, you're not gonna get an error. You can modify annotations, labels. CRD validation ratcheting is now in beta in the latest Kubernetes release that's being cut right now. It's enabled by default. So we're excited to see that we're hoping that existing CRDs can make use of it very soon once it leaves their version skew. And they don't need to hold back on strengthening their validations in the future. So what's next? So we've already mentioned how schemas haven't been powerful enough to express our native Go validations for Kubernetes types. So on the screen, I have an example of persistent volume specs access mode field, the validations that Kubernetes has handwritten in Go. And you can see there's two validations here. One is simple and one is more complicated. The simple one, we simply check to see that the access modes has at least one element and if not, we'll throw a required error. The second one, more complicated, we loop through your list of access modes and we check to see if you use read write once pod and if you use it with any other access mode, we'll throw a forbidden error. So up until now, this hasn't been possible to express in open API or declaratively in schemas, but now with some of the new features we've been adding like cell, we're able to start annotating the native Kubernetes Go types with cell to encode their validations into the schema. So you can see on here, we're now annotating required and min items, but also we have the cell rule, either read write once pod is not in the list or it's the only item in your list. And we can see that uses the exact same error message and error type forbidden as Kubernetes would have. So after our build process will then ingest this Go type definition and emit the open API schema that will be published on the endpoint as Stefan mentioned earlier that can be consumed by tools. And we can see that it now includes all the validations we saw in our Go type and can be consumed on the client side. We have the X Kubernetes validations and mid items ready to go. So declarative validation for native types has been in the works behind the scenes to enable the possibility for a while. We're hopeful that we can start to publish the annotations for a few native types starting in Kubernetes 131 behind a feature flag, but once that's possible, we won't really need to enable an Apple feature gate to make use of these validations on client side. And we also hope to eventually possibly remove some of the handwritten Go validations in the future and use these also on the server side. So we've talked about how we're making schemas more useful for validation. Let's check out some tools. Early tools to solve this problem use standard open source open API of validation libraries, most popular example, I'm aware of is Kube conform. There was also KubeVal and there's some other JSON schema tools. Because these use standard open source open API libraries from the community, they give different error messages than Kubernetes does. And they also don't support any of these new feet, these extended open API features that we've been adding to Kubernetes. And most importantly, probably being sell is that's going to be used to a great extent in the future. These tools also had the benefit of trying to supplement the schemas with handwritten handwritten schemas for native types. But oftentimes open API is simply just not expressive enough and they're either wrong or incomplete. So response to this gap in functionality of sell and other things we mentioned. Last year, we released the first alpha of KubeCuttle Validate. It's a first party tool to validate your Kubernetes resources. And we released it last year. It basically just takes the same code that was in the API server and publishes it as an easy to use command line tool and a library. So as a result, it has all the same error messages, all the same Kubernetes extension support. It also embeds the native schemas from Kubernetes with the caveat that they're still not good yet. We're working on it. And you can use the CRD schemas directly from your repo exactly how Kubernetes would process them. We're still working on adding awareness to the old object so you can enforce your custom sell immutability rules or transition rules and also ratcheting so the tool doesn't throw errors where the cluster would not have. And also, we're hopeful to add object reference resolution and intelligence. So this is an example GitHub configuration for kubectl validate or for a GitHub action. Another CI system would probably use something very similar. It's pretty simple. Just get your manifests, install go, get the tool, validate. A lot of times, a lot of people also use templating to define their Kubernetes resources in CI. So you probably would also have to maybe add another step in there to render your tool, your manifest before sending them into the tool. Lots of things have happened in the last eight to 10 years. Schemas are obviously in the center of everything. We are still not finished. Like we want to bring back this kubectl create experience. Seconds until you see errors and improve velocity and make everything smoother developing manifests, writing YAML. One big motivation here also is to get rid of evidence. We ended that books years ago and everybody who runs it in production knows the pain, right? Also scores down, suddenly resources don't validate or the word may seem like that. So one reason is also using cell to get rid of evidence makes CI more expressive to enable that. Similarly, there is the validating emission policy. So we didn't talk about emission, but there is this policy thing, another object, and CI is going in there, CI is the basis of that and CI is getting used every time when it is for the same reasons basically. And native types as described, working progress, we will also get some of the schemes. Kubectl Valde is under active development. We're working towards full support for Kubernetes schemas, including cell, including all the native type schemas that are being published and it will track Kubernetes as it develops. You can try it out with just a simple standard go install command. And if you'd like to contribute or help out or learn more, you can join us in the Kubernetes Slack. We have a Kubectl Valde channel. Just to thank you, of course, if we say we, it's not just the two of us, it's the whole thing of the machinery which works on that. So many people who do this to everybody and yeah, this is a motivation. I mean, we talked about that, this is why we want to improve schemas in some feedback, getting back where we have been in this operation. Thank you.