 Hello, I'm Kevin and this talk is going to be about building Kubernetes operators in Rust. And in particular, it's going to be about a crate that I've been working on, which allows you to define the object life cycles using state machines. First, a little bit about myself. I've been a Rust developer for four years. I've been working with Kubernetes for one year. I'm a maintainer on the Crustlet project, which we'll talk about a little bit later. And I'm the lead developer on the Crater crate, which is the focus of this talk. In this talk, we're going to discuss Crater's unique API for developing operators. I'm going to demo an operator that was built using Crater, and then I'm going to discuss the upcoming features on Crater's roadmap. First, a little bit of background on Crustlet. Crustlet stands for Kubernetes Rust Kubelet. And the goal of the project is to re-implement Kubelet using Rust. And the focus of the project is to allow for the development of Kublets which support alternative and experimental types of workloads. So the project includes two Kubelets, one using the WASI runtime and one using the WASC runtime. And this allows you to deploy WASM applications using Kubernetes Kubelet. So how did I get involved with developing operators? Well, last fall, we wanted to improve the API that was used to define the lifecycle of pods within the Crustlet-based Kubelets. And a Kubelet is essentially a controller for managing pods. And so we developed this state machine API, and I presented it both in a blog post and at KubeCon North America last fall. And we got excellent feedback from the community and there was a huge desire to split out the state machine API to support arbitrary controllers and operators. So that's how Crater was started. I split that out in December and January and then Crater was officially launched in a blog post on February 1st. And Crater stands for Kubernetes Rust State Machine Operator. Next, a little bit of prior art. There's lots of frameworks out there for developing operators. The biggest one is the operator SDK which is a CNCF project. It allows developing operators in Go, Ansible and Helm. The Go variant of this operator SDK uses Go's controller runtime as well as KubeBuilder and it pulls in the Kubernetes types from the main Kubernetes source code. And it offers a lot of plumbing and templating around deploying operators. The Helm variant has a limited subset of functionality but it allows you to get deploying your operator very quickly by simply supplying a Helm chart. And then the custom resource definition spec contains Helm chart values which are populated and then applied and then the operator ensures that the Helm chart stays consistent with cluster state. Some other languages such as Python and Ruby, et cetera all have controller frameworks which tend to sit on top of the main Kubernetes client library for that language. These offer limited functionality for controllers and Rust includes this as well. The Kube and Kube runtime crates are both very excellent. Both Crater and Kresslet rely on them under the hood but Kube runtime provides a controller API for Rust. Some background on the basic frameworks that I mentioned above, they typically just watch for resources of a specific type. You have to sort of explicitly define that watching logic. And then you have methods that you populate for create, update and delete events for your managed resource. And these methods can end up being quite large and monolithic and it doesn't provide a great structure for you to define your operator's logic. So more sophisticated frameworks like the Go operator SDK do a lot of code generation for you. They wrap the logic of your controller with a lot of boilerplate. That is required to build good and reliable controllers. And they offer a very sophisticated API for that. And they have extended features such as the ability to easily add validating or mutating admissions web hooks and other extended features which we'll discuss later. So some common features in all of these frameworks is that there tends to be kind of a single method that you're intended to implement in which you determine the difference between the desired state of the application that's defined in the CRD and the existing state of the cluster or whatever resources your operator is managing. And these functions tend to be very complex and you're kind of left to your own devices to split up that logic into a reasonable and maintainable piece of code. So the state machine API was an attempt to increase the reliability of implementing this kind of application logic and break up these monolithic functions. So the general approach is that for each sort of area of concern that you might have in your application's logic. So an example would be pulling images for pods or mounting volumes. These would be separate areas of concern. You would define a node in your state machine graph and you can define an arbitrary state machine graph for your application. These nodes are intended to be infallible and if they do encounter an error, they should transition to an explicit error node in the state machine graph which allows better error handling and retrying and things like that. And then the state handlers I use, the term I use is handlers and that's the actual body of code that's executed within each node in the state machine graph. So this offers some really great benefits. The first is that the Rust compiler can enforce a lot of nice safety guarantees on the behavior of your operator. So you can ensure that the state of your object only ever transitions between states that you've considered and it only transitions to well-defined states. And by using infallible state handlers, we're able to sort of avoid just completely dropping the management of objects and better encode the retry logic that you may wanna use with your application. And then some other things that you get out of the box. Crater is able to automatically update the Kubernetes API to reflect the state that you're currently in. It provides the latest version of the object at all times to your state handler. And finally in Crestlet, we're able to supply a number of common states that you may wish to implement if you're implementing a Kubelet specifically. So digging a little bit deeper into the Crater API, there's really four major traits that you need to be aware of. The operator trait, you're intended to implement a type which implements operator and this is created as a singleton. And this has some associated types on it, namely manifest initial state and deleted state. And we'll discuss the roles of these types a little bit later in the demo. Next is the state trait. This is to be implemented for each state in your state machine graph. So you'll implement this a number of times. And this has methods for defining the status that should be reported to the Kubernetes API when entering this state. And then of course the state handler is the next function which returns a indication of which state should be transitioned to or whether the state machine should exit. The remaining two traits are used to define types that will be provided to your state machine, your state handler. So there's object status. That defines the schema of the status that's reported to the Kubernetes API. And then there's object state which defines the state that is kept, the data that's kept associated with a single state machine or a single object. And then there's an associated type shared state which represents the data that is shared across all state machines or all objects that your operator is managing. And next I'd like to walk through a sort of canonical demo that I developed for the original Crater blog post and utilizes a lot of the features of Crater's API. So if you'd like to try this demo at home, it is all of the code is located in the examples directory within the Crater subcrate within the main crosslet repository. So if I go into this examples directory you can see moose.rs, that's the implementation of the operator. And then assets has a number of useful scripts and manifests for playing around with the operator. So just quickly going through the code for moose.rs, I use the kube crates custom resource derive macro to specify the type for my custom resource. And this is great because I just have to specify the contents of the spec field and it derives the rest of the type. But one thing that I'd like to call out is that Crater requires that the status field is used. That's optional on custom resources but it's required by Crater. So I name a status type that I've defined below. This status type gets the first trait implementation from Crater. There's a method for creating a failed status in the event that Crater has an issue within its own runtime and needs to report that the object failed to Kubernetes. And then there's also a method where you should take any information that's captured in your status and produce a JSON patch that will be sent to the Kubernetes API. Next, there's moose state. Now this is the data that is shared amongst state handlers for a specific object state machine. So it is not shared between objects of a given resource type. There's a trait implementation for this object which specifies the custom resource type that we defined above, the status type for that resource and then there is a type for sharing state between objects which is defined below. And then finally there's an async drop handler if you need to run asynchronous code when the object is deregistered. Looking at the first state implementation, this is the registration state where a moose is first created. You have to implement the state trait for each of your states and here there's a method called status which simply produces the status type that we defined above for that given state. This is called when the state is entered and should report the status to the API, Kubernetes API for this state. The other method here is the next method. I also refer to this as the state handler and this is the body of code that's executed for a given state. It has access to an arc RW lock to any shared state. It has access to the state for this specific object which it has a mutable reference to because that is owned by this object. And then it has access to this manifest type and this manifest type implements stream so you can watch for changes to the object. And it also has a method called latest which supplies the latest copy of the object immediately. So you can see here I run some code to create register this object in the shared state and then I call transition next and tell creator which state I'd like to transition to. Now in order to transition to a state, I have to implement transition to and explicitly tag that transition as valid with the compiler and this is to improve the rigor of the state machine implementation and ensure that invalid state transitions are not taken. I'm not gonna go into too much detail on all of these states but I'd just like to call out that there's a derived macro for defining valid transitions. And then I'll just comment on the behavior of this state machine. So the next state is roaming around. The moose will do that until it gets hungry and with some probability, it'll make a friend and update that shared state map. When it gets hungry, it transitions to the eating state. This simply waits for some duration and then replenishes the moose's food and then transitions to the sleeping state. From the sleeping state, we simply wait 20 seconds and then transition back to roaming. And then the final state is a deleted or deregistration state. This is a state that on an operator like a cubelet you may transition to within the handlers of other states in your state machine graph. So a pod may be running and then it exits and it's successful and you transition to the completed state and that exits the state machine. Or in the case of this operator, I never explicitly transitioned to it and the state machine simply runs forever. However, when an object is deleted with the Kubernetes API, crater needs to transition to a state that allows the object to clean up and then exit. And so this state needs to exist for crater to transition to it will interrupt the execution of the state machine, transition to this state and then execute from this state until the state machine exits. And you can see that this state calls transition complete which completes and terminates the execution of the state machine. And then it returns a result which indicates whether the state machine exited gracefully or not. For our shared state, I have a simple hash map of relationships between Mooses and then I implement Moose tracker. This is the type that will implement our operator train and these are created essentially as singletons. So scrolling down, this is the implementation of the operator trait. I again reference the status and resource type definitions from above. I indicate that the tag state is the first state that should be entered when an object state machine is created. I indicate that the release state is that state that should be transitioned to when an object is deleted. And finally, I reference the state type that is specific to each object. There's a couple of methods that need to be implemented. So the first is to create that initial state type for a given object when the state machine is starting. And this can reference the manifest of that object or any shared state on the operator. Next is crater needs to be able to fetch a arc RW lock reference to the shared state so that it can supply it to the state handlers. And so there's a method that you need to implement here to simply create a clone of that arc and return it to crater. And then finally, I've been working on some extended functionality to support admissions validating and mutating web hooks. And here, it's hidden behind a compiler feature flag, but essentially you get a copy of the object that's trying to be changed or deleted or created. And of course, you can reference any shared state on the operator if you need to validate within sort of the context of all of the objects that the operator is managing. And then here, I'm simply validating that the Moose's name starts with the letter M and then I can allow the object or the change to occur and then I can optionally mutate the object. Here I'm leaving it unmutated or I can deny the object and return kind of a standard Kubernetes status which indicates why this has been denied and that'll be returned by the Kubernetes API to the client. So scrolling down a little bit, this final piece of code is our main function and it's all you need to do to actually start an operator with crater. So I create a cube config, which I simply infer, I create my operator singleton and then I call operator runtime new and pass it that cube config and that operator singleton and then I can optionally supply list parameters to filter the objects that I'm managing. And then finally, I call runtime start.await and that'll block forever and it will spawn asynchronous tasks for the execution of the state machines for each object that's created. So if I exit out of this and I come up and rerun my operator, you can see that I am setting the moose module as well as all of crater to debug for the purposes of this demo and I'm also activating the admissions webhook so that I can show you that. So if I run this, it'll, I'm in the wrong directory. There we go. It'll print out the CRD here but it doesn't actually apply it. The CRD is located in the assets directory within the examples folder. So you can apply this yourself if you wanna play around. But you can see here that there was already a moose registered with the API. So we got a re-sync event when we started watching for mooses. We created an event handler here that's creating a state machine for this moose. Let me scroll up and there's a number of pieces of logging here and I'll talk about that a little bit later. But initially the moose enters the tag state and there is a status update that's patched. Excuse me, there's a status update that's patched to Kubernetes when it enters this state. And then next it transitions to the roaming state and there's a status update for that and then the moose will continue to move through its states as the operator progresses. So here in canines you can see I can list mooses and I have set certain fields on the status to be printed out when you use Kubekettle. And so we can see the moose transitioning through the states. For the purposes of this demo I can generate a large number of mooses. So here I generate 25 new mooses and the amount of time that the moose spends in the roaming state is determined by its body weight. That determines how long it takes to get hungry and the body weight is sampled from a normal distribution. So the mooses should go through the state machine at different rates which makes it a little bit more interesting. And then finally I can demonstrate the web hook capability. So I have a moose here named George which is an invalid name according to my rule. And so I can apply this manifest and you can see in the log that the request was denied that went by quickly but you can also see that the response returned to the client is that this is a invalid moose name. So hopefully this has been a nice overview of the API and the functionality that you can implement fairly concisely using the Crater API. Next I'd like to talk a little bit about some new functionality I've been working on with Crustlet and Crater. The first is that both crates have transitioned to using tracing for all logging events that we emit and this offers full compatibility with the original log crate that we were using. So it shouldn't require any changes for consumers of our crate. Additionally I've been working on some extended functionality for Crater. So wherever log events were emitted where we were populating strings with the value of variables, those have been pulled out into structured fields which should allow you to dissect the log output of Crater using a tool like JQ for JSON person. And then I've added several key spans to the tracing instrumentation in Crater. Namely whenever an admission webhook request is processed, whenever a single state node is executed for an object and then whenever an update is received from the Kubernetes API and is processed. And if you'd like to read a little bit more about tracing, Luca Palmieri has an excellent blog post slash book on deploying Rust in cloud native environments and has an entire section on tracing that's really excellent. Finally, I'd like to talk about Crater's roadmap a little bit. So in the near term, I'd like to expand the ability to easily subscribe to watching different types of Kubernetes resources and I have an RFC out on GitHub for the API proposal there. In addition to that, I'd like to think a little bit about how the state machine API can be improved to sort of identify where cluster state may be deviating from desired state and how to trigger reconciliation within the context of a state machine. Finally, I'd like to expand the templating and sort of boilerplate generation surrounding creating and deploying operators. This is something that's offered by the Go SDK and I think there's a lot of room for improvement here with Crater. And then I'd like to look at the operator SDK's capabilities roadmap for ways that I can offer very simple APIs to implement some of the more advanced features such as auto scaling and abnormal state detection. So thank you for coming to my talk. I hope this has been very interesting and I hope you'll check out Crater. If you'd like to get in contact with me or the Crustlet project, you can find us on Twitter, GitHub. We both have websites. And then finally for the Crustlet project, there's a hashtag Crustlet Slack channel in the Kubernetes Slack. So that's probably the easiest way to reach out to us. Thank you.