 Hello, I'm Kevin Flansberg and welcome to Kubelet Deep Dive, Writing a Kubelet in Rust. This talk covers Kubelet architecture and behavior in detail, which was learned whilst rewriting Kubelet in Rust. It also covers the benefits of Rust for writing distributed applications such as Kubernetes components. First, a little bit about myself. I have been a Rust developer for around three years. During this time, I've been responsible for developing and deploying globally distributed microservices. As such, I'm pretty familiar with the Rust software development lifecycle, from running Git in it to CI CD and finally monitoring. My DevOps journey began with Docker Swarm, however around a year ago I migrated to Kubernetes. I have found it to be a very powerful tool even in a single developer shop. I most recently joined the Kresslet project, which nicely combines my interest in Kubernetes and Rust and was the inspiration for this talk. Kresslet is an open source project being run by Deus Labs. It stands for Kubernetes Rust Kubelet and involves a number of interesting components. The first is the Kubelet Crate, which seeks to implement common Kubelet functionality and expose a flexible API with which developers can build custom Kubelets. This allows Kubelets to be developed for new architectures and types of workloads. In fact, the second major component of the project are two Kubelet implementations for running Wasm workloads, one using the WASC runtime and the other using Wasm time. These are exciting because you can compile the Wasm using Rust, upload an OCI compliant image to Azure, and then deploy these workloads in Kubernetes using Kresslet. These images are very small compared to Linux container images and isolation is achieved using the Wasm runtimes themselves, resulting in a very performant and efficient way of running microservices. I have also begun work on a Kubelet implementation, which targets traditional Linux containers and makes use of the container runtime interface. In the rest of the talk, I will focus primarily on Kubelet in the abstract or as it pertains to Linux containers, but I think the use of Wasm within Kubernetes is very exciting and I encourage you to check out these projects. I'd like to begin my tour of Kubelet architecture with a high-level overview of Kubernetes architecture for those that are unfamiliar. On the left side, you have a number of components which make up the control plane of the cluster. These can run directly on the host in control plane nodes, or in a self-hosted cluster in pods in the cluster itself, or in the case of many cloud offerings, are managed by the vendor and only the API server endpoint is made available to the customer. Within the control plane, you have etcd, which is a distributed key value store that Kubernetes uses to persist cluster state. The Kubernetes API server then exposes the data in etcd via an HTTP endpoint and additionally performs authentication and authorization on these requests. Users and Kubelets connect to Kubernetes via the API server endpoint. Kubelets register themselves and monitor for pods to run, and users can modify cluster state such as submitting a new deployment to run. Note that this slide isn't exhaustive and leaves out components like kube proxy, cloud controller manager, and cluster DNS. The scheduler is responsible for assigning pods to nodes, and the controller manager runs cluster control loops such as adding endpoints to services, provisioning service accounts for new namespaces, and pruning dead nodes. These services leverage an extremely central concept to Kubernetes, the controller pattern. A large number of Kubernetes components use this pattern. It begins with declarative manifests which represent instances of the various API resources that the cluster supports. Most users will be familiar with these manifests in the form of YAML files, which are submitted to the cluster to create resources such as deployments and services. These manifests are mostly immutable, but you can find some exceptions. Controllers within the cluster watch for changes to the types of resources they manage. These changes are typically the creation of a new instance of the resource, modifications, or deletion. Providing a resource for these types of changes is typically referred to as the informer pattern. When changes are detected, the controller will then drive cluster state to match the desired state specified in the manifest. In this way, different components with different responsibilities ensure that the cluster is eventually consistent with its desired state. As an aside, there is also the operator pattern, which you will see pretty often when working with Kubernetes. The distinction between an operator and a controller is fairly subtle, but an operator is essentially a controller that defines a custom API resource and is application or domain specific. I believe that the use of the two different names here is a little confusing and unnecessary. If we take a closer look at Kubelet, we can see that it is a controller as well, with its resource type being the pod. It is a little special though because pods represent units of work, and are the original and most fundamental resource type in Kubernetes. Primarily, the Kubelet watches for pod changes, and then configures the container runtime to pull images, create namespaces, and run containers. There is a GRPC endpoint for this communication called the container runtime interface, which was introduced in 2016. It is not 100% adopted, however, and you can still find some Kubelets configured to use Docker with its Damon socket. Additionally, Kubelet exposes an HTTP endpoint for streaming logs and exec sessions to clients. First class support for this via the API server and kubectl is one of the special things about Kubelet. For configuring storage such as block volumes, there is the container storage interface, another GRPC protocol endpoint which Kubelet interacts with. Legacy storage drivers were originally included in the Kubelet source code, so you may find some in the wild that do not make use of CSI. Interestingly, the container network interface is not GRPC, and it is the responsibility of the container runtime to configure this. This makes sense because the runtime is directly responsible for configuring the network namespace that the pod runs in, but it is a little unintuitive. The fact is, pods are very complex. If the ops community has learned one thing over the last few decades, it is that hosting applications involves a lot of moving parts. Kubelet must interact with many other components and does not just act as a shim to the container runtime. For instance, at some point, it must fetch and configure secrets and config maps, respect the image pull policy, and mount service account tokens. The pod spec allows for a lot of customization of behavior, which requires a lot of Kubernetes-specific decisions to be made by Kubelet before the runtime can start the container. I'd like to spend some time discussing why I think Rust is a great language for developing distributed applications like Kubelet. First, Rust can produce very high-performance software and frequently matches C++ in performance benchmarks. In part, this is achieved by a policy of zero-cost abstractions, where in abstract programming features like generic types incur zero runtime cost. The result of this focus on performance is not only the ability to scale, but also efficiency, which can have a big impact in the data center and at the edge. Next, Rust employs a strong type system, as well as a borrow checker, which enforces memory safety at compile time. Many first-time users of Rust quickly grow irritated with the compiler and borrow checker. However, I have found that nearly everything that it catches is an actual bug that would have been a runtime error if it had not been caught. Once you are familiar with certain error types that are specific to Rust, the compiler can feel a lot like pair programming and can be a helpful guide when conducting large refactors or prototyping new features. As a result, I can't help but be nervous and code very defensively when I return to a language like Python. Rust's strong concept of memory safety contributes to easier concurrent and parallel programming as well. It includes strong primitives for coordinating and communicating between threads. This leads to significantly reduced cognitive overhead when developing concurrent software and frequently catches memory safety or race-condition bugs. Additionally, async await was stabilized about a year ago, and although there are some friction points surrounding the many run times that there are to choose from, it is something that I basically default to now unless I am seeking to minimize latency. Another great feature of Rust is error handling. This can be a hot topic as many find it somewhat cumbersome. I find it to be easy to understand, albeit verbose at times. Rust's error handling gives me the confidence that I'm actually handling all of the error types that my code can produce. This example shows obtaining a result from a function that can fail. A result is an enum of either OK or error. Rust allows you to exhaustively match on this enum, ensuring at compile time that you handle all possible variants. The second case shows a more terse format. This question mark will either return the value if the result is OK, or it will exit the current method early with the error, allowing errors to bubble up. The Rust community is constantly working to make error handling even better, and there are many great crates out there that deal with ergonomics and behavior related to error handling. The developers of Rust have also done a lot of work and continue to improve the error messages produced by the compiler. While not all error messages are the most informative, there is a great framework in place for extremely descriptive messages which underline the exact code the error is referring to and suggest exactly what needs to be changed. Rust also has an excellent ecosystem. I'd like to mention some crates which are extremely high quality. The first is serde, which implements serialization and deserialization. I genuinely miss serde in every other language I use. Many offer similar options, but they are simply not as useful or feature complete. Serde automatically derives the ability to convert Rust types to serialized strings such as JSON, YAML, or Avro, and vice versa. The amount of control it offers over dealing with little serialization quirks is really phenomenal. Next is tracing. Tracing is being developed by the Tokyo project, which is one of the leading async runtimes. However, tracing itself does not require async. I think Rust has a great story around logging in general, but tracing really steps it up. It makes it very easy to introduce structured logging to your application, which I consider to be necessary for application monitoring. It also has the ability to instrument your functions automatically, and you can configure it to output open tracing data which makes it very simple to slot your Rust application in to your distributed tracing architecture. Last is PROST, which can generate Rust types from proto definitions and tonic, which can generate RPC clients to use with these PROST types. This makes it incredibly easy to develop Rust code to interact with GRPC endpoints like CRI and CSI. Rust has also made great documentation a major focus. It has first-class language support for doc strings and Rust doc, which ships with Rust, can be used to generate great documentation pages, including runnable examples. Rust also has strong support for dependency management through its package manager, Cargo. When working with Crustlet, we are balancing a number of rapidly evolving dependencies, changing Kubernetes API versions and complex combinations of libraries and standalone binaries in a single project. Cargo makes it a breeze to manage all of this and is a very well-thought-out tool. Finally, the Rust community is very welcoming and helpful. I found that their approach and attitude is what makes the language and crate so high quality and what makes programming in Rust a real pleasure. I would also like to mention some useful crates specifically for Kubernetes development in Rust. Kate's Open API contains automatically generated types for the Kubernetes Open API spec. It is very useful for manipulating Kubernetes manifests and its documents are actually the main docs I use for referencing Kubernetes API resources. Second is Kube, which is the primary Kubernetes client for Rust. It is what Crustlet uses to parse Kube configs, connect to the API server, patch resources and watch for pod changes. Finally, Kate's CRI and Kate's CSI are crates that I have published which provide automatically generated GRPC clients for CRI and CSI respectively. Let's take a closer look at the control loop used by KubeLit to run a pod. This is the loop that was developed for the Crustlet project based on observed behavior in Kubernetes. When a pod is added, some validation happens and then the image poll policy is evaluated to determine if images need to be pulled. If an issue arises when pulling an image, an exponential backoff is used to retry. After images are pulled, we provision storage volumes, which could also include collecting config maps and secrets for the pod. Next, containers are started and we begin monitoring for exits. If an error occurs, the restart policy is evaluated and the pod either retries with backoff or enters a failed status. If no errors occurred, the pod is considered to have succeeded. This is useful for jobs. Finally, if the pod is marked for deletion by the API, KubeLit sends signals to stop the running containers, cleans up and exits. This is a brief overview and there are a lot of details surrounding the behavior of KubeLit, but I think this provides a good working outline for debugging pods. Note that inherent to this is our control loop. Whilst a pod exists, the graph tries to get us into the running state. The only way to get to states that actually exit the loop are under specific conditions, such as pod deletion or a restart policy of never. For Crustlet, we spent quite a bit of time exploring how we could best implement this control loop while still allowing downstream developers to write highly specialized KubeLits. What we developed is thoroughly documented in a blog post I wrote earlier this year. To summarize, we released a Rust API for building a state machine which captures the logic of the control loop and leverages Rust type system to ensure correctness. This state machine is fully customizable by the developer. The state machine has a number of constraints which we believe improve the reliability of the application and these are enforced at compile time. We ensure that only valid states are used and only valid transitions between states are taken. We also believe that the result of this pattern is code that is much easier to interpret and reason about. The KubeLit crate is responsible for driving the state machine and automatically handles updating the pod status with the control plane on state changes. The pattern also encourages that error handling is done in the context of the control loop. In other words, rather than an exception that prevents the pod from continuing, we explicitly transition to crash loop back off and try again, matching the expected behavior of Kubernetes. To finish, I would like to share an overview of how the Cresslet application is architected. Green boxes represent individual async tasks, so this gives some idea of the concurrency going on here. The yellow box represents the scope of the KubeLit crate while the blue box is what is implemented by downstream developers when building a new KubeLit. KubeLit handles all communication with the Kubernetes control plane, including updating the node lease, serving logs, monitoring for pod changes, and updating pod statuses. Downstream developers implement a provider trait, which is a set of methods that are needed by KubeLit, including those for reading pod logs as well as initializing state for a pod to run. When a new pod is created, the pod event dispatcher will spawn a new driver for that pod. In step one, this driver will call the provider to initialize the pod state and then create the initial state machine state, which is also specified by the provider. In step two, the pod driver will run the handler associated with this initial state, which will return the next state. The pod driver will iteratively execute these state handlers until either a state returns saying that it is an end state, such as failed or succeeded, or the pod is deleted. When a pod is deleted or modified, the pod event dispatcher will notify the appropriate pod driver. In step three, after a pod is deleted, the pod driver will interrupt the execution of the state machine and jump to the terminated state, which is specified by the provider and handle shutdown and cleanup. I think the takeaway from this slide is that Crustlet is a fairly complex and highly concurrent application. Rust gives our small team the confidence and ability to iterate on APIs and rapidly add features while avoiding entire classes of errors and maintaining high code quality. Rust's ecosystem provides many high quality crates, a helpful community, straightforward dependency management, and automatic high quality documentation. Many languages offer similar capabilities, but in my opinion, Rust offers the least compromise. I feel that complex and highly concurrent distributed applications like this are especially suited for Rust, and I hope you will check it out if you haven't already. I'd like to wrap up with some key takeaways from this talk. First, we covered Cubelet architecture and communication patterns, including the components that it interacts with and how it fits into a Kubernetes cluster. I hope this information will be useful to you in the future for debugging and administration tasks related to Cubelet. Next, we covered pod behavior and pod lifecycle. I have found a working understanding of this to be extremely useful for debugging pods that are failing to run and understanding how deployment, stateful sets, and jobs leverage this behavior. Finally, we looked at Rust and how it can be an extremely strong language for developing distributed applications, and in particular, how you can develop for Kubernetes using Rust. I'd also like to shout out to the core maintainers of the Crestlet project, Taylor, Matt, and Ivan, who have been great to work with as we try to roll out some very ambitious features. If you'd like to contribute to Crestlet, you can find us on GitHub, and it would be a great way to start working with Rust. Thanks for coming to my talk, and we now have a few minutes for questions.