 My name is Maté. I am a Lincor D. maintainer. I work for a company called Boyant where I'm a software engineer. You can reach out to me on social media. I'm on Twitter. Probably should have changed the logo. You can find me on GitHub and a couple of Slack channels. So yeah, welcome to another talk on why you should rewrite your software and Rust. The hype train is very, very real and it's not stopping. It's going strong. And I've got a lot of things that I actually want to cover in this session. So, you know, kind of a housekeeping thing. Who here has written a controller before? It doesn't matter what language it was in. All right. All right. That helps me out. What about Rust? Any experience with Rust? All right. And service meshes? Okay. Well, that makes my job much, much easier. So we have a couple of Rustations in a room. But I still need to go through the slide. You know, like why Rust? It's kind of the topic of the talk. One, because it's fast, right? You can do your own memory allocations. That's very important when you're working with software where latency is important, where it's important that you manage data in a very efficient way. And Rust lets you do that in a very nice way. It has a very nice concurrent API. You can do concurrency with a bunch of tools that were written with performance in mind. Backpressure is applied in a nice and elegant way. And overall, it's just a very nice language to work with if you want to get stuff done fast. But more so than that, it's safe. Right? Like, you don't basically have the same issues that you have with a language like C++ because the compiler really helps you out. You have a bunch of invariants that it basically upholds to ensure that your code is safe, whether that's in concurrent execution environments or whether that's in a single-threaded application, you know, that all holds. You have ownership semantics. You have marker traits to ensure that you don't share data across threads when it's not meant to be shared. And that makes the language really, really safe to program. And even if you don't have experience with managing your own memory. And last but not least, it's sane. All right? So you might have noticed this is exactly the title of my talk. But it's sane because the compiler really helps you out. If you have to work with Rust and you have an issue where your code doesn't compile, the compiler is going to give you some very helpful output that's going to help you resolve your issue. So all of that makes it just an overall nice language to program in. But for us with Linkerty, so Linkerty is a service mesh. If you didn't raise your hand when I asked about service meshes, I got really good at explaining this over the years, by the way, because I've been doing it for a while. But service meshes are basically platform level tools. You just use them in your Kubernetes environment to give you features out of the box without really making any changes to your application stack. So the way we kind of do this is that if you have two services that talk to each other, let's say two pods, like in this very nice and elaborate diagram, we're going to introduce a sidecar proxy that's going to take over your traffic. And it's going to do a bunch of cool stuff with it. It's going to provide MTLS out of the box. It's going to basically provide a bunch of reliability mechanisms, like retries and timeouts and metrics. It's going to make everything a little bit more performant, because we do some very elaborate load balancing techniques. And all of that without actually requiring you to make any changes to your code. So it's a pretty, pretty sweet deal if you ask me. But if you ask me, I'm pretty biased. So, you know, there's that. But for Linkerty, we need to ensure that safety and performance are there, right? Like we're in the application's data path with our sidecar proxies. We're going to take over your bytes and we're going to send them to their destination. And if you work with stuff that you don't want to leak out, like credit card information, then ideally, you want it to be very safe, right? And in practice, this means no CVEs, right? We want to ensure that you don't have any buffer overflow attacks. We want to make sure that we don't have any double freeze in our code. We want to make sure that everything is performant and up to par. But it also means that we want to be performant, right? Because we add at least two additional hops in your network traffic. So we don't want to have any GC pauses. So Go, for example, is the language of choice for network services in cloud native, because it's very simple to work with. It also gives you some nice mechanisms like you can put stuff on the stack, you can put stuff on the heap, you can work with pointers, but you have a garbage collector. And that kind of gets in the way of things, especially when you need to deal with a lot of traffic. Ideally, for our proxy, we're not going to have any point in time where we just stop the world to collect all of these values that are no longer used within the application. So at Rust, because everything gets dropped when it goes out of scope, we avoid that, right? Like, you use some memory for a request. As soon as you're done with the request, you just discard everything, you invalidate it and, you know, everyone's happy and the code is ready to perform it. And finally, we want to have the ability to scale. So with LinkerD1, it kind of predates me, but you know, I've heard the stories. With LinkerD1, we used JVM proxy. We programmed it in Scala. And we had a bit of an issue with scaling down. So the JVM was good for scaling up. But when you had an application that didn't really send a bunch of traffic, then we were using a lot of memory just by virtue of running a sidecar container. And as soon as we switched to Rust and as soon as, you know, the proxy became machine code, so to speak, everything just became much, much better in terms of scaling up and scaling down. But we're here to talk about controllers, not about the proxy or sidecars for that matter. And LinkerD1 also has a control plane. So for us, the control plane is just a set of controllers that we run in Kubernetes that basically tell the proxies what they need to do. And originally, they were written in Go. Because at the time, the Rust ecosystem didn't really pick up too much actually, just kind of in general, not even in the cloud native sense. We had to build a bunch of the Rust libraries that we're currently using from the ground up. We had a bunch of contributions in Tokyo and Hyper and a bunch of other libraries that people use in production now. But with the control plane, we kind of resorted to Go because we didn't have the same requirements that we had with the proxy. We weren't in the applications data path. We weren't dealing with all of these very stringent requirements. And we just opted to use something that was easy and that had a very mature ecosystem. And a quick thing to note about our controllers is that they're not very traditional in a sense that we don't really leverage any big frameworks. Our controllers are very read heavy. We basically do no writes. And their purpose is just to replicate a bunch of data. In 2.11, though, we made a choice. So LinkerD just reached 2.15. 2.11 was about two years ago. We made the choice of introducing Rust to our control plane. And it was mostly an experiment because we just like rewriting stuff in Rust. But we also kind of had enough of nil pointer exceptions. And we had enough of bugs that were very, very hard to actually track because they were happening at runtime. And to be honest, we also had enough of putting mutexes over all of our maps because it was very hard to reason with concurrent and shared data. So with that being said, I kind of mentioned controllers a couple of times. And for people in the room that are not really familiar, controllers are how you implement the operator pattern. And when I was researching material for the talk, I came across this white paper from the CNCF about the operator design pattern. And I pulled out this quote. So the operator design pattern defines how to manage application and infrastructure resources using domain specific knowledge and declarative state. That's a bunch of buzzwords all put together in there. But I really like to quote because it kind of highlights my own understanding of operators and controllers. With our applications, we're used to configure them in two ways, right? You configure them statically for a configuration file or for environment variables, or you configure them dynamically, which is probably what we kind of do in a cloud native environment, right? You have a bunch of dependencies, you have a bunch of microservices, and they all need to talk to each other. And you need to discover configuration dynamically at runtime. You need to make a bunch of API calls. And operators and Kubernetes, at least my own understanding of them, is that they allow you to automate a bunch of this dynamic configuration. They allow you to extend the Kubernetes API service because the Kubernetes API server is very generic. It only supports, you know, core resources that make it tick. And it just basically offers you an interface to deal with configuration management and to just automate all of your stack to support your applications. And if you want to read more about it, I put the link in here and I'm going to put the slides up too. And of course, that if you want to write a controller, and that's what this talk is about, writing a controller and Rust, you need to talk to the Kubernetes API server, right? What is the Kubernetes API server? It's just an ADP server. It just has a bunch of paths that it manages. You can get resources. All of the resources are versioned. But at the end of the day, it's just an HTTP server, right? So in order to talk to an HTTP server, you need an HTTP client. The only problem is you need to decorate that client with a bunch of config, right? You need TLS config. You need to somehow discover the actual API address. You need to deal with service account tokens because you probably use RBAC. There's a bunch of complicated stuff that you need to do to talk to the API server in order to build a controller. And most operators, or at least the operators that I've kind of seen out in the wild, are written in Go, primarily because the library that you get to interact with the API server comes directly from upstream. It's really well optimized and it just makes writing everything a breeze, right? You don't need to complicate yourself with writing everything from scratch. You have a bunch of nice abstractions. In other words, the ecosystem is very mature. And I think one of the things that people have at Rust is that the ecosystem is not as mature, and I'm kind of here to convince you that that's not the case. But first, let's see how you actually use Client Go, the upstream library to build a controller. You can basically create a client. You just have some environment config. You read that in. You create a generic client, and now you can use this generic client to access a bunch of resources. You list the resources all as well. But even though the API server is just an HTTP server in a trench code, it takes a lot more to actually run a production grade controller, right? You need backoff policy so you don't overload the API server. You need to have some queuing strategy in there. You need to subscribe to resource changes. There's a bunch of stuff that's involved in actually writing something that's production grade. So that's why the Go ecosystem actually has some runtimes that it offers. You have controller runtime, you have cube builder. All of them do kind of the same thing. They let you just basically use these frameworks that take ownership of your data and, you know, they take a function that reconciles stuff and, you know, it abstracts all of that complexity away from you so you don't have to deal with it. Which is nice. But that doesn't mean that we have to use Go, right? Again, going back to my Rust evangelism strike force meme, we can just rewrite the entire ecosystem in Rust. And this is what people have been doing for the past two years, three years, I don't know. Timeline's a little bit hazy. But the point is that the CNCF landscape for Rust is thriving. I only have a couple of logos so, you know, like, the slide's not really a mirror of what's actually happening in the landscape. But there's a bunch of stuff out there already that's using Rust in production. And here's where, I guess the talk gets interesting. Rust's answer to client Go is the cube crate. Has anyone here used the cube crate before? All right, not a lot of hands. That's good because I'm going to talk a lot about it. It's like half of this presentation, I guess. So the cube crate is basically Rust's answer to client Go, right? It's a library, basically a crate that contains four other libraries that give you all of the abstractions you need to actually interact with the API server because, again, fundamentally everything that you need to do in a controller when you write a controller is to interact with the API server. So you have a client crate that basically abstracts away all of this client creation stuff. You can read it from the environment, you can create HTTP clients, you have a core trait that provides a bunch of interfaces that you can use to build better abstractions or to work with all of the types. It has a derived crate that allows you to create CRDs and it has a runtime in case you don't want to worry about all of the glue code that you actually have to write. So this is how it works. If we kind of compare this to the initial Go example that we had, it's essentially the same thing. It's just a different library and a different language. So instead of having Go's usual main entry point, we have an asynchronous function because most of the stuff that we were going to build and rust, especially in a cloud native ecosystem, probably needs to be concurrent. So we need an asynchronous runtime going to use Tokyo because it's what people generally tend to use and it's battle tested. And then the flow is basically the same as the Go counterpart. You create a client, you create an API type and I'm going to talk about this in a second. And then you use this API type to list all of the pods. So what's all of this code that I just put on the screen? Well, obviously the client is pretty self-explanatory. You know, you just have a function that creates a client but the API type is looking a little bit weird, especially if you're not used to rust syntax. It's basically a wrapper type that's generic over a type parameter. This type parameter is just a resource that you have in your Kubernetes cluster. It can be a pod, it can be a CRD, it can be a namespace, whatever. And it allows you to actually interact with the API server. The reason why it's generic over a type parameter is because different resources require different interfaces, right? If you're dealing with cluster scope resources, like cluster roles or namespaces, then you're going to want to have a different interface than if you're dealing with a pod. Like for one, the pod takes in a namespace. So the generics look a little bit scary but they're actually pretty intuitive once you get to work with them a little bit. But aside from all of the simple stuff, because we're not just building CLI applications, right? We're building controllers. QBRS has a runtime that has all of the batteries included. So you can create the same style of controllers that you can usually create with the upstream client go library or the controller runtime. And this is an example that I took from the QBRS codebase. You don't really need to understand it too much because I'm going to go into more examples soon. But it basically lets you create this controller structure that takes ownership of everything that you pass inside. You pass it a reconciler function and it just does its thing. So like really easy. You can get a controller up and running in no time. And this is just kind of explaining the same thing. But QBRS is really more than just a runtime, right? Like it contains a bunch of glue code. It's the controller runtime put together. It's Q builder. It's client go. It's everything packaged in a nice, neat box. And to kind of illustrate how easy it is to create a controller using QBRS, I actually set out to create a controller for a resource that we recently introduced in Linkerd. So let's say you have a controller, right? It needs to do some sort of dynamic configuration. It needs to reconcile some resources. That's exactly the kind of example that I wanted to highlight here. We have this external workload resource that you don't really need to worry about, right? It's a CRD. We want to register some bindings for the CRD and then we want to do something with this CRD. Every single time we see an instance, we just want to reconcile that instance. So on the left, is it on the left? Yeah, I guess. On the left, we have the actual YAML definition. So this is what the resource actually looks like in your cluster. Once you define it, you have a bunch of metadata in there that, again, you don't really need to be worried about. You have some IPs, some ports, whatever. But what this controller does that I kind of built just for this presentation, it puts a status on it. So statuses are generally things that controllers modify, right? Because you're not really supposed to modify the spec of an object once you persist it into HCD. So this controller will always slap on a ready status whenever it sees an external workload resource. And just to show how easy it is to just create the bindings with QBRS, we have the bindings on the right. So you use a procedural macro. You have a struct that represents your spec. You fill this struct out with all of the data that you actually have in your YAML. You slap on some macros and at compile time, all of this gets generated for you and you can use it. And this is what a controller actually looks like. So we have a reconciled function. That's the bread and butter, so to speak, of our controller. It takes in our actual object and our actual object is behind an atomic reference counter pointer because we want to share it between threads and because we want to probably store it. We have a context. It's just a shared context of stuff that you might want all of your reconciliation loops to keep track of. In this case, the shared data is just a client. We want to reuse the same client instead of building it every single time. And then inside the body of this reconciled function, you just implement your reconciliation logic. So in this case, we just want to work with a status. On line 14, you can see that we get all of the conditions. If we don't have any conditions, we default to an empty vector, to an empty list. And then for every condition, we just want to check. Do we actually have this condition already put in our status? If we do, then there's nothing for us to do. Just wait for the resource to change. Otherwise, we just create this boilerplate condition. We change the status, we write it to the API server, we patch it, and then again, we say call me when something changes with our resource. And to run it, and to be honest, this is probably a little bit overkill, but I just wanted to prove a point. To run it, we just have a wrapper function around it where we take all of this out of band configuration that our controller might need. We pipe it into controller.new. We have some knobs that we can twist, but other than that, we don't really need to worry about what happens inside, right? We just pass it a reconciled function, we pass it an error policy to share data, and then we just say, okay, do your thing. I'm going to go and focus on my next ticket. But the problem is, or at least the problem that we noticed in Lincardia is that sometimes controllers need to hold a lot of state. Sometimes controllers aren't simply just reconciliation loops, right? So Lincardia's control plane, it indexes a bunch of state, and then it exposes the state to proxies over gRPC APIs. So proxies can always dial in, they can get data, but we don't actually want to write any data. We just want to do a bunch of reads. And there are a couple of reasons for this. The first and foremost is that we kind of took this from first principles, and we thought about Lincardia as a distributed system. So what do we need in a distributed system? Well, we need to replicate the state because we do not want to have a single failure point, right? We want to have a separate failure domain. So if the API server goes down, your traffic's not going to go down because we index all of the state and we keep driving discovery for all of the proxies. We also wanted to ensure that there's a separation of concern, right? We wanted to ensure that we have clean interfaces around all of the controllers that we build and all of the proxy code that we build. So we wanted Kubernetes to be completely opaque to the proxies. And it turned out to be a good thing because we just introduced mesh expansion and we read it and want the proxies to be aware of Kubernetes in that specific scenario. And, you know, the last point is that we can always optimize better in the face of overload because we own the code. So there's a bunch of cool stuff that we can do inside of our controllers because essentially every single thing that we do is to translate them to domain types. So frameworks generally work well in most cases, but not for us, right? We can't use a framework because we don't have a reconciliation loop and we just want to read stuff. And this applies to using kubrs as much as the go controller runtime. So we don't use any of this stuff in our own code. This is exactly why we built kubrt. It's a toolbox with common patterns that we use all throughout our controllers. So in Go we didn't really go down this round. In Go we just leverage everything that client Go gives us, have a bunch of glue code and then we just, you know, kind of duplicate all of that code all throughout our controllers. But in Rust we thought that we can do better. So yeah, we built kubrt which is kind of a general purpose library that offers a bunch of glue code that helps you write controllers that are very heavy on reads in a much nicer way without you actually having to dig in into all of the abstractions that kubrs builds. So kubrt is built on kubrs. It uses the same kind of runtime concepts to talk to the API server but it includes a bunch of things like CLI helpers, an admin server, it has some drain and lifecycle helpers to help you manage, manage your controller better, some Prometheus metric adapters and stuff that's, you know, a little bit more opinionated and wouldn't work in a generic fashion. So it's not upstream into kubrs. But the most important thing is that in kubrt the application controls the storage, not the framework. And I think this is a very important thing to note when building controllers in general. You don't want the framework to own all of your data. You want your application to own all of your data. And there are some really good advantages to that. For one, you don't have to hold all of the Kubernetes state inside your actual controller. Right? You can just prune it as much as you want. You just hold all of the data that you actually need to implement your business logic. So this is how kubrt looks like. Might look a little bit confusing. But on the right, this is kind of our entry point. We have this names-based function. We also have a cluster analogous function that basically does this for cluster scope resources. But it basically takes a store. This is an index. It's a store that's protected by a read-write law because we want to share it between threads. And then it takes a stream of events. So these are all of the events that come from the Kubernetes API server. And what this does is it calls this index names-based resource trait. It calls the functions in there after it flattens the stream of events. So if you have an event where a pod was created, it will call the apply function. If a pod was deleted, it will call the delete function. So basically, this namespace thing just makes sure your state machine is running correctly. And then the index namespace resource trait simply just applies all of these to your index. So it's a very simple kind of way of structuring your controller. And just to show you how easy it is to implement, this is exactly what an index can look like. So in our external workload example, we key everything off by the name of the external workload. And then we don't need to hold the status. We don't need to hold the spec. We don't need to hold any other information about this resource, because it's very esoteric. It's very Kubernetes principled, right? And we don't need that. We just want to know if this workload is ready. That's it. So we just kind of hold the boolean in. And that actually saves a lot of bytes. You wouldn't believe it. And this is how you just implement the trait. You have an apply function. The trait is generic, again, over the actual resource that you want to use. And we say that, OK, whenever you see this resource, whenever you see this external workload and you want to apply it, just look at its status. Find a condition. If you found a condition, then just insert it. If you didn't, then just remove it from the store. Easy peasy. You don't have to actually deal with any reconciliation loop or any backup strategy or anything like that. And then in the main function, you just start the runtime. You create an index. You create a feature to watch all of these resources from the API server. And then you just tell the asynchronous runtime, like, hey, I've got a task for you to run. Just keep running this until we're done with all of our work. That's it. Now, when you have to choose between Qbert or QBRS, I kind of wanted to highlight that there isn't really a strong preference for one over the other. It really depends on what you're trying to build. Are you trying to build something that's right heavy? Are you trying to build, like, more of a traditional controller than probably, you know, QBRS is going to be a better choice because it allows you to, you know, just leverage all of these controller-like frameworks that abstract all of the Kubernetes internals from you. But if you want to build something that's purposely built for your application domain where you need to work with your own domain types, then using something like Qbert is probably a better idea because it allows you to structure your application differently and allows you to have control over your storage, which is really important when you operate at a high scale. So in the end, I guess, it's just a matter of which flavor you like most, and it really depends on what your controllers are doing. And then if you want to learn more about mesh expansion, my colleague, Zahari, is going to have a talk about it just because I mentioned this external workload resource. But that about sums it up. And in my trial runs, this actually went 10 minutes over. But I guess you're all experienced with Rust, so I didn't have to explain a bunch of things. Do we have any questions? There's someone over there. I don't know who's got the mic, so. Hello. Thank you for your talk. I would like to ask if there are any disadvantages that you have noticed with Rust? Because most of the people, they mentioned the advantages, but you usually hear about the disadvantages. Yeah, if there are any. That's a really good question. I think that disadvantages that Rust has a bit of a steeper learning curve when it comes to using languages like Go. So I think Go is a very elegant language, for example, for writing network services. It gives you a very nice and streamlined concurrency model. You don't really need to think about what your data is doing. You don't really need to hold a bunch of context in your mind. You can just be very productive in a matter of hours or days. But with Rust, the concurrency model is a bit trickier to reason with. First and foremost, you have this async await kind of syntax, but futures are lazy. And it's a bit of a leaky abstraction, right? You need to know how futures work, and you need to know how the asynchronous runtime work in order for you to use it. So it's a really nice abstraction once you learn how it works. But usually with abstractions, you shouldn't know how it works. And I think that's Rust's biggest disadvantage that I've noticed personally. Perfect. Thank you very much. Any other questions? Yeah. Yes. Hello. Thanks for the talk. You talked a lot about performance and memory in the talk. Did you have the occasion to compare memory-wise if there is an advantage to use Rust instead of Go? Yes, we did. I don't have any numbers. Maybe I should have put some on. Had some slides with some benchmarks. So maybe I can go into feedback. But we did notice it because we run two controllers that index very similar resources. One is in Go and we basically just pull a bunch of Kubernetes objects and we cache them. And one is in Rust. The one in Rust always has less of a memory overhead. So it's something that we always notice even in the resource requirements that we give it. We always just go a little bit lower in both limits and requirements just because it doesn't consume as much and we manage to prune all of these resources. Thank you. I've got another question. And then I'll let it go. Is there any issue on the updating side of the Rust library? Because Go, a Cube client kind of goes fast. There is a lot of work being done very, very, very often. And I wanted to know if you ever had any issue to catch up. That's a very good question. So just kind of as a disclaimer, I'm a contributor to QBRS. It's true that in terms of features, it still hasn't reached the same feature parity. So for example, if you want to use leader election, that's something that's not natively offered by the upstream QBRS. That's something that we implemented in QBIRT in our own opinionated way. So from that perspective, there is a little bit of an issue here because we're still catching up. And of course, you don't have the same amount of people that contribute. QBRS is a set of two maintainers and a bunch of ad hoc contributors. Client Go is like an entire special interest group that continuously drives innovation and updates. So that's one side of it. The second side of it is that QBRS being relatively newer still sometimes ships changes that might completely re-architect the entire thing. And as much as we try to not really touch the public interfaces so that we don't introduce backwards incompatible changes, it inevitably happens when you're still working on something that's relatively new. So that's a very good question, though. Hope I answered it. Yeah, I just wanted to ask, do you have any experience with like caching layers or how would you approach that in a Rust world? Is it built already in the library or because sometimes we need to cross-reference multiple objects, which is a bit tedious because we don't want to hammer the API server for every request? Yes, that's a very, very good question. And that's what I wanted to illustrate with QBRT, the library. So that's how we manage our caching layer in Linkardee. We basically just have a bunch of these global indices that we run for various parts of our work. So we collect a bunch of resources. Most of the indices that you see in our code are not as trivial as the one that I showed in the slides, right? It's not just a hash map. We have lots of maps, trees, vectors, things that we use to hold all of the state that we transform after reading from the Kubernetes API server. But I guess that's an opinionated way of doing it. In QBRS, you can also do this through an object store. So every controller can also support taking an arbitrary object store that you basically provide out of band. So you can use the same store to share the same object across multiple parts of your code base. And something that I'm working on is bringing that to the controller framework. So you can actually share multiple watch streams. So if you have free controllers, for example, that work with pods, we want to be able to let you pass in a single stream that lets all of the controllers run reconciliation loops on the same object so you don't have to continuously pull the API server for the same resource three different times. Great, exciting. Yeah. Hi, so if I'm not mistaking, QBRS supports defining finalizers for controlled resources. Does QBRS support that too? No, but QBRS basically uses QBRS as a dependency. So you can use QBRS and then you can probably figure out how to rip some of that out from QBRS. QBRS was built more as an opinionated way for us to run controllers and to just aggregate all of our glue code in a place that makes sense. But if you do have a feature request, I guess, always happy to hear it. Cool, thanks. Hi, just a question on tooling. So you mentioned QBuilder. One of the great tooling there is you write up your code and it generates a bunch of stuff on your behalf, such as Rback and whatnot. Can you get that with QBRS today? I don't think with Rback. So I'm not super knowledgeable in QBuilder. Unfortunately, we wrangle all of our YAML by hand. I know. But yeah, I don't think you can generate Rback or any of that kind of stuff. You can generate schemas for your CRDs, but that's basically it. It's just CRD generation at this point. Okay, thanks. Any other questions? Go on once, twice, three times. All right, well, thank you all for coming. Thank you.