 All right. Hello, everyone. Welcome. I'm John Howard. I am a software engineer at Google. And I'm really excited to be here talking with you guys about some ideas for how we can build Kubernetes controllers in some better patterns. I'm especially excited to be here on a Tuesday talking about something other than Service Mesh, which always is on Friday night. So just a quick overview of what we're going to be talking about. First, I'm going to go over what are controllers. How many people here know what a Kubernetes controller is? All right. Quite a few. It's kind of what I expected. How many of you are involved in the development of controllers in some way, like you write the code or even you just read the code? Wow. That's actually more than I expected. So that's good. So I'm glad everyone knows what controllers are. You might have some different ideas of what a controller is than me, so I'm still going to go over it. Next, I'm going to try and convince you that writing controllers is pretty hard to do correctly. You might already have this opinion, in which case it won't be a hard argument, but if not, you know, we're going to go over this. And specifically, I'm going to talk about a lot of the challenges we've faced in Istio. I've been working on Istio for about five years now, mostly on, you know, the control plane, which is basically a big controller. And we've had a lot of challenges that are somewhat unique to Istio compared to other controller projects. And so we'll give kind of a deep dive into that. And then talk about some ways that we can solve these problems and kind of introduce a different approach to writing controllers. So if we look at what are controllers, Kubernetes has this big blurb. It's very useful, but quite long. I like to think of controller as something that takes in some input and has some output. It's a very broad, generic thing. So for an example, in Kubernetes, a user creates a deployment. There is a controller that reads the deployment and outputs a replica set. That is a controller, some input and some output. And Kubernetes is not just one controller. It consists of a ton of different controllers that all operate together to give us the Kubernetes experience that we know and maybe love. So there's another controller that takes a replica set and outputs a pod. There's another one that takes pods and services and outputs endpoints. And Kubernetes is extremely extensible, as I'm sure you've noticed. You can have custom controllers that are third-party ones that maybe operate on third-party custom resource definitions. So I maybe have a certificate issuer and a pod, and I create a certificate. So that's the general idea, what is a controller? It's really driving this whole declarative API that Kubernetes provides us. The problem, though, is that it's really hard to write these controllers. And more importantly, it's hard to write them correctly so that there's no bugs that your users will find when they do some obscure thing that you didn't think of, right? I find that writing controllers is very low-level, and that's really the root cause of this. If we look at the primitives that Kubernetes gives us for writing controllers, you could operate a bit lower than this, but generally we're looking at an informer, which essentially allows you to subscribe to resources. So you can say, when pods are added, updated or deleted, call my function, and I'll kind of figure it out. This sounds probably simple, but it ends up being very, very, very complicated in practice to handle all the cases that are required. So if you look at a real-world controller that is fairly robust, tested, handles all these edge cases, even for the simplest business logic, it's about 400 lines of code. I can't really show a whole 400 lines of code, so I showed an extremely simplified view of what this looks like, but the simplified view doesn't really showcase how complex it is. The tricky thing, too, is that it's not just that there's so much code, right? It's annoying, but it's not the end of the world. It's that you have to be so precise in following it in the exact right order in the exact right way. I've written, I don't know, maybe 50-100 controllers of various types over the years, and every time I don't have another one that's written properly, side-by-side, and I'm copying-pacing the exact order, I'm always introducing some type of bug, and it's usually a slightly different one than I did last time. So you can find these types of bugs all over the ecosystem. This is a little bit easier. Here's one. We didn't properly use a queue for some instance handler. We handled events out of order. This could easily cause an outage for a user. Here's another one. For some edge case, I forget why. We caused to miss events, and now your data is incorrect. That could cause really bad things for users. Kubernetes Core is not immune to this, right? It's not a thing that developers are bad at writing Kubernetes controllers because they're not experienced, right? The people who made Kubernetes are also introducing bugs in Kubernetes controllers, right? It's not a developer problem. It's a framework and ecosystem problem. So here was an example where we were starting up too soon, and we were saying that we were ready and had synchronized everything, but we weren't, and so we could be serving stale data, which could lead to unexpected results. Another example of such a case in the scheduler, right? So specifically in Istio, you don't need to know what Istio really is, but if you're not familiar, it is a service mesh, right? And all you need to know for this talk is that we put an envoy proxy on each pod, and that proxy needs to be dynamically configured. So we have a control plane that is sending, you know, the proxy configuration dynamically to it. That's all you need to know for service mesh. So in Istio, what this looks like, and if I didn't mention Istio, is the service mesh and the service mesh I work on. It's basically one giant controller. So, you know, I showed the other examples. Ah, one input, one output. Quite simple, right? In Istio, we read 45 different resource types as inputs. And instead of outputting to Kubernetes resources, we're outputting to Envoy. Envoy has a slightly different API. It's called XDS. The details don't really matter. The key point really is that it's not just about one Kubernetes in and one Kubernetes out, right? We have tons of Kubernetes input, and we have an output that's not even in Kubernetes. So what's tricky about this is the scope is huge, right? We have all this information we need to manage, right? And the size of this state is also huge. The green stuff on the end, the XDS, if you sum it up across a large scale cluster, it can be terabytes of data, right? We can't store this in cluster, like standard Kubernetes controllers that we showed earlier do, where they're, you know, deployment to pod. We can't even keep 10 terabytes of data in memory, right? We need to be computing this on demand. And so it's a very different problem than a standard controller is operating on. And this problem isn't unique to Istio either. There's a lot of other controllers that are not just Kubernetes in, Kubernetes out that have these problems. Like in Core, one example would be Kube proxy, right? It's reading Kubernetes configuration, but its output is some IP tables rules in the kernel. So, of course, no one can write a function that takes in 45 inputs and mumbles them all together and produces 10 different outputs, right? That would be quite crazy. So we do split up into many different smaller controllers just like Kubernetes does, but we have to do it all internal to Istio in memory, right? So what we have is an architecture that looks somewhat like this. We have a bunch of smaller controllers, and each controller does a somewhat small amount of work, builds up some intermediate state, and then some other controller builds up intermediate state from those and so on and so on until finally we can produce our outputs, right? Now the challenge with this is that in the standard Kubernetes model, you have Kubernetes is the intermediate state for these, right? And it's giving us all these things for free. I say free because we don't have to go implement that in our code, but it's absolutely not free, right? The cost of reading and writing from etcd is quite large, especially at scale. And so in Istio, we simply can't do that. It's too much data, so we do that in our own application. So what this kind of looks like is we would have some controller, it's got some inputs and some outputs. We would get an event from the input, say a pod is changing. Generally the event just says like there's an update, it doesn't say what the actual information is, so we'll go read that back. We'll do a bunch of business logic to recompute some internal state, notify the dependency something's changed, and they in turn will most likely go read from our internal state. And we build this over and over again, all over the project, and we have to maintain this perfect state management. The thing that's tricky about this is that it's very imperative, right? I got an update. Now I have to go find what changed and go update it in all these different maps or indexes or whatnot. It's not like the experience you have with Kubernetes where everything's reconciled for you and it's simple, right? There's a lot of bugs that are introduced in this process. Another thing that's super important at Istio, and I think most controllers that run at scale, is event detection. So, you know, a very naive controller would say anytime any input changes, I'm going to go recompute the entire world and go persist that to wherever my outputs are. That would work for a demo, but it's not really scalable, right? We want to have whatever inputs change. We output the minimal amount of information, so we do the least amount of work. In Istio, this is absolutely critical. The output is extremely expensive from a compute and network bandwidth perspective. So anywhere we can stop an event from kind of trickling down the chain is a huge win for our users. The challenge here is, again, because we're re-implementing this internally without having Kubernetes as an intermediary, we have to do this all ourselves. So Kubernetes, to some extent, has this on with server-side apply. It works in a lot of cases, but it can still be very deceptively hard to use correctly, and you're still sending these right requests to Kubernetes every time you want to make a change. It made a side, there was nothing actually changed, so I'm doing nothing with it, but you still sent that API call. You still sent the full object to the API server, and it had to process it. So it can still be quite hard. And, of course, for Istio, so we have to implement this all ourselves. A lot of the Istio codebase is all about just these random optimizations for some specific thing. Like, one example, we have some annotation that I think is autoscaler or something's adding to services. It was adding it a lot, so it was pushing a lot of information down the entire chain. We add an optimization that says, ignore this annotation. That's really dangerous. It requires knowledge of the full system. Did I skip the annotation way up the stack, but later on I needed to read it, and now I'm serving stale data? Those types of issues are a numerous source of bugs, but the opposite is also just as bad. If we don't skip those events, Istio is unusable. Even a single optimization like that for users at large scale can be the difference between Istio using half a CPU or 100 CPUs, or just not even functioning at all. Getting this right is absolutely critical. I want to briefly go over some actual examples of what this actually looks like in practice. I won't go over every line, but just broad strokes. This is an example of one of the smaller controllers in Istio. You can see we basically just have a bunch of different maps of things. Every time we get some event of some podge change, some service change, some node, whatever, we have to go recompute all these different maps. I need to find this podge change, what were all the addresses and what index was that in and go update all these things? What's a delete? What's an add? I'm doing all this manual work. It can be even worse too because a lot of the objects are related. Maybe a service changed. I need to go find all the pods in that service and then go recompute all the pods. These are dramatically simplified examples because I can only put so much code on slides. In reality, like I said, we have 45 objects. We have lots of relationships between these objects, and it gets simply unsustainable to manage the relations of these independently. So, quick overview. Istio is one giant controller, and it's not sustainable to implement one giant controller this way without having proper building blocks to do it. If we take a step back, how can we improve this? I think some of the goals that we should be targeting is that a controller is easy to write correctly and efficiently. We want someone to go in and write the obvious thing, and it should be both bug-free and efficient. That's not the case today, I think. You can write a correct controller. You can do it in an efficient manner, but it's extremely hard to do so. Part of this is being high-level. We want the controller authors to be writing about business logic. They shouldn't be concerned about all this low-level state management. Was it a delete? Was it an update? Do I need to update this index or that index? And they should be composable so that we can have all these building blocks and just add them together to build a full system. Istio is not going to be one giant function. It's going to continue to be many different components that all build up to build Istio. Now, after writing all this stuff, I realized one simple meme could probably explain this better. Istio gives users a declarative API. You may not say it's easier or great to use, but I think it's at least better than the alternative of manually saying, I want a cluster IP. I want a DNS name. You declare your intent. I want a service. But this is all possible because under the hood, we have these imperative controllers that are doing all the dirty work for us. So if you're just a Kubernetes user and you're using controllers that are extremely well-maintained and are writing perfect controllers that are bug-free, maybe that's fine. But that's simply not the case. There's no real large-scale controller that is bug-free, I would imagine. So we need to push this further down. We need to make it so that controllers are in this declarative model as well and push the imperativeness, which is probably required, down to an even lower level so that less people need to worry about it. I saw probably half the people here raised their hands saying they were involved in writing controllers. You all shouldn't need to be writing all this imperative code. A small set of a library or something could manage that for you, right? So I've been thinking about this problem a lot over the past few years of working at Istio, and I've kind of done some prototypes and investigating kind of how can we solve this problem. And I think ultimately, it is possible, right? If we consider a simple interface like this, you know, we have a collection of resources. You can get them. You can access them, and you can watch them. It sounds quite trivial, right? It's almost identical to Informer, but we can build an ecosystem around this that makes writing controllers much more efficient, right? So what does this look like? First, we need a bunch of ways to get data into this interface, right? The obvious one is Informer, right? We already have Informers. The API is very similar. In Istio, we also have a mode where we can read from files. We can read from Kubernetes at all, or for whatever reason you want to read from files, maybe testing, right? You could read from in-memory objects. This could be useful for tests or other things. You maybe read from some external state. Maybe you want to store objects in a SQL database or S3 or something. That's totally fine. The interface is agnostic to where we're getting it from, right? In Kubernetes, that's not really the case. Anything that's not reading from Kubernetes, you don't have Informer. You can look up your own abstractions, your own implementation on how to manage these inputs. Now, the most important part, really, is what we do with those inputs, right? We don't just take the inputs and read them and call it a day. We have to actually produce some output with those. So we want to have transformations like indexes. Can we look up things efficiently by some value? We want to transform them. We talked about deployments turning into replica sets. We need to have some way to model that transformation. And they need to allow complex compositions. The service case I gave was fairly simple. We read a service in pods. We produce endpoints. In Istio, we're doing a lot more complex compositions. We're reading five, ten different resources and merging them all together into one state. We need to be able to efficiently model those. And then finally, what do we do with those once we have them? We need some outputs. Of course, writing back to Kubernetes is probably the most common case. In Istio, sending over XDS is probably common as well. Or maybe you want to write to some cloud API. Maybe the load balancer controller. We have all these different inputs, outputs and transformation steps that are kind of the building blocks for building controllers. So what this could look like in practice I think is a... Well, let me just go through actually an example. Say we want to build a config object in memory that is managing various config states of our service. In this example, I'm going to read from two config maps, kind of merge them together, and I want that to be dynamically updated. So what this looks like, first we build a collection from a config map informer. And now what we want is to turn these config maps into this config object. So we're going to build a new collection. And let's go through some of this. So first we fetch some config maps. So we have to make this a bit more complex. I'm saying we have two config maps that we're going to merge together. And then eventually we merge them, right? This code is very simple. It's not dealing with any state in Kubernetes at all, right? And what's happening here under the hood, like I said, we have Kubernetes Informer. We are building a function. We just tell the library how to build a config. We don't tell it to go fetch the data, right? Well, we do tell it to fetch, but it's a declarative intent. We're not saying I'm reading this object right now. We're saying when you want to build a config, you should do it by fetching from these config maps, right? And what that allows us to do is push all the state management into the library. The library can detect when the config maps the inputs change and knows to recompute the outputs the config object. And it knows that anyone that later on subscribed to the config updates, it knows to also notify them if it changed. So we see that a config map just added some random annotation or something we don't care about and the config output doesn't change. And at that point, it could automatically know not to notify dependencies of config that it changed. So, yeah, I just actually covered this, but, you know, we can do the event detection that I talked about automatically at the framework level so that each controller isn't needing to do this in their own controller code. So just going back to this diagram, like if a deployment we're reading from didn't change, we could short-circuit right away and, you know, all this is done out of the box in the framework for you. Just some more examples of things that we could do with a framework like this would be something like an index, right? It's pretty common that you still want to look up things efficiently, right? We want to find a pod object by its IP address. You can actually do this in Kubernetes and Formers, too, so this one's perhaps less interesting. But just to show kind of the flexibility of what we can do once we have a common interface with things, right? Even though Kubernetes and Formers have indexes, you can only have Informers on Kubernetes objects, so it's not universal, right? So here's just a simple example. I'm saying in pods fetch a key that's based on the pod IP and then later we can look up the pod by that IP. Now, most things want to write to Kubernetes eventually, right? That's the end goal is we're reading some inputs and then we're writing them back to Kubernetes. We can probably be modeled by a framework like this, right? We may want to build up a collection of some desired state. So I admitted the details of how we're doing that, but you can imagine you've defined a collection of here's how the pods that I want and then we could actually watch the pods that actually exist in the cluster and then we can simply tell the framework to reconcile them, make them the same, right? And it knows how do I delete a Kubernetes object? And it knows what it wants and what it has and can do that dip for you, right? It's very similar to using Kubernetes as an end user but as a controller writer. Now, having this higher level library also unlocks a lot of potential deeper integrations and tooling for developing controllers. So this is an actual diagram that was made by code, not by myself, auto-generated from a collection that I wrote. So you can see we have this automatic architecture diagram that's just derived from the code, can easily help give an overview of how the system works, where dependencies are. I didn't go this far, but one could even imagine a cool like animated diagram as the change flows through the system. Or perhaps more useful or realistic, we can add tracing, right? This is probably way too small to see but I'm sure you've all seen tracing spans. This shows a single event propagating through a big tree of collections that are building on top of each other. So we can see that this event impacted this other controller and this other one and in turn impacted this one, how long it's taken at each spot. You have a strong visibility into what's going on in the system and even the performance of it. I don't think that in most Kubernetes controllers today that there's anything really like this out there that gives you this amount of visibility into what a controller is doing. Additionally, testing is one of the biggest parts of writing a controller today because you have to deal with all this weird state. In Istio, most of our tests of controllers are not actually about the business logic at all. It's more about if I took this object and updated it, was it the same as if I just created it to begin with? It's all the state management. It's not actually the business logic. But because we put all of our business logic in these kind of pure functions that are just some input to some output, you can just go test that. You don't need to test all the state. We also have a bit more potential future possibilities for things once we have kind of a collection of ideas. Perhaps we could add some sort of automated fuzz testing because we have this higher level thing. When you're working with the lower level primitives like Informer, it's harder to build these nice integrations around testing, around visibility, and whatnot because it's less opinionated. So where is this now and where are we going? Everything I talked about does exist in some form. I would say it's basically demoware at this point. It's not just an idea in my head that we can do this. I've actually gone and implemented prototypes and whatnot. In fact, probably about 50% of Istio over time as I was working on this, integrating it and thinking up ideas, I've replaced with this model and it seems to work fairly well. That being said, deploying something like this to production is a far different matter than doing so in my own branch. So where are we going? It could be a new library. It could be something that we start seeing others adopt. It could just be something that quietly disappears or quietly emerges into Istio. It really depends. I heard a lot of people saying that there was interest in this, so if there's more interest, I'd be happy to discuss and bring this more broadly. If not, I think the ideas are useful, if not the library. For me personally, I think a lot of you may be looking at this to me and like, wow, this is way too much magic. And I feel the same way, which is why I haven't been pushing this in Istio for the actual merging to production yet. I think probably dialing back a little bit of the magic would go a long way in making this something that's reasonable to use in production, but still giving a lot of the benefits. So, yeah, that's it. I think we have some time for questions. I'll be around afterwards as well, but I do have to run to a panel right after this. So if you have more questions, you can find me just downstairs at Service Mesh Battle Scars. So, thank you, everyone. And yeah, come up to the sides if you have questions. There's microphones. Hello. I'm Xu Dong from University of Illinois. Great talk. And especially like this is about the testings. And I'm here to ask you because we are actually building some automatic testing tools for controllers. And especially, you know, fuzzing testing. So I actually want to hear your opinions on because fuzzing testing is also very broad topic and there are many ways you can test your controllers. So I want to know like what kind of like specific testing scenarios or testing inputs you would like to, you know, try to test, for example, E-Steel or some other controllers. You think of E-Steel, you know, super useful. Yeah, in the current state of E-Steel, I think one thing I've been thinking a lot about is actually that as well, fuzzing the controllers, right? Is like, like I said, most of the E-Steel code and complexity and bugs comes from all this manual state management. So testing things like if I add and remove something multiple times, is it the same as if I never did anything or, you know, just updating things that are things like that seem like could be very well covered. In some ways, moving to a framework like this almost takes away those use cases from the application or the controller developer. It would of course push those into the controller library where they would be useful. But I still think there's possibilities like if you have invariance in your even if it's just a function of pods to name insets or whatever, or that's backwards but, you know, there may be invariance you want to test so it still makes sense there I think. But in many ways, I think probably a lot of the bugs that you would want to test today are around all this manual state management. Great, thanks. I think if we can probably talk them all. Yeah, sounds good. Hello, thank you for the talk. So you talked about there are like 45 number of different controller talking, controller talking to the central controller. Yep. There have been a lot of works in network layer verification. Seems like that technique can be useful to catch any undesirable behavior when there are multiple controller interacting with each other for service layer networking. Do you think it is useful? If it is, do you have any concrete example? It sounds like it might be useful but I don't know enough about what you're talking about to give a useful answer I think. Like for example, these controller want to do A and there's another controller want to do B but this A and B kind of conflicting each other for some way. If we can verify this conflicting behavior with some magic model or program will it be useful? It's maybe, I'm not sure in Istio it's more like all the controllers funnel down to one output really. It's not like we have multiple things writing to the same thing and those kind of conflicts. Which is the problem space that I've spent a lot of time in. It sounds like it could be potentially useful in other areas but I haven't thought too much in those spaces. Thank you. I have a question over here about what you just said about they all go to one single source of logging. How do you track that because it can get really messy if you have a lot of microcontrollers to just separate the data? Do you have any advice on that? That's something we've struggled with a bit. For logging is that the main concern? Each of our controllers we have, I don't know about in our library or logging library we have a scope label and so each thing has a scope and we're trying to move towards more consistent structure logging that has this Kubernetes upstream so that you can have a consistent way to say x object updated and it's in a structured manner that you could query over but we're not actually super mature in that area either. That's something we're working to improve as well. Is each controller their own thing or is it a lot of controllers within one operator? We have one binary but it has a bunch of subcontrollers and stuff to just organizationally and to make sure that we don't have one giant super function. No problem. Thank you very much for this. This is really informative. I was wondering if you think this would be useful to upstream to something like the controller runtime library which I feel like has sort of a similar goal of simplifying the controller loop? It's possible. I didn't really want to go out and push this and be like, ah, you guys should take this. If people are interested it sounds like a possible path forward but I haven't really talked much with them directly yet. But definitely interested if others are. Yeah, I definitely would be interested in something like this in that library. Thank you. How do you go about tracing controllers and linked resources? Yeah, the thing I may have been slightly misleading on the tracing slide and overselling it. The thing that enables it to be done in this manner is because our controller is one binary. That's how we propagate the context. If you're actually writing to Kubernetes and then a different process is reading it I haven't seen any solution for propagating context through that. So in Istio it's all one process so we propagate it in memory so it's a bit easier. Okay, that makes sense. So speaking of interest it's entirely possible I missed a link. You mentioned having a branch where you're experimenting with this in the Istio code base. Is there somewhere that we can go to get the code and play around with this? That would have been a very good thing to add a link to here. Yes, I'll try to update on the sketch, I guess, if I can add my slides and update it, but if not I don't know you can ask me on Slack maybe and I'll post something. It's just in my fork of Istio really but it's under probably some obscure branch name. Okay, thank you. Yeah. Yeah, so I have a question on testing. Our, we found test ENV to be pretty sub par for what we wanted so we went with a more end-to-end kind cluster type of testing but as I was looking at your subcontroller with the data flow do you find that the writing tests for the subcontrollers kind of as a more unit based approach negates the need for a complete integration test or end-to-end? I think I mean the generic answer is both are useful. We definitely don't in Istio exclusively use end-to-end test because there's so much surface area to cover right and end-to-end tests of course are quite expensive so we tackle them at both layers. I think in Istio we tend to use we use the fake Informer library quite a bit. I think others tend to prefer like the cube end stuff. We never found much need or success with that and we were perfectly fine with the fake client but we're also using a lot less functionality like server-side apply or other things that aren't in the fake library but with the fake library it's been fairly seamless to write a unit test. Now the problem we've had is that those unit tests are not testing the useful thing we actually want they're not testing the business logic they're testing all this Informer issues. Thank you. Alright looks like that's everyone. Thanks everyone.