 OK. Thank you, everyone, and welcome to our talk on API server-only clusters for fun and hopefully for some profit. So, do you introduce us? My name's Matt Turner. I'm a software engineer at Tetrate, and we have my esteemed, inestimable colleague, head of platform, former Istio maintainer, software engineer, very embarrassed person, Liam White. So, yeah, we both work at Tetrate, which is a company that uses service mesh to do identity-based application micro-segmentation at enterprise scale, so for high-insurance environments, dealing with the organisational and technical scaling problems that you'll have, trying to do hybrid cloud, multi-region, and zero trust at large scale. So, we know, I guess, a fair amount about Istio and Kubernetes, and we've hopefully got an interesting talk for you today based on those tech. So, let's remind ourselves first how a Kubernetes cluster hangs together. Apologies to the Kubernetes experts in the room. I see some literal upstream contributors in the audience, but I need to do this to get us all speaking the same language, and this is maybe a mental model that not everybody has. So, this is, I guess this is a diagram of a classic Kubernetes cluster, exploded with all the components of the control plane for folks who've not seen them before. So, let's just run through an example, right? So, if I submit a deployment, I write some YAML because it's Kubernetes, that's what we do, and I submit that YAML file into the cluster via the API. So, first of all, we hit the API server, right? Just the thing that serves the API, and our request is authenticated. Are you a user of this cluster? Can you prove it? Then it's validated in some other ways. Some authorization, sorry, happens. I know who you are now. Do you have permission to create new deployments? And then, eventually, this deployment is serialized and written to the database. And at this point, there's kind of a pause. There's no more synchronous actions that take place. This resource is at rest in the database. It's stored persistently. It's actually encoded to some protobuffs, and it sort of sits there. Now, at this point, the deployment controller, which is a piece of code, this has a watch on that type of resource in the database, and it wakes up and it processes it. Now, I'm using controller in this case as a kind of abstract term. I mean like a particular loop of code that runs and processes this kind of resource. Physically, this loop of code is in the controller manager component, alongside a lot of other loops of code. Liam's going to talk a little bit more about what we actually mean by controller later. So, like all controllers, the deployment controller reconciles its input. So, it takes its input, which is this deployment resource, and it reconciles it with the state of the world. So, in this case, what it's going to do is it's going to make a new replica set resource and add that back into the API server. Now, that goes in the front door, just like your cubectl command or anything else would do. So, this goes through the API server, through the same set of verifications and authentication and authorization that anything else does. It's the same path, so once it's traversed the API server, if it's okay, then it'll go and again sit in the database. So, we have another controller. The replica set controller, physically in the controller manager, but a separate loop of code, that wakes up, reconciles the replica set to a pod, which is again fed back into the system. Now, this pod, so that's node name the field that I've shown there. So, this pod has its node name field set to nil, because the replica set controller knows that it wants a pod running, but it doesn't care where. That's not actually its job. So, it just says, hey, I've got no opinion on that, go do it. So, into the scheduler. Now, the scheduler is another controller. It actually physically runs separately for reasons, and it has a watch on pods, specifically pods, so the pod gets sat in the database with its nil node name field. The scheduler wakes up and processes it, and it's got a watch on pods, specifically with a nil node name field. It does some logic to work out where to place them, what that node name should be, and its reconciliation is to put the pod back in the system with that node chosen, with that node name field filled out. So, this isn't a new resource in this case. This is a patch, an update to one, but it's still reconciliation. So, when this is back in the database, it's at rest again, yet another controller kicks in, and this one is the cubelet. So, the cubelet wakes up and picks this thing up, and, as I said, the cubelet is, if you look at it, that way a controller too. It reconciles pods, scheduled pods, you know, with a non-nil node name field, into containers running on the cubelet's worker node. So, in this case, the cubelet on this node has a watch for pods with the node name field set to node one, picks it up and reconciles it into actual running containers. So, just to get that pod running, we saw four controllers reconciling things one after another, actually feeding into each other. So, the endpoint controller that reconciles services into endpoint resources, and kube proxy can be thought of as a controller that reconciles those endpoint resources into IP tables rules. So, that's another two controllers just to enable the basic deploy a workload and make it accessible functionality. So, this is how I personally like to reason about Kubernetes. If this is new to you, you take nothing else away from this talk. That's hopefully a useful model. It really helps when things get stuck and when you're trying to debug the system. Apart from that database and the API server that's kind of brokering access to it, like being a front end to it, I would say that everything is a controller. So, there's one controller per type of resource, and they reconcile those resources into other types of things. Maybe other resources, or maybe external things. Maybe containers on nodes, maybe IP chains. This is the beauty of Kubernetes. This is what makes it asynchronous and declarative. There's actually nothing to stop you interacting with any of these resources at any level, by the way. You can deploy like a pre-schedule pod with a node name field set. You can deploy your own endpoints, and it'll just work. It's one of the things the CKA actually tests you on. But you normally use those high levels of abstraction. But controllers aren't very powerful here, but they certainly weren't invented by Kubernetes. They're not just a Kubernetes thing. So, Liam's going to talk in more general terms about controllers as we look to stretch this pattern a little. He's probably speaking to Mike without help. So, some of you may be aware that Kubernetes, let's go and borrow the concept of controllers, are more specifically control loops from industrial control systems. So, unless you live in a cabin in the woods, you come into contact with several control loops every day. So, there's one controlling the temperature in this room, most likely, and one is actually keeping me alive. So, this is an insulin pump. For those unfamiliar with the endocrine system in your body, which is the system that this manages, our bodies have to maintain a certain range of glucose in the bloodstream. If that goes too low, the brain doesn't have enough energy or fuel to actually do anything, and you can probably guess what happens if that happens. And if it goes too high, it basically damages your internal organs. And that's kind of what type 2 diabetes is. That's why you get some of those side effects. So, the side effects are things like blindness, weakened immune system, and eventually it leads to the same outcome as going too low. So, I have type 1 diabetes, which basically means my immune system attacks my pancreas, and so my body is unable to produce insulin to lower my blood sugar. So, instead I have the insulin pump, which is a control loop. So, it has a pumping mechanism, which is the thing actually working on this device, the kind of core process that it does. It doses the insulin. It is the system element in this diagram. It has the controller running on the pump, which is the algorithm basically that identifies when my blood sugar is out of range, and then adjusts my insulin dose based on that. And then finally, if I can get access to it, I have a sensor on the back of my arm here, and that communicates with the pump and accesses the sensor in this control loop. And actually this pattern of sensor controller and system is so common that the different parts of this particular control loop are actually regulated based on this model. So, the FDA, which is the pharmaceutical regulations body in the US, explicitly gets medical companies to approve each of these boxes individually. So, in my loop, the sensor is developed by a company called Dexcom, and the pump is developed, and the algorithm actually is developed by a company called Tandem. But when they go through FDA regulation, they have to go through it twice for each of those different things. And we can actually go even further and say that this pattern occurs in nature, right? Because all of this external stuff I have keeping me alive, everyone in this room has it internally, right? Thanks to millions of years of evolution, they've eventually ended up with a sort of control loop that manages this kind of automatically for you. And all of this is kind of to make the point that these patterns are well known. They're widely used and they're not unique to Kubernetes. And given that they occur naturally in nature via revolution, you could maybe even argue that that actually is actually a universal pattern. It's not just like an engineering thing. So, if we go back to our deployment example, in our case, the sensor is the API server. You can kind of argue semantics there. The controller is obviously the deployment controller. There's a reason they're named controllers. And the system is kind of the combination of the replica set controller, the Kublit, the pod controller, the nodes, basically everything that's responsible for making sure that that container ends up actually getting run. And the nice thing about Kubernetes is that early on the community realized that if they can make this pattern extensible, if they can make it so that anyone can use it, then basically anyone can write code that will leverage this framework. And that's kind of the origin story for CRDs. And one example of a project that leverages this extensibility in particular is Istio. So, when the idea that would eventually become Istio was being formed, there was some discussion about whether or not it should be in the core Kubernetes project, whether it should be a separate project. But because Kubernetes had already done the work to externalize this control loop framework, it was much easier for it to act as a standalone project. Istio is built using the same controller pattern as my insulin pump, but Istio actually isn't a closed loop like my insulin pump is. So, in a closed loop, feedback is, for lack of a better term, fed back as an input into the system. Whereas in Istio, input is solely from the API server. So, it watches the API server for endpoints and services, and obviously all the Istio CRs, so virtual service, destination rule, that type of stuff. And then it configures a fleet of envoys based on that information. So, it's still that same control loop. It's just open, not closed. And the reason that it's open is because those envoys never feedback into the state of the system. So, they don't impact endpoints, they don't impact services. They respond on those addresses, but they don't actually... There's no full-circle, snaking-his-tail type thing. But Istio is just one example of a controller. There are smaller, more scoped custom controllers, like CERT Manager and external DNS. Or there are larger ones like Crossplane, which allows you to provision all of your cloud infrastructure using that same control loop framework. Which brings us back to Matt, who is going to discuss how far we can push this patent in Coop. Thanks. So, this... Only one of these is on. So, this was the original idea. Actually, I was using Crossplane. You know, I'm a big fan. And I was just using it to spin up a cloud cluster. Quite a simple use case. But for folks who haven't used it, unlike Terraform or Pallumi, you actually need a cluster to run Crossplane in. So, I had a bootstrap problem. I had a cluster zero problem. So, like all good engineers, I wrote a bash script. Right? And this thing spun up kind, and it installed Crossplane into my kind cluster. And then it deployed the Crossplane CRs, Custom Resources, to specify that cloud cluster that I actually wanted and to bring it up. And it waited for the cloud cluster to come up. And then it tore down kind with Crossplane inside it. That's quite heavy. It takes quite a long time. It uses quite a lot of RAM. You know, cloud clusters can take a while to come up. Mentioning no provider names. So, obviously, I wanted to play Stardew Valley, but it just kept, like, paging steam out to disk because it was using all the RAM. It was very annoying. And I thought this probably isn't the best way to do this. And I was looking at, you know, the kind I was having to use to do this and thinking I actually don't need all of these components. Do I? Can I just sort of go build myself an API server and then, you know, run dot slash API server locally and then dot slash control plane and connect to that? Do I really need any of these other components? This didn't, admittedly, that experiment kind of stopped there. This doesn't really work for Crossplane. You've got some fairly big issues, like Crossplane doesn't do everything in the, sort of, in the Crossplane pod that you initially deploy. It actually spawns worker pods to do those actual reconciliations. It'll spawn one to talk to AWS, one to talk to GCP. So, in order to deploy those pods, you basically kind of do need a whole Kubernetes system that's sort of what it's for. But Tetrate, you know, we do a lot of work with Istio and Envoy. So, I was sort of talking to my co-workers about this idea and we wondered if we could apply it to the Istio control plane instead. Because, importantly, Istio doesn't spawn its own child workloads it manages Envoy's. So, that idea kind of goes like this. Now we've got our mental model. So, in this diagram I've got another pod running down here. The only difference is this one is an operator, right? You can see it talking to the API server. So, this is a controller too. That's what the operator pattern is. And what am I saying? Oh, yeah. So, what are the controllers in the control plane that we discussed before in the controller manager and in the scheduler? They handle built-in resources, built-in types like deployment and replica set. These are like first-party operators, if you like. So, this thing is just a third-party operator that handles third-party resources. Anybody old enough to remember the TPR, third-party resource type? These things are now defined by CRDs, right? Which is just a replacement for that. So, it's not magic. It just makes calls to the API server, which is always accessible to all pods and is called Kubernetes. Client Go is actually going to do most of the work and you just write the reconciliation business logic. Now, this one reads from the API server. It does actually feedback into it again. So, like Liam says, this is sort of a closed loop controller. It makes more resources in the same system. So, this is maybe cert manager or something. But this one, for example, is Istio. This is open loop. So, this reads from the API server. It processes those Istio CRs. But its output, the world that it affects, is the sidecar proxies. Those could even be proxies that are on VMs through the Istio's mesh expansion feature. So, they could be completely outside of the cluster. So, if this is what I'm doing, reading from submitting CRs through the API server into the database and then reading through the API server out of the database into this operated pod and affecting envoys, well, do I need a controller manager? Because I'm not reconciling resources in the database. I've not got that closed loop. I'm not reconciling resources back into the API server because I'm not ultimately trying to make containers on a node. So, I don't really need the controller manager because I don't need the deployments into replica sets into pods. If I've got no pods, I don't need to schedule them. I don't need that scheduler because I'm not going to have any pod resources. I don't want to assign any to nodes. So, if I've not got pod resources that I'm assigning to nodes, do I actually need the cubelit because that's the sort of container controller that reconciles pods into containers? So, maybe I don't need that either. And then, if I don't have any running pods, I'm not going to need cube proxy either because that reconciles into, you know, IP tables rules to do the in cluster network vips. Technically, if you want it to be technical, it only does the sort of low balancing vips, the sort of flat overlay network for the whole cluster is made by CNI components which are a lower level they're not shown. But they, you know, they also are probably not needed here either. So, if we look at this picture, it now looks pretty simple, but we do still have a bunch of features here. This API server is very feature rich. We've still got a production grade HTTP server, right? We've still got the authentication layer. We've still got the authorization. This is all in the API server, by the way. We've still got built-in admission control with that you can sort of program to guarantee certain features of the resources you apply. We've got the hookable admission control so you're mutating and validating the API server's versioning support where it can mangle versions, upgrade and downgrade versions based on rules you give it. We've got its defaulting support so you can apply partial resources and you can do three-way merges, server-side applies. We've got a serialization layer and then a database driver for the persistent storage in XD. So we've got a lot of stuff. If you look at these pictures, if you think about the model, we've still got systems. So is this sufficient to do what I was talking about? So the first big question is, well, how do we run this operator, right? Because without Qubelet or Qproxy, this blue box is just a computer. It's not a node with a capital N so I can't submit a pod resource and have this thing run. So I need to find some way to run it. I could use a Qubelet with static manifests. But without Qproxy, it can't just get to the API server by asking for Kubernetes. It's not going to be that convenient. We don't really have a cluster here. So are you better off just using system D or Docker or whatever and doing your own load balancing and IPs? As I say, this is a theoretical minimum. This isn't easy, but it's an interesting thing to reason about. There are a few other things that won't work if you do want to mutate or validate your webhook, which is something that is theorised on quite a lot. You would need certs for that so that this pod, API server needs to be able to call it in a secure fashion. It needs certs. You'd normally use cert manager for that. Where are you going to run cert manager? How's cert manager going to do its thing? And the other big issue is service accounts. So while a service account is a resource, it's just an API server SCD thing. The tokens for them are reconciled. They're made by a loop in the controller manager and they're made available to the operators because they're mounted into pods. We don't have any pods. This isn't really a pod. So none of this is going to happen. But it's not so easy. But what you could do maybe for the token thing is you could turn off security. You don't need a service account token if you don't have RBAC or ABAC if you've disabled all of the authorization layers in the API server. That might sound crazy and if your API server is exposed to the internet, it is. But for a small and embedded system with maybe no network access at all, maybe it's not that crazy. And that, as I say, was the original idea that prompted this. It was bundling an operator, something like a crossplane or an SDOD with just the API server and the database almost acting as libraries in a very small self-contained unit. Maybe I run them all in one system D slice or something. But just to absolutely repair down the resource uses and the complexity of what I need to get my bootstrap. So, as I say, theoretical minimum, hopefully an interesting thing for folks to go and think about. I think it is possible to build it. I didn't. I'd be interested to get tweets out of it. But Liam is not going to talk about a slightly less extreme version that he actually did build that does make a lot of sense and offers us a kind of serverless Istio. Okay. So, like Matt said, one place we can use this minimal API server control loop is what's known as remote Istio control plane. So this architecture diagram is just from the Istio eye docs. But basically it's in the remote architecture model. Istio can manage the networking of multiple clusters and VMs either on premise, in the cloud, doesn't matter from a central location. Now, to run the control plane like this requires an API server to get the desired states and something to run the Istio controller process itself. Now, thanks to cloud providers, getting an API server nowadays is a relatively painless experience although some are less painless than others. In our prototype we use GKE but any coonetys works. I mostly use that because the GKE cluster has been up pretty quickly and the feedback loop is much better. But next we need to run IstioD. We don't have to deal with provisioning nodes, so let's go with serverless workloads and then all we have to do is ask for POTS. On GCP, this is autopilot on AWS, this is Fargate on Azure, that virtual nodes. It uses the virtual coonet project under the covers I believe though. Whichever cloud provider you use here, the key is that we've gone up a level in the abstraction and we've reduced our maintenance burden because we don't actually have to manage nodes in this scenario, we have to manage container image vulnerability, sure, but you don't have to upgrade nodes, you don't have to worry about that side of things. By the way, there's actually nothing that requires us to run IstioD inside of coonetys in this scenario. We could run it on a VM somewhere but we don't read remotely from our API server using that same service account token thing. Because remember, our API is just operating as a database and control loop framework, it just happens to have a side gig as a container orchestrator. So next we need to run IstioD but no, sorry, I'm coming back to the same thing. The next problem we have to solve is how do our workloads reach IstioD to receive their networking configuration? We need to run a service pod because it's a gateway, not a side car, we don't need to do the IP tables that enable the service pod to connect to IstioD. So it doesn't matter if it's on a different VM somewhere else or another coop cluster, all that matters is if you have that L3 reachability, right, and then IstioD is responsible for managing the security of that once you have that reachability. Now we don't need to do the IP tables redirect, so we can do this on serverless still, we don't have to worry about notes. And what we've effectively built in this scenario is a very rudimentary Istio is a service offering, right? We, yes, we have to make sure Istio is deployed, it's scaled correctly, you know, upgrades are handled correctly and high availability, but we've reduced our maintenance burden to only include things at the abstraction level we actually care about in this scenario, right? This was definitely a win, as anyone who's had to upgrade notes recently will attest to. So this remote architecture in Istio is actually being used by several users in production, so I highly recommend you go to this link on the left. Airbnb basically did a talk talking about the remote IstioD architecture. They use their architecture to bridge VM and coop. They use a fully-clired environment, but again, it's semantics where it's run and doesn't really matter to Istio. Although I don't think they're using the serverless cube compute stuff. But in the talk, they discussed how they migrated from their existing ZooKeeper-based service discovery system to an Istio as a service that their platform team runs in their own central cluster. And on the right there, you'll find the codebase of the very hacky proof of concept I've put together that demonstrates this, this serverless Istio as a service in GKE. We built it using the codes there. It's built using Pulumi, so it's TypeScript, but anyone who's familiar with Terraform will see what we're doing there. So this pattern also applies to something like cross-playing. It doesn't have to be Istio. In Matz's example, a platform team can basically build an entire platform as a service, using this same pattern. It doesn't... There's nothing within cross-playing that I know of in the architecture that needs it to actually have a pod on a physical node. You could run this serverless as well. And GitOps leverages the same model with tools like Argo and Flux, right? It's that same control loop pattern, and both of these tools, again, can be used to build platform as a service things, but scope to deploying and upgrading workloads in Kubernetes instead of something like cross-playing that does your entire cloud infrastructure. I'll hand you back over to Matt. Thanks. So, yeah, we recapped how the control-playing works, or maybe we learned it for the first time. I think we then introduced the idea of using the minimum possible parts of it, and as I say, maybe a bit of a thought experiment at the moment. I personally not built it, but I think it's possible. It just didn't fit my use case at the time, but Liam showed something slightly less extreme that did come off the back of that that we have built that does work for us. It, you know, when you're getting that managed cloud provider control-playing, you will have a scheduler and a control manager sitting there. Their CPUs are basically idle, because we're not calling on them, but they do exist. But like Liam says, we don't have a maintenance customer at the level of abstraction that we want. So, hopefully this talk, we've taught you something and I'm going to inspire you to take away that pattern, maybe even build the totally extreme one if you do, let us know. So, yeah, thank you all for coming. We'll be happy to take them.