 Next up, we have Josh Wood, who started with RKT rocket containers as soon as they were a thing, which led him to a job at CoroS, where he manages documentation, and also likes taking pictures of cats with extra toes. Yeah. So, he's going to tell us about operators, so welcome, Josh Wood. Bonjour, Fosdown, how's it going? So, we all love Kubernetes, right? Obviously, it's a hot topic in this room and across the conference this year, I think also last year, probably next year, if we're all lucky. Today we're going to talk about what's really cool about Kubernetes, what makes us love it so much, and then, and what I hope is a direct response to a question that was asked in a talk one back, we're going to talk about some of the reasons where that model begins to fall down, where the things we love don't carry through to managing complex applications on top of this cluster orchestration system, and the steps we're taking at CoroS in the direction of improving and resolving that so that Kubernetes can be a tool not only for deploying, scaling, and monitoring health and availability of simple, stateless applications, but so that we can also manage deploying scale complex, stateful applications in terms of the Kubernetes API. So to that end, we're going to talk about a concept and a couple of pieces of software that we call operators in the general sense. Now, scaling stateless apps on Kubernetes is really pretty easy. If there's anything that made me really excited about Kubernetes when I first saw demonstrations of it and started playing it with it myself, it was how easy it is to take simple applications. Web servers, anything that doesn't have a lot of state to carry around with it and doesn't need a lot of storage, scale it up and down. We have this idea of a replica set. We even have a scale command, and we can just point the scale command at an object in the Kubernetes API with a name and tell it how many copies we want of that service, that pod, whatever that thing is running. Kubernetes implements a reconciliation loop at its heart at the control plane that asks, what's our desired condition? Does that condition exist in the cluster and then takes actions to make the desired state that we've requested through Kube-Cuddle or some other control mechanism the actual state in the cluster? In really simple terms, that kind of looks like this. We had one copy that was running. We wanted three in our most recent reconciliation loop. We made sure there were three. In fact, it's so easy to interact in this way with the API that we can write really nifty control systems to give us an interface for adding a few extra replicas of any given object in the Kubernetes cluster. What about apps that actually do things? What about apps that aren't just static websites or engine X demos or all the other really cool things we do when we're trying to get you excited about Kubernetes? What about Postgres databases? What about distributed data stores that require persistent storage, that require explicit steps for deployment or for upgrade in lifecycle management cycles? It's very, very easy to stick complex applications in containers and instantiate them with the classic ideas from the Kubernetes API. You can run a database, it's not hard to do. But managing a database in the Kubernetes API is considerably harder, particularly if you lose your internet connection. I actually need my slides for this talk. Here we go. For some of the reasons that finally popped up on the screen to remind me of what they are, it is a little bit more difficult to manage apps with persistent storage or with any kind of real state because what Kubernetes would like to do is come along and treat every app as a discrete entity with no dependencies on other applications, on other microservices. When we ask it to scale with that simple scale command that we showed earlier, all we get is a new raw fresh copy of whatever application we were running. We just get another pod running somewhere else in the cluster. We don't know that that pod was properly prepared with its persistent volumes where data is actually written and retrieved. We don't know if that pod matches the version specifications in the midst of rolling upgrades and application scaling. We don't know how to back up the data that is running in existing pods to add pods to this collection. These are all things that human administrators are still forced to interact with even if they are running their databases on top of Kubernetes. If only Kubernetes knew about the characteristics of complex applications. Now fortunately, there are mechanisms for extending Kubernetes designed around the idea of providing this kind of automation. In the form of operators, these are the steps we have taken to try to make this work. The idea here is that we want to be able to add to our current concept of a Kubernetes object and everything has a manifest. The goal would be, let me look at a YAML output of a manifest and see some application specific things in that manifest. In a simple illustrative case for the purpose of this talk, we've got a database cluster. We've got three machines in it. A couple of them are just read replicas and there's one master. They're all on a version of a data store on an API that's spoken between that master and its read replicas. This is a way to explain this and store it in the Kubernetes API. This is about half of what we need to address complex and stateful applications. The next thing we need is a representation of all of those special steps that an administrator may need to take to deploy or to back up or to upgrade complex applications. We call how we've encoded that operators and what they really are is custom controllers in the Kubernetes sense paired with third party resources that store the metadata that describe the special characteristics of complex applications. To illustrate operators, and that is on a slide in a simple clear sentence if you want to write it down or take a snapshot, the easiest way to probably illustrate this is with a distributed database we've created at CoreOS that actually lies at the heart and soul of Kubernetes, it's called EtsyD. It's a fairly simple distributed key value store. Kubernetes ships it and runs it in the center of the control plane binaries to maintain cluster state and do other things. Many other applications are also built on top of EtsyD and EtsyD needs a cluster of machines that can maintain some kind of quorum. It uses the RAF protocol to maintain consistency in the data being stored into this database and it can do neat things like have leader elections in the event of node failures and do a lot to maintain its own consistency. What it can't do is upgrade itself, back itself up, and Kubernetes doesn't really know anything about the data stores that EtsyD is using, the other nodes in an EtsyD database cluster. What we've done is taken the third party resources, let me back up a little bit. I apologize for my disorganization here. Let's talk about operators in the general sense just a little bit. We have these complex applications and the first one we're going to look at as an example is EtsyD. What we want to do is figure out how to manage those complex applications in terms of the Kubernetes API. Instead of needing a secondary API or instead of always devolving to that application's own management tools for things like larger DBMS systems, how can we do that? How can we encode the human operator knowledge that runs these complex systems? Well, it's by building on two basic Kubernetes concepts and extending them for our purposes. The first are resources, who, what, where in the cluster and what is our desired state. The third party resources mechanism gives us a way to extend that for non-native members of the API so that we can add our own types to the Kubernetes API. Secondly, we need custom controllers that understand the meaning of those types once we've added them to the API and how to use the information contained in those custom types on a pod versus on a job or a service of specification in its manifest to implement some kind of an observe act and reconciliation loop, sorry, observe, analyze an act to reconcile loop in the same way that controllers in the standard Kubernetes control plane do this for simple applications. So, instead of just, do I have the number of copies that I expect, operators can look into additional third party resource values for, am I running the version I expect? Am I running the data store that's been agreed among the group of nodes in this cluster? So, a little bit about third party resources. As I've mentioned, they're a way to extend the Kubernetes API with new object types. They're a little bit like a database schema. They kind of tell us what the data model is or tell an operator what the data model is for a class of application that the operator will be responsible for managing, deploying, managing life cycle for. Third party resources were designed in the Kubernetes API for extending it in exactly this way. And obviously, at this URL, you can dig into a little bit more of the specification. Now, you may know in the, in the, in the praises or abstract from my talk, I promised I would teach you how to deploy the SED operator. Here you go. So, ideally, these do summarize things by taking advantage of the, of the Kubernetes API that we can make some of these things very, very simple. Now, we're going to dig into this a little bit more, but this is actually how you would deploy an SED cluster on top of Kubernetes. And then once you have this, you can automate upgrades. This operator knows how to do a lot of neat things with that SED cluster that you would formerly have been doing by hand. How does that happen? Well, we've added third party resource values that let the custom controller we have written that does this management know the things it needs to know about the state of an SED cluster running on Kubernetes. For purposes of illustration, we can talk about how many nodes are desired in that cluster, and what version of the SED software should, should each of those nodes be running. Now, let's take a look at actually walking through an, an SED cluster that's run, or that's running on top of Kubernetes. As you can see is the custom controller that we call an operator goes through this reconciliation loop. The first thing it does is look at what's going on in the cluster. So we have this SED cluster with two nodes in it that we know from our manifest is supposed to have three. We've got a first node that's actually running a little bit older version of the software than that specified in the manifest. We have a second version that's running the version we've specified. So that's okay, but we're missing a third node that we ought to have to have quorum in this cluster. So the controller based on the information in the third party resources is able to analyze the situation that is operating in the cluster right now and compare it to just like you would for a normal service, but with this extended data, compare it to the desired state and then take actions to bring that desired state about in the SED cluster running on top of Kubernetes. So what are we going to do in those, in that scenario and what's special about it? Well, if this were just a standard app, all we would really be able to do is say, well, there's supposed to be three of them. There's only two, start another pod running. But in an SED cluster, that would cause more trouble than it would help you out of your situation. You would have inconsistency between the two versions. The new node you've just added would not know where the persistent volumes where SED is actually storing data are located. And there would have been no recovery step taken to find out what happened to this missing member, restore its data, and bring it back online before trying to perform this upgrade. So all of those things are exactly the type of operations that are currently human administrator workloads that are encoded into the SED operator for Kubernetes. The SED operator knows how to recover our missing member. It knows how to back up the cluster before attempting the upgrade to the requested version that's in our manifest. And it knows how to perform a rolling upgrade to our now three running cluster nodes and bring them all to version 3.10 of SED. The SED operator is open source. You can dig into the code at GitHub here at this URL. And I'll review these URLs as we get toward the end of the talk in kind of a blanket slide for you to pull out. So what other neat things can we do in the basis of the groundwork of operators? What are our future plans and what are we going to continue to do with SED? Because we've already talked about it being a key part of Kubernetes itself. So one of the things we're going to do, given this mechanism for managing and easily upgrading and having an automated lifecycle system for the SED key value store, is moving it outside of the binaries that are linked and running in the control plane and actually having a self-hosted SED cluster for Kubernetes to use that runs independently of Kubernetes, but like a Kubernetes application. And through these operators is able to be scheduled, scaled, and automatically healed in the same way as simple applications, simple stateless applications are today on Kubernetes. Other goals with that are making high availability setups really, really easy for this kind of software. We can already offer you what cannot fairly be called high availability, because it doesn't take the idea of network segmentation into account. But if you just go run the YAML file that I showed you a little bit earlier, you will have a three node SED cluster with at least subnet failover. And that kind of shows you the direction that you can go there. Towards getting easy HHA setups. We're working on some automated backups to different popular cloud object stores. Right now, automated backups are using Kubernetes persistent volumes, which is the immediate API mechanism for doing this. But we also want to connect that to S3 and GCP storage and make that very easy as a choice in the manifest file. One of the other things the team is working on as we move towards a 1.0 version of the SED operator is chaos monkey testing to really stress the edges of the management of this operator and make sure that it can actually perform the functions of a human administrator in repairing and recovering from edge cases of failure scenarios that can happen in SED clusters. So that's a look at the SED operator as one glance at how to introduce the concept of operators on top of Kubernetes. My goal here really is to get you to write operators based on this open source code to manage your complex applications. And they kind of give you a way forward there that is a little bit harder to do if you're trying to, there's an API object that exists in 1.5 and up called stateful sets, formerly known as PET sets and several other names throughout the course of the project. They're similar and lie along the same lines as this work. However, they cannot because of their generality express the specific conditions of individual applications in the way that operators are designed to do. So they can provide part of the groundwork but they can't encode the specific administrator knowledge that's encoded into the custom controllers that we call operators. So what's another operator? Well, we've written a set of open source operators in exercising and exploring this idea at Coros. We want to look today at two of them that are really important for Kubernetes because they both kind of have a native space in Kubernetes. And they're both essential for building modern applications at scale. We often say monitoring is the heart of production. Prometheus is a well-known and I think fairly well liked monitoring system, certainly we hope so. We've constructed an operator for Prometheus, which is itself a fairly complex app with stateful storage of machine state, application state, memory thresholds, all the kinds of statistics that it gathers. So much like at CD, it needs a certain arrangement for deployment. It needs to know where data is stored. If you want consistent long-term collection of statistics in Prometheus, you need to be able to maintain persistent volumes between upgrades and rollouts of different versions of it. And more than that, you want to be able to configure targets for monitoring very, very easily. And these are all items that we're trying to move into logic for the Prometheus Kubernetes operator. So Prometheus is a monitoring system and this nifty animated slide I stole from our CTO Brandon Phillips. So nobody tell him cuz he flew out this morning. And I know that he'll never see this video. But anyway, this kind of shows sort of really more about what Prometheus does than the operator specifics of it. But what we have here is we're deploying Prometheus. One of the things Prometheus has is this idea of a host information collector, so that we can drill down to the node level. Which will cycle around here in just a second. So we can drill down to the node level and do monitoring either in terms of machines and in terms of services. And so what this is actually showing is a little web service. And we're doing some, you know, Rackle Hay is a kind of HTTP stress tester. Basically we're just hitting this web service really hard and showing off a few counters in Prometheus as we put this Kubernetes service under load. It seemed to run a little faster. My testing. So that's an idea about Prometheus. The idea behind the operator is making the deployment of that kind of monitoring as simple as possible. And again, if you examine the, this is again open source code and much like with the Etsy operator, there's a YAML manifest file. You can do a kubectl create-f directly from the GitHub repos and actually deploy Prometheus on your Kubernetes cluster. So what are next steps in terms of operators? Well, we've worked on these two very closely because these two projects are so key to Kubernetes and the ecosystem surrounding it and also key to our deployments of Kubernetes. But what we would like to see is that this becomes kind of a framework for constructing administration controllers or as we like to call them operators. That is custom controllers for the various different kinds of complex applications with state that currently you're deploying by hand and managing at upgrade and failure cycles by hand because the default Kubernetes API and the regular Kubernetes control plane don't have enough information about those applications to manage and scale them in a complete way. So as promised, here's a bucket list of URLs about this stuff that you can check out. I am running intentionally a little bit short because what I'm hoping is that we can get into some questions and maybe I can draw out a little bit of what you'd really like to know about this stuff rather than trying to take a rapid guess at it. Also these slides will be available for folks who are trying to take snapshots of the URLs, so don't worry too much about that. A few notes before I ask you if you have any questions. We would love if you'd come and join us in San Francisco for our own conference, CoreOS Fest at the end of May. We will have some other detailed talks about open source projects that we produce that fit into the platform and the number of great speakers from our partners in the ecosystem. I was just trying to get Josh Berkus to promise me that he would come and talk during CoreOS Fest and I appreciate your time. Thank you very much. Does anybody have any questions? How about you cuz I'll be able to hear you? So one system administrator, a certain operator, and implement it. Can you say that certain system administration actions become off limits, strongly discouraged because it will break whatever is perceived in the operator and then you will end up in consistent environment? Actually, yeah, that's a good question. One of the things I think it might even, I may have run right past it on the slide, but. Sorry, okay, the question really was, if I switch to using operators, will that cause me to be limited in actions I can then take as an administrator? Will I need to be hands off of that software? Because I could change expectations. The operator now expects to be managing it and will administrators trying to come in after the fact and administrate it, break those expectations. The answer really, at least for the two that I've described here today, and at CD is the one I'm most familiar with, so I'll talk about it, is yes. Once you've deployed at CD with the operator, the expectation is, like number one, we're not at 1.0 yet, so I can't honor this guarantee yet. But 1.0 will honor this contract. The operator will be backwards compatible so that it won't break your old stuff. But it will expect to control upgrades, failover redeployment, recovery, backups, all of those things will then be expected to happen in terms of the operator instead of in terms of SED's tooling, or your own approaches, muddling config files or anything like that behind the scenes. So currently, at least with these two as implemented, the idea is very much for the admin to hand over that control to the operator. Now, you could write an operator that gave you different guarantees or expected administrative interference or did checks for different kinds of expected administrative interference. But at least with the SED and the Prometheus operators and a few of the closed source ones that we have that I know as they exist today, they're all pretty much like a branch. We're now on an operator branch and we really don't want to do manual human administration, sort of the point, right? Good answer? Wait, let's bounce around a little. How about here? Is there ever going to be a Kubernetes cluster federation operator? Please repeat the question. The question is will there be an operator for Kubernetes federation? So probably first a little bit of background. Kubernetes federation is what you might say how you deal with having too much success, like your one cluster and one region is now no longer enough to provide availability even though it has 100,000 nodes in it. So you want to start having clusters in more than one region, but which have hierarchical relationships to one another. That's what federation is all about. Certainly it's part of the idea behind some of this work, right? Like that's a direction we would like to go. Is there code that I would show you today that I know all about inside and out? No, but as federation matures, yes, I expect. And here's why, we already deploy Kubernetes in terms of itself. This is not what my talk is about, but one of the things that operators underlie is this larger concept that we call self-driving infrastructure. We think the whole stack ought to do this. The operating system ought to update itself when there are security updates, right? CoreOS Container Linux does that. That's kind of its thing. We think the cluster orchestration system ought to manage itself in terms of itself or be self-hosted if that makes any sense. And by that I mean when you upgrade Kubernetes, we think that you should issue a kubectl rolling upgrade command to upgrade the Kubernetes control playing components. It should be self-hosting. When you deploy Kubernetes federated clusters, we think eventually, in the long term, how does this really work? What's it look like when it's mature? Those federated clusters should also be deployed in terms of the Kubernetes API and managed in terms of the Kubernetes API verbs, right? I mean, that would be the ideal world. The Kubernetes API sort of becomes what POSIX has been for 30 years as here is a basic set of services and a basic architecture that applications can expect to be in place and then extend on top of to provide the services that they want to provide. So the idea is that we want to make the Cates API the way to manage any application that's going to scale across computer ads or clusters. Fair enough answer? It's not a promise, but yeah, probably. And if we don't, you should write it, you could write it before we do. Okay, so there's more questions. You've had your hand up for a while. I was thinking, just to pick back and forth question, I think on top of Kubernetes, I'm new on not yet to the undergroup. Currently, absolutely, yes. Right now, this would be like, right now, what you'd use the operator for, the NCD app. First of all, let me repeat your question, which I keep forgetting to. So let's talk about the fun part. So the operator is a centralized thing, right? So let's say you have the NCD cluster on top of Kubernetes, which you have for some reason, and you have, say, five machines in there running NCD or pods, if your operator is on one of the machines, and that's part of the network split, and the other four that load that cluster before, how does this even work? Okay, well, first of all, let me repeat his questions. What was the first one? Yeah, I like it, it's much easier, I'm going to answer that one. No, he asked two good questions. One is, is the operator about managing a separate NCD cluster, or is it about managing the NCD that ships with every Kubernetes cluster in the world? The answer to that, right now, it's about managing a separate NCD cluster on top of Kubernetes. In the future plans part, this is part of where we're headed, we want to get the Kubernetes NCD up out of the heart of that thing too, and manage it much more like a regular application, and have it not be, if you will. So then, the second part of your question is, if the controller and your NCD cluster all live in the same subnet, what happens when you, and I should say operator, what happens when the operator gets split from the rest of the NCD cluster? The operator is not in, what's the best way to put this? The operator is not essential to the functioning of the NCD database. The operator is a Kubernetes controller that runs a reconciliation loop. So what would break in the situation you described, if we did no other engineering to try to rectify it? What would break is our ability for the operator to know the condition of the NCD cluster to do all the things we talked about it doing. What wouldn't break is the actual operation of the NCD cluster. So I mean, I don't want to pretend that there's a tie there that- I mean, it makes sense in the case of a cluster, so it'll probably happen at the same time. That's the same, that's the main situation as it goes. My colleague was just saying, the situation we describe is indeed a problem. But it is no different than the problem with the standard Kubernetes scheduler and the control plane in a non-federated single subnet Kubernetes cluster sitting in one network space somewhere. I like the idea that a system that actually does have a true overview of everything, whereas the operator can have this by definition because it exists in the system. True, and a fair point. I mean, at least with the fair point with our choice of metaphor, if nothing else, I mean, I would give you that. Thank you for the question. How about you? Does a single operator use to say you want multiple main-based clusters? Oh, yeah. Actually, you can deploy multiple instances of an SED cluster with one operator by giving them different names, much like any other Kubernetes object. I'm so sorry that I cannot seem to remember to repeat folks' questions. He asked if operators, if you could run more than one operator so that you could run more than one copy of your complex application. And the answer is absolutely yes, but more than likely, that operator would be constructed as the SED operator is to run many instances of whatever application is designed to manage, just distinguished by name, like any other Kubernetes object or service or pod. How about you? Is there anything kind of special you'll need to do to have a really slow reaction to say you want a main-based cluster? OK, so the question is, is there something special you have to do to handle really slow reconciliations? Like, in the event of, I presume, connectivity issues are probably going to be the prime cause of that right of it. So in a way, it relates to the question we were just asking about, sort of, net splits and stuff. We're still in the back-ups, but if that's what you want to take... Oh, oh, oh, OK. Actually, yeah, you just clarified your question. Thank you. OK, so the question is, can you encode specific knowledge in operators to deal with particularly long-running recovery, deployment, scaling, restoration, or backup tasks? The answer is absolutely yes. I mean, the whole point of these custom controllers is to be able to encode knowledge of exactly that kind. So forgive me for engaging in, you know, tech-talk design, but you could have a third-party resource that knew something about expected timings, perhaps. You could, you know, I don't know, I can't... At the top of my head, I'm not coming up with a dynamic way to do this so that you really knew, like, on a callback, you know, this job is done, but, I mean, that would probably be the direction I would look to go if I were actually trying to solve this problem. But, yeah, the thing is, is you would still use these two basic mechanisms. The idea of a custom controller, some little bit of code that one way or another knows to wait, and a representation of some kind in a TPR, a third-party resource of what are we waiting on, or how long are we waiting on it, or how do I know that it's done, or when do I give up, or, you know, whatever it is. To answer it, like, for Postgres, it would be a little bit over-specific, but in general, for long-running operations, that's exactly the kind of stuff we expect to see encoded into people's custom operators for their own complex applications. I wish I could think of an example in, like, the etcd operator, something that kind of matches that scenario, because it is actually a really good question. I think I've seemed like I've really shorted the left side of the room, so all the way in the back with the sort of den of me looking shirt. How much are we working on this team of working on Helm charts? Okay, the question is, how much are we working with the folks working on Helm and Helm charts? And the answer is, we use a little bit of that stuff internally, and we like those folks an awful lot, and we are all part of the wider Kubernetes community, and we interact at the CNCF, which is the overall copyright holder for the Kubernetes code. We are somewhat in pursuit of not contradictory, but divergent goals. You know, like, we're working on packaging for production, probably a little bit more than development. I don't want to get ahead of myself and try to write policy for CoroSX Cathedral, but I mean, certainly, I've spoken out at Deus's offices in Boulder a couple of times, so they like us well enough to have us over. Is that? How about, unfortunately not you, but right behind you, and then maybe partially just to make sure I even heard you this time. The question here is, let's say I deploy a database on Kubernetes by using an operator. How do I then interact with it for standard database operations, just querying it? And exactly as you would with that database as normal tools, like with the PGSQL or with the MySQL client. Oh, no, no, no. No, actually one of the things that operators would do, and this is something I probably did not do a very good job of bringing out in my talk, trying to do everything in terms of kind of standard Kubernetes things where possible. So an operator, if my imaginary operator for a Postgres SQL database would create a service and a load balancer pointing to the Postgres API so that clients could access the database and that's how you would get access to the database. Just as any other service within Kubernetes by mapping it to a Kubernetes service. Actually, if you add one comment if you repeat. I work on a tool called Petroni and we're looking at implementing operators on top of Petroni, which would handle things like telling the operator to scale. So if you wanna actually repeat that too. Cool, but does that have to do with his question? Yeah, because his question was, do you need something running inside the container? And the answer is yes. Oh, oh, yeah. Yeah, I don't think I ever heard your question very well because I'm trying to answer a question about how do I use the database once I deploy it and that's not your question. I think Josh has a much better answer. You actually wanna give me your mic. There we go. There you go, let's go. Yeah, so the answer is yes, you do need something running inside the container because at least normal databases are not designed to be self-managing. And so, for example, I work on a project called Petroni which supplies that thing inside the container and we've been actually working on implementing the operator pattern for that because it's exactly what we want, in terms of you wanna scale, you want to do backup, you wanna repeat other things, yeah. Yeah, and something with an API too, which is important. And I'm super sorry I'm deaf and didn't hear your question at first and I tried to answer a totally different one. I think, oh, we have about five minutes so you've been very patient and then I'll, so. I really can't hear you. The question is, do we see the operator's concept being upstreamed? That is maybe a question above my pay grade, to some extent. I would say overall, if you look at the history of our work on Kubernetes, almost everything we can figure out how to abstract in a way that we can get it upstream, we try to do that with. If you look at our work on RBAC and like, we did a whole bunch of work on RBAC basically that we could have incorporated into our open source project called DEX, but instead we did kind of the support parts of it upstream so that DEX could be as abstract and minimal apart from it and just work with it, be loosely coupled in the best kind of sense. So I would expect we would try, but with projects of this age and this amount of dynamism, right, and there's so many people involved, I think there's a lot of room for exploring solutions with code. So we have this idea of an operator's pattern. There's the stateful sets, pet sets thing that's in Kubernetes. I would expect, and this is very purely me, Josh Wood, not me, CoroS or anything official, but I would expect those efforts to learn from one another, to kind of merge towards something similar and towards like kind of a unified view of the universe over time. So that's a really long way of saying yes. I mean, we would like to get the key parts of the framework of operators be something that Kubernetes makes it easy to write, so that you can treat it like an API for writing management applications for these kind of complex apps. The question is how difficult is it to implement an operator? And the answer is I am singularly ill qualified to tell you, I'm responsible for documentation. The thing is, is if you look at the open source code for the two demo kind of operator, well I shouldn't call them demo operators, they're very much production operators for us. They're headed towards production versions, but they're like good examples for this talk. If you look at the code, neither one of those two items are very large code bases. And I think especially in the NCD operator, you see it's not as abstracted as we want it to eventually be where there's sort of like an operator framework and you build operators on top of that, like that maybe is a vision of the future. It's all one big piece right now, but you can kind of see clearly which are the support pieces that are generic and deal with the API and which are the NCD specific pieces. And if I were going to try to write an operator tomorrow, I would probably begin with that code and kind of mentally scrape away all the NCD specific stuff. And I would be looking at what basically is the generic basis of like what operators do when they talk to the API, which really as much functionality as they potentially enable, their interaction with the API is fairly simple. It revolves around third party resources and registration with name services of that custom controller that is itself called an operator, like basically just getting it running in the control plane. So I would expect that to be a fairly narrow bit of code. So is it easy to write operators? Well, is it easy to encode everything, an experienced DBA knows about running Postgres? No, it's probably not. But my suggestion would be is that the design is much tougher than the coding as in so much software. It's much harder to find out all the things that person knows than it is to write them down and go once you know them. Does that make any sense? I maybe can have one more question if anybody has it. Oh, all right. Was that within your operator? Well, I mean, ideally you have some recourse to just the idea of a reconciliation loop in the first place. If you have some kind of temporary failure on update, like you can't retrieve a container from some registry, you know, kind of simple things, all that's gonna happen is you're gonna go around the loop again, you're gonna find that you didn't recover your node three when you're, you know, and there's still only two running and you're just gonna try the recovery operation again. So that's not, that really isn't even special code within the operator. That's really just kind of that standard idea of some sort of UDA or reconciliation loop. Obviously for more complex failures, I think this relates to the question about once I start using an operator, should my admins all be hands off and never touch it in the future? And to the question about would you encode within an operator knowledge of exceptionally long-lived operations that might be expected during recovery or backup or deployment scenarios. So for complex things, yes, they would probably need to be represented, that knowledge would need to be represented within the operator or within TPRs and some, you know, combination of those two things. Fundamentally, if this keeps failing, I just wanna know about it. Yeah, obviously, and I mean, so like I would probably, if I were architecting the deployment, I would probably seek to maintain, what am I trying to say here? Monitoring is always going to be a piece on top of this. And I sort of refuse to be one of these people who's pitching like serverless, admin-less. There will probably be somebody there monitoring even a fully automated, all operator-driven Kubernetes stack for the failures that aren't yet encoded in your operator, right? So I mean, I think that would be good best practices. And there's some more detail on that kind of thing in the, the SED repo on GitHub. There is actually a best practices document for the SED operator that'll give you some idea of directions there. And again, I would say that with the caveat that the SED operator itself is not 1.0. So things will change, but that gives you some idea of what we kind of think that picture looks like for an application that we ourselves run in production all the time at SED. So because when I answer questions about Postgres, I am speculating at the very best. I don't run Postgres every day in production. I don't have an idea, but. Okay, well, thank you. Thank you.