 All right, welcome to my session on building a managed database service using Kubernetes operators Before getting into the weeds. I want to start off explaining who I am and why I should listen to me So my name is Jimmy Zolinski. I'm the founder of a company called off said at off said we build a database called spice DB And what spice DB does is it stores your authorization data? So when you build a permission system for your applications Eventually, you hit a couple different types of problems Maybe you want to make dynamic changes to the system without changing code in various places or Maybe you have multiple applications that need to share the same data in order to determine an access decision in your code And that's that's the type of scenario in which you would reach out for an authorization specific database So my background is in product engineering and operations. So I've worked as a product manager in the past I've worked as a software engineer building Distribute system projects and then I've also worked in operations running those distributed systems in production So despite my background being in product, I still write code every day and carry a pager over the services that I build Prior to founding on said I worked at a company called CoroS which got acquired by Red Hat and at CoroS I actually co-created a CNCF project called the operator framework Alongside some other members now members of the outside team and what the operator framework lets folks do is basically more easily build Operators that they can customize Kubernetes and extend it in ways that make sense for running their their domains So as a part of CoroS I was also a maintainer of OCI which is the container specification And I've done a bunch of work in the container registry space over the years. So That's a bit about my background All right to level set the for the talk Before anything, I always like to kind of level playing field make sure that everyone is using Understands the same terminology or what we're talking about before diving right in So To be able to even discuss this topic. We have to kind of cover two major Subjects right the first is what's a managed database service and the second is what are Kubernetes operators? so managed a base service is going to be pretty much You outsourcing the operational side of a database to a particular provider So instead of you spinning up a database and managing on top of your own hardware or even cloud hardware This is going to be someone else doing that for you I'm purely giving you the details you need for your application to connect to that database and Then you're basically out of the out of the way for that So you don't have to maintain a pager or anything like that To make sure that the database is operational and able to serve traffic there's kind of two different types of Providers that you can outsource to there are cloud providers that obviously have the expertise in running software on top of cloud environments so examples of that would be Amazon RDS and Google Cloud Platforms Cloud SQL These providers offer the typical relational databases But they also have individual services for more specialized databases The other type of expert that you can outsource this to are the actual database providers themselves So folks like cockroach labs selling cockroach DB Dedicated and my own company also said selling space to be dedicated But there are plenty of other database providers in the space that also do similar Elastic search and Redis also come to mind as kind of examples of these database provider Experts that offer these types of services All right, so then what are Kubernetes operators? So operators are custom controllers For Kubernetes that encode application specific logic. So that basically means extending the Kubernetes API and teaching it about effectively new concepts that are specific to your domain the point to all of this is to basically effectively improve how Kubernetes is able to handle running the application but Even more greater concept is actually encoding your domain into Kubernetes so that The Kubernetes control plane actually becomes the central interface for everything it becomes the source of truth and you can always use The standard tools like your your dashboard or kube control to query that and understand what is running in production All right, so Without further ado I'm going to talk about my anecdotal experience building space to be dedicated The reason why I'm going to use this is not only familiarity But also because we've actually built the service semi recently There are a lot of other Managed database services that are probably built on top of kubernetes, but because of the recency of this I think it's probably more applicable to someone looking to build a similar service Today if they're trying to do that for building their own product or just building a platform engineering team internally at their business So the rest of the talk is going to be basically me describing the system we've built kind of like this the decision-making process we went through and Kind of the way we've kind of divided things up and how we think about The different problems that we had to solve So at a 10,000 foot view we can kind of break down this problem into three major phases Provisioning runtime and then the day-to-operations The provisioning side is going to be kind of how we create the customer environments so that means everything related to creating and updating clusters and what lives on those clusters and The finer details to that is actually how you decide to split up and differentiate between what is What defines the cluster and what defines the configuration that lives on top of the cluster? This is actually pretty subtle when you're trying to Understand what things should need to be updated with the lifecycle of kubernetes itself versus things that can be iterated on with changes to the application So that that's one of the subtle aspects another big one is how you're going to promote changes to these different customer environments How are you going to roll out kubernetes updates or any changes to the aforementioned cluster? Configuration and how are you going to do that in a way that? can be progressive and So that your customers if whether they have maintenance windows or they're very sensitive to updates You can get the updates at the regular cadence that they're expecting And we move on to the runtime phase the runtime phase is about basically What things have to be running? Live when customers are using these systems This is where the managed database as a service kind of differentiates from a lot of other workloads that you might be running on kubernetes Because the customers are actually going to be modifying the the cluster itself in real time and That basically means that we need to be able to not only manage our own configuration But also be able to respond to end users Deciding they want to take operations like scaling their database cluster up or down The other unique kind of problem in this space is going to be handling the availability and performance requirements of running a database Databases are typically very performance and latency sensitive workloads, and they're also stateful workloads so all these things kind of complicate the actual production runtime of of the system and Basically being able to run a service in a way where when different events like Scaling up or scaling down happens or you lose a node in kubernetes To be able to handle that in such a way that you don't lose any performance or drop any requests is very tricky and Something that every database of the service is going to need to manage Because you can't necessarily make Changes in the application code that is talking to your database instead. You need to make the the actual runtime As robust as possible because you don't have any control of the application code Connecting to the databases you're managing Finally, we have the day two operations which are basically the the actual operations that our SREs are going to be managing So this has to do with basically handling backups Specifically, as I said customers can modify these environments So that means we not only have to be able to reproduce our clusters But also be able to reproduce the state that the customers changed And then we also have to power our own operational workload So that means we need to be able to Aggregate metrics across customers who need to understand the health and state of the customers page our engineers and things like that when something is going wrong on a customer environment So I'm going to dive deeper into provisioning now I'm going to list out some of the technologies and some of the core concepts that we've chosen to go with I would say a lot of these different technologies are personal choice. I'm not saying you should choose one over the other But I'm going to include why we ended up with the ones that we have But these reasons are kind of organization specific and if you have a company, for example, that has a ton of Terraform expertise Go ahead use Terraform. I think that's going to be a better choice for you if if that's what you have the expertise in at your company, but for us, for example We picked Pulumi We're very comfortable writing go code and we actually wanted to build ultimately one binary that is kind of our infra Program that can manage all kinds of different things. So Pulumi just gets embedded into that process So we actually have commands for provisioning things But we have other commands for for accessing different systems that would be a part of our operations teams basically everyday work So that's why we ended up picking Pulumi For actually reconciling configuration in a cluster we use Argo CD Flux is another example of a CNCF project that also kind of does this continuous deployment We ultimately aligned it on Argo specifically because it has a nice web UI for checking the health of all the environments but also it has kind of Nice functionality for actually applying the changes like dry runs and pruning and you can actually write Lua to kind of extend Argo in some scenarios where We specifically for when you're creating operators you're going to create custom kind of definitions of healthiness in the status fields and Argo can be extended with Lua to actually understand those to know Whether a custom resource that you've actually created for your operator is healthy or not So that's super useful functionality there For the actual configuration we use on the cluster itself. We use customize We previously used Q a lot But we ultimately migrated to customize because it was really easy to structure Integrates directly with kubectl. So our engineers have to install any additional tooling It's way easier to onboard engineers because if you understand kubernetes You probably a kubernetes YAML at least manifests. You are going to understand using Customize to some degree And it lets us actually really reuse a lot of tools off the shelf Because you can kind of point to any manifest in a git repository and use that as a reference and kind of extend that Use them customize so as we adopt more and more of the kind of standard community tools We can kind of just point customize to those tools and get them vended Almost for free or with very little modification and yeah, if you're using Q you kind of have to do all the legwork of kind of importing and Transpiling basically YAML into Q and you're kind of on your own for a lot of a lot of the tooling and structure But I imagine some of that will change over time So it's not necessarily cut dry if you're watching this video six months from now Maybe the the state of the world for Q has improved dramatically Finally, we also use github actions and we use github actions Mostly because we can automate a bunch of the github APIs for opening and merging pull requests And that ties very much so into the concepts. I want to talk about The high-level concepts that we have are largely around kind of our promotion process, which we call the ring model the ring model is specifically about Basically bucketing customers into groups of stability so that we can slowly roll out changes one phase at a time to that bucket of customers So for example What we actually do is we have a staging instance and the staging instance gets every change push to it as part of a continuous deployment and then when things look good we Promote that to what we call ring zero and ring zero is other testing environments Whether it's doing performance testing or just staging environments at all said Then what what happens is once that kind of passes the Q8 there then we actually promote to ring one Which would be our rapid kind of released phase So customers that have adopted into getting updates sooner, but potentially less stable releases And then kind of so on and so forth. We promote to ring two which is more stable and then ring three which is more stable, etc So that's kind of how we structure how we roll things out This was inspired by the internal model used at github. We have some ex-hubbers Auth said and that that actually inspired us greatly to to solve this problem this way So we know it scales because being used by big companies like Microsoft so And finally we have kind of get ops, but get ops by bots is kind of how I want to talk about it because while get ups is great Making changes in some of these repositories can be very verbose and error prone It can take a really long time So what we actually do is we have automations all around it so you can manually Kind of click from a drop down to say I want to promote this ring to this ring and Then bots handle the rest so you get kind of like the the benefits of having everything checked into git And if you had to manually override anything you could But also a lot of the error prone Side of copy and pasting specific versions into specific places all automated away So in the general case you pretty much don't have to open your editor to to make the changes That you want to see propagated to the system Um, so here is a drawing of our customized configuration We kind of split it into three topical folders We have the bases the features and the overlays if you're familiar with customized overlays are typically used for the Kind of end results. That's going to be a renderable thing that you can actually apply to a cluster So we have a dev one or actually variations of dev ones And then we have kind of customer specific ones the customer specific ones we keep In a separate repository the info repository that tracks all the customer environments And the dev one lives in our mono repo alongside the configuration itself But then overlays are composed of at least one or more base and then a set of features So, um, examples of features are like postgres database or, um, ecr for getting your your images on this cloud provider or gcr if you're using google cloud So we actually break everything down into these different, uh, features that you can then compose together, um And uh, to actually build a working system And then the bases are kind of like the base layout for a cluster That installs the things that we want to assume are always going to be there So in the the actual like regular cluster base, we have Basically the monitoring stack that we went to use to deploy to absolutely every cluster to make sure we kind of have a Baseline of understanding the health of every cluster That's not specific to any workload that we deploy to it This gets used both on a an info cluster that we run centralized for our infra like operations team But also then on all the customers as well But then we also have this dev base and this dev base is basically filling the gap between, um, something like Docker desktop kubernetes or kind and making that Exactly similar to what we get when we run polumi to generate a cluster On a cloud provider for an actual production environment. So that kind of fills the gaps there so that, um The clusters look exactly the same. They have the same starting base Then we apply the base and then we apply whatever features are specific to that that environment So here is kind of the architecture of the didox pipeline In our mono repo, as I said, we have a configuration that lives in there that makes it so Developers can iterate on configuration and also the code for the different projects and kind of spin that stack up locally and running on their machine And test everything out and then when that looks good that gets committed to a mono repo And then what happens is we have this other info repo which tracks customer environments and customer environments are actually Organized into rings and then those rings reference a specific commit shot Of the mono repo So that you can actually point it to a particular snapshot of the configuration of the mono repo at a point in time So that's how we kind of get Basically all the version tracking and the ability for us to promote different versions of the configuration to different customer environments Inside of that infra repo. We also have The binary that manages palumi and that's what's going to provision the individual clusters We have configuration files for each Each customer environment in there as well So that's kind of the the central central source of what is represented in production Every cluster is also deployed into its own Cloud provider account. So if you're running on amazon, each customer runs in an ad of us account That's individual to that that particular customer. That's just the level of isolation we've chosen for the system But that's not necessarily a hard requirement for for every managed database as a service We're just a security product. So we take kind of isolation a bit more seriously than a lot of other people So then finally we have our centralized infra Kubernetes cluster. This is what runs argo. It runs Thanos So that we can actually collect metrics and and query and understand the runtime of our customer environments But what argo is going to be doing is it's going to be pulling the infra repo and asserting that Each of the customer environments is synchronized to the proper state that the customer environment is configured for So this makes sure that if there's Anyone that logs into a machine and they are debugging something if they skew the configuration It's going to be restored eventually by argo. That way Even if a machine gets gets compromised We kind of have something that's going to reset the cluster And basically make sure that nothing is nothing is the way it shouldn't be So that's our high level of kind of the GitOps workflow that we have Time to move on to the runtime environment In the runtime we have built two custom operators. So this is going to be the kind of The with Kubernetes operators portion of the talk, which is kind of the meat and potatoes So we have decided to split Basically our system into two different operators The first operator is open source and it is basically all of the configuration And operational know-how to automate running spice dv the database itself We make those open source because we want our customers Or any open source users to also be able to operationalize and run spice dv just as good as we can So this includes scaling spice dv making sure that it doesn't drop traffic making sure spice dv knows how to Basically self cluster It handles Running migrations that the data changes across versions It makes sure that it has an update graph and make sure that you go from a supported version to a supported version and Basically assures you that you have zero downtime as you go through the upgrade process So this kind of logic all lives inside the spice dv operator And then what we have is is the z operator, which is our proprietary operator. And this includes automations that largely Are reliant on assumptions about how we've laid out our clusters So if the functionality Is tightly coupled to opinions and decisions for how to run a kubernetes cluster Then we keep it in the proprietary one. So Purely because it's not applicable to anyone else's deployment. It's only applicable to ours So that's kind of the decision-making framework for us on where we cut cut things off on the open source and proprietary But at the end of the day the users they get this next js front end that we've built And that's the customer facing interface, but it's actually an interface to kubernetes So what we're actually doing is making it so when a user logs into the the dashboard for Spice to be dedicated They're actually seeing a view of kubernetes And the resources that live on the cluster and when they say for example choose to create a new space to be cluster they're actually talking to a javascript Application that is going to talk to the kubernetes api and create custom resources And that is how the core of everything is functioning. It's all using kubernetes as the source of truth And then of course we kind of have all the additional tooling that compose our Kind of opinions for how to run kubernetes. So that's using things like contour insert manager And like the prometheus operator things like these So at the core the concepts of our runtime include Basically centralizing everything into the kubernetes control plane. You want to use that as your source of truth It makes it a convenient api For for managing all these things for us our data like the actual control plane That's being used for customers to make changes is one in the same with the control plane that Our operations team is managing so that gives us a convenient way to To basically interact with the system. We don't have to build some kind of admin interface into the dashboards that we can kind of Get our operations team access to the customer control plane. No, it's just one in the same control plane for us So that's where a lot of the benefits come from But also the the power of the operators being that the customer driven changes Actually live also in the cluster So this is what's enabling the fact that a customer can log in start making changes to the infrastructure And those can apply immediately because all those automations are not a living operator That has to get paged and go to the cluster and make a change to it instead It is a kubernetes operator that's running in the cluster that can manipulate the desired state of the deployment and just run with it So with that, we have basically the namespace layout of one of our clusters Some of these namespaces get applied to absolutely every cluster and some of these namespaces are exclusive to a particular cluster. So The awesome monitoring namespace for example gets deployed to absolutely all clusters That we run this includes kind of like the all the infrastructure we need for paging alerting Doing metrics tracing applications that are running on the on the cluster And this goes all the way to non kind of space db customer clusters This also runs for example in our infra cluster so that we can make sure that the infra cluster is running and healthy Even though the infra cluster is only running our internal tooling and not spiced db workloads So this is fully generic and can be reused across the company But then gets specialized by kind of the resources that get created in other namespaces So then we have the Aucid system and Aucid region So the difference between these two are the system is what I would say is the customer facing control plane so multiple In region in customer environments where they actually are running in multiple regions. So say you have a Europe and a north america kubernetes cluster deployment. So you have two individual clusters What ends up happening is you pick one as your control plane and that's where the offset operator runs That's where the dashboard runs Anything that's kind of driving the information on the dashboard Is going to live there and what ends up happening is when you choose to provision something there The offset operator actually understands the configuration for the other regions that make up the customer environment and it will create resources In the appropriate cluster So Aucid region is kind of the thing that standardizes a cluster to be able to run space db So primarily it has the space to be operator in it And that's going to sit there and watch For the request to create clusters or make changes to clusters That the offset operator is then going to create as a reaction to a customer making a change in a dashboard And it's going to create those clusters inside of the tenant namespace So the tenant namespace is where all the kind of runtime customer data is This is where the systems they provision live It's the one that the operations team is mostly going to be inspecting Because these are these places where the customers are actually live making changes This is what we typically focus on for backing up Data like customer specific configuration the things that they have actually changed on the system Every other kind of like smaller namespace in here are the kind of cluster dependencies So we use the Prometheus operator and kubestate metrics Just to like make sure that we're kind of got the standard standard operational kind of deployment for collecting metrics and observability from the cluster I kind of mentioned earlier that we use cert manager and contour as our ingress and pki infrastructure And then we actually create two deployments of contour In the internal and external namespaces these namespaces are for internal and external traffic So because customer environments are often in VPCs like virtual networks That traffic goes through a specific load balancer And then internet-facing traffic goes through the external load balancer So that's how we kind of differentiate those and do peering to internal networks at our customers companies And then finally we have Valero, which is going to do backups and then All the kube system e namespaces that you get from the different cloud providers Cool so kind of transitioning now to The final topic the final phase the day two operations These technologies are kind of the standard ones and The reason why you pick the standard ones is kind of like the high-level concept. I want to Mention which is that the observability data isn't just for you Because you are building a system that is kind of customer-facing infrastructure Some of this data you're going to pass on to your customers. They want to know what the Latencies are of the database. They want to know how much cpu they're using They want to know how much capacity they're using if they're going to have to scale up If you're going to have to scale up is that going to affect their bill? So it's not pure purely your decision on what kind of technologies you're going to choose for for these stacks because They're going to integrate potentially with Customer systems They might want to ingest logs or traces or metrics from their database as it's running into their own systems So that they can also page their engineers if something is going wrong inside of the managed database service So for that we're using all the standard kind of Prometheus ecosystem For observability, so that's kind of the Prometheus operator kubes date metrics thanos Grafana The works there And then we use jigger for traces, but generically just open telemetry And then as I described before backups Need to be not only data. So we're using kind of the box standard cloud provider Datastore backups so the things that come with The data store themselves, but we're also building apis so that our Customers can basically export data out of live systems or stream that data to a replica that they have themselves Maybe on a completely different premise on prem or backup environment So we're we're kind of tackling this on both fronts But the unique thing is actually not the backup of the data But the fact that you have to also back up the configuration because if you restore the cluster And replay all plume and your configuration changes, that's not going to include any of the changes The customers have made to the control plane themselves So that's where valero comes in and we're actually Continuously backing up the changes that customers are making to the clusters So that if we have to restore a cluster, we can restore absolutely everything And the kind of nice thing is it's all kind of decoupled in different ways So we can restore just the customer data if we needed to restore it to a older version of maybe the cluster An older version of the configuration all the namespaces that run in the cluster because everything is broken out of these three different Categories we can actually mix and match versions to produce stable versions or unstable versions Of the environments for our users So with that Like to conclude you can find me on social media on these three places on Twitter blue sky and then you can always email me If you're interested in any of the projects I talked about We have a link to spice to be dedicated The open source space to be operators available for Exploring and kind of like learning how we went about automating the actual operational side of our database That's actually built on another library that we have open source called controller idioms And what controller idioms does is it wraps up high level behavior that you're going to need to implement for idiomatic Custom controllers and kubernetes operators into a library that you can just reuse So examples of this are kind of custom informers Setting statuses according to other properties That of like other resources you're managing Things like being able to pause your Your operator so that the operator stops reconciling so that a human can come in and debug Kind of like these higher level patterns that you would always need to implement that are not like the core logic of The operator we've kind of abstracted in a way that you can import And then also if you're more interested in spice to be itself you can always join the space to be discord or Look at our github organization We have plenty of other open source projects all around the cloud native ecosystem things Regarding basically all parts of the stack operators grpc The database itself Clients for the database things like that. So thanks for your time