 I would like to start. Don't have much time. This is just a 30-minute talk, so hopefully they can record it on time as well. Hello, everyone. My name is Sachin. I work for a company called Platform 9, and I'm hoping to tell you our story about sort of DIY implementation of MySQL as a service. So what is this about? And before we dive into too much of details, do all of you know or have experience with Kubernetes operators? Somewhat? Okay. I'll just start from the basics, then. So I work as an engineering leader at Platform 9, and I've been looking into this cloud-native space for a while now, particularly interested in how do you run cloud-native applications on Kubernetes platform, and how do you run them in production? So those are the kind of problems I'm currently interested in solving, and this is just one of the sort of side projects that we did in this effort. Just a mandatory slide about my company. So Platform 9 is a hybrid cloud company. We offer cloud management for customers who are interested in running workloads on-prem as well as in public clouds. It's a single management control plane, which provides SLA guarantees for your cloud environment, and we have around 300 cloud regions that we manage for our customers, and we have around half a million virtual CPUs that are currently managed by Platform 9. And we offer cloud services, which are both virtual machine-based as well as container-based. For containers, we use Kubernetes as the platform. So here's the agenda, and a few things I would like to go over. So I would like to first describe what problems we faced in order to actually do this DIY MySQL solution, and what kind of features that we were looking for in a MySQL offering that we wanted to create in-house, and why we chose Kubernetes in particular, because Kubernetes is not really meant for, or at least the perception is that it's not really meant for running stateful applications like MySQL. And then I would like to introduce you to the architecture, and we dive a little bit deeper into it so that you can understand what are the different things involved, how do we handle high availability, et cetera. We have been running this for a while. We have been running this for about a year and a half now for our own consumption. So we learned quite a bit on that journey, so I would like to share some stories and various pitfalls that you can watch out for if you're thinking of doing something similar. And some of the future things that we look forward to add to this particular project. So let's just dive right into it. So we are about a five and a half year old company, and when we started, we just used a public cloud service for everything. Pretty much everything, including our business software, use our databases for long-term storage like Object Store and so on. But as we grow, and our customers base grow, the demands on the public cloud increased and correspondingly, our spend on public cloud entries manifold. And that becomes a bottleneck in any kind of company, and this is especially true for startups because it becomes like one of your top spends that you want to avoid. So we made this strategic decision about a three-year go to move completely on-prem and use everything that we offer to our customers to run our own software stack. And this experiment, this effort was quite successful when it came to compute and storage and networking components that we were using. But for higher-level services like the database, there was quite some period where we couldn't really figure out a good solution which could just replace a managed SQL service and just run it on-prem without having much expertise in MySQL or without knowing much internals of SQL itself. And that's how this project came into being. So we started to look at what are the possible on-prem solutions we could use for running MySQL, and we stumbled on Kubernetes. But before that, what did we look out for? So when we used managed MySQL, we were using purely MySQL for... So all our software stack assumes MySQL primitives, so we didn't want the solution to break our existing software, not introduce new bugs into it. So we wanted a drop-in replacement for MySQL. We wanted this solution to have cell service and automation capabilities because we were using already a managed SQL solution, everything, all our deployment was automated. And it assumed API, and it assumed an API-centric way for configuring, scaling databases, making changes to them, taking them down, and so on. Availability was important because ultimately we wanted to run our production deployments on it. So we were looking for a solution where a single node failure doesn't impact the availability of our databases. And we also wanted something which provided an inbuilt disaster recovery, so backups built in. Some of the things we were also desiring were the portability aspect because from our experience, we learned that once you start using public clouds, it is easier to move your compute workloads. If you use higher-level services, it's very difficult to get out once you have a production environment already running and an automation which is already set up and a bunch of customers depending on it. So the solution we wanted to design, we wanted to be portable so that we could just switch platforms without impacting our entire software stack. We were looking for an open source alternative, and just whatever we learned from this experiment, we wanted to contribute back to the community. We were looking for a cloud-native solution because our software stack, like many other companies, is evolving from a VM-based world to a cloud-native microservices world. So more and more of our services run on Kubernetes, and we wanted something which integrates really well with those services. And last but not least, we wanted full-fledged monitoring and alerting with this because once you run it in production, you better have that. So with that, we looked at a bunch of alternatives. So first thing we looked out for was actually VMs, and we tried to write automation and just bring up a SQL cluster using the virtual machine-based model. But that didn't pan out well, mainly because it's very easy to provision MySQL using some sort of automation, like Ansible. But once that thing runs in production, if it fails, it is very difficult to handle failures in an automated manner. And that's why we liked Kubernetes so much, because Kubernetes has this main advantage that once you run anything on Kubernetes, it makes sure that that thing keeps running. And when it hardbeats and checks that failures don't happen, if they happen, it tries to correct it. We also liked Kubernetes because of the agility offered by the container model. Like arguably, running MySQL in a container is a very new or challenging idea. But it worked in the end. And running these databases using Kubernetes reduces our deployment time dramatically, because containers can come up in a few seconds versus a virtual machine-based workload which can take tens or more minutes. Kubernetes also offered us the declarative API. So every database object is essentially a JSON manifest. And all the configuration, all the resources allocated, everything is available as one JSON document that we can check in into our GitHub repo and track changes. So that's very attractive. I already talked about reconciliation a little bit. And running MySQL on Kubernetes used to be a novelty, but it's not that anymore, especially this year, since Kubernetes has adopted stateful sets as the first-class citizen, and most of the database companies like Oracle, MariaDB, Percona, they support running their database on Kubernetes platform. They have container images, which are official container images that you can use to run on Kubernetes. And we also like the portability aspect of Kubernetes because Kubernetes can run on many public clouds today as well as private cloud environments. And essentially, when you run software on Kubernetes, you are dealing with higher-level constructs that Kubernetes offers instead of worrying about underlying machine and underlying networking and security groups. So we can take the same software and run it on VMware, OpenStack, AWS, any kind of number of public clouds out there. But there is flip side to it. There is obviously an argument of not using Kubernetes for not using Kubernetes to run MySQL. So I think in our experience, one of the major things was the learning curve of Kubernetes because Kubernetes, once you go from this server-based, virtual machine-based world, physical world with something that like Kubernetes, with its own constructs like containers, pods, its own service IPs, config maps, and so on, there are a lot of things to learn. And they don't map one-to-one from a server model to this new cloud-native model. So you really need a team who is sold on Kubernetes and really want to adopt Kubernetes to run all sorts of workloads. And it is definitely not as easy as just consuming a MySQL service because you have to deal with failures. You have to understand why failures happen. So there are a few things that you can automate, but there are definitely corner cases you need to watch out for and keep alerting on it so that you can correct those while you run these databases in production. Another aspect is that when you run something in production, especially the stateful services, one needs to be really aware of high availability aspect of Kubernetes because if you don't plan your infrastructure properly and failures cascade or failures affect things that are outside your fault domain, then there is always a danger that everything just falls apart. And I will give you an example of how we ran into this issue. Another interesting aspect of Kubernetes is that Kubernetes is an orchestrator which runs cloud-native workloads really well, but it sort of confines itself with running just the cloud-native compute workloads. And for things like storage, things like secrets, when you want to expose this outside to the outside world, you have to use other things which are outside Kubernetes. And depending on what alternatives you choose, some of them might be pretty mature and work very well, but there are many others who don't work well with Kubernetes. They have bugs and you need to be prepared for dealing with all sorts of these issues when running this on Kubernetes. Another interesting aspect of Kubernetes is that it's famously, it's a cattle model, right? It doesn't care, it just spins up a bunch of pods. The pods can die over time or they can move around between nodes as the Kubernetes scheduler realigns itself. And this becomes problematic sometimes when you run something like a stateful application like MySQL and Kubernetes, because if a MySQL replica dies, there is a cost to bring up the new replica. You have to start from scratch or resume from some point to make sure that it's up-to-date. So this is definitely not for faint of heart. So our solution to, but overall, I would like to say that comparing the overall cost benefit of this approach, we as a team decided that Kubernetes had far more advantages that we would like to use and we really look forward to. So we decided to run MySQL and Kubernetes. So the solution we came up with was basically an application controller. It's a service which extends Kubernetes API and adds new type of objects to it. So in this case, we have a service which when you run it on Kubernetes, it defines an object of type MySQL and adds it to the Kubernetes API. So using Kubernetes CLI now just like other objects in Kubernetes, like pods and services, you can now see all the MySQL instances running on Kubernetes. You can understand what configuration they have, what is their state, are they healthy or not. So that's what it does. And what happens is in addition to introducing this new resource, it also keeps on watching for all the resources that it has defined or added to Kubernetes. Which means that if a MySQL application goes into bad state, this application controller software can detect that condition and take corrective action. So you can think of it as a database admin which is programmed into an application service. So it knows how to take backups, it knows how to recover when database fails, it knows how to scale out a database in case of MySQL actually how to scale up a database when the performance is not adequate. So it offers highly available self-healing clusters. It offers highly available reads. So you can scale reads using replicas. It follows a single master architecture. So the writes cannot be scaled, they can only be scaled using vertical scaling. It keeps track of how data is being replicated in the cluster, ensures that the replicas are not too far behind the master because if they are, then you have a problem. It uses Kubernetes resource controls to limit the resources that are given to a MySQL instance and it also offers automated backups and restore to an object storage like Amazon S3 or Google's object storage. So this is a high-level architecture of the solution. So everything runs on Kubernetes and the components are divided into three different parts. So there is a control plane which manages MySQL clusters running on Kubernetes. The data plane is actually the clusters themselves, the MySQL services. And there's a monitoring component which is done using Prometheus. So on the control plane slide side, I already talked about the MySQL operator or the application controller, which is basically an automated database admin for MySQL. In addition, we use a service called Orchestrator and it is written by engineers at GitHub. And what the service does is it basically manages the cluster state of a MySQL cluster. So it understands which are the replicas, which is the master, it can handle failures and so on. On the data plane front, we have the MySQL services themselves, which means the instance of MySQL running as a Kubernetes pod, the volumes attached to it where it's writing data, the Kubernetes services which abstract the IPs and so on using the inbuilt Kubernetes DNS and so on. And I already talked about monitoring, which is done with Prometheus. If we go one level deeper into this architecture, this is how the MySQL cluster running on Kubernetes looks like. So essentially, you have a master node and a bunch of replicas and each one of these runs as a Kubernetes pod. And this whole thing runs as a Kubernetes stateful set, which is Kubernetes object for running persistent applications which have state. And it just means that each one of these nodes has a persistent volume attached to it and each one of them can write data to it. So the database are those yellow things and in this setup, you basically have four replicas of the same data. Now in order to access this database, so everything, actually can we hold off the questions to the end? So when you want to access the database, you have two choices. When you want to write to a database, you go to the master and when you want to read from the database, you go to any node in the cluster. And in order to do that, this operator creates a service, two types of services. The master service can be used for writes and service is just a DNS implementation inside Kubernetes. So think of it like a load balancer sitting in front of all these instances. And for reads, your application can go to the healthy node service. And you can scale out reads on account of adding more and more replicas so that the node service can load balance between them. Now if we go one step deeper into this, what are these individual things here? So each individual thing, like the master and replica, is basically a Kubernetes part. And Kubernetes part, as you may know, it's just a bunch of containers running together and they can all talk to each other as if they were running on the same machine. So think of it like a bunch of Docker containers running on your single machines so that they can talk to each other using local host. So there is an innate container which initializes the database. It configures passwords. It configures replication and so on. There's the Percona MySQL container which runs the actual MySQL database. There are a couple of PT containers which manage the cluster states. So they ensure that all the replicas are healthy and they commute these hard bits to the orchestrator piece that I talked about earlier. The Prometheus exporter exports all the performance data and that data can be monitored using an external Prometheus instance and alerts can be generated. And there is a Sidecar container. The Sidecar is a pattern in Kubernetes where you can do things which your application is not designed to do. So in this case, the Sidecar container is in charge of doing backups for MySQL. Now I would like to talk a little bit about the orchestrator and how it handles failures because one of the key design considerations for us was we didn't want node failures in our data center to alert our database admins or ops so that they have to manually tune in, fix the issue, bring the database back up again. And that's where orchestrator helps. So orchestrator is a service which has direct access to all the database clusters running on Kubernetes. It is a single service per cluster. And what that does is for each MySQL cluster, it ensures that all the nodes in the cluster are healthy and the topology of MySQL where there is a master and there are followers which are replicas is intact. So due to this, when the master fails, it can detect such conditions and it can trigger workflows where it can take out the failed master and promote one of the replicas as a master. So effectively, what happens is there is some time gap that is configurable when the failure happens, the database becomes unavailable for about a minute to three minutes and then a new master comes up and everything is back up again. And this is a key component because without this, we can't do automated failover in the solution. So what we have been running this for a while, and interestingly, we have been running this on prem. So we don't use public clouds, which means that we have to either run this either on bare metal nodes or something like OpenStack or VMware. And we tried both ways. So I can tell you about problems with both approaches, in a sense. So running on prem, when you use a cloud, basically it seems obvious, but it's just something to remember that you're taking on problems of the cloud as well as this layer on top of it, which is Kubernetes. And typically in our practice, we found that software networking problems in the underlying layer impact Kubernetes quite a bit. So if the software networking has a bug, what happens typically is that the load balancer in Kubernetes doesn't work well. And if you consume this Kubernetes service from outside Kubernetes, that causes issues. Storage in Kubernetes, in case of private clouds, is not very well developed in our experience. And different storage providers have different type of limitations that you need to deal with. There are some companies which now develop storage, which is Kubernetes native. And in hindsight, if that was something that was available to us that would have made us much more, our implementation much more smoother. Running on bare metal comes up with its own challenges. And one of the key challenges is that load balancer. If you run everything inside Kubernetes, then you don't need a load balancer. But when you consume it from outside, you need to expose the services outside the Kubernetes networking namespace. And that's where you need something like an external load balancer in order to get to the services. And if you run this on bare metal, then the load balancer options are fairly limited. If you go to public clouds, you can use a public clouds load balancer in order to access your Kubernetes services. But on on-prem, there is pretty much only two choices, which is the bare metal. There's a project called bare metal load balancing, Metal LB. And another one option you have is the node port based load balancer. Some of the must-haves, if you go for this kind of model of running Kubernetes on-prem or even in public cloud when you want to run your own Kubernetes, we learned that multi-master, having a multi-master architecture is a must-have, because failures can happen anytime and if your one master goes down, then that brings down the availability of the entire cluster. It is also a very good idea to have backup and restore workflow in place for Kubernetes clusters just to account for some of the key failures with architecture or maybe a disaster. Backups are paramount importance. So I'm kind of glad that when we are looking for alternatives, we only looked out for solutions which could provide us a complete backup and restore workflow for MySQL because there were a couple of instances over this period where some unexpected failures caused all the nodes in our cluster to become unavailable and then we could restore it from a backup. And monitoring is very important. You need to monitor Kubernetes as well as monitor the MySQL layer. So in the software stack itself, we found a few issues. So when you run stateful set on Kubernetes, what happens is if you delete the stateful set, the compute portion of it, the pod goes away, but the volumes remain. The state of the database remains in the cluster. And when you run it on-prem, especially on bare metal, we ran into cases where older volumes got attached to newer databases, which we didn't want. And we fixed this issue by assigning ownership so that everything just gets cleaned up. And for data availability, let's say if somebody accidentally deletes a database, we rely on backups. MySQL upgrades are still disruptive with this operator implementation, and we would like to go to a mode where they are seamless. As I mentioned earlier, this is a solution which has a single master. So if the master goes down, there's a blip in the availability until the new master becomes available. And we want to go to a model where the database is always available using a multi-master replication. Another problem is that we currently have backups which are based on MySQL Dumb Tool, and we would like to graduate to a snapshot-based model. So that we can take frequent snapshots, and there is not much time difference between the state of the database and the backup that we have to restore it from. On orchestrator side, we also ran into a few bugs that we fixed in the operator code. So some of the future things for this project are, as I mentioned, we want a multi-master implementation. We plan to use ProxySQL, which is the load balancer for MySQL in front, so that individual master failures don't impact the availability of the database. The snapshot support that I just mentioned. We would like to extend the backup options of this current offering. Currently, it can do public clouds and object storage as a backup, but we want to extend it so that it can use a file system like NFS for backups. And SimLabs upgrades is another objective. When we go from one minor version of MySQL to another, currently we take down the database, upgrade it, and bring it back. We would like to make it an online process and handle it in Kubernetes layer itself. So with that, I'm almost at the end of my talk. So here are some of the links. This operator, I should mention that this was developed by a company called PressLabs. It's an open source operator, and we help them take this effort further by using it on-prem, providing their feedback, fixing bugs, and so on. The orchestrator project is also an open source project that you can check out. If you like this kind of model of running your applications on Kubernetes, which are sort of self-healing, I would encourage you to look at this project called CubeBuilder. What it is is it's basically a tool to generate scaffold for these Kubernetes controls. So a lot of the underlying machinery that you would need to talk to Kubernetes, bring the service up, build it. The scaffold spits out code so that you can just focus on writing the business logic for our application. So with that, I think at the end of the talk, and happy to take on any questions. Yeah, so we tried both approaches. So our initial attempt, we used directly a storage provider, which was given by our backend storage array that we have in our data center. But we ran into a lot of limitations. For example, this controller didn't give us flexibility to define sizes of the volumes. It would just create a stock volume of fixed size, regardless. We had availability issues which caused a lot of failures in our environment. Because all the storage connections used to go through a single virtual IP. And in some cases when the single virtual IP became unavailable, our entire MySQL cluster became compromised, even though it had all the replicas. Then on the open stack side, we actually tried Cinder. And then what we did was we used LVM as the storage provider in Cinder. And the Cinder box provided volumes to all these nodes. So that worked out better because what we did was in each AZ, we created a Cinder node with its own storage. And then we created a MySQL cluster where each replica was confined to that AZ. So even though that availability zone had issues, they never failed together, and that worked out pretty well for us. But yeah, local volumes is something that we were looking at, but given its limitations, you have to statically carve them out. And you have to plan it so that, for example, if you have 50 volumes, I can only create 50 databases. And we didn't want that option. Yeah, yeah, Portworx is something that's definitely, yeah, okay. Yeah, so the monitoring happens at two levels. So Kubernetes monitors the objects that it knows about because this MySQL cluster, it's an object which only the controller is aware of, the MySQL controller. And what it is doing is it creates a stateful set, Kubernetes stateful set. And the stateful set in turn has pods and services and things like that. So Kubernetes can do health checks to pods to make sure that the MySQL process is running and it's responding. And it can move them around if it becomes unavailable. In addition, you want to have monitoring at the application level. So at the MySQL level, there are things like how many concurrent connections the database has, or at what rate you are doing log writes. For a replica, what is the log index at the replica versus a master? How far behind it is? So those kind of things we monitor using the Prometheus component I talked about. Yeah, so initially we used NotePorts because that was the simplest option we had. But as you know, NotePorts bring up services at some random ports, which are very high range, 30,000 plus. So what happens is we have these bunch of services which still run in VMs. And the only way they can talk to this MySQL is through the NotePorts. And in case of disasters, if our database goes down and we bring it up in some other Kubernetes cluster or in the same cluster using a different MySQL object, this NotePort changes. And that is quite visible to our application. If you use a load balancer, this problem won't be there. So each one of them comes up with some bells and whistles. We used HAProxy based load balancer, but when it runs on top of a software network layer like OpenStack, if the software networking layer has some issues, then this load balancer becomes unavailable. So each one of them come up with their own sort of advantages and disadvantages. Thank you. Thanks for staying up late. And yeah, if you have more questions, just you can talk to me.