 I would like to thank you all for coming. Amazing how many people just showed up. I guess everybody's interested in running my SQL on Kubernetes. That is unfortunate. Does this help? I can try a bit louder, but I'm not a very loud person. Good. Or does this thing? The microphone is just for the recording. Okay. So I just have to try yelling at you? Yes. Great. Where was I? I want to talk about running my SQL on Kubernetes. Just before we get to that, my name is Sami Alros. I work at Perkona as a support engineer. Been doing that since last April. Would you mind not leaning on the light switch? Kind of confusing me. I have been working with all sorts of open source technologies since almost all of my career since 1995 or so. I got in touch with my SQL for the first time, pretty much when it became available around 1996, 97. Yeah. I think that's enough about me. We want to talk about something more interesting. What we are going to cover is, what would be the point of putting my SQL on Kubernetes? How would we get started doing that? As a support engineer, I just find backups and restoring super interesting. So I want to talk to you about that. Scaling a database up and down while it's running how that would work when running my SQL on Kubernetes. I forget to yell at this better. To yell even more? Yeah, it's possible. Oh, that's bad. I think it's better over there. Yeah, but I am. I'm just a quiet person. Sorry about that. I may become closer if you can't hear me. Finally, I want to talk a bit about monitoring my SQL in Kubernetes. So actually, before explaining what I think is the point, I would like to ask who of you is using Kubernetes? I don't mean for running database, but for running anything at all like applications or other stuff. Yeah, maybe roughly half ish and follow-up question who is actually using Kubernetes for running databases? See about five ish hands. Good. To be honest, it's what I expected. So the point in my opinion is like we just saw, a lot of you are using Kubernetes for running your applications, but not putting your databases in there. But it could make sense to run your database on the same infrastructure, on the same stack as the application. Deploy both together, keep it simple, and it can be simple. I guess the ones who did not raise their hand a moment ago, have their databases on dedicated machines or on dedicated virtual machines, and those need their special care and feeding, and maybe we can make it simpler by putting both in one place. Kubernetes is really good at resource allocation. Like you can configure limits on this database, may use 1.3 CPUs and one gigabyte of RAM or whatever. You can also configure guarantees. This database will always get at least four CPUs, and you can configure burst limits, which don't want to get too much into details about what Kubernetes can do, but those would be, how do you say? Yeah, good things to have, could make management more creative, interesting. Another cool thing about Kubernetes is anti-affinity. Say you have a database cluster, you don't ever want two database cluster nodes on one physical hardware. You can tell Kubernetes, make sure these two pods never run on the same server. Or since we will be, I'm spoiling here, we will be using cholera replication here. You could configure Kubernetes to keep all nodes of a cluster in one availability zone, to keep cholera happy, but still on separate hardware. You can do all that in Kubernetes. So yeah, lots of possibilities there. Kubernetes is really good at scaling things up and down. I will show you in a bit. There is even an auto-scaling in Kubernetes. Try it out before using it on your database, but it is possible. Finally, automation. That's what Kubernetes was made for, orchestrating, automating your things, and we can actually use that for our databases. Actually, one more question. Who thinks Kubernetes is complicated? Again, maybe half or almost half. There is a learning curve, I admit. I'm far from an expert, but you can actually get used to it pretty quickly, and it starts to make sense at some point. I promise. Actually, I'm going to say running my SQL on Kubernetes can be surprisingly easy. You will see. So, getting started. How would we go about getting started? Well, as I spoke with one of you earlier, you can start some pods in Kubernetes and write some scripts for setting up replication, and it may work, but it can also be pretty complex and it's not fun. I have done this and you have done this and yeah. But there is a piece of software that can make it fun, a lot more fun. That is called the Kubernetes operator for Percona extra-DP cluster. Now try saying that quickly. Since I absolutely cannot say that quickly, I'm just going to call it the operator, and if I'm talking about the Percona extra-DP cluster, then I'm just going to say PXC. Anything else is taking too long. We only have 20 minutes after all. So yeah, the operator is free, open source and you can clone it from GitHub and get going. Easy. What is an operator? Just very shortly with an operator, you can basically extend Kubernetes. You can define the so-called custom resources, and in our example, a database is a custom resource or a cluster of databases. So the operator deploys Percona extra-DP clusters, like I said. PXC, I think most of you have heard of PXC, so I'm not actually getting into too much detail there, but more free and open source. I obviously would not talk about anything else at FOSTA. With the Percona operator, you get Percona server for my SQL 5.7. There is no 8.0 yet. We are working on it. There will be one at some point, but I cannot tell you when. Requirements for running my SQL on Kubernetes. A Kubernetes cluster is required. Can be Google Kubernetes Engine, can be OpenShift, works fine too. There are some version constraints, which I did not write down, but you will find them in the documentation. Or if you just want to try this out, you can get it up and running in Minicube. If you have Minicube set up, I promise you can have your database running in less than half an hour in Kubernetes. So yeah, I think I mentioned that. You can get the operator from GitHub, you clone it, you deploy a couple of files, you will have your first database running in Kubernetes. Is this readable at all? I guess in the front row maybe. I hope. I was going to do this live, but somebody told me to not rely on the Internet here, and turns out my old laptop doesn't handle lots of Kubernetes clusters not very well. So we get screenshots today. I guess I could have selected better colors. Sorry about that, it looked fine on my screen. The thing with Kubernetes is Cube CTL. Kubernetes control is what you use to control things. We apply the so-called bundle YAML, which came from the GitHub repository we cloned, and we apply two YAML files like I mentioned, bundle YAML and CR YAML. CR stands for Custom Resource, and that file looks terrible on this screen. I promise next time I will make black and white. The thing is you can define the name for your cluster, size for your cluster, all which reminds me the slides are, of course, in the FOSTA program, so you can download them there and look at them on your phone. So that might help, might make this more readable. Define the size of your cluster, what, how much memory your nodes are getting, limits for those, what kind of SSD disk here, with size of six gigabytes. You can define a lot of things in the CR.YAML, this is just a small sample of that. You apply that after modifying it and you wait a bit, and you do Cube CTL get pods to see what's running. So the operator went and started three proxy SQL pods for us, because that's what I configured, and three PXC pods, and the cluster operator itself is running. Actually, I see the last proxy SQL pod is still pending, but it will be started at some point. So that's how easy it was to have proxy SQL and Percona Server PXC running in Kubernetes. Deploy two files, assuming you have a working cluster. The operator also sets up a few Kubernetes services. I guess for people who have not had exposure to Kubernetes, the service doesn't mean a lot, but what you need to know is we have our cluster one, which is what I named my cluster, proxy SQL. It has an IP, I can point my applications at that service and my application will actually work, assuming it's running in the same Kubernetes cluster, can reach the service. Just to demonstrate that it works, which I guess you cannot see very well, I started my SQL client inside the cluster, pointed it at the cluster one proxy SQL, created a database and table and it just works. Proxy SQL does its magic, sends my connection to where it belongs, and I have my SQL in Kubernetes. Looks simple, doesn't it? Or does somebody not agree? Everybody agrees, fantastic. So I told you. I promised to talk about backups because I find backups interesting. So the operator actually offers us a way to take backups. You get two backup destinations out of the box, you can send your backups to an S3 compatible storage, or you can use a Kubernetes persistent volume. You just, in the CR, custom resource YAML, you can define a, that looks much nicer than the previous screenshots, just define a schedule like a cron job, tell it what should be packed up, where should the backups go easy. And if you happen to think of taking a backup on demand, just apply a file like this to the operator, just go and create a backup, called backup one from cluster one, store it on my Kubernetes persistent volume. Apply that, and wait a bit, and run keep CTL, get PXE backups, and we will see my backup has been created, took 13 minutes, six hours old, whatever. Backups happen, yeah, super easily. So how do we restore a backup? Almost as easy. When we clone the repository, we got a script called copybackup.sh, then what that does is pretty obvious, we tell it which backup we want, backup one is from that get PXE backups output, we tell it where to put the backup, and we wait a bit, and we have a local copy of the backup, and the script actually tells me which commands to run to start a local instance with that backup, so I can do whatever I want with my backup, verify it's healthy, use it for whatever you do with backups. Yeah, scaling up, scaling down is what I promised to tell you about. Scaling up a database cluster is traditionally not always trivial. I mean, it can be if you have proper tooling, and with Kubernetes, you basically have the proper tooling right there in your hands. So like we started earlier, our cluster with three proxy SQL nodes, three PXE nodes, my backups are there as well, let's ignore those. I go and edit the custom resource YAML file, set the cluster size from three to five, apply this file, wait a moment, and I have my PXE nodes from zero to four, up and running. Scaling a database cluster up in Kubernetes is that difficult, okay? Behind the scenes there might be a few more details, but in a simple case, that is how it works, and well, scaling down, I think no big surprise. Go back to the YAML file, change the cluster size, apply the file, and we see Kubernetes has already terminated my PXE four node, it's terminating PXE three now, so the cluster will be three nodes again. And we have five minutes time, which is perfect, because I only have one more topic left to briefly explain about. How would we go and monitor our database, our PXE cluster running in Kubernetes? We could use Percona monitoring and management, or PMM for short, another, yeah, free open source piece of software, it's built on Grafana, Prometheus, Clickhouse, you can use it to monitor not only my SQL, Percona server, MariaDB, but also MongoDB, Postgres, it's a pretty flexible piece of software, and you can even install it inside your Kubernetes cluster, I mean, like we just established running everything inside Kubernetes, could make sense. You get a so-called Helm chart, I will have a link for you. The Helm chart, Helm is a package manager for Kubernetes for those who don't know, it's literally two commands to install the package inside your Kubernetes, and you do just, yeah, Helm install PMM server and a couple of parameters, and that will actually work most of the time, and to enable monitoring on the database nodes, you go back to the custom resource YAML, set PMM enabled, and you will have the PMM client running in a sidecar on all your PXE and proxy SQL nodes, and you will have monitoring set up that easily, and just a screenshot of how PMM could look like after it's been running for a while on my PXE cluster. PXE is a very powerful piece of monitoring software, but we are not going to get into that, that would be like a talk or three of its own, I think there might have been one today, but that is actually all I wanted to tell you about running my SQL on Kubernetes, how are we time wise? We still have two minutes, and then still five for questions. Perfect, so like I said, the slides have all the interesting links, well, I find them interesting, I hope you find them interesting, and yeah, with that, any questions, sir, in the front? Do you have any problems with preemption of your MySQL bots? You're gonna have problems with that. Do I have any problems with preemption of my MySQL bots, which can turn into a problem indeed, yes. It's again about resource allocation. Do you have any tips on how do you solve it, or what's your way to solve it? I do not have a way for you to solve it off the top of my head. We'll be happy to talk to you about this later on, because I would need to think on this a bit, sorry. So next question, I think I saw a gentleman over there. So basically two questions, would I use? Yeah, so would I use persistent volumes for datasets, generally yes. And the next question, how do I deal with large datasets? Well, with running your stuff in Kubernetes, a large database can get clumsy, like say, it's cholera, so a new node joins, it does a full SSD, and if you have terabytes and petabytes, it can take pretty long, bring patients, or bring smaller databases. So, yeah, sharding, for example, will work. Yeah, back there. I'm sorry, I'm having trouble hearing. Okay, so how would the operator handle running a database cluster spread across multiple Kubernetes clusters, possibly geographically spread? The operator does not handle that for you at all, not off the top of my head, nope. Yeah? The test, I don't know if you are aware of this, so what are the benefits of using these two versus the other one? So what would be the benefit of running Percona? Percona operator or VTES? I feel like I'm saying, I don't know a lot, but I don't know enough about VTES to compare, yeah. There is a helm chart on my SQL high ability, so how this is different, like how this is better? How this is, I haven't compared the two. Have you invented the wheel if it's already there? I only have used the Percona operator, and I know it works pretty nicely. I haven't had a chance to work on the other one. So how does running on Kubernetes affect latency of database? It depends, it depends a lot. I mean, if you have your, nothing else running on your Kubernetes nodes, Kubernetes itself doesn't do much there, right? I mean, it's Docker containers, they are pretty lightweight, so a few percent maybe, but try it out with your workload to know for sure. And tell me if you test it. I think I saw a question here. Yeah, did you have a chance to run some benchmarks on MySQL running on Kubernetes in this mode versus normal VM style or even bare metal? So did I do benchmarking of MySQL on Kubernetes versus bare metal? I have not done such benchmarking. I mean, there will be an overhead. I don't think I have heard of such benchmarking either. I mean, it also depends a lot on what's going on in your cluster. So it's a very open-ended question, really. Try it out. Sorry? Well, Kubernetes itself, that's barely any load. Docker does add some. So I'm gonna go with five to 10 percent, but please don't take it to the bank. We are done. Thank you, everyone.