 Good morning. Hello. Shiver me timbers. Or as they say here, sookie les artimuses. So we are the pirates. And we're going to talk about database operators. Hi, everyone. Can you upload, please, Captain Jerome for this talk? Thank you. So here is Jerome. I'm Alex. We actually had this talk one year ago in France at KCD France 2K23. Omelette du fromage. Yes, baguette style, camembert. Actually, we had quite a lot of fun with David, who is missing today. Hopefully, we will present what was presented in French last year, and we will add a little bit of updates. For instance, we got some operators that went through releases. And as well, we got some kind of experience in production as well, based on the past 12 months. So maybe, Jerome, you can explain us about the operators in particular. Sure. So a quick show of hands. Who knows what an operator is, like a Kubernetes operator? Yeah, lots of hands. Cool. Now, who knows what a database operator is? A little bit less hand, but still a lot. Cool. So maybe half of the folks raised their hand. So a database operator is something that lets us deploy a database automatically on Kubernetes and manage the whole life cycle. And behind these fancy words, we mean that it's going to take care of deploying, but not just deploying the database server, but like a primary replica, have backups. And when you have backups, backups are pointless. If you don't have restores, so also manage restoring. For instance, point in time restore, which is something really cool. We're going to talk more about this later. And in this talk, we're going to concentrate on six operators and only SQL operators and only for Postgres and MySQL. It's not because we don't like the other one. It's just because we only have like 30-something minutes. So we had to constrain the scope a little bit. And this is by no means an exhaustive review of all the operators out there. Again, time constraints and so on and so on. Another thing that is pretty cool with operators, even if you are a seasoned DBA, et cetera, something pretty cool that you can do with database operators on Kubernetes is leveraging local storage safely. Because when you are, whether it's in the cloud or on-prem, there is often this kind of dual thing. Am I going to use my local disks, which are super fast? But if I lose the machine, I lose the data, or am I going to use remote disks, like EBS, or CIF, or something like that, which feels safer, because it's on something that is replicated and so on. But also, it's much, much, much slower. So if you have something that deals with replication and backups and so on, then it feels safe to use these local disks. And sometime that results in absolutely tremendous performance improvements. Yay. Can you name the operators we are going through during this presentation? Good idea. So there's going to be, on the Postgres side, CNPG, Stackgres, and the Zalando Postgres operator. And then the MySQL, MyIDB side. There's going to be Moco, MyIDB operator, and the Percona operator for MySQL. And, well, the reason why we talk about this, and I'm going to say, well, we're not exactly in a unique position, but we're in a kind of rare position in the sense that we are, and we, that means Enix, our company, we are helping folks manage their cloud native infrastructure. And some folks ask us, hey, can you do that with Postgres? Can you do that with MySQL? Hey, there is that cool feature in that operator. Can we use that? So we end up with a bunch of these different operators in production. So it's a little bit different from if you're this giant billion dollar unicorn who bet the whole farm on Postgres or MySQL. And every six months, they publish a big blog post. And I can use being like, hey, this is how we moved from something SQL to something else SQL. No, we have a lot of these operators in production. And we thought it would be really useful to share what we learned. So, Jerome, can you write us to our destination? What's the plan to get there? Sure. So we're going to go on a journey. We're going to plot a course and set sail. We're going to manage the permission of the crew. We're going to promote somebody to be a new admiral in the fleet. We're going to go up in the crow's nest and recover in second fleet and outfit vessels and cannons and upgrade to bigger boats and finally divide up the button. Yeah. Do you think it's going to be fruitful? Probably. Probably. And just in case, like, if I'm sorry about all these words here, maybe you feel like, what is this about? This doesn't make much sense. Yeah, we just put a bunch of pirate words here. But each of these funny expressions is going to correspond to some actual work that DBAs have to do. All right. So let's do that on short course and set sail. Actually, are you going to discover your operator and start to work with it? We are all pretty much used to mCharts. So all operators can be installed through mCharts. What will differ depending on the operator is a kind of interfaces you will have to play with. So first kind of interface is going to be custom resource. Everyone knows about custom resources definition in the room? Yes? OK. Actually, some operators got one single custom resource and some others got a bunch of custom resources. Like, yeah, like, sorry, I was disturbed. So a lot of custom resources for some of the operators in a very cloud native way, I would say, like cert manager, like Flux, like this kind of typical operators. Also, one of them is to provide a graphical user interface. It's Staggress, as far as I can remember, which is pretty convenient when you want to play with your database cluster and basically add plugins or monitor it or play with it. And finally, one of the best interface from my standpoint is kubectl plugins. So playing with kubectl is going to be very fun and you're going to enjoy it, meaning playing with your database through such plugins makes it very easy to upgrade, to backup, to restore, to switch over. We're going to talk about that in the various following slides. And quick show of hands. Who prefers to work with the GUI? Almost no one, just a few people. And who prefers to work with the CLI? OK, cool. We're among friends here. I mean, it's cool for the folks who said they prefer the GUI, but then it means we're kind of on the same wavelength, I think. Can you show us the YML that we use when we want to create databases with these operators? Sure. So some in real life examples. On the left side, you got the CNPG. On the right side, I don't remember. I think it's Moco. Yes. You got a bunch of YAMLs, pretty usual for Kubernetes. What we wanted to show here are the different colors. You can see red, yellow, and orange. The red actually means it's a minimal set of information you need to provide in order to launch your operator when you start to play with it. So in development, for instance, you can add it like six, seven lines on the left side. And it's pretty similar on the right side, except you want to provide the storage information. Evidently, you can add some storage class settings and some backup configuration. And by the way, are there non-French speakers here? Non-French people? OK. I'm sure you're going to appreciate my very, very French accent. I just forgot to talk about that. All right, so quick example to show you that it's very easy to launch a database cluster with the operators. That's the point of operators. And next slide, you will basically discover some emoji here, like the banana. And the banana, we do like bananas at Enix. Who doesn't like bananas? OK, a couple of hands. That's OK. You're fine. But basically, everyone else loves bananas. So we put bananas for the ones that we were like, OK, this is really cool. We love the way they do it. Now, what about the skull and bones? The skull actually means it's like a minus from a standpoint. It doesn't mean that the operator is not good at that. But it basically means that at Enix, we don't very like the way it was handled by the operator. So on most of the slides, you're going to see pros and minuses. But it also means that some operators are not displayed here, which means no particular command. It doesn't mean that they are not good. It doesn't mean that they are not bad. But you will see all feedback here on the bottom of the slide. So for this example, we really like playing with Yen, PG, and Moco. Because they're basically very easy to install and to interface with. And what we didn't like is Percon. Now, quick information. My for MySQL, PG for Postgre, and AMA for MariaDB. So Percona, what was problematic from a standpoint is the configuration is a bit not messy, but the documentation is not very clear. It's complicated to install. Basically not a very pleasant moment to install. Less pleasant than the other ones a little bit. It's not bad, but not as bananas as the other ones. So Jerome, Captain Jerome, let's imagine we got a kind of mutiny on the boat. How we save the day through operators? What can we do? Because with operators, we're going to set permissions on the databases. So obviously, when you connect to a database, you need credentials. And so how does that work with our operators? We have different paradigms. In some cases, it's kind of completely do it yourself. Like you have one admin account, and then you need to run SQL set statements, manual commands to create your users as you would do in the old world. Sometimes you can indicate, like in your manifest in the ML, you can say, I want this, this, this, and this user. And sometimes you can even define some CRDs to define the users. We're going to see a couple of examples here. On the left, Xalendo. And I don't know if you can see. On the left, there is the user section around here. And on the right, this is the MyADB. So you have a CRD for users. So that's pretty cool because you can define the users and the permissions, like part of your GitHub deployment pipeline, et cetera, et cetera. One thing that is worth noting is that some operators also kind of decided, well, I mean, in the old world, so to speak, you have one big database server or clusters and maybe hundreds of databases on it. And some operators basically tell us, well, if you want, you can deploy one cluster per application because it's so easy to do with operators. And the overhead is really low. And that helps you to achieve a better separation. So some operators are like, yeah, if you want, you could define multiple users. But what we encourage you to do is actually to have one cluster per application because it's easy, because it's cheap, and because you have better isolation, same thing like for performances and query caches, et cetera, et cetera. So maybe we move to the next slide. Sherom, how do we promote you as captain to an admiral on the cluster? OK, so we're going to talk about automated failover and manual switchover. So it's about high availability because, in fact, you have one admiral in a fleet. I think it's the same in the pirates and in the Navy, sorry. And the flagship is just the ship where you have the admiral. So the flagship is basically our primary database server. So it might be a bit confusing that we have automated, manual. Out of curiosity, who knows the difference between a failover and a switchover? A few hands, so I see we have more hands. We have some seasoned DBAs here, awesome. Basically, the automated failover is the first thing we think about. It's like, oh, we just lost the primary, and we need to failover to the replica so that operations can continue. And the switchover, it's manual, so it means I'm going to now and then decide that I'm switching over to this replica. And when would we want to do that? Well, a little example, let's say I have a big Kubernetes cluster, and on it I have like 100 database clusters. And I have a node that I need to decommission. Maybe there is a hardware problem. Maybe my cloud provider just sent me a notification like, hey, we're going to shut down that node in like 30 minutes. I don't know, but basically I need to get rid of that node, which means moving all the stuff that's on the node. And that's when you want to do a switchover so that you don't have any primary on that node, basically. And so here we can see some pretty significant differences between the different operators. So example, if I manage to click on that thing, yes. So this is on Stackgres. So some Postgres operators are based on Patroni. And I think if some of you have already set up replication on Postgres, there is a good change that you have seen or used Patroni. So here, if you're one of these folks who know Patroni pretty well when you see this, you're probably going to be like, yeah, this feels familiar. I've done this before. And it's relatively straightforward. You know, just a kubectl exec a few commands, and boom, we just did a switchover. Yay, awesome. Now let's compare with what we can have with another one, for instance. So this is going to be with Moco. And with Moco, that's going to be extremely complicated. There is a plug-in. One line. Exactly. And in one line, that's it. We did the switchover. So if you just need to do a switchover once in a blue moon, that's fine. I can run these few manual commands. It's OK. But if I happen to have hundreds of clusters for which I need to do a switchover, that's going to be a little bit more annoying. Imagine if I just realize, oops, I have this notification about this machine going down, and I have 20 minutes to switch over the 30 clusters. That's going to be easier if I can just do a little for loop and grep and et cetera to do all that switchover. So let's say maybe regarding these operators in that regard, I guess we had some problems at some point with some of them. And we got a new emoji here, which is the banana skull face. Yeah. OK, so the skull morphing into a banana means that one year ago, when we did the first iteration of that talk, it was not great, but now it's bananas. Can you tell us a little bit more about the MayaDB operator here? So MayaDB operator is actually the only one operator implementing MayaDB. It's a pretty recent operator, which was quite young one year ago. And actually, they made a ton of new releases and sold tons of different values problems we faced before. So we actually think today that MayaDB gets really better and is like a banana today. Whereas one year ago, it was not. For instance, MayaDB didn't manage a replication of SQL engine. So we basically had just a primary one year ago. Now we can manage a complete cluster, which is, I would say, on this regard regarding promoting, which is necessary. Also, on CNPG, you see a little boom here. I think we're going to be short on time. I don't know where we are right now. Like 70 minutes. But maybe we can keep that story for the end if we have time. OK, so let's move to my joke, my joke. Yeah, wait. Oi. Hey, Jerome. Can we kind of monitor something from far away? I think we're going to talk about observability. You're welcome to applause this perfect joke. Thank you. So basically, in terms of monitoring, everyone uses Prometheus, I guess, I think. Yeah, who uses Prometheus? OK, maybe I don't know if it's everyone, but many, many, many hands. OK, cool. So you've got meteorology by default with operators, which is kind of expected. What you will see in terms of differences is the level of meteorology you get from one operator to the other. The first layer of metrics you're going to get is typical SQL exporters that you can use on virtual machines. So you will have database level metrics like performance, eventually some information about the CPU consumption, memory usage, et cetera. Pretty low level information, actually. Some of the operators do provide additional information about the cluster health. Meaning, is my cluster replicated in good health? Were my backups correctly handled? Was I able to, I don't know, restore something about the certificate expiration as well? And finally, something we're going to present on the next slide. Oops. You did it at the same time. Sorry. So right side is actually typical Kubernetes export, YAML export, about the status of your cluster. You can see over here some kind of health information about the cluster. It's not very convenient to parse, actually, from a human standpoint. And some operators do provide, as you can remember, CubCaddle plugins. Actually, here it's CNPG providing the status command here. And there you can see, you have a complete overview of your cluster status right now. As I did say earlier, you can have certificate expiration information, cluster health, last backups that went through correctly, et cetera, et cetera. And I think for the folks who are used to working with replication, you can recognize stuff about the whole state, the right ahead log. And so that's the kind of information that you want to have real quick when you want to check that everything is fine. And speaking of wall logs and backups and everything. Yeah, actually, very important stuff that you can monitor here as well. If I get back to just wanting to show you the pose and minuses, the bananas and the scales, we do find that from a monitoring standpoint, the operators that do provide a CubCaddle plugins or do provide cluster health methodology are really a plus in that regard. So CNPG and Moco, again, that's a plus. On the other end, Percona again on the path side. What's wrong with it? OK, so first of all, you got to install a kind of agent in order to do the monitoring. It's called PMM. Percona monitoring something, management. And by default, with the normal configuration, it's going to send information to a central server. So this is really, from an any standpoint, a boo. Who likes to send data anonymously to a central server with a default configuration? Yeah, do you like it when you install something open source and it sends random data to its central place? No one? Yeah, I mean, honestly, there is nothing wrong with the idea. Actually, sometimes it's really cool, like Firefox does it to have like crash info and that's super useful. But the problem might be, like from a security standpoint, you have this network flow going out, and you're like, well, what is this exactly? You need to kind of, is there some important data in there? And of course, you can disable it, but that was a little bit of a surprise, basically. Yes. So hopefully at some point, we can get some pluses for Percona. OK. So let's imagine, Jerome, that we lost our clusters. Right. So big catastrophe, we lost the whole fleet, primary, replica, everything is gone. Maybe it's like 11, 21, and I do a drop table on the production database. What can we do? Well, first, most operators let you define backups either to an object store like S3 or to PVC, like so persistent volumes on Kubernetes. And well, sometimes both. There is also special mention to CNPGE, who just recently added the option to do backups with CSI snapshots. I don't know much about it, but I just learned about that last week. And I was like, this is cool. And that enables you to do point-in time recovery, which is the thing where basically you can say, well, Captain Jerome the pirate accidentally dropped production at 11.21. Can we restore the database as it was at 11.20, you know, one minute before? And that's point-in time recovery. And out of curiosity, who implemented point-in time recovery before? OK, a few hands. Who found it easy and enjoyable? OK, there are some hardcore DBAs here, math props to you. But well, I like managing databases. That's my maybe a little weird side of me. But PITR is not really easy. So it's really great that the operator lets you do that. Sometimes as easily as you take your YAML, you add literally a couple of lines to say restore from the backups at this time stamp. And it's going to create a brand new cluster with a restore of the data. Here, there is a little bit of a difference between the old world and the new cloud native world. Well, in the old world, often we would restore kind of in place on top of the existing machines because we wouldn't have tons and tons of machines lying around. And in cloud native, it's like, no, no, no, I'm going to leave the current database running. And I'm just going to create a new cluster because it's just a bunch of pods. So I don't care if I just add a couple of extra pods. It's fine. And that way, I can have my old broken database and the new one side by side if I want to more easily do some reconciliation, mergers, et cetera, et cetera. Wait, Jerome. You mean that in the cloud native way, rather than replacing the production servers that are faulty, you want to create a new instance and then on-sits up and running and you verify the data, then you redirect the app. For instance, yeah. OK. Yeah. And then you know, like, blue-green deployment, et cetera, I'm sure I don't need to tell you how great these things are. OK. A quick word about MayaDB again. What changed during the last year? So first implemented the F3 endpoint, which is kind of our preferred way to do backup. You usually want to send your backup far away from your production in a different data center, in a different provider, on a different provider. And also what they did implement is a kind of point-in-time recovery. It's not exactly point-in-time recovery. Basically, it's going to find the nearest backup in time when you ask for a specific thing. One last thing before we move away from the backups. A quick story about CNPG as well. If you look in the documentation, at first it can be a little bit overwhelming, because it's going to tell you, hey, you can do full backups using SQL dumps. Or you could also do PG-based dumps. And if you're not exactly super deep in the trenches, no pirates don't go in trenches. In the bilge, I think it's called, below the ship. So if you're not super deep into Postgres, you're like, what's the difference? Why should I bother? And you might be tempted to think, oh, this other operator is better, because it only gives me one option, and I don't have to think too much. Well, having this option is actually super important. Because, for instance, full SQL dumps will let you do a backup and restore across different versions of Postgres, for instance, or sometimes across very different services. And having a base dump will be much, much, much faster. To give you an idea, a recent example on about 100 gigs production database, so modest size, but still a little chunky, with a SQL dump, the restore took multiple hours, like two, three hours. With a base dump, it just took a few minutes, because it's just a file system level backup. So that's pretty important when we start thinking about time to recovery, and blah, blah, blah. In one case, it's hours. In the other case, it's minutes. So I'm going to pick minutes, personally. And so it's great to have both options, and it's worth reading the documentation. So if I want to pimp up my boat, actually add some plugins to my SQL engine, I would have basically two options. Either the operator manages such operation, meaning that in the custom resource, you're going to add a simple setting to say, I want this plugin. As an example in Postgres, it's pretty usual to add a post-gis, post-gis, which is for geographical database. So it's pretty convenient to do it on some, let's say, cognitive operators. I mean, the more modern ones, like Stackgres and CNPG, but sometimes you will have to do your own custom container image, like on Moco, the skull face here. Why? Because they basically do not manage plugins, so you can still add your plugin by building a new container with the plugin included and enabled, and it works. But it's less convenient and it's not managed by the operator. All right. Okay, so. Well, I think you're going to need a bigger boat. Yeah. We need a bigger boat. What does it mean? It means that we need to discuss about upgrades. And upgrades actually is quite a subject when you deal with operators. I won't ask in the room who had failures while upgrading operators, but I can tell you there's a big difference between upgrading your SQL engine, which is normal operation from an operator standpoint, and on the other side, upgrading your operator, which can be pretty dangerous, okay? So on the SQL side, on the SQL engine side, upgrading a database is pretty easy. Sometimes you even have some custom resource in order to do the upgrades, and you got events, and the operator basically will roll out your different replicas, like new primary, do switchover, et cetera, and you don't have to do anything. And it's very, very convenient, okay? On the other side, if you got on a one single cluster, one hundred clusters, times three pods, one primary, two replicas, and that your operator starts to roll out everything at the same time, it basically means that you got 300 pods, like exploding and restarting, et cetera, and hopefully you will have extra resources in order to handle this, but most of the time it can get quite ugly. So beware of the operator upgrades, make sure in the configuration, that's actually the problem for Moco. One year ago, while you were upgrading an operator, the Moco operator, basically it was by default restarting rolling out all clusters. Hopefully now, that's why they got this kind of yellowish something. It's because they changed the config, you can configure the behavior, so you can disable rolling out at the operator upgrade time at the rollout of the SQL engine. Be careful, test your upgrades in-dev, in staging, whatever. Do it a couple of times, because it can get quite messy. And the nice thing, again, with Cloud Native, when we tell people, oh, always do test in staging or whatever, it feels like, well, you mean I have to do the job twice, but the point here is that it's just applying a bunch of YAML, so it's not going to be a lot of extra effort. All right. So, we're gonna find some treasure now. So, yeah, that's the end of our journey, but I think our treasure looks like shit, basically. Well, I mean, we got some bananas, so. Okay, great. The bottom line is basically that SQL operators on Kubernetes are bananas. Like, it's seriously a really awesome in the sense that if you're just, like I would say, a casual DBA, like, you know, oh, you're the full stack engineer who goes all the way from CSS and React and from then back-end, and also accidentally now you need to manage databases, so that's a lot. This will let you have replication and backups and so on and so on without having to go through the long and sometimes painful process of learning everything about the MySQL and Postgres configuration and so on and so on. Like, honestly, if we had more time of if this were a workshop, we could take like five minutes to use, I don't know, like, kind or wrencher desktop and install one of these operators and have a replicated database and show some failover. So, that's how easy and fast it is, and it's going to implement for us a lot of best practices. And if you are a seasoned DBA with lots of, you know, battle scars and war stories and data loss and so on, that's going to help you manage more databases, more easily, faster with less overhead and so on and so on. So, I would say whatever is your level of database expertise, in my opinion, database operators are really great. It also gives us some way to kind of self-serve databases as a developer with my Kubernetes cluster. If I need a new database cluster, like I can just apply some YAML and one minute later, boom, I have my Postgres or MySQL cluster, instead of opening a ticket to the database team who's going to get back to me as soon as they're back from vacation and then they have other thing to do, et cetera, et cetera. Yeah, in that regard, actually, yeah, in that regard, actually, as you can implement an ingress in your application Mshart, you can implement an operator customer source in your Mshart and this makes it very convenient, at least from a Dev standpoint, when you want to launch multiple environments. So, how are we on... Two minutes left, I think. Two minutes left, okay, so, that's the final slide, actually. We wanted to give you an overview of what is going on from our standpoint at NX, okay? It's an opinion. It doesn't mean that some operators are worse than others, but it's all of you on how we want to implement them in production. So, on the left side, the three on the left are in production at NX and they have been for quite a long time now. And it's running well. It's running especially well. The StackGrace line, you can see, you got all bananas and there's a bizarre emoji. Crystal ball. Okay, crystal ball. So, why isn't it in production at NX? Actually, we don't know because it's a very good operator. We want to use it, but CNPG does the job today, so we never switched or added StackGrace. We really think it's one of the most modern kind of operator with multiple customer sources and various stuff that are quite cool, but still not yet in production at NX. Let's see in the following month. And then we got the two remaining, but we don't have time, maybe, I don't know. So, MariaDB very quickly did a great job and this past 12 months it's starting to be the MariaDB operator. It's the only option, but it's start to be like prod ready-ish right now at this time. And finally, Percona complicated for us to use it. It makes some sense in particular scenarios, not ours. Well, I think that's about it. If you want the slides, top right QR code. If you want, oh, so we had these cool like clown t-shirts and so we have some on the back if some folks are interested. Also, please, if you liked this talk, please rate the talk. If you didn't like the talk, don't rate it. No kidding, but then come to us and tell us what you didn't like so we can do a better job next time because as pirates, we strive to continuously improve. Thank you, Jerome, for this talk. Let's applaud the... Thanks, Alex. Okay, thanks.