 Okay, hello everyone. Mike seems to be working and you hear me well, right? Okay, did you guys have a good lunch? So yes, who came here to take a nap? Anybody brave enough to raise a hand? No, no one. Well anyway, then we'll get started with presentation and you know what? I want you to make me one promise. Can you do that? Like if you spot and mistake in my slides, please, please just stand up and say, Peter, this is the biggest bullshit I ever heard. Okay? You promised to remember that. Well, and I will start with their big picture for context of this presentation, which is about running databases on Kubernetes. Now, if you think about the big hyperscale cloud providers, how do they want us to run databases? Well, typically that is not on the Kubernetes, right? They would like to use their very much proprietary database as a service solution, right? Being that something like DynamoDB, which is proprietary or maybe something like Amazon Aurora, right? Which is based on open source, but with a lot of bells and whistles. So that is proprietary too, right? And what comes to that, of course, is we are getting the great usability for that software. It's very easy. It's very easy to deploy and manage some database in many cases, right? But of course that comes off with a great cost. And I don't just mean what these are more expensive in terms of dollars, right? But you have to lose some of your freedoms, right? You are now locked in that particular cloud. Now, what is interesting for me, somebody who has been in the open source space now for more than 20 years, that reminds me the times when I was starting with open source development, right? Sometime maybe in early 2000s, right? We had some Windows developers with ASP.NET or whatever Windows stuff, what has that time, right? And they would tell us how amazing development tools they have, right? You know, like compared to us, the open source guys would be using VI, or maybe some smarter ones would be using Emacs, right? Like, well, but it was not particularly that kind of experience, but guess what? Right now, if you are using the open source programming languages frameworks and so on and so forth, which are, well, probably most frameworks out there in those days, you have a fantastic development tools, right? And I think we are in this kind of similar situation right now when it comes to the cloud, where you can be looking at this kind of very well integrated solution from Amazon and other clouds, right? Where the open source solutions have yet a lot to catch up. Another example also in my career is Linux versus Solaris. Remember about the same time, maybe kind of late 90s, I was starting working with Linux and then there are a bunch of my friends who are working with real operating systems, right? HPX, Solaris, AIX, right? And Linux, of course, was a joke, right? It was 32 bits, right? It didn't scale with multiple CPUs. Well, if some of you remember, it couldn't even create a file more than two gigabytes in size. Right, remember those times? Oh my gosh, that was a joke. Well, but guess what? Now, it took, of course, some time, but anybody still running Solaris here? No, well, oh, I'm sorry. Oh, I see, I see. I never would have said, but forgive me, I'm working at the museum, right? I thought that's what you're going to say. Well, anyway, the fact is what Linux became, you know, absolutely ubiquitous, overtaken across many dimensions, the property operating systems, which, you know, pretty much died off, right, on the infrastructure side, right? And I think that is something what we are also going to see happening more and more on the cloud infrastructure size as well, because, well, guess what, right? Lock in sacks. Now, in my presentation, I focus a lot on Kubernetes, right? And I wanted to maybe share my observation, which was very interesting for me attending this conference, right, is how interesting the Kubernetes and OpenStack intervene, right? I knew for years that some people are using the Kubernetes, oh, using OpenStack as the deployment for Kubernetes, right? Like if you have your own cloud, which may be open source base, and if you are focusing on Kubernetes applications, containerized applications, you can use open source as a base, right? But it was interesting to see what there are also solutions which allows you to get the OpenStack deployment on a Kubernetes. And I quite didn't quite understand it if you deploy the OpenStack on a Kubernetes, can you then deploy Kubernetes inside that OpenStack environment, right? And how many times you can do that? I think that would be interesting thing to experiment with. But anyway, wherever choice in particular, you are making in this case, whether it's OpenStack or Kubernetes using as a foundation of your cloud, I think what is fantastic in this case is what that really allows us to look at the cloud in terms of, as a commodity, right? Really, as Amazon used to talk about that. For many organizations, indeed, it is not very practical to build and operate your own data centers, right? In effect, many of those cloud vendors provide us with commodity infrastructure where you can get VMs like or bare metal through API, for us to use, that is fantastic, right? But I think it is a smarter choice, better choice to use their majority of value which comes with OpenSource software. Again, wherever your choice of that infrastructure is. And that is really that time which compares to electricity as those guys talk about here, right? Because if you think about electricity, it is indeed commodity. You can get it from any vendor, you can buy your own generator and your what, your TV, your fridge still works. Compare that to Amazon experience, right? Or GCP, Azure, you name it, where you can say, well, you know what? If you want to have your fridge, your TV, your microwave, you only have to buy electricity from us. Well, doesn't make much sense, right, in this case. Well, now let's talk about their Kubernetes and why. I believe that is very helpful, wonderful in getting a lot of traction. The thing of Kubernetes is what that is easily available everywhere. You can see, of course, Kubernetes on the edge, right? You can deploy it on your own. A laptop, it's also available in a managed from from all the majority of the clouds. It was interesting to see, for example, the Amazon which initially had their own kind of container engine, right? Well, that's kind of all but died off with majority of use coming from managed Kubernetes service out there. Now, Kubernetes and databases, in certain case, you may think would not quite mix because Kubernetes was designed initially for stateless applications. And that is indeed true, right? And because of that, that reputation what Kubernetes cannot do state, I think, followed that for a number of years. But as I will show you some stats, I think the situation is significantly changing right now and we can see a lot of people running data intensive applications on Kubernetes quite successfully. To the point, what we even have the special community for folks who are running data intensive applications on Kubernetes, right? DoK, community for data on Kubernetes. Now, and I wanted to share some of the stats from that community, which I think is quite interesting. Like here are the stats from their community members, right in this case, which you can see a very significant number of them is running their data intensive workloads on Kubernetes, but what is more important, quite happy about that, right? We don't see a lot of people, oh, no, sorry, and expect that to increase their use. But of course, when you are pooling your own community, right, it's kind of not very representative, right? If you ask wherever, right? How many vegetarians are vegetarians, right? Like how many vegetarians don't eat meat? Well, the response would be pretty obvious, right? But what is interesting in this case is what most of the folks among those systems are quite satisfied how those things are working out for them in Kubernetes. Here is another survey which I find interesting. This one comes from a CNCF, that is a foundation behind the Kubernetes, right? So they don't have that specific bias for data intensive applications. And what they can see here is what their databases and the message queues, right? Which are also data intensive applications. There are some of the fastest growing applications which have been deployed on Kubernetes, right? So that is quite cool. Here is a specific data intensive applications which people see deployed on the Kubernetes. And you can see where database analytics, you know, AI, machine learning stuff, right? There's a kind of, you see like a big foundational data applications like avatar free, right? Of course, followed by some other like persistent storage and others. Now, how do you run the database in Kubernetes? And I think what has been the foundational change which came about over the last few years? And these are concept of the operator in Kubernetes, right? What is the operator if you don't know, right? That is the piece of software which can act in a similar concept as a, let's say human operator would do managing that system, right? Just an automated way. Why this is important for databases? Well, because databases typically are quite complicated, right? If you want, for example, to do something of databases, right? Saying, hey, you know, just rolling restart, killing pods one by one and restarting them, that may not be the best way to keep your database available, right? Or you may have a special process which is needed to upgrade the database to the new version as well as troubleshoot if something gets wrong with the update. Because when we have a database, we can just say, oh, you know what? If it didn't work out, we can just, you know, rebuild it from scratch. Well, you can, but you all lose your data and you know what? Data is what the database is all about, right? What I think is wonderful about the Kubernetes operators is what they focus both on a day one and day two automation. One thing, of course, if operators, you can deploy your database clusters, right? Or even kind of like a shorted clusters on, you know, many tens, maybe kind of hundreds of instances very easily, right? It really is much, much faster compared to this kind of legacy old ways, you know, installing things from packages, right? But what I think is much more wonderful and important is the day two automation, right? You can find, I think, many, in the early days of Kubernetes, there have been many people using, let's say, Helm for database automations. And oh, look at that. I got a database deployed. It's saying, well, so what? Maybe if you're using that for CI CD, that is good enough, right? Because CI CD database, we can always recreate them from scratch, right? Or in many cases, you can. Now, production database, they spend probably 99.9% of their life in day two, right? You expect to deploy a production database cluster, and then that's going to live out there for years, even for decades, right? It's better to be, you know, not going down, right? And in many modern applications, we operate, you know, with a 24 by seven internet, right? We don't really like maintenance windows that much, right? And with that, I think it's very important for your day two processes is solid. And that means it better if they're automated, because, well, hopefully, if a code is right, then machines make less mistakes than people. Now, I mentioned what a lot of things of operators are quite easy. And I wrote this little blog post which shows how easy to do certain things, right? With a cluster, with the help of a mini cube, right? If you have been using, you know, MySQL or any kind of other database, you can get a good feel and maybe become the, you know, Kubernetes convert. Another interesting thing with Kubernetes is if you look at a lot of modern database as a service solutions, they tend to be based on Kubernetes behind, right? And because that makes a lot of things easy in building a database as a service, right? And what that means for us is what actually that is very well tested in practice, right? I don't have access to the internal numbers, but we are speaking probably about millions of instances, which are running right now of a database in Kubernetes, right? And we don't hear like, oh my gosh, it's all going to hell every day, right? So if you know how to do it, then you can probably be quite successful with that. Okay, with that, very long intro, right? But hopefully helpful for some folks. So let's talk about some of the best practices, right, what we can see. Well, not pretty basics, but you know what? In many cases you get, you have this, you know, 80, 20 rule of the price, like any following the best practices is the most important ones, right? It's like you often hear about security, right? Why people get, you know, hacked over and over again? Well, because they have passwords on the postal notes, right? Basic things, but happen so, so many so frequently. So number one, as we discussed, the operators are wonderful, right? If you're deploying database on Kubernetes, you want to deploy that by using the operator which can manage that for you, not just something which simplifies installation and say, hey, you know what, you are on your own. Unless of course, you know, just, you don't really care about database space and just throw in the way and reinstall it maybe for testing that works for you. Well, the second one is you want to make sure you have a high availability setup for your database. Because the reality is in how Kubernetes works and in general, you know, stability of that environment, you do not want to be relying just on one database never going out, right? Like, I mean, there should not be pets in Kubernetes and that applies to the databases as well. Number three, you want to keep persistent data persistent. Obviously, right? But the thing is in a Kubernetes, right? As an also like in Docker concept, before that was a very, very common issue when people just, you know, deploy the container without some external volume and have that kind of wonderful database which works well until that port is destroyed with or Docker container destroyed right off all the data, right? Make sure you use, you know, local disks can be, you know, wonderful, especially if a database handles replication on its own, right? Or some fast remote storage. The next rule is you want to keep a data purport small. Right now, if you have your PostgreSQL database which is a 50 terabytes in size, right? And it's, you know, one massive instance. Well, maybe you don't want to move that to Kubernetes just yet, right? Because having a port with a 50 terabytes of data and, you know, like many, many terabytes of run and so on so that is not a very good pattern for Kubernetes, right? You want to move the data out there which fits well. And let's face it, while there is a significant progress made over the last years, right? Not every database for every use case is best fit for the Kubernetes use case. Well, the next one is to use appropriate node sizes for Kubernetes. One thing I have seen with Kubernetes deployment, people use like relatively small node size VMs because they're saying, hey, you know what? We are deploying relatively small web apps, right? So we don't need kind of like a big chance. But you know what? In a database, you may want them to be larger. Why I don't recommend deploying five terabytes, well, you know, maybe deploying terabyte database and having, you know, like half a terabyte of memory, right, 256 gigs, like 32 cores, right, or something. That is quite reasonable and that's what your database instance one wants to consume, right? But it obviously needs to have their node time to support that. The next one in production, you want to configure resource requests and limits. And I think this is important one like for, and many people kind of moving from VMs to containers don't really understand that, right? Because when you deploy VM, typically you have this amount of memory, right? This amount of CPU cores and it's kind of quite fixed, right, or you kind of get VM in a cloud. If you have the container deployed, right, it will use up to all resource available on the node. And that can be, you know, problematic, right, between different applications, different database nodes, competing for resources, right? In development, that may be fine. In production, you often want predictability so you want to make sure saying, hey, my database container is fixed at this amount of resources so I get like a uniform performance. And if somebody else running their workloads on the same Kubernetes cluster, hopefully I have a minimum impact to my database performance. The next one is use proper anti-affinity, right? That is another interesting thing, right? Because if you don't, you may end up to have all, let's say, cluster of a free database node all placed on VMs on the same physical server. Not a good idea, right? Or maybe if you have an environment with multiple, you know, racks, you may set affinity with a different racks, right? So if there is some, I don't know, let's say, whole rack fails because of power, right, or something, then you have availability set up, right? And I think that is an important thing for the databases. Next thing is tuning your database. Kubernetes may be magical, but it's not going to tune the database for you. All the good database practices are still applied, right, in this case. And that applies to the database configuration, making sure you follow a good schema design practices, tuning your queries, and so on and so forth, right? To get that. Now, here I will maybe mention a little plug of Percorna tool, which is an open source tool called Percorna Motron Management, which helps among other things to understand what are the most important queries and how, you know, what is their performance and how to fix them. Works with Kubernetes or database everywhere and open source dimension, so you can use that to get your database. Next one is scaling, right? Of course, as your workload grows, you may not be wanting to rely on a performance of a single note, so how do we want to scale? And that really depends on databases. Some databases say, hey, you know what? We can split them to get like a read-write, read-write, split-write or scale reads. In other databases, they support sharding and some sort of like really, or active-active clustering. You want to understand that because that also defines how large databases you can put on Kubernetes, right, and how much really traffic they can support. Next one is control eviction port priorities, right? Well, if any of the databases, even kind of in the best case it still, takes some time to recreate that port, right? Database often needs to at the very least kind of warm up to warm up to get the data in the cache to achieve optional performance and so on and so forth, right? So you want Kubernetes to go easy on killing the database pods, right, and rescheduling of something, right? Of course, it needs to happen and the operator should take care of that if we're limiting the performance impact, but at the same time you want to minimize it. And now one is making sure you don't expose your database outside of a Kubernetes cluster unless you have to and if you expose it to other applications running in your network outside of that cluster, make sure it's at least it's not exposed in the public because guess what? That is a very common issue of the data losses when databases get accidentally exposed to the internet. The next one is, of course, enable encryption, right, a good idea to make sure you encrypt the data both at rest and in transit. TLS is quite inexpensive these days if you set up a trite, right? So it's better to have it going, right, even if it's inside the network of your control. Also, for having the access details passed between application, you can use Kubernetes secrets which are the fantastic solution, right, which can help to reduce the credentials exposure and there are a lot of operators which integrate with, well, typically good operators are going to integrate with Kubernetes secrets and provide that kind of map into the database. Backups, of course, don't forget your backups because especially with Kubernetes, have a kind of higher risk to lose your data because of some sort of operator mistakes. So make sure you have them and remember, even if you have like five node cluster in multiple geographical locations, it's not replaced into backups, right, because somebody may still just, you know, log in and drop a table intentionally and intentionally, right, or trash your data in a way that you need to restore on backups. So the last one I would mention is considering new generation databases. There are some databases which are designed to run on that cloud native infrastructure and run exactly on Kubernetes. They typically are designed with a shard in scaling across many nodes, horizontal scalability in mind, right, and especially for large scale environments where it can be fantastic, right. I think what is interesting here is there are a bunch of solutions which exist right now which are open source, right. They are not as popular yet as Postgres or MySQL that they are growing rapidly. Okay, now let's cover in a few words what is going on besides the Kubernetes operators, right. So as we spoke, Kubernetes operators are wonderful in terms of reducing toil, similar to database as a service, but for many people UX can be different, right. You say somebody saying, well, you know what, I want to deploy the database and manage that with a couple of clicks. Well, Kubernetes operators require you to do things a little bit differently, right. And that is where we are working on open source replacement, right, the functionality for database as a service in PMM, right. I think what besides us, they're probably going to do more work, get into that, right, as well, often happens if open source, there is often, you know, many people working on some high value solutions. So at this point that is like a pre-use functionality, right, it's not GA yet, but well, you know, check it out, provide the feedback or maybe even some code and you can even check out this, right, where you can, you know, get the things or all the things deployed to play with without needing to mess with your Kubernetes operator installation and so on and so forth. Well, with that, that's all I had. And if you have any questions, I would be happy to answer them, but I think we'll need to start allowing the next speaker to set up. Is the next speaker here? No yet, well, he's lost, right. So you can start asking questions. Yes. What's the state of Galera? Well, what is the state of Galera on Kubernetes, right? So we at Percona have a Percona XRB cluster, which is Galera-based, right? That is what we use for replication because Galera is indeed very designed very well to work on Kubernetes. Yeah, any other questions? Okay, well, if not, then thank you, I appreciate it. You know, run off, you know, run open source, gluten databases.