 So good morning, good evening, good afternoon, depending on where you are in the world. Thank you for joining today's CNCF webinar. I'd like to thank you all for joining us. And today's webinar is how to migrate databases into Kubernetes. I'm Christy Tan, and I'll be moderating today's webinar. We would like to welcome our presenters today, Alex Chirkov, CEO and Founder of Storage OS, and Peron Castel, Product Reliability Engineer at Storage OS. A few housekeeping items before we get started. During the webinar, you are not able to talk and as an attendee, there is a Q&A box at the bottom of your screen. Please feel free to drop in your questions and we'll get to as many as we can at the end of the presentation. This is an official webinar of the CNCF and as such is subject to the CNCF Code of Conduct. Please do not add anything to the chat or questions that would be in violation of that Code of Conduct. Basically, please be respectful of all of your fellow participants and presenters. Please also note that the recording and slides will be posted later today to the CNCF webinars page at cncf.io slash webinars. With that, I'll hand it over to Alex and Peron to kick off today's presentation. Take it away. Thank you, Christy, and good morning, good afternoon, and good evening, depending where you're dialing in from. A little bit about myself. My name is Alex Kirkup, I'm the founder and CEO of Storage OS. I also hold two hats and I'm the co-chair of the CNCF Storage SIG. I am very focused on building a candidate storage solutions at Storage OS, but before that, I spent 25 years engineering a number of different infrastructure platforms, primarily for financial services. Always happy to hear feedback from the community, so feel free to DM me or join our slack and interact. Peron, do you want to quickly introduce yourself? Sure, Alex. Hello, everyone. My name is Peron. I'm based in London as our company Storage OS is. I'm a platform reliability engineer with a background on infrastructure and platform engineering. We usually run a lot of infrastructure on clouds and bare metal, so my background is mainly on systems and a bit of DevOps as well. Brilliant. Thanks, Peron. Before we start, I just want to give you a bit of background on Storage OS, where we come from. At Storage OS, we are building cloud-native platforms for customers and users who are running platforms in the cloud, on-prem or in hybrid environments. One of the things we come across very often is customers wanting to run databases in Kubernetes, so we're going to share some of the experiences. As I mentioned, I've also got my other hat, so I'm going to do a little bit of SIG advertising here. The CNCF Storage SIG is a public group. It works with the CNCF TOC team, and we help create content for end users as well as review projects and provide technical storage expertise to the SIG, to the TOC. The calls are open and are open every second and fourth Wednesday of the month, so we'd love to see you there. I talked about what we're seeing in our end users' communities. Of course, everybody begins their cloud-native journey, typically with containers. Containers have changed the application landscape because they've broken the dependencies between an application and an individual-specific server, so this concept of cattle versus pets, which we hear about quite a bit. Containers allow applications to be portable, and of course, the minute applications are portable, they can be orchestrated. This is where Kubernetes comes in, so Kubernetes being that container orchestrator allows developers to be able to compose and declare what they need out of their applications by defining, for example, compute requirements or network requirements. Kubernetes plays like an expert Tetris game with your applications to abstract away your infrastructure and fit the applications in the most efficient way to the available resources. Of course, because it has the automation, it can do lots of advanced services like automating, scaling, and healing, and connectivity. But of course, all applications store state somewhere, and one of the amazing things with Kubernetes is how, with the power of new standards like CSI, Kubernetes can interface with different storage systems. Now, developers can use cloud-native storage functionality that will integrate with Kubernetes to not only automate their compute and networking, for example, but also to automate storage provisioning, availability, and scaling. Now, what does that actually mean in terms of an end user? So effectively, now that you can specify storage requirements, it makes it very easy to move stateful applications into Kubernetes. And because those storage requirements and those applications can be defined with a standard set of YAML, we move to the scenario where it's possible to build anything as a service. And one of the things that we see happening very regularly is the concept of building databases as a service. So why would you want to automate the deployment of databases in Kubernetes and what are some of the advantages in this? So one of the key things with Kubernetes and cloud-native storage relates to this key concept that the containers and the applications are portable. So in this case, when I refer to an application, I'm talking about your database because every type of database is now mostly available as a container too. So popular databases like Postgres or MySQL, for example, as well as databases like same Mongo or Cassandra, and more distributed systems like Vitesse, for example, are all deployed as containers within Kubernetes environments. One of the key things here though is that those applications, those databases, are inherently portable and they benefit from the ability for Kubernetes to dynamically scale down a cluster and to be able to upgrade nodes on the fly, whether it's because you want to upgrade the resources of nodes or you want to upgrade nodes to make sure that software is current and security patches are applied, for example. So one of the key things to look out for here is to look for storage that is portable and supports the cloud-native attributes of that type of application and database running in Kubernetes. So effectively, we're moving away from an environment where servers and storage are tightly coupled, where the storage is locked into individual server nodes and we're moving to an environment where the data and the storage is locked into the database and the application and Kubernetes has the ability to move your application and therefore the storage is portable too. We talked a lot about the fact that in Kubernetes, the environments are declarative and reproducible. So what do we mean by that? In much the same way that you can specify the containers that are needed, in this case the database and the compute and memory and network requirements, you can also declare the storage requirements and items like scaling or availability, for example. And the reason why this is so incredibly powerful is because it makes your environment recoverable and reproducible. So you no longer are nurturing individual servers, but Kubernetes has that power of extending the database and recreating that environment wherever you are, making it very easy to have pre-production or development and production environments replicated very easily. One of the other things which applies to databases in Kubernetes is the way you scale the deployments of those databases. I will talk about that detail in a little bit, but effectively what we're recommending is that you co-opt the same sort of concept as microservices where effectively a database is no longer deployed in large gigantic instances, but different databases can be deployed in separate smaller instances and can be scale out horizontally with highly available or dynamically available database endpoint subtractions, allowing you to tune the workloads and to right size the environment appropriately. So we talked a bit about how that dynamic provisioning works, so I'll give you a little example of what we're referring to specifically for a database. So within a database we start off with a storage class. A storage class is a way to abstract away the definition for accessing a storage provider that Kubernetes is going to talk to. Now we mentioned CSI. CSI is the container storage interface and this is the standardized API which is used by a number of different storage providers to provide the interaction to dynamically provision volumes, but also to dynamically attach detection and mass volumes into different nodes within your environment. Effectively using a storage class which is an abstraction point for your storage provider, you can now define a persistent volume claim. That persistent volume claim uses the storage provider to dynamically provision a volume and is identified using a friendly name which you can then refer to in the application itself, typically defined in a pod or a stateful set. So in this case, for example, we're showing the crunchy data example to define a Postgres container that will use the database volume that we defined in the persistent volume claim. The idea behind this is that we now have a very simple way to define the way the databases run and the volume requirements that are being used. And if you were using a software defined storage system, you can also have the benefit of being able to use the same YAML configuration, whether you're deploying, say, in a cloud instance or on-prem or perhaps in different VMs or developer environments, for example. I'm going to caveat this and say that many databases actually have a superset of automation nowadays called an operator or some sort of capability to actually provide additional functionality around the life cycle of the database. For the purposes of this particular webinar and this example, we're restricting ourselves to providing examples that are related to starting a database using a simple pod or a stateful set. And that's mostly for simplicity, but for completeness sake, I'm mentioning the fact that operators are available for most types of databases to provide that database life cycle. So what does it look like in terms of what we've seen happen time and time again in different enterprises and then user environments? So typically, we start off with a large server or a service that has potentially multiple databases hosted in a single database instance, typically on a large node, and that may or may not have dedicated storage or may have some existing other storage provider. The point that I'm wanting to emphasize here is that most of this definition is now very, very server centric. And it's based on a scale up architecture, which obviously gets more expensive based on the capacity of those individual servers. But more than that, the functionality becomes operationally more complex and prone to failure, simply because of the fact that we're talking about running multiple databases within that large instance. So every time a new database is required, it requires a fair amount of manual effort. Perhaps it involves database administrators or DevOps teams to get involved. And of course, you know, potentially operational change windows for your database environment. So what does this look like in the new world? So starting off with that large database instance, we then are looking at how we deploy this within our Kubernetes instance. For the sake of this particular example, we're using storage OS as the underlying storage layer, but of course, other storage providers are available. And we start off by moving a single database instance within a container that is now runnable and accessible within the nodes, within a node in the Kubernetes cluster. And this is typically a pod or a stateful set that might be managed by an operator. We can then continue to break out the additional Kubernetes, sorry, the additional Postgres databases into their own mini instances, effectively making them portable standalone products that can be distributed across the different nodes in the cluster. So effectively, we're using the functionality of Kubernetes to turn each database instance into a declarative self-contained container. One of the aspects of using the provisions of the cloud-native storage within Kubernetes is that a lot of the storage providers provide the capability of having primary volumes and replica copies of that data in some form that are distributed between the different nodes within the environment. What this means is that, and again, using the automation that's available through CSI, that allows Kubernetes to talk to the service, to talk to the storage provider. When a database is working with a copy of the data, the data can actually be replicated and data protection can be applied such that transactions are being replicated to more than one node within the cluster. And again, there are a number of different software options that allow you to do this. And the CNCF-6 storage has created a white paper to describe the storage landscape in the CNCF, which is a good reference point that I'd recommend that you read. This type of data protection means that if a database commits a transaction to the storage environment, that data is now available on multiple nodes within a cluster, which means that if a copy of that data fails, say, due to a disk failure, resource failure, or a node failure, that the database can continue to run transparently because the storage system is handling this in the background. So this gives additional options to some of the database providers by providing storage-level replication that's managed by Kubernetes to protect the databases within that Kubernetes environment. Of course, there are other projects like Vitas, for example, that provide other methods of doing database-level replication, which can also apply in these environments. But suffice to say that the whole concept of allowing the database within the Kubernetes environment to benefit from that transparently replication means that you get automated failovers within your environment, especially because the database remains accessible via the service accounts and the service IPs that are defined as part of the as part of the pod or the stateful set. So just summarizing, databases are, of course, stateful workloads. And in order to take real advantage of these Kubernetes workloads and Kubernetes environments, it's important that you are able to apply the same declarative construct and composable construct that you apply to your applications to the stateful workloads to like databases. Containers are, of course, ephemeral, and nodes can be ephemeral, but using the capability of software-defined storage within Kubernetes, we can ensure that there are multiple copies of your data and data is accessible anywhere. And just re-emphasizing that CSI as a standard interface that Kubernetes uses to talk to storage providers has now been around for over two years as a GA function. And this provides all of the benefits to provide flexible storage for these environments to allow the databases to be portable with dynamic provisioning. One other, a couple of other aspects that I'd like to cover is that by splitting out each database instance, each database into its own separate containerized database instance, we have better performance and better throughput, typically because Kubernetes can now balance the load more effectively into multiple instances. We have the concept that each one of those database instances can be tuned and resources allocated individually per database, so you're not having to worry about one large instance for multiple databases. And of course, by scaling out horizontally rather than focusing on a vertical scale-up with Kubernetes, you reduce the failure domains and you reduce the blast radius and the impact of any one component within the cluster failing. All right. So with that, I'm going to pass on to, I'm going to pass the battle on to to Ferran, who's going to give us a live demo, talking, providing a demo to show us how to actually move a database from a standard server into Kubernetes. So this is probably the exciting part of the environment and I'll pass over to Ferran now. I'll stop sharing my screen. Thank you, Alex. I'll start sharing mine. Second, great. Cool. So as Alex mentioned, we're going to go into a live demo. First of all, let me give you a bit of introduction what we have. We have a Kubernetes cluster. Four nodes in Google Cloud could be somewhere else, but Google Cloud is as good as any other. And we're going to do actually what Alex explained about sending the main database into multiple ones in Kubernetes. So we have one main server with Postgres. Let's actually have a look. It's a standard Ubuntu box with a Postgres running. You can see just a simple process. It's just a standalone machine with a multiple Postgres databases. So let's have a look. We can see actually that we have multiple databases. Those databases are schemas. I created with ordinal numbers those databases for simplicity so we actually can see what we are migrating. But keep in mind or try to understand that this is something that we are creating a database for each microservice or for each component of our application or for each part of our holding, our company, etc., etc. So there are different ways of distributing our data to the store. So I'm going to move all these databases which I will understand each database as a microservice into Kubernetes. So I'm going to use k instead of kubectl to save everyone's time. Just bear with me. So I created the namespaces. So we have one namespace per microservice. Very simple. We want to keep isolated them. Even they can access to the network. One component to the other. We keep them separate. By the way, everything I'm going to use today is available in GitHub and we're going to share the links for the YAMLs, etc. And I want to encourage that everything you see today is something that I highly recommend to use into CI CD environment. So we're going to do it by hand so you actually can see what is under the hood or in the Kubernetes constructs. But you would use a CI CD or Jenkins server, whatever tool you prefer, to actually run those automations. It will make your life way, way easier. So first of all, I'm going to deploy a Postgres server on every single one of these namespaces. So we're going to have 10 databases or 10 instances of Postgres itself. To do so, I'm going to use these YAMLs that, as I said, they're in a GitHub repository completely public. And then we're going to migrate the data. So I'm going to iterate over them and just create the resources. In a second, I'm just going to show what actually are we doing. So to be able to provision databases, as Alex mentioned before, we're going to use stateful sets. Every stateful set is a Kubernetes controller that will enforce certain requirements. For instance, the amount of instances of the application. So let's have a look at the stateful set itself. The set is a Kubernetes controller that will enforce, in our case, to have one instance of the application always running. We are using a fairly standard or fairly common Postgres instance. And very importantly, we're going to mount a volume into PG data, into slash PG data. That is where Postgres will write the data. If you want to use and split the write ahead log and all other components that Postgres allows for speed or whatever, you can do that here, create different volumes. Because this is the clarity, it's very easy to just provision your volumes. Where are the volumes coming from? These volumes are coming from a template. That template is a persistent volume claim template. So for every application, for every part that this controller starts, it will have one persistent volume claim always associated to it. In this case, we're just referencing the storage class that we want to use for our software defined storage backend. But there are actually different ones in this cluster. So we have multiple. There are two of them that are software based. And there is one actually that is the world cloud volumes. And actually you can add more and create different storage classes that will actually have different capabilities or options for your specific use cases. Now, let's actually have a look at the pods that we have created. I'm going to use K9S, which actually is a very cool tool for live information. I'm searching for app. We can see that for every name space, there is one pod, one positive spot for every single one of those microservices. So now every microservice can have its own database. Cool. Every one of those databases have a volume associated. Let's say app three to say something, get pbc. And we can see that there is a pbc associated to it. So now we have 10 instances of Postgres running and they have 10 volumes associated to each other. Cool. Now we have Postgres running. We have the main server running. Now let's run the migration. Let's see. Let's see. For instance, let me show you in the instance itself before we did the migration some data that actually is in this table. So actually we can relate later about them. Let's connect to one of the applications. Let's say app two. Let's see there is one table. Actually it's just random data and there is just block files and block information. There is no real use, but we will be able to see that this table actually is migrated along to the database itself. As I said, it's very important that you guys use CI CD for these kind of tools or for these kinds of executions I'm going to do now. There is one thing that we actually need to have before we do the import itself and that is a way to connect our Postgres servers, our many instances in Kubernetes to the main server. We do that using the external service. It's a standard YAML. If you're not familiar with external services with Kubernetes, don't worry too much. Just for context, what we are doing is we are telling Kubernetes that there is an IP that we want to access from inside the cluster, but we want to give it a name. Instead of accessing the IP itself of the Postgres server, the main server, we are creating an endpoint and a special service that allows us to access that main server from a DNS name inside the cluster. We keep good practices and an elegant distribution of concerns. Actually, I have a name space that has that service. We can see that a service called PG, so we're going to access PG.name space, so PG.PostgresExternal, and that will forward into that specific Postgres database. If we see the endpoints, actually, it's mapped to a specific IP where my Postgres server is running. Cool. This is important because when we want to do things from an automated point of view, we want to change configuration while we don't have to change anything else. How are we going to run the migration? I'm going to use a job. A job is a Kubernetes constructor or a Kubernetes controller that will execute my container or, in this case, my task once. What this task does is a simple PG done and pipe to PG restore. Of course, if you have big databases, I have a small database today, but if you have big databases, you can add the size of the migration in here or the size of every chunk of data that is migrated so you can control a bit of the flow on the data and be more sophisticated with that. Essentially, it's very simple. There is a source and a destination. Where these source and destination variables come from? They come from configuration, as actually it's quite common and quite obvious, but sometimes we just want to keep these jobs as agnostic as possible. Through configuration, we will be able to run this. We will create one of these jobs for every namespace, so we will do an independent import for every database that we have. Let's have a look at the config map. The config map itself is again quite simple. We have a database. This is only one database, so we will have to use some templating and some patching to be able to specify the source database that we want to migrate. User and password, of course, use secrets for this. This is just a demo. The source and the destination, and this is very important. The source and destination are those DNS names I was mentioning before. The source is this external name that points to the main Postgres server, while destination is the name of the pod dot the service, because I'm using a stateful set along a headless service, which is a service that goes along with stateful sets, I can always access my application with the same name, podname.service name, because this spot will run in the same namespace as every database. It will always connect to the right one, because I don't have to specify the namespace itself. With this, the only thing we need to do is a patch. Let me create the config map, and then I'll show you how to patch it. Let's iterate once again over all the namespaces, so one config map per namespace, 10 config maps in the end, and now let's apply a patch. A patch is a simple Kubernetes, I could say construct, but it's a way that we can alter the YAMLs themselves. Of course, you can use, for instance, customize or any tool that you find or you prefer for your own templating. I just want to keep it simple, and on the other hand, you can do that from a CICD point of view. Very easy to do that. You have your applications or microservices listed, you simply iterate over them, and you tell every configuration or every application, which configuration has to apply. So we applied the patch, let's have a look at one of them. So we have the main postgres and the postgres migration. Great, we can see that in namespace app one, we have the database app one. Quite simple, but actually this allows us to be agnostic and run that as many places as we want. Now we have to just create the job for every namespace. Let's iterate over the namespaces once again and create the job. While they are created, let's go and keep looking at the real-time information. We can see that the population itself, you can see there are different ones, and for every namespace, we keep having one. Actually, because obviously the data is quite small, they're finishing fairly fast. Let's have a look at the logs of one of them. Let's say up six logs, and we can see simple pg-restore restoring our data. That database is my data. So now let's go into one of the applications themselves. Let's get a shell into them, and let's get a shell. And we can see that in app two, we have only one database. Instead of having 10, we have one. Let's connect to it. We have the same data as before. Something for the trivial, it's just a pg-restore. In our case, pg-restore is not blocking. So for production, it's pretty all right. For other databases like MySQL, if you use MySQL, don't be careful because that's blocking. There are different tools. So adapt to your specific use case. As always in engineering, we have to always find the best solution for our use case, not for someone else's use case. So pretty much we have now 10 databases with their own applications, and they are segmented. So we just migrated in a matter of 14 minutes while I was talking databases from one big instance that can be difficult to maintain, difficult to upgrade to multiple databases where their concern is a split. Let's say that app one, for instance, is very intense on reads. That application, that database, can be scaled for reads, can be improved to scale better for reads. Let's say we can add read replicas, or we can scale the size of that stateful cell or the pot itself to scale horizontally to make it bigger, or we can specify different configurations. While maybe app two has a different concern, for instance, is the microservice for user logins, where latency is not as important, but consistency is really, really highly important. Backups don't have to be the same time. We have a different concern. We have different tuning options. We can improve our availability and actually the blast radius of that, our databases. If our main big instance of Postgres goes down, even if it's a cluster itself, if we have a problem with that cluster, we will end up with a situation where all our information, all our services are down. With this model, of course, it has caveats. There is an overhead of running a database. That depends on your use case and how you want to split that. You will have a better scalability and a better microservice architecture, in my opinion, at least. Cool. The demo is pretty over. We can keep going and start doing things to databases, but I think it's a good time to pass to the Q&A. If you have any questions about the demo, feel free. I can reshare my screen once again and answer them online or on the command line. That's fantastic. Thanks so much, Ferran. We've had a few questions come in, so we can go through them. One Olivier asked if we can share the repo that we're using for the use case. Perhaps if you can send that to the chat window and I can mention that. We also had a question around what is the opinion about local PV for Postgres with replicas, such as Crunchy, or with Stalin implementation? This is a question that comes up fairly often. We often have a discussion about the pros and cons of different instances. Effectively, if you are thinking of a database like Postgres, availability can be done at the Postgres level with replication at the database level. It can also be done at the storage level, or perhaps it can be done with boats. Each of those options has perhaps different use cases, so using a local PV, which is effectively a disk that's only available on an individual node, means that Postgres instance is very tightly coupled to that node. It does make services like Failover a bit more complicated. If the disk on that particular node fails, it means that it's not straightforward to move the database to another node. Of course, you can do database level replication in that instance and then have and then recover to another database. That said, you are now managing two sets of databases and two sets of compute resources, because both databases have to be up to do replication. On the flip side, if you're using a storage system that's providing storage level replication, and again, there are a number of different options how to do that, you tend to have better portability, so databases can database instances can move around horizontally across the different nodes within the cluster. Additionally, you have the benefit that you have an extremely low recovery time if a database instance actually fails, say a node or a dataset fails, because one of the things that we do come across quite a lot is how does a failure affect the deterministic performance or the recovery process of a database? For example, if you have a very small database and you're doing database level replication, then if a database fails, it might be quite quick to sync up another replica of a database. If the database is large, that's creating another replica instance of the database could take a long time and potentially that impacts performance from a disk and from a network resource point of view while that recovery is happening. That's also another reason why we've seen end-users deploy both database level replication and or disk level replication in both of those instances. We have had another question. Sorry, go on. I was about to ask one of the questions actually for you, Alex, because you are here quite a bit about this. What's the preferred storage selected to deploy storage for stateful databases? What's the best recommendation? That's actually a really good question as well. It is a good question. The answer is it depends and I'll sort of expand on that slightly. In the past, we've often taught of the attributes of a storage system in terms of how the access method for that storage system is defined. So SAM being sort of block-based devices or NAS being, for example, a file system-based device or NAS being directly attached to storage to an individual node. The reality is, though, in a Kubernetes-type environment, we have a lot more layers to a storage system. I guess the important thing to understand first is what are the attributes of your database or your application? Often databases may not necessarily be optimized for throughput, but they probably are. They probably do require lower latency if they are to handle transactional performance, for example, and also databases typically have storage attributes that require strong consistency for data integrity and strong data durability. All of these different attributes, by the way, are well defined in the CNCF storage landscape, which we publish on the CNCF storage site. We define these different attributes and explain the different layers within the storage system that contribute to these attributes. One of the things that changes quite dramatically in the cloud-native world is that, because of those layers, it's no longer safe to assume that the way you consume the storage, say a distributed block system or say a file system, defines the attributes such as performance or latency. For example, we see file systems that are sometimes built on top of object stores. It might have the file sharing attributes of a file system, but the latency attributes of an object store, for example. Therefore, what you really need to do is just understand the attributes of the storage system in relation to what you need out of your databases. Typically, databases, as I mentioned, require the low latency and the high consistency. Therefore, you're looking at a storage system that can provide strong consistency and deterministic latency within these environments. That would tend to naturally select distributed block systems, but that doesn't always have to be the case. May I answer one of the questions, Alex? Will you go through the other if that's okay? Of course. Go for it. There is a question talking about the size of the persistent volume in the context of databases in Kubernetes. That is definitely a situation where it's not easy to know which kind of size you want to put. Obviously, persistent volume claims are declarative and when they are declarative, you have to set them at bootstrap. When you create the PBC, you set that image, that size. Of course, you don't know how it grows. You cannot just go like this. In fact, most or some of the storage providers for Kubernetes implement resize of PBCs. If your database runs out of the space because you put 25 gigs and it's not enough, most of or I would say most, but some of the solutions, mostly the software-defined storage solutions, allow resize of PBCs. It couldn't really be a problem for that. Of course, the answer is it depends on the size of your database and your data for each dataset. But in the end, you can just resize based on the storage provider. Yeah, that's brilliant. It's fantastic to see the continued development of CSI and how this enables the data to operations like resize. It's key to the day-to-day operations. Hey, Fran, one of these questions is for you. A user has asked, is it possible to replicate data from an on-prem database to Kubernetes databases in order to require minimum downtime? Yeah, 100%. What we did today was no more than connecting one application to another. We didn't care about the infrastructure. We just needed, obviously, the plumbing of or the network in between. As soon as your Kubernetes cluster can access, whether if you have VPN access in between instances, they are in the same subnets or they're visible through the network, from your bare metal through VPN, for instance, to a cloud environment or wherever it is, you can just configure one of these databases to pull that data to replicate from that master. From a Postgres point of view, that's not problematic. Actually, it's fairly easy. It's asynchronous. When it's ready, it's ready. The only thing you have to do is set up the configuration. We created a config map while I didn't show the config map today, but when we created a stateful set to create the application itself, the Postgres application, that has a config map, which in there you can specify all the configuration and you can configure there as lathe or more commonly called a passive instance of a master. I could highly recommend, though, if you do that and you have higher latencies, more than five, six milliseconds, to do asynchronous and to do a transaction log replication through Postgres, for instance, is good enough for that. If you create synchronous replication in between a data center and a cloud provider, or something that is far apart, you will struggle a bit with performance because the latency affects the input-output operations on synchronous replication. Brilliant. Thanks for that. We've also had a follow-up question on the database sizing. The question is about what is the recommended size of database storage to be effective in a Kubernetes database? I think that's a fairly open-ended question. I don't think there is a specific size that we would necessarily recommend. It can be as small as a few gig all the way up to several terabytes based on the actual size of your database. However, that said, one of the things that is worth understanding is, again, how size translates to the different attributes within your storage system. For example, depending on the storage provider, you may find that perhaps there are IOP thresholds or input-output operation thresholds or megabyte per second throughput thresholds that are correlated to the size of the volume. Sometimes you may find yourself having to over-provision the size of the volume to ensure that you have the correct number of IOPs available to run your database. That is definitely a factor which is very specific to the particular storage provider. Alex, if I'm going to shoot a question to you, what is the overhead in terms of memory CPU to deploy the storage OS on all the nodes of the cluster specifically in Kubernetes? Specifically with StorageOS, StorageOS is built to be extremely low overhead. StorageOS can typically run with a single core and maybe a gig or two of RAM depending on the amount of activity. StorageOS will largely coexist with additional workloads which are running on the cluster. Typically, it's often deployed in a hyper-converged type of topology where nodes are used to provide storage to the pool but also for those nodes also run the same applications that run on the pool. Of course, the amount of CPU that any storage system utilizes will be tied to the actual amount of activity, of course. If there's lots of IOPs or a high number like hundreds of thousands of IOPs, you'll see higher CPU contention by the storage system. We had another question around container-attached storage. Container-attached storage is a term that's used by a number of software-defined storage providers where those storage providers are actually providing storage, are actually deployed as a container and providing storage to a cluster that way. I think the answer to this question is, in terms of does container-attached storage lend itself to databases, the answer is obviously yes. There are a number of CNCF projects like Longhorn which falls into this category as well as a number of different vendor-supported projects like StorageOS that work in this method and effectively create volumes out of the available storage on individual nodes within the cluster. I think it is a particularly good fit for databases. It's a very common use case. Things like databases and maybe message queues are one of the most common use cases in terms of stateful workloads that move into Kubernetes first. A lot of the services offer some more advanced functionality, like which we haven't talked about today, but things like affinity and locality which provide additional benefits to databases within the Kubernetes environment. We also have another question around is it possible to run a Postgres cluster like a primary replica set up in Kubernetes and also how would you load balance to it and shard it? Is that something you can cover? Yeah, one second. I was answering one of the questions on the chat. You mentioned about the primary replica set up, right? Yeah, cool. Let me re-ask the question a lot. Is it possible to run Postgres QL cluster like primary replica set up in Kubernetes, also how would balance and sharding work in Kubernetes? Kubernetes, I would say, that is no different from running that in a different server. What happens is that you have an orchestrator that will leverage the startup of applications anywhere in the cluster real fast. That's fantastic. You no longer have nodes tied to applications, you have applications tied to resources. Storage is a resource, no different from CPU or memory. When you're asking for storage itself, if you have a system that gives that storage as actually what I was doing today with the PBCs, you're just asking for capacity and the storage system is giving you that capacity. Running the application on top of this is no different than running it somewhere else. In fact, because Kubernetes gives you the networking interface and the DNS itself is actually very easy to configure primary replicas or read replicas or active passive model. Actually, I would say read replicas is easy, active passive, use a storage system because that will be way more difficult. It's done for you, so you don't actually need to do. When we're thinking about primary and replicas, you can configure very, very easy a service that is for the primary. Then another service, let's say a stateful set with a headless service, then it's very easy to configure another deployment or a stateful set that holds replicas. I would say stateful set. You can have a stateful set and scale that to 10 instances if you want, so you would have the primary stateful set and the replica stateful set. Tiny bit of configuration to sync all the data or to put configuration for the syncing using DNS names that they never change, whether even though the applications, the pod restarts here and there, that doesn't change. Kubernetes handles that frontend DNS for you and then you can just access the replicas through one service called read replicas dot posgras and the main one to main dot posgras to say something. Then when it comes to sharding and things like that, you can do exactly the same. The load balancers that run in Kubernetes, it's a level for load balancing, so it's based on networking. It's not really, you can configure that if you want, but you don't need to. You can configure a load balancer itself physically, but you don't need to because all everything is handled, whether it is with IPVS or IP tables or different components of the CNI, the access to the network you can do by name. When you have to shard, you will hit any of the pods that serve a service. When you have to shard, you have to keep the sharding in your own application or posgras can do that for you because the routers are in front and they're frontended. If you, for instance, elastic search, elastic search does something similar, you hit some frontend part of the application that knows where the backend data is and it does all the aggregation when it comes to sharding, et cetera. It's fairly easy. In fact, I would say that it makes your life easier to run these infrastructure components in Kubernetes, at least in my opinion. Fantastic. And I think that covers all of the open questions and puts us neatly just a little bit over time. Yeah, perfect. Thank you again both for the great presentation today. Is there maybe your Twitter handles or a Slack link that you want to share with folks in case they have lingering questions after the webinar? Yes, indeed. Drop those in the chat. Oh, or your slide works too. Yeah, by the way, I shared the Github repo as well on the chat and in the questions. So we're all of our engineers and Ferran and myself are available on Slack. StorageOS.com and all the demos and use cases are available in the documentation on our website. I would happily discuss use cases and different options for moving databases into Kubernetes. Thank you very much. Great. Well, thanks again for attending. We hope to see you at a future CNCF webinar. A reminder that the slides and recording will be posted later today to the CNCF webinars page. Take care, everyone, and stay safe. Thanks. Bye. Thank you very much.