 All right. Hi everyone. I'm Lucas and today we'll talk about running Postgres as a service in Kubernetes. First of all, to give you a little bit of background about myself. I'm a member of the Azure Database for Postgres team, where we run Postgres in Azure in the Cloud. I'm also a member of the Citus Data Team where we distribute Postgres as database across many shards. Then last but not least, I also run a product called Pg Analyze, which is just Postgres monitoring. The reason we're talking with Kubernetes today, is because in my work, I've built Kubernetes-based products, especially in the Azure Database for Postgres context, and I've also had a personal interest in understanding better how do we actually run Postgres in a Kubernetes-like environment. All right. I would say maybe the most important thing to start with is, why should or shouldn't we run Postgres in Kubernetes? I think a lot of people over the many years, they have tried to run it in Kubernetes and they've run to various issues. I would say these days, it's much more normal choice than it used to be five years ago, for example. I would say the main benefit that you get from Kubernetes is consistency. If you run your application and your database in Kubernetes, you will make sure that you have the same mechanism to manage to control to roll out new releases for both the application and the database. Next, I would say really it's about having more portability. It's about being able to have a consistent deployment experience across different Clouds. Instead of relying on Cloud-specific provider APIs, like obviously Azure, for example, offers a managed Postgres service, but if you look at Amazon, there's a different managed Postgres service. If you ran in both AWS and Azure and Google, maybe it becomes very difficult to have a consistent experience. That's really where Kubernetes and running database in Kubernetes can make a difference, is because it is consistent. Now, the other benefit here is that we have lower latency. We can co-locate the compute and the database in the same Kubernetes cluster and on the same Kubernetes host so that if the application does a lot of fast queries, the network latency is very minimal. Then last but not least, I would also mention why it's maybe a bad idea to run Postgres and Kubernetes. Today we'll talk about how to do it, but I just want to call this out that this might be a bad idea like you maybe shouldn't be doing this, especially if you're just starting out and you just have a very small-scale setup, usually the managed database as a service offerings, or maybe even just running Postgres in a virtual machine is a better approach. It's really running anything in Kubernetes is complicated and complex, and databases are certainly worse than that outside. Let's take a look at how to actually deploy Postgres in Kubernetes. Let's see how that actually works. I think the main starting point we have usually, is really you have your Postgres container, and the Postgres container you could deploy just in a regular Kubernetes pod, no additional work necessary there. But I think that only gets us so far. You could do this today, you could just download the Postgres Docker image, just put some environment variables there and get going. The things I want to talk about today are really the other things that are not there in that situation. The things that you need to really add to your environment, to your setup to really make this work well and make it work as a service, not just as a one-off Postgres process. The six things we'll talk about today, we'll talk about the Kubernetes integration. We'll talk about if we go beyond just that single pod, how can the Postgres database service integrate with the rest of the Kubernetes environment? We'll talk about high availability. Specifically, what are the things to think about when you want to have a highly available service running on Kubernetes? Then we'll talk about backups. Obviously, closely related to high availability, if your whole cluster goes down, you would want to have a backup somewhere. We'll talk about that. We'll look at monitoring how to do that, and we'll look at the different solutions that are out there today. We'll talk about connection pooling, where you want to essentially be able to scale your connections when running Postgres in the Kubernetes environment. Then last but not least, we'll look at the scale-out capabilities. We'll look at what exists today, and then I'll also share some details on what our team here has actually built to scale out Postgres on Kubernetes. There's essentially three operators that I want to talk about today. An operator in Kubernetes world just makes it easier to provision multiple resources and acts as a scheduler as well as entry point into making sure that you can operate more complicated workloads. The first one we'll highlight is the Solando Postgres operator. Now Solando is Germany-based business, that I believe sells shoes online mostly, but they actually have a very skilled engineering department that runs a lot of Postgres, and they run Postgres in Kubernetes. They essentially built the Solano Postgres operator, which builds on a HAA Docker container solution called I think spilo or stylo, and then that builds on a HAA solution called Patroni, which I think Patroni is probably the most well-known project that they've done in the Postgres community, and then the Postgres operator builds on that. Second of all, we'll also look at the CrunchyData Postgres operator. CrunchyData has been an active contributor in the Postgres space for quite a while now. They employ a number of Postgres committers, and so one of the ways that they provide their Crunchy certified Postgres is using the CrunchyData Postgres operator for Kubernetes. This one has a lot of advanced capabilities, so we'll look at some of those. And then last but not least, I also want to tell a bit of what we've built here in the Azure world, which we essentially have an offering called Azure Arc, which brings Azure services, including the Azure data services to other environments. So this means that we actually run a Kubernetes-based service in Amazon or in GCP using Azure technology. And so I'll highlight a few things there, and specifically we'll talk about Postgres Hyperscale, which is this Kaila out version of Postgres that we offer. All right, so kind of looking back at this, so what are the important things here to consider when you're doing kind of this Postgres as a service in Kubernetes? And so I think the three things to look at on the Kubernetes integration side first of all, I want to look at, how do we interact with the Postgres service, not just the process, but really the whole kind of management deployment mechanism. Second of all, we'll look at kind of how these operators integrate with the namespace in Kubernetes. And then last but at least, we'll look at kind of how that integrates with storage. Now, if we look at kind of our three operators again, so very first operator is the Solana Postgres operator. So the Solana Postgres operator is very simple, which I think is one of its strengths and also it's one of its weaknesses, which means that it doesn't actually have any CLI. When you interact with the Solana Postgres operator, you interact directly with the operator or directly with the CRDs rather. And so you create everything through Kube CTL commands. That means, it's very straightforward, right, you have the operator, you have the Postgres Kail CRD, custom resource definition, and then you have the Postgres container or pod that gets deployed. There's many sidecars that you can also deploy, but really it's pretty straightforward. Second of all, the Crunchy Postgres operator. Now, this is where it gets interesting. And I think that's a pattern that I've seen work well, is where in addition to the Kube CTL CLI, you introduce an additional CLI. And so the Crunchy operator calls this the PGO CLI for Postgres operator. And so using that PGO CLI, you provision things, you manage things. And you do that through the API server, right? So the PGO CLI essentially contacts the API server, does various operations that ultimately end up in, custom resource definitions being updated or other operations being done on the Kubernetes cluster. And then Postgres kind of running again in a container. And then last but not least, right? And sorry to found this caught up there a bit. But so what we've done for Postgres Hyperscale Azure Arc, we've taken a similar approach, right? So we also introduced a CLI called AZ data, right? This is similar to the Azure CLI that's called just AZ. And the AZ data CLI interacts with the Kubernetes cluster. And so here, it integrates again with an API server, that API server again talks with the operator and the kind of CRDs. And then, because this is Postgres Hyperscale, there's actually more than a single Postgres container actually running the workload. So one thing we'll see here is that this is very similar, right? So these, I would say there isn't anything unique here and really the operator pattern, which has been around for a while and Kubernetes now has really proven to work well. I think the question you should maybe ask yourself when you evaluate the solutions, right? Is are you okay with going directly to the Kubernetes resource definitions? Then something like the Solander Postgres operator might be more straightforward or quicker to use because it's very simple versus the, both the crunchy data Postgres operator and the Azure Arc kind of system. They're both, I would say much more featureful and much more complex. The other thing we can look at here and a kind of apologies for the font there. So we essentially have the specific resource definitions, right? So here essentially we looked at, for the custom resource definitions for each of the operators, how do they compare? And you would see that, pretty much all of them have a way of defining the version numbers you have, and the resource definition you can also specify which uses exist. Both the crunchy operator and the Azure Arc operator have a way to specify the resource limits and they specify a resource limit of either 28 megabytes or 256 megabytes by default. And so that just kind of helps you understand also how this gets scheduled, right? I would say if you just visually look at this, right? Clearly the crunchy operator has the most options here for better or for worse. I think it has a lot of customizability there and it has certainly grown over time. And then one thing to look at with the crunchy Postgres operator in particular is the different namespace modes, right? So the crunchy Postgres operator, and this applies to all the operators really, the way the crunchy Postgres operator defines it is there's three ways how the operator interacts with namespaces. So there's the dynamic mode, which means that essentially the operator has most control, most permissions. And so in this mode, the operator actually can create new Kubernetes namespaces. It can also manage all the RBAC, right? Get all the permissions essentially or set all the permissions. And then because of that, well, it requires a lot of privileges amongst other things, the cluster role approach, which you may not want to give away or depending on where you're at in a team, you may not even have that permission on your Kubernetes cluster. Second of all, it defines a read-only mode. Now, this is very similar to the dynamic mode, but the only difference here is it doesn't automatically create new namespaces or RBAC, right? It lets you configure that. So from a security perspective, that is a bit easier to audit, but it still has roughly the same permissions and still requires the cluster role privilege. And then I think the most restricted mode is the disabled mode, where it deploys to a single namespace, doesn't require any kind of cluster role privilege. So that one is really, if you are in a very specialized environment where you want to make sure that it only operates within a single namespace with very limited permissions, then that's the right mode to use. And so you will see that this same pattern applies to the other operators as well. It's just something to think about as you roll this out, which one are you okay with? And generally I would start with dynamic because it's just so much easier to use. Next, let's talk about storage, right? So I think the thing to mention on storage is that generally these days, storage is roughly a solved problem, right? Like stateful workloads on a log-in issue with Kubernetes. And this assumes that you use network storage. So I would say, it's generally a bad idea to use anything else, the network-based storage and especially in an environment where you're doing like high availability and you're failing over between servers. Like it gets very messy once you use local storage. And so I would generally recommend, you know, use something like a premium managed disk if you're running in the cloud or, you know, if you're running on-prem, having some kind of SAN or NFS-based file share that you can use for storage. And so here, right, we essentially, you know, have the persistent volume that, you know, kind of gets associated with persistent volume claim. And that's really, you know, the most straightforward approach. Additionally, if, you know, you're using the Crunchy operator, it actually supports table spaces, which are, I would say, you know, the most advanced mechanism in Postgres to kind of utilize different storage types. So that, you know, can be helpful if you wanna use a really fast provisioned IOPS type disk for, you know, most of your workload, but then you have some archive content. And for that archive content, you essentially have it on a, like, slower drive. All right, let's look at high availability. So for high availability, maybe, you know, taking a step back, what are the scenarios we're trying to solve? And so I think the, you know, most commonly, I would say, you know, we'll start simple, right? We'll have a single Kubernetes cluster. We'll have an operator running in that Kubernetes cluster. And really, when we talk about high availability, what we mean is more than one Postgres server running, right? So you could always have some form of reliability or availability of sorts, not really high availability, by using a shared storage. And if, you know, your compute crashes, right, your storage is still there, so you can come up on a different Kubernetes node. However, in practice, I would say that that doesn't usually fulfill the requirements on RTO and RPO, especially RTO is often, you know, a problem, like it's just not fast enough for large database to bring up a new server from a cold storage effectively. And especially, you know, if you take into account like Postgres recovery times and so on, that can often be a problem. So that's kind of why I would always recommend, you know, having a highly available setup where you actually have, you know, a primary and a secondary running, both in the same Kubernetes cluster. And then, you know, maybe to call it out here explicitly using synchronous replication, right? So I think if you are looking for an RPO of zero, you really need synchronous replication in Postgres, because only that way, if you write to the primary, the secondary also confirms that, you know, the data is actually still there. Now, if we take a look at, right, what happens in this scenario when a node fails? And so the key capability operator here needs, and all the operators that we talk about today, by the way, have this capability, they need to detect the failure, right? So if the primary fails, there needs to be a mechanism that actually understands that in response to it. Second of all, it needs to promote the secondary, right? So it needs to promote the secondary to be the new primary. And then eventually it needs to be able to recover, right? So after having done a failover to that secondary, it needs to be able to bring back up a new node, maybe in the same, you know, Kubernetes node or not, right? But like you essentially need some mechanism to bring up that node. Now, the other case to look at here is, you know, what happens if we, you know, let's say we have one Kubernetes cluster per region, but we actually have, you know, two geographically close regions, let's say, you know, Seattle and Portland, I have two data centers there. And so we don't really want them to be the same Kubernetes cluster, but we do want to have some kind of high availability between them. And I've seen this many times, actually, that, you know, especially for, I would say, mission critical workloads, this is, you know, an important requirement. It's not really solved well today, but I'll still kind of walk you through this, right? So first of all, in this scenario, we need to look at async replication, right? If you have enough geographical distance between your two Kubernetes clusters, you can't really, you know, solve that async rep because it's just gonna be too slow to, you know, commit something in Postgres. Now, second of all, whilst, you know, here it is drawn that, you know, the operators talk to each other in practice, no solution today solves this problem. So this is, I would say, you know, something to call out and to say, are you okay with this being a manual process, right? So in the case that, you know, the data center fails, really what you ideally would want is that, you know, the operator is actually able to promote the primary. And the difficulty here, of course, is like, how do you handle a network partition, right? How do you actually have, you know, the operator have enough information to be able to promote that kind of node in the new cluster to be the primary. And, you know, if we look, if you compare this with the different operators, right? So the Solano operator, Crunchy operator, and the Azure Arc operator, they all have, you know, HA from the same Kubernetes cluster built in, right? So this is something that you generally don't need to worry about. There's, you know, various configuration settings, especially around, you know, if you want to use Syncrap or not, again, I would say generally you should really use Syncrap. I think it's a bad idea if you don't use Syncrap. But, you know, they all kind of come built in with this. On the other side, though, right? Like having HA across different Kubernetes clusters is something that is usually left as an exercise to the reader or a person setting up, you know, the Kubernetes environment. And so here the thing to look for is, right? Streamline this process as much as you can. So assume that, you know, the cluster, if you have a region wide failure, right? And you have our cluster wide failure. I would say the most important thing is how quickly can you get up the other cluster or if it's already up and running with a read replica, how can you promote that replica to be primary? And so I think really, you know, if all of these operators, it is important that you actually exercise this and you make sure that this works well. Last but not least on high availability, right? The thing to also think about is, how do you use Kubernetes labels in your environment? So Kubernetes label allows you to, you know, differentiate, like notes that might otherwise look the same, right? Like same compute, same kind of instance size, but that are, you know, for example, here, these are the labels from, you know, an Azure Kubernetes service cluster. And so you'll see that, you know, the whole cluster gets applied, the kind of region wide label, right? So this was a cluster that's in VestUS2. So the whole cluster has a VestUS2 label. And then in addition to that, we kind of, you know, see the different zones, right? And so they were three nodes deployed here, each in one zone. And then depending on how your operator works, it may deploy Postgres in different zones or it may deploy in the same zone. And some of them also allow you to configure that, right? But it's something to think about because maybe it is good that you're all in one zone because what you're trying to do is have your application and your database co-located, right, for performance reasons. And so in that situation, you probably want pod affinity, right? You actually want things to be close together versus there are situations where you might wanna have pod anti-affinity where you wanna have Kubernetes pods, like Postgres Kubernetes pods be placed in different zones. And so for example, with, you know, the Crunchy operator or this Lando or Azure Arc, you can kind of, you know, specify these and kind of, you know, ensure that it essentially gets spread into different zones. All right, let's look at backups. So backups are pretty straightforward and we'll actually get back to this at the end with a little demo. I would say, you know, the different operators have different approach to backup though. So this Lando operator, I would say, you know, takes the, I would say simplest approach, which is it utilizes Wally, which is, you know, a third-party existing Postgres tool has been around for many years. Now Wally only is able to backup into object storage, right? So S3 or Azure Blob Storage. And so if you use this Lando Postgres operator and with backups, you really need to have something configured that it can backup into, that is, you know, outside of the cluster. And the same goes for, you know, powering the point-in-time restore capabilities. You really need that, you know, object storage that you can utilize. Now, next, if we look with the Crunchy operator, the Crunchy operator actually has, you know, backups built in for local volumes. If you utilize this PG Backgres, which is a much more flexible solution than Wally, also built by Crunchy themselves. So that's why, you know, they know how to use that, of course. And then for offset backups, they'll also again, utilize this Amazon S3. So I think, you know, if you're trying to get a backup outside of your cluster, then ultimately it will again end up in S3. All right, last but not least, the Azure Arc kind of operator. That one also kind of, you know, has a way to backup to local volumes. Now that's a built-in mechanism, so it doesn't utilize any open source, like existing open source tools. It, you know, it's just built right in. And then, you know, it's also able the same as the others to be able to do point-in-time restore. And then the offset backups, they are done, I would say in a more simple way, but, you know, kind of flexible way using Kubernetes volume mounts. So if Kubernetes volume mounts, you could not just, you know, mount something locally, you could also mount the remote storage. And so essentially there's a mechanism to synchronize between the local storage and that remote storage on a schedule basis, so that you have both kind of local and offset backups. All right, looking at monitoring. So, and monitoring is by the way, something that, you know, I spent a lot of time on. I gave that I run PG Analyze, which is, you know, monitoring product. However, today I kind of want to focus on, you know, what's built in, right? So what are the capabilities that you get just from using these operators? And you could use again, other third party tools, right? You could use PG Analyze, you could use Datadoc, you could use Cortex. There's many Postgres monitoring tools out there. And usually they all work pretty well with Kubernetes. For this year, just looking at, you know, what are the things that are built directly into these operators? So I think the Solando operator here, I would say in a sense is the weakest, right? Like it doesn't really have anything built in. You could of course still run something, you know, just graphing the Kubernetes pods and so on, but there is nothing built into really specifically look at Postgres metrics or get the Postgres log output. The crunchy Postgres operator utilizes Grafana. Grafana obviously, you know, is standard these days, I would say for, you know, anything visualizing metrics and it uses Prometheus underneath as well. For logs, it utilizes PG Badger. Now PG Badger is, you know, standard Postgres log analysis tool. You could always download logs from a Kubernetes cluster and run it yourself, but what they've done essentially is they've, you know, in the crunchy operator, have a way to provision a site car that then uses PG Badger on a scheduled basis to actually analyze the logs for you. So it's kind of a neat, you know, addition, I would say. And then last but not least with Azure Arc, we've also utilized Grafana. You know, again, I'm a big fan of Grafana. And then we've utilized Kibana for logs. Now the other interesting thing here, and this I think talks a little bit to, I would say my personal unique perspective on this, right? Because we're not just trying to run something locally in the cluster. We're also building a connection to Azure, where we're essentially saying, you know, everything that runs in a Kubernetes cluster can also be linked to Azure. It doesn't have to be linked to Azure, but you can link to Azure if you see a benefit in that. And so one benefit here is that we can actually provide the Azure monitoring capabilities, right? So you can treat the Postgres that runs inside that Kubernetes environment that could be running anywhere. You can treat that Postgres the same way as a managed Postgres offering in Azure. So you can use Azure Monitor, or you can use Azure Log Analytics pretty straightforward. And so, you know, from a, I would say experience perspective, that is very consistent, right? So like here, just a quick screenshot of that, right? Like this is the exact same experience you have for any other kind of resource in Azure. And I know the team, you know, has invested a lot of time making this work, right? This was not straightforward to implement, but I think it's really nice to have a, like single pane of glass experience with your different services running both in Kubernetes, outside of Kubernetes, across different clusters. All right, let's look at connection pooling. So for connection pooling, I assume people are familiar with PG Bouncer, but I'll just, you know, repeat it for those who are not. So PG Bouncer, I would say is, you know, it's so important for running anything at scale of Postgres, right? PG Bouncer really helps you, you know, reduce the cost of an idle connection. If you run PG Bouncer in transaction pooling mode, it really helps you, you know, reduce the overhead you have for keeping connections open. So as an example calculation, right? Let's say we have, let's say 20 application servers, and we have 50 connections for each of those application servers, right? Generally, if you use a framework like an ORM, they, you know, when it starts, they allocate a fixed amount of connections. Even if the application doesn't actively need it, they pre-allocate it so that there's no connection initialization latency. Now, what that leads though is to is that there is essentially, you know, 1,000 or 2,000 or 5,000 connections to Postgres that need to be kept open. And so Postgres is actually not good at this. And this is a known issue. Folks are working on approving this in Postgres itself, but, you know, this will take time to really get better. And so what people usually do is they put PG Bouncer in front of Postgres, right? Put it in transaction pooling mode. So that means only when a transaction is open, it actually needs to be mapped to an actual Postgres connection. And so before, you know, a transaction is open, it just sits there as an open, like port open CCP session on PG Bouncer. And so that's where, you know, you get essentially an order of makings of difference between the amount of connections to be, you can handle in PG Bouncer and the amount of connections Postgres itself would handle. And so in practice, you know, if PG Bouncer you could have thousands of connections without much overhead. And so, you know, again, looking at our different operators here, and so I think this is something where, you know, the team working on Azure Arc, for example, knows this is a problem, right? So they're working on adding this, both the Crunchy and Slender operator, you know, have been around for a while and they've already kind of, you know, built this in. And so, you know, it's literally like flipping on a switch using a different port and, you know, boom, you have your PG Bouncer. And I think that's really the experience you want, right? It's like just having a built in so you can just easily use it in your application. All right, last but not least, let's talk about scaling capabilities. And so I want to kind of start with, you know, mentioning the obvious, right? Apologies for the font being cut off there. But so the most straightforward way to scale Postgres is to use read replicates, right? So, well, first of all, the most straightforward way to scale Postgres would be to just make a bigger database, right? Especially if you are in a Kubernetes environment, you may not, you know, wanna introduce one node that's, you know, suddenly much larger than the other ones, right? So let's say all your Kubernetes nodes are 16B core, 32 or 64 gigabyte of memory type instances. Now, you know, the traditional way of scaling up Postgres is just get the largest instance. And you can still do that, of course, but oftentimes it's not desirable to introduce, you know, an odd node essentially into the setup. And so read replicas help you scale the read workload, right? So that's really means, you know, your application has to essentially decide when it makes a query, is this a query that only goes, you know, to do a select statement essentially on the database that only does, you know, read only modifications, read only reads, or is it, you know, something that needs to write to the database and actually change the information. And so that's where, you know, if we like look at kind of read replicas, the other thing to call out is that you have the exact same workload, right? So across all the kind of the different servers that, you know, the primary and the secondaries, you have the exact same data size, which means that, you know, your maximum storage size that you can handle per server is the same as the maximum data size across everything. So this can be very limiting, depending, you know, if you go larger and larger, right, to give terabytes of data at some point, you know, maybe you want to be able to use two different storage systems and to split that workload and read replicas don't really support that or allow that. All right, now here's kind of where the Citus extension or hyperscale Citus as a service comes in. So I first of all want to mention, by the way, that Citus is open source, so you can just, you know, go to GitHub and look at the source code, it's an extension for Postgres, right? Nothing to be hidden there, it's pretty straightforward in terms of how it works. I think a lot of the, you know, magic goes on in terms of, you know, how to route queries here. Now, the way this works is that you have an application, again, you know, that talks to database. So here it kind of talks first to what's called a coordinator node. The coordinator node is really the entry point into the system, right? So that's really the point where you just, you know, run a select statement from a certain table. And then ultimately that ends up, you know, going to different data nodes, depending on which part of the data you're accessing, or if you're accessing all the data, like you're doing an account star, it actually, you know, would summarize the data across in parallel, across all the nodes. And so the benefit of that is that, first of all, you could scale both the read and the write performance, right? There's also a mode where you go directly to the data nodes instead of going through the coordinator. And then, you know, the really, I think the main benefit is if you have a lot of data, it allows you to go beyond, you know, that limit that you often see around, I would say a couple of terabytes, it starts being difficult in Postgres, right? It starts being frustrating to use. And that's really where, you know, something like Citus shines, is because it allows you to make smaller nodes again and smaller tables. The other benefit here is, you know, it's more gradual. And so I think just to maybe call this out, right? I think this is really the difference between scaling up your database, right? Increasing, like keep increasing the size. And again, in a Kubernetes environment, that's often not that easy, right? Because you have kind of the same size nodes. And so really the benefit with something like, you know, sharding-based architecture, like you can do with Citus, is that you can essentially just add more nodes, right? So here, illustrate with the slider where you can just increase the node count or pod count in Kubernetes. And then that allows you to just increase how much capacity you have. All right, so with this, I'll now switch to kind of showing you a little bit of what the team has worked on here. Let me see that this works. So I will start a screen share. And I assume this works. All right, so here we essentially, you know, see a deployment that I've done earlier, which is maybe start from the beginning, right? So kind of what we're looking at is our, is kind of a virtual machine that's running Kubernetes environment. It's all kind of, you know, for ease of demoing, it's all in a single VM. Now what you'll notice is that things are not always fast. That is because I'm running in a single VM. But we're running a Kubernetes cluster here. You know, illustrate that with doing Qubectl like at pods, we can see that there's, you know, a bunch of pods already running. So we kind of, you know, have our control system portion of this, which is, you know, all in the arc namespace. Then I provisioned a Postgres deployment earlier already called PG01, which I'll kind of show you more about in a second. Then there's some other, you know, internal Kubernetes system aspects running. So this is, you know, pretty straightforward Kubernetes. Now, coming back to this, you know, popular CLI, right? So I could, of course, you know, now go in here and I could, you know, manage things through the Qubectl CLI, but it gets a bit cumbersome. And it's, you know, again, especially if you're thinking about, you know, not just single pods, but you're thinking of server groups and you have, you know, HA and whatnot come in the backups, it gets very complicated to go through Qubectl. And so that's why we kind of have the AC data CLI here. And the AC data CLI has different namespaces. It, by the way, can also support SQL server deployments, not just Postgres. And so now, you know, we're gonna do AC data Postgres. Let's just start with, you know, listing our Postgres instances. And so what this does now is it goes to the API server, right? And through the API server, essentially it makes, you know, query figures out, you know, which Kubernetes resources exist through kind of this database service CRD. And so here we can see, right? We have this one, Pg01 resource. Now, let me illustrate just how you provision a new one, right? So this is the same command I ran earlier. I'll just, you know, kind of run it again with a different name. And so the things to mention here, we have kind of, you know, the name, right? This is Pg02 in this case, the namespace that it gets deployed into. Now, you know, as we talked about earlier, the, you know, namespace model does make a difference. So by default, this also operates with, you know, a more privileged model because it's just much easier to use. But so that does mean that it, you know, requires the cluster role privilege and so on. Next, you specify your work account, right? So here we say W22, which means that we want one coordinate node in Postgres hyperscale terminology and two worker data nodes. And then for each of those data nodes, we want to have data size of one gigabyte. We also want to provision a backup volume. We'll use that backup volume in a moment to, you know, store our backups. And then last but not least, we also link this to Azure, right? So this is, you know, again, because what we're doing here is we're really building an Azure data service on top of Kubernetes. And so here you're specifying an Azure subscription and Azure resource group that then, you know, allows you to surface the data into Azure into the Azure monitor system, for example. All right, so we got our new server group here. I'll just, you know, kind of connect this to illustrate. And so, you know, first of all, maybe to call out the extension here, right? So again, we're using the Citus extension for distributing data. This is, you know, open source and GitHub. You can take a look at it, run it yourself easily, run some Docker container and Kubernetes and other places. So praise your forward to use. And so what we'll do now is we'll create a distributed table. And so we call this test and we'll give it, let's see, we'll give it a ID column. And type this the wrong way. All right, there we go. And then, so I'll insert something into this table, right? So if I now do a select, this works, standard select. Now, the important thing to mention here, right? So, so far I've just created a table. This is a regular Postgres table. And so if you do an explain on it, you'll see that this just goes directly to the table, right, does sequential scan on the local table. Another thing I'll do is I will create a distributed table. All right, and so what this did, and you know, this works for both tables, large and small, is it first created the different shards for this database. So a shard in, you know, kind of the hyperscale terminology is a kind of a portion of the data, right? So each of the data nodes has multiple shards. And so here by default, we create 32 shards. So we create a 32 shards for this test table. And then we copied the data from the local table, which in this case was just a single row, onto these shards. And so now what we can do here is if we run the explain again, right? We can see that we're actually, you know, going through the site as executor, like through the custom scan in Postgres, and we're actually kind of, you know, reaching the, you know, distributed system and we're reaching these distributed nodes running on one of our Kubernetes pods, right? So here, for example, we can see it references, you know, PG02 S000. And so if I go out again now and I list my pods and, you know, let me just, just a default namespace, right? So you can see here, this kind of host name, right? Maps to one of the pods here. And so you can see that, you know, really just has a pretty straightforward mapping into the Kubernetes environment, but it's still easy to manage because it's still a single Postgres and you still query it through a kind of single connection. Now the other cool thing I want to show you, right? So right now we have two data nodes or two pods, data pods of sorts, to kind of power these queries. Now what we can do is we can actually scale up. So I'll use the AIDS data Postgres service update. Specify PG02. And let me also specify the worker count. Here I'm essentially increasing capacity from the, you know, initial two workers I specified to now four workers. And so this kind of, you know, lets me essentially double the amount of Kubernetes resources I'm using. And again, this doesn't have to be in the same Kubernetes node, right? So this could be many, many nodes. And so here I could specify, you know, W100, for example, right? If I wanted to scale up that much and it would allow me to really, you know, have kind of this limit less scale beyond that single Kubernetes node. And so now if I go back here, you know, while this is scaling out, it's like keep running my queries, right? So you can actually see there's an internal table. Hope you just know that we can see the two nodes that, you know, we added, they're not yet online, right? They're still being added. They're still initializing Postgres. But so even whilst this is happening, my queries keep running, my queries keep being functioning. All right. So with that, I just want to, you know, give a brief shout out to backups and then we'll end. So we also have a backup mechanism here, right? And so the, just briefly to mention, right? So you can set up backup schedules. And this again, you know, this works with all the operators. So except for maybe just Lambda one, where it's a bit more complicated. But, you know, the key thing here is I can take a backup, right? I can go back to that backup. I can do a point time restore, right? So maybe to briefly illustrate the restore option we have here just, you know, we kind of have the ability to not just say, you know, go back to this specific backup but really go back to, you know, the different times, right? So I can say, let me, you know, use the Postgres wall to, you know, go back to this particular date, right? So if I, for example, you know, yesterday at 9 p.m., something happened. So I want to go back to the backup of 855 to recover some lost data. And so I would say this is a key capability that you really need in any kind of solution that you're looking for. All right. So with this, let me go back here. And that's everything really. So today, you know, we kind of looked at the different operators that assist running Postgres. We also briefly discussed, you know, the option of just running it yourself. And again, I would, you know, really encourage you to look at one of these options, try them out for Azure Arc that's actually still in preview. So if you're interested in testing Azure Arc, please, you know, send me an email. I'd be happy to, you know, share access and kind of get your feedback on it. And then, you know, again, if there's any other questions, I'm happy to take some questions now. All right. Hi, folks. Thanks for listening to the talk. I don't think we have any questions just yet, but I'll wait around for maybe a minute or so to see if somebody has questions. There was also quite a lively, I haven't used this question yesterday because somebody had posted the slides before the session online. So I'll post that into Slack later if you're curious on, you know, why different folks on the internet think that it's a good or bad idea to run Postgres and Kubernetes. I think a lot of strong opinions, but I think some good experiences there all that can use for it as well. Perfect. And somebody says, thank you. Thanks for the perfect, all right. So let's see. There's a question from Jonathan. So Jonathan is asking, what's under the hood to take backups? Yeah, that's a great question. So I think ultimately, right? Like in Postgres, it like commonly I would say, you know, there's two components to backup. There is the full backup that you need to take, right? Which ultimately comes down to calling PG stop backup in Postgres, copying out files, and then, you know, calling PG, sorry, PG start backup, copying out files, then doing PG stop backup. And so that's something that, for example, you know, the Azure Arc operator does kind of for you, right? Versus you could also do it yourself, of course. And in the case of Crunchy operator, for example, PG backup does that for you, right? So it's kind of this core capability on the full backups. In addition to that, there's the wall stream that needs to be backed up. That's where, you know, on the Azure Arc site, for example, the wall files just get copied to a volume. So pretty much the same as the regular disk, kind of full snapshot, versus for the kind of, you know, what Crunchy, for example, does, or the Solana operator does is it works in object basis, right? So there you kind of specify an object store that you kind of copy things into. Let me see. Do all the operators use stateful sets under the hood? Yeah, that's a great question. I believe they don't. I think stateful sets, and I would have to double check this myself. I don't actually know right off. I don't think they do these days. I think it depends a bit on, yeah, I would have to double check this, but I don't think they do. I think because you're going through an operator, you don't really need the concept of stateful set, in my understanding. And it was Ryan asking that question. Any other questions from folks? I also realized Mark shared that there was an issue seeing the screen share. I apologize if that didn't work. I think it did come through here successfully, so I could see that that was working, but maybe, unfortunately, I'll share slides later. I don't have the screen share later, but I think the recording should have the screen share. Great, and all right, let's see. All right, if there are no more questions, then I'll be on Slack as well in the open source databases track. Perfect, one more question from Ryan. So Ryan is asking, do you have thoughts about SIDUS comparison with VITES? That's a great question. So VITES, for those who don't know it, is a MySQL kind of scale-out solution, right? So it's essentially trying to do, similar to what SIDUS does for Postgres, the test tries to do for MySQL. I think the way that VITES approaches it is a bit different. So SIDUS tries to do everything really inside the Postgres extension that SIDUS is, right? So it's, I would say it's more maybe self-contained inside that one kind of extension component to Postgres, right? So it's also very easy to run locally. I think VITES is actually very good at, I think, you know, separating out different components. And so what VITES I think is good at is, you know, if you have the scale of architecture, especially in Kubernetes, instead of having everything inside something like an extension, right? What VITES does is it just has, you know, different kind of, you know, like one server for accepting queries and not a server for kind of managing a different data aspect. So I think VITES is essentially a bit more split up into different components. But ultimately, I think the benefits are the same, right? So the benefit is that you're moving away from this model where you have a single server, right? Which again, if you're in a Kubernetes world, it's a really bad fit because you can't just make your node bigger. I mean, you could, but it's just like, it's very limiting, right? And so I think both VITES and Hyperscale SIDUS have the benefit that you're, instead of scaling up, you're changing the methodology to scaling out and adding more resources, right? And so I would say both kind of solve the same problem there. And I would say, you know, if you want to use my SQL data user tests, if you want to use Postgres, then Hyperscale SIDUS is one of the ways to go. Cool. All right, well, thanks everybody for your questions. I appreciate it. I will be around in Slack. And I will also share a DHECA news thread that I shared that I talked about earlier. Thanks everybody.