 Well welcome everybody. I'm Josh Burkus. I work for Red Hat in the OSAS, which is always known as the Community Department, where I get to work on Kubernetes, which is truly awesome. And if you know me from earlier times, you looked at my CV, you know the way that I got into Kubernetes was actually moving Postgres to Kubernetes. And so this has been my sort of personal project for a while. Of course, what you're about to see is not just my personal project, or even primarily my personal project. The lion's share of the work is done by the engineering team at Zolando. We're specifically the database team at Zolando. We started collaborating two years ago on the Petroni project. Nobody from that database team could be here at KubeCon, which is why I'm on this podium by myself. But I'll tell you all about it, and I'm sure they will be at KubeCon Europe talking about things. So my goal and really the goal of the Petroni project from the beginning is to make deploying PostgresQL on Kubernetes, or frankly, to make deploying PostgresQL in any form boring. That is, you shouldn't be boasting about deploying PostgresQL on your internal cloud, or on your public cloud, because it should be something that everyone does with a single command. And that's our goal. Now, what is what boring comes from is, if you saw Matt essay posted an article a couple of days ago talking about how, despite being 27 years old at this point, I think, PostgresQL is like one of the hot hip databases to use with all of your new technology tools. And one of the reasons why people like using PostgresQL even with new languages and new servers and new technologies is that it is boring. And what he means by boring is, it is not something that you have to think about. As in, you deploy Postgres, and then you save your data to Postgres, and if Postgres is your data is saved, it's saved, and if Postgres is your data is replicated, it's replicated, and you don't really have to think about whether that's true or not, or about starting strategies, or about all kinds of other stuff that you have to with some of the more interesting scale out databases. Now, it's a good statement. It's not hip, and it takes a bit to set up, but after you're done, you have a reliable workhorse. Well, so what I want to do is I want to actually take out one part of that statement. It takes a bit to set up, and that's what this project is about, and Kubernetes is a big help with this. So I'm going to explain a few things here. I'm going to explain the Petroni project and the tool in general for anybody who hasn't seen one of these, one of the earlier presentations for earlier versions of the tool. Explain what we changed recently for anybody who has seen my previous presentations, and then actually show you the sort of Kubernetes native version of it, and demo that as well as a brief demo of the Zalando Postgres operator, and then talk a little bit about another Postgres operator. Now, for anybody who has seen, you know, this before or read any of my stuff about Petroni, et cetera, what's changed since like the last KubeCon is a lot tighter Kubernetes integration in order to just rely more on Kubernetes for all of the infrastructure things, and I'll show you that. The second thing is that we have an operator now, which is fairly alpha, but there's an operator out there for anybody who's familiar with the operator pattern, and again, I will show you that. And then the third thing, of course, is that Petroni Kubernetes, et cetera, is now being used in production in a couple of different places. And so as a result, we've learned a lot by having it deployed in production and relied on which improvements have gone into Petroni. So some terminology. So Postgres, PostgresQL, that's the database we're talking about. The one that's been around, again, since the 80s, and yet continues to add new things. Petroni is a high availability demon service that runs inside a container. Let's explain a little bit more about that. And Spilo is a distribution that wraps Petroni and PostgresQL and some other common utilities in a container to be your container of Postgres goodness. So you'll see those words again in this presentation, and that's what the three different sort of things are. So particularly I'm going to talk a lot about Petroni, which is the high availability demon. So this does a few things. You run it in your container as PID1. It controls whether or not Postgres is running in that container. It automates failover and replication for your cluster of whatever defined size. And it also supplies a management API for the things that you can't do through the normal Postgres interface. Now question of, you know, what's the idea? What's the design behind this? What are sort of specifically, what do we think, why we're doing Petroni and not using some other systems and, you know, what are our goals? And there's three basic ones. So we're looking for simplicity, usability and availability. And what I mean by simplicity is we're really trying to, the majority of people, right, us people who are Postgres geeks, we think of Postgres as this huge constellation of services, you know, that does all these different things and you've got foreign data wrappers and you've got monitoring tools and you've got all these other things. For us, you know, having Postgres be big and complex is actually a virtue. For everyone else, Postgres is one thing. It's an endpoint and they want that to be one endpoint and not 16 different things. And so our goal, partly with Petroni and Spilo, is to make Postgres one endpoint and not 16 different things. To, you know, rely on Kubernetes as much as possible so that the Petroni installed does not have to be complicated, that instead we're simply relying on the tools that ship with or tools that we can enable with Kubernetes. And to supply a set of defaults that are what we believe just work for most users instead of requiring anybody to go through any kind of a complicated configuration for a simple database that's going to support, say, a web application. For usability, we've got a couple of things. One is, there are some things in terms of controlling Postgres QL that you can't do through the port 5, 4, 3, 2 SQL interface. And that includes things like changing security settings, starting and stopping, initializing replication, et cetera. And so we supply port access to that with a restful interface. Some configuration and plugins in order to support a variety of cloud environments, like, for example, there's a whole set of plugins for different backup tools because for continuous backup to cold storage, just in case you lose your entire cluster, so you still have data. And then the other thing is that, of course, unlike a lot of public cloud providers, we want you to be able to run something that places no limitation on what Postgres plugins or configuration options you're going to run. You know, because maybe you want some obscure plugin for doing, I don't know, biological data on Postgres that you can't get off of Amazon RDS. We want you to be able to install that. So, and then, of course, availability, we're talking about fully automated replication and failover. That's it. It should just work. You've got three Postgres nodes, one of them is the master at any given time. If the master goes away, you get a new master. So now let's talk about a little bit about what's changed recently. And by change recently, I mean this is still a feature branch. So our Petroni design started out actually before Kubernetes was really usable as anything, but we had at CD. And so we designed it where you spin up an at CD cluster to provide a single source of truth. I think everybody here knows what at CD is and what it does. Yes. Yeah. At least by this point in the conference you ought to the and to supply a single sort of truth, run that at CD, separate at CD cluster on our Kubernetes. And then the Postgres nodes communicate to an update the at CD cluster. And then at the same time, the Kubernetes cluster maintains things like I should have three nodes in this particular Postgres set or five nodes or whatever. And they have this storage, et cetera. So and that works pretty well. And that's actually what's in production in several places. However, we're looking at this and we're saying, you know what, that's still more complicated than it needs to be. Why do I need this separate at CD when I already have a single search of truth, which is called the Kubernetes API? So what we've done recently is get rid of that separate at CD as an option. There's actually always going to be reasons why you want to run a separate at CD for some clusters. But again, simple defaults. And as a default, you know, instead of having a separate DCS, everything goes to the Kubernetes API. So, you know, leader elections go through the Kubernetes API to some other structures and everything else. So this is as simple as possible because the idea is now I can deploy a high availability Postgres cluster that consists of three and only three pods instead of needing several different services. So now, because most people in the room have not seen any Petronia presentations before, let me go over basically how this works in terms of high availability with an animation and then I'll go ahead and actually demo it in action. So the basic idea here is we have our container with Postgres inside. We've got the Petronia demon running in the container as PID one. And then what happens is we have Kubernetes. We tell it to create a stateful set and it distributes a bunch of these, creates a whole bunch of these. And then we've got a bunch up and running that are empty currently because we're deploying a new cluster. So what happens is they all communicate with Kubernetes API and they say we want to have a leader election by updating actually a config map and an endpoint. And then they have that leader election and one of the nodes wins. That node becomes the master at which point replication, the other two nodes start replicating from that master. Now, things can happen to the master, particularly we're in a container cloud environment. If nothing else, we can run out of resources in that node and Kubernetes can decide that it needs to migrate that container, right? So if something happens to the master, what happens is the master key in the config map times out and then the remaining replicas will do another leader election. And one of them will win and become the new master. And the other one will start replication from that new master. And this will all happen in the background. And then, of course, Kubernetes will notice that it does not have enough nodes in its stateful set and bring up another postgres, which will then replicate from the existing master, making sure that you have constant this. And then via Kubernetes services and end points, we can make sure that there is always a connection to the master available. So let me actually go a little bit more into detail in that failover because it is a little more complicated than just making a call to the API. So what happens is the master vanishes and it's got this time out. We've got a time stamp on it and so we know the master hasn't checked in in a while. So then the replicas who are checking in realize the master is not there and they grab a lock. Currently, what we are doing for that is we are actually having them redefine the master endpoint because in the way that Kubernetes currently works, that's an operation that only one of them can win. The winner then checks the current replication status to make sure that it didn't already have broken replication before the master went down. If replication status is okay, we'll update the config map indicating that it is the master and then the other replicas who were not able to grab the endpoint will remaster off of that master and you will have a new cluster. So it's a two-stage failover which can affect you if you have a really crappy network because if your network support is really crappy, then it's going to be a little slow. So let's actually do that. Actually, let me show you what's in that before I do this because it creates very fast. So we've got here a sample manifest. This is just a slight modification of the one given as an example in the Kubernetes branch of the Petroni repo. So you can take a look at this yourself later. But basically we just have, we define a cluster name in there. We have labeled applications. We're telling it to do three replicas. Obviously for high availability purposes, I recommend to always at least three because there's a lot of things that can cause two to fail. Then we have our container here. I have a slightly modified image because this is a feature branch and I've been debugging, but it will be merged before the end of the year. Then we just have some definition of the container and everything else and your usual volumes. Then we define a whole bunch of environment variables that are getting passed through to the Petroni daemon in order to configure the behavior of postgres. So the usual things. Password, ports, if we were going to put in any kind of performance modifications from the postgres defaults, like how much memory it uses, et cetera, we would also pass them here. And I've got this commented out because I've got it running in Necube and there's no actual volumes. Then we have to define a couple of other things. We actually wanted to find a separate endpoint. If people are not familiar with endpoints because you might not be. Normally when you create a Kubernetes service and you give that service a selector, then in the background you create something called an endpoint that actually allows things to connect. However, you can create a service without a selector in which case there is no endpoint and you can define your own endpoint. Most of the time people use this to connect to services outside of Kubernetes. But in our case we're using this to redefine the connection specifically to the master node. So that we update only the endpoint. And then we create this sort of headless service with no selector. Now I also want a separate service that's going to load balance to all nodes for read-only connections. So I create that service as well. And then of course we have to have passwords and secrets. So you can see this here. We've got this extra label called role. I'm going to actually shrink this just a little bit. And you can see first a master comes up and then replicas come up. That's very interesting. Sorry about that. That is a remnant of running through this demo earlier. So we've got a master in two replicas right there. I'll show you this actually in the console because it's actually kind of in easier way. So we've got our stateful set of Petroni demo right here. And then again here are all of the nodes. These are actually sort of fun if you look at the we send all of the activity here to the log output. So you get the leader talking about being the leader and you get the replicas talking about being replicas. Which is your main log output. We also funnel of course the Postgres log output to this. In the process of this we create a couple of config maps. And those should also not be there. OK. One of which actually has the configuration of the cluster which would have any non-default items. I don't have a lot of non-default settings in here. One of the non-default settings I have in there is. There's a tool called PG Rewind that can allow us to bring individual Postgres into sync if they've gotten a little ahead during a failover circumstance. Postgres or application details. The more important thing is this leader config map actually gives you one of the several ways we offer to get information about who's the current leader. So this is a lot more interesting. So I show you the failover. So it's killed the current master. And as you can see when network latency is negligible, as it is when you're running this on Minicube, it picks up the fact that the master has gone really fast. When it's slower than that, it's slower than that specifically because of network round trip time to actually sort of detect that things are gone. And obviously when you do things like when you lose physical nodes from your network, it can take a little bit longer for network to time out. And then Kubernetes will spin up a new one and we will have a replica again. And let me show you those end points here. So we've got two end points. One is the master only end point which is pointed at whichever one is currently the master and then we have our load balancing end point that gets created. Now, one of the things that you can actually say here is, wow, that's nifty and that's really super useful. But among other things, if we want to actually, among other things, doing all this pass and configuration via environment variables in Kubernetes manifests is really irritating. And really doesn't help me when I need to actually manage dozens or hundreds of Postgres clusters. So it would be nice if there was another way. Well, there it is. And that's called operators. So how many people here are familiar with the operator pattern and the whole operator idea? Yeah. So I mean, CoroS created the sort of operator idea around a database at CD. And it turns out that it's actually useful for databases in general, probably for other things where I've seen it used as databases. So there's a few reasons why you want to use an operator. One is it makes it easier to keep documented and to get repo the clusters that you're supposed to have, because you reduce the information to the information that's distinct about each cluster rather than having a lot of scaffolding in with your cluster configuration files. For that matter, if you have divided up teams, you can actually pass a much simpler structure set of information to define the cluster so that people don't have to understand Kubernetes YAML structure and how to define Kubernetes objects and stuff and have the opportunity to screw up your Kubernetes cluster in the process. And then the other thing is an operator is actually an active thing. It is a thing that runs on your Kubernetes cluster. And that's important because databases require scheduled work. Like if you have continuous backup running, they have to take a snapshot periodically. Sometimes you want to run other forms of maintenance and that sort of thing. So simply a stateful deployment, deployment, you're not done. And the big thing is, hey, I'm effectively creating this a manifest anyway. So if I'm doing that, why not have that run everything? Why don't I just create a manifest and have everything else be automatic? So I'm going to show you one operator. So like I said, Spilo is the sort of full packaging of Petroni plus the other stuff. So there's an operator for Spilo, which is what Zalando currently uses in their staging cluster. It hasn't gone to the production cluster yet. And what you do for that is you install the operator, you create a manifest for each cluster. If you want to modify clusters, you do that by modifying the manifest. So let's actually do that, although it'll take me a minute because apparently I have a lot of remnants of the previous uninstall. So pardon me, why delete stuff left over from the previous run of this because otherwise it will refuse to build it. Okay. So I need to install a config map for the operator itself because the operator has a configuration. And then I need to actually create a service account for it because this is something that's required for creating operators. And then I need to create the operator. And then you'll see the operator logs in, registers itself. That's the Spilo operator there. So once I've created an operator, I can, by simply defining a manifest and passing the manifest to the operator, tell it to create a cluster. So this is what a manifest looks like for an operator. You know, it's still YAML, but it is not, you don't have to understand the Kubernetes infrastructure to be able to write these, which is really important if you're going to have, say, development teams creating databases for their own applications. Instead, you define some users, you define some Postgres settings. I don't really have a lot in this particular one because I'm trying to show you a simple one. But you can define a whole bunch of Postgres settings within that manifest. So let's actually go down here. And now, no, no, I'm actually going to do, hold on a moment. I still think I have remnants left over from other stuff. So let me spin up a new cluster. Hey, there we go. Yeah. One of the outstanding bugs, if you go to the Postgres operator thing, you will see that there's an outstanding issue for it failing to clean up when we delete clusters. So there is actually an issue there. It's being worked on. But we go ahead and install that. Why are we not seeing the spiral roll? Oh, there we are. So again, like the bare patrony, one of them gets selected, whoops, no, that's not what I wanted. One of them gets selected as the master, then the other two should replicate from it in just a second. The, if we actually refresh, so we've got now our stateful set running our three nodes for that. And the same, some of the same config maps that you would actually see through doing the patrony. It's the same thing, but again, you've simplified it even further. And this has a tremendous advantage that you can now hand this off to development or testing teams who can deploy Postgres clusters as they need them without needing to understand how Kubernetes works. Now, spiral operator is not the only operator out there. There's another whole, another project for Postgres and Kubernetes that comes from Crunchy Data. Hi there. Sorry, we have one of the Crunchy Data staff here that comes from Crunchy Data. They have their own operator that's been out for a little bit longer, so it's a little bit more mature. Their architecture is very different from, their architecture is very different from the patrony architecture. They were very much coming from the enterprise side of things. And so they were really looking at having a very full service Postgres offering with a lot of tooling, which is what their thing, the Crunchy Container Suite, is what it's called in a little bit of. And also, their operators are a little bit different. Design of our operator, the spiral operator is extremely manifest driven and there really aren't like other ways to interact with it. The Crunchy Postgres operator is very command and control driven. And what I mean by command and control is there's a client tool called PGO that talks to the operator. And so if you're using the Crunchy operator, the main way to actually deploy things is to use this client to specifically communicate with the operator. And they have a whole demo and everything else around that. And again, it's one of those sort of trade-offs in terms of how you want to do things. There are advantages both ways. Ooh, that image did not pop in. Sorry. I had your whole, I had the whole architecture diagram of the container suite. Container suite is a lot of things in it. It's also, like I said, it's also a very different setup for high availability Postgres. And I am a little bit too far forward. It's also a very different setup for high availability Postgres than the Petroni setup. Quick comparison, I was actually going through and trying at both operators and going through this. There's some of that, I mean the biggest thing is that the SPILO operator is still kind of alpha because it was created recently. Things move fast, we'll say that forever. Sort of different ways looking in there. One of the main reasons why it's still kind of alpha is we haven't really written any documentation for it, which will change. There's not, yeah, there's not a specific CLI for the SPILO operator. There very much is one for the Crunchy operator, a very sophisticated CLI. Depends on whether CLIs are not. Crunchy is got, is OpenShift compatible now. SPILO is not really both support rolling upgrades of Postgres, which is another important thing that the operator does for you because you can do that by specifying a lot of things in your upgrade strategy for new containers in a Kubernetes deployment, but you'd rather not have to write that since the operator already knows what to do. So, some future work plans because it's active. Number one is obviously merge the Kubernetes native branch into the main Petroni. Second thing is to actually right now the operator integrate the operator and the Kubernetes native branch, which currently currently the operator uses the older version of Petroni. I'm working on OpenShift compatibility because I work for Red Hat and I have to work on OpenShift compatibility. Besides, there's lots of people who use OpenShift and use Postgres on OpenShift. So, the Prometheus integration is a big interest of mine because I like metrics, but I also want to package some other things up for it like Powwow, which is this Postgres performance analysis tool, and then one of the other things that's come up immediately is support for more complex replication typologies. Other things that have come out, some other much more hypothetical things that have come out partly from being here at the conference. Obviously, Istio integration, in order to route traffic, have more sophisticated routing of traffic to the Postgres nodes. The, there actually is a command line tool for Petroni that's used for a few things and right now it's completely separate from the operator and maybe it shouldn't be. We're discussing that. Also, I was interested in Brendan's metaparticle lock CRD, so we're going to look at that and potentially having something that's defined lock operator rather than overloading endpoints to perform that function for us. Support for logical, the new logical replication is obviously a big sort of hypothetical thing and whether or not we could customize the workloads API and maybe be less reliant on operator custom controller behavior. Now there are a few things that we run into in Kubernetes that are still limitations that make life hard for us. There really aren't very many. I mean, mind you, I've been working with a team that does stateful set for quite a while and so there really aren't very many because a lot of the things that I had on my issue list got done. But multi-data center, multi-availability zone cluster is hard to do with stateful sets right now. I should, you know, Kubernetes have a built-in leader election thing that's been discussed before. I'll discuss again. Upgrading stateful sets pretty much requires you to do an operator right now because the stateful set has no intelligence about what order it should upgrade things in and it could. So again, up for discussion and then it would be really nice to be able to do more extending of kube control so that we could stop having all of these separate CLIs, one per operator that's completely separate from kube control. So, but if you're interested in this and there's something going to use, we really would be happy to have more contributors. Open project. The Petroni. I'll admit that we do most of our communication via GitHub issues. It's not a mailing list or anything so if you have a question or something like that, file an issue or find one of us on the Postgres IRC and let's make Postgres boring. I'll take questions in just a second. Research links and now questions. Go ahead. To do. Okay, we're not going to pass this around. So the question was what's it written in? The bulk of so Petroni is in Python and the operator is in Go because operators aren't Go. The more questions. Okay, so the question was the replication is with replication lag. Yeah, the default is asynchronous replication. Which means indeed there is replication lag and if you have an unexpected master failure then you will lose some transactions if they're in flight. However, Petroni is completely compatible with you turning synchronous replication on. There's examples in the documentation of what that would look like and so if you want to run it with synchronous replication then you can turn that on. And particularly if you're going to do that I would recommend using Postgres 10 and running it with synchronous quorum replication so that you don't get replica failure blocking your your right transactions. So there's another question up here. Okay, so right this question was what you missed most was user management and creating users via config maps. The operator if you modified the manifest has a section of the manifest devoted to users and if you operate the man if you modify the manifest it should create the users if you're using the operator. If you're using Petroni without the operator then you can either create the users when you create the cluster because that goes into Petroni config or if you want to create a user later on obviously log in a super user and create the other user but you know which would happen in deployment time but if you're trying to do it in a sort of get repo managed way then the operator is the way to go. Let me just see if it was someone else at a quite some yeah. What about storage? How do you specify storage? Well the same way honestly that you'd specify it in well for Petroni in general you're specifying storage just through the Kubernetes API right? Right. The there's a section in the operator example where you tell it the storage selector and that's the limit of the discrimination it has. Yeah yeah. Okay we're over time so further questions will have to be taken out in the hallway. Thank you very much.