 Welcome everybody to this talk. My name is Julian Fischer. I'm CEO of NININs. We are a Cloud Foundry consultancy, currently focused on Cloud Foundry data services, located in Germany. So the talk is going to be about building a production grade Postgres Cloud Foundry service, which leads us to the question, what does production grade mean? So with the variety of customers we've been working in the past, we've seen various definitions of production grade. So for this talk, we'll actually go with the definition we at NININs have been using in the past. Because for historical reasons, we run a public Cloud Foundry platform called NININs. So back in 2013, we started running Cloud Foundry as a public offering. So when offering Cloud Foundry to a public audience, you actually onboard various kinds of users. So people you don't know, people with poorly designed apps, people with vicious apps, bad people trying to break your platform. So operating a public pass actually turned into a learning environment. So the production readiness litmus test you can have is offering a public platform, basically. So I think it can't get any worse. So in case you withstand a public offering, you will also be fine in a more private environment. So actually with these experience we've made, we started to investigate how to create data services because they weren't on. Back in 2013, nobody had service brokers. There were some shipping with Cloud Foundry, but they've been deprecated then. So we used them. And they were, well, it's not really production ready. So we actually started investigating what it doesn't actually mean to have a database being deployed on a public, on a Cloud Foundry. So we've been through a journey that took us like two years and six people working on a service framework to taking care of data services. So this talk will focus on Postgres, but basically most of the things can be transferred to other data services as well. So and as the rest of the conversation we have will be around leading through the design decisions that are necessary when building a data service. Sadly 25 minutes or 30 minutes are not enough to cover this topic. I can, you can come to our booth and push play. I will talk like two days straight about this topic. So feel free to do so. So one of the most important things when building a Cloud Foundry service is how do you actually implement the service broker API? So there are only a few methods to be implemented. Looks simple, but isn't. So one of the most important decisions is actually what is your service instance going to be? And for Postgres, there are a number of decisions you actually have to make. And one of which is what you actually want to offer as a service instance. So we actually come back to this question as it will require going through some other Postgres decisions. One of which is whether using a Postgres server or a single server or a cluster. So, well, offering a platform, you'll have users playing around with a small toy app. They won't have a Postgres server that's cheap like five years a month. And some customers with a production trade app, maybe they move away from physical servers, they want to go to your platform, they want to have a production trade database that's being clustered in order to be stable even when infrastructure incidents happen. So the question on whether to deploy a server or a cluster cannot be answered as a general, this is how to do it right answer. But it heavily depends on the context of the customer. So let's go into the cluster topic just for a few minutes because once you got the ability to deploy a cluster, you can deploy a single instance anyway. So the clustering Postgres and making Postgres highly available is a little cumbersome because it inherently has never been designed to do that. And with the transaction based SQL database, reminding of the cap theory, it's not a simple task to do. So one of the possibilities to actually make a database to make a database highly available is to introduce replication. So when thinking of replication, you may think of whether using synchronous or asynchronous replication where asynchronous replication means that transaction has to be confirmed by a majority of the cluster nodes where asynchronous replication, you'll write against the master and those changes will be replicated to a slave, accepting that there's a certain difference between the master and the slave called the so-called replication leg. So the far away your slaves are the greater the replication leg and in case something goes wrong, the greater your data loss is because there is a data loss with the size of the replication leg. So in our case, we've been looking at MySQL and Postgres. We decided to go with Postgres as the first relational database to implement using the service framework. So we had in mind that there's MySQL with Galera, very nice synchronous database management cluster based on synchronous replication. So we thought, looking at Postgres, Postgres has built in replication since Postgres 9 and it's asynchronous replication, we thought, why not having one RDBMS with synchronous replication and one with asynchronous replication, we stick with asynchronous for Postgres. So that decision been done, we figured out that too hard that the replication facilities within Postgres are fairly limited and there's a variety of tools you can use and to actually overcome that problem and it's not really easy to find out which one to use but we'd actually try to build a Postgres and make this available to our customers which is easy to automate because we actually have to take care of the cluster management. So we just looked at the replication built into Postgres and seen that there is actually no cluster management. There's replication but no cluster management. So what does cluster management mean is you have let's say three databases, a master and two slaves. What happens if your master database server goes away? So you need something that actually recognizes that your master's gone. That seems easy but it is not because sometimes your network might have trouble so in case of a network partitioning you could be the case that the communication between the servers are disturbed but the master's actually there. So you have to find a way to ensure that you're not performing a failover although still you have a master. So those are the tasks usually solved by cluster managers. Not worth talking about that too much here so there are solutions to that but we need a component taking care of this. So a small summary of the cluster thing is that you wanna have three nodes instead of two so that you have always a majority of servers so you can clearly judge whether there's, well, that a majority can decide upon whether there's a new master and who's it going to be. So after looking at several solutions from historical reasons, we've been operating Cloud Foundry, sorry, databases, Postgres databases for seven years now. We've been starting with physical databases being clustered in such a cluster with master and slave. We've been using Pacemaker to do the cluster management. So the first approach was can we do the automation? Can we take the automation we have around Pacemaker and put it into a Cloud Foundry environment and clearly the answer to this is no, you can't. Well, you can but it's not really meaningful because Pacemaker is a beast. It depends on every single Linux library ever written. So in order to Boschify that, you will actually have to Boschify the universe. So that's a bad idea. Also, it's not really nice to automate it. It's the way Pacemaker has been done is it's not really something you wanna put into a Bosch release. So what we found is that Rep Manager is a good solution. So it's simple, it does the job and it actually basically does the job fairly good. So it does also monitor your application performance, but most importantly does failure detection and helps performing automated failovers. There's a little research to that and there are a lot of let's say edge cases when it comes to Cloud Foundry because you can't just take away one server and promote another server in a Cloud Foundry environment because there might be IP address changes. So how do you actually tell your application that the application should now write to a different database server? So that's one of the problems to be solved. In our case, we added a console cluster to our service framework. So to be a little ahead of the talk, we use Bosch underneath to deploy database clusters. So whenever there's a change in the cluster, we'll tell our console and we use a DNS alias in the credentials so that your application will always have a DNS entry resolving to the right master. And the cluster manager, the Rep Manager in this case has one of the purposes is that when it promotes the new master, we will talk to the console and update the alias pointing to the master. So the trigger actually comes from the Rep Manager but the execution of the actual failovers done using console. All right, once we actually made a decision that we wanna support a clustered Postgres, we also have to make the decision what the service instance is going to be. Is it going to be a single cluster that will be sliced up or is it going to be a cluster per service instance? So actually two different strategies come to mind, a shared or a dedicated approach. With a shared approach, what you do is basically you create a single Postgres server or a single Postgres cluster and you slice it up into different databases and each database is going to represent a service instance. So this is very easy because you need to create a Bosch release, you need to deploy one Bosch deployment, creating your Postgres cluster and your service broker will then access this cluster and return appropriate credentials. The drawback of this solution however is that the isolation between the service instances are pretty weak because Postgres has isolation built in, multi-tenancy capabilities, but they're fairly restricted. So when accessing a database a server, one app can drag down the performance of the entire cluster, hitting the cache or creating disk utilization and CPU utilization. So the contract towards the customer will be fairly fuzzy because you never know how much of a database you actually get. Another major problem with that approach is that even with a cluster, your cluster represents a single point of failure within your entire architecture. So a production CloudFoundry system, you have a runtime and the runtime is, it's just an awesome piece of technology, CloudFoundry is awesome, right? So you'll deploy a tremendous amount of apps against the runtime and now you have a load of apps and one database cluster. I mean, that doesn't sound right, doesn't appeal right. For the obvious reason that whenever your database cluster goes down and I can tell you, you have a component in your system, it's going to fail. Whatever it is, it's going to break, right? So we recently melted down our open stack because of a kernel driver issue and because all the hosts were the same, we had the same problem all the hosts. So 20 out of 24 hosts died within one hour, right? So fuck availability zones, sorry, didn't say that. It was just all gone, all right. So let's say we really want to ensure that whenever a cluster goes down for some reason, the situation won't look like this because a lot of applications will rely on your Postgres and I can tell you how this feels exactly like that because your phone will keep on ringing and customers will give you a bad time because they are so disappointed because they expected the platform to work. So the problem with the shared cluster in general and that's true for every data service is that once this cluster goes down, all your instances are gone and you'll have a lot of trouble. So what's the counter strategy to that? Obviously it's going to be dedicated clusters for everybody. So instead of having a single database cluster or a single database instance, you'll have multiple of them, maybe even both. So in the ideal world, and we've made that happen, you can actually create a single instance or a large single instance or a single cluster or a large cluster and you can also migrate between them and with that, you'll have a big advantage also when creating a contract towards your customer because now when you create a dedicated cluster, you use infrastructure resources and the infrastructure isolation to create this multi-tenancy behavior which means that when I provision a Postgres with four gigs of RAM, you're going to get four gigs of RAM, CPU and a certain amount of disk and in case your application needs more resources, we'll just scale to a larger database but it's never going to be your neighbor dragging down your cluster because his app actually goes crazy unless you have a, let's say, unfair amount of over-commitment in your infrastructure which is totally up to you to decide but the solution is actually safe. So looking at the same scenario, you now have a different ratio between applications and service instances. So if one of these service instances go down, the problem is pretty much contained and you have only one angry call to answer, right? Saying, well, excuse me, things went wrong and we're going to fix it. So your problems are going, your problem with Postgres failures are going to be contained, right? So we've been through the question of having a single server or a cluster. We've been through the question of having a shared or a dedicated approach. So ideally you have a choice between single or cluster and it's going to be dedicated. Or the drawback, obviously, is that it uses more infrastructure resources but then you have a stable contract to the customer. It leads to the question, when do I actually provision those virtual machines? So two strategies, again, come to mind. We could actually pre-provision those virtual machines so that can be immediately handed over. We'll do this later. So with a pre-provision strategy, you have a service broker and a pool of service instances several of each plan you offer and whenever somebody performs a create service command, you'll just assign one of the service instances out of your virtual machine pool. Same for a cluster, just that you assign a cluster instead. The problem with that approach is obvious, like you'll have 10 of those things on hand and there's a hackathon going on and people start creating service instances like RAZY, you run out of pre-provision instances. So also these pre-provision instances will consume infrastructure resources even if nobody uses the database. So it's actually, again, a counter strategy that comes to mind which is why don't we provision these service instances once somebody creates service? So in order to do that, you have to provide some automation and whenever you do CF create, this will actually create then a Postgres server or a Postgres cluster. So pre-provisioning, the benefit is you will have your service instance right away and the on-demand, well, you have the advantage that you don't use the resources and once you go down that path, you'll be able to serve as many instances as your infrastructure actually has resources. So we actually started with the pre-provision approach because then we could actually fill the pool being deployed manually, already giving the customer the appearance of having dedicated instances and then we actually did the automation afterwards, filling up the pool once the automation is ready, automatically and then of course, we can actually switch down the pool size because you'll actually have to provide instances from each service plan so we can turn our framework into deploying those things entirely on demand. A mixture is interesting in cases where you have CI pipeline creating certain service instances at a high pace so where the provision time does matter to you. So it's a kind of a drawback and a balance you have to make, you have to make those design decisions and you can configure it. So we'd like to have a pool, small server for testing purposes on hand and the rest is going to be provisioned on demand. So with that being said, you can on-demand provision something without automation so one of the key questions is how do you actually do this automation? And while containers are very modern and fancy and maybe it's going to be the future, we had the impression that having a database, a database should be close to the metal, as close as it is possible because often performance is an issue and also we would like to have an automation technology we can really, really rely on. And after operating Cloud Foundry with Bosch for years, we really fell in love with Bosch and that's not very obvious because our team was using Chef for six or seven years so for them Bosch was really a challenge to what they already have been using but they learned to fall in love with Bosch. One of the reasons is because Bosch gives you infrastructure independence and we moved infrastructure twice we actually started on VMware, moved to OpenStack for cost reasons and recently moved to Amazon for stability reasons but that's just because we can't run OpenStack. We are a platform company and other infrastructure company. Also I've not seen many solutions that really inherently do the orchestration of entire distributed systems so well as Bosch does including virtual machine and persistent disk image while being entirely uncoupled from operating system. So back in the Chef days you have a cookbook with if-else clauses for different operating system also using different package managers gives you a very heterogeneous system in the end so this whole operating system support will actually go through the cookbook and it's not very nice so with Bosch you have a clear contract here. Also the separation of a blueprint of a distributed system let's say the blueprint in a Bosch release and in contradiction to that the specific construction is a very interesting approach in Bosch. So when it comes to deploying data services the advantage you get is looking at the Postgres cluster example is we have a Bosch release that deploys a Postgres cluster but the same Bosch release with a different manifest can just deploy a single machine. So you actually cover a variety of data service plans with just a single automation. Also interesting using Bosch is once you use Bosch to deploy your Postgres cluster you get the monitoring and self-feeling capabilities of Bosch for free. So as I said the rep manager will take care of your data of your instance so whenever a database server goes down the rep manager will talk to console and your application will continue to write to the new database master but Bosch will recognize that there's a missing virtual machine and will just resurrect the virtual machine and the virtual machine comes up it will actually recognize that it's not a master anymore it's a new virtual machine. So we recognize that now it's a slave and integrate into the cluster as a new slave. So we actually have with Bosch a integrated way of recovering from a degraded mode after an incident. And that's a very nice thing to do that is also topped by the scalability scenario where we also want to be able to take a single service instance. Let's say I've created a small app and deployed it on the platform but now my app needs to grow. So what I can do is see if create update and turn this into a large cluster. So how is that possible? It's possible because the service framework actually creates a new Bosch deployment and you hand over the Bosch deployment and Bosch will actually create new virtual machines and scale the one that's existing and copy over the data. So you get that behavior fairly at low cost. It's not for free, you have to do some management around it but it's so much that's already been done by Bosch that it's fantastic. So of course the same strategy applies when you wanna create, when you scale a small cluster like this fellow on the right side to a large cluster. It's just taking down the virtual machines one after the other so your service keeps on running scaling the virtual machines so you scaled your cluster. All right. So now with that all being said with this fancy thing it's like how does it actually look like in the resulting system? The architectural overview is looking like that. So we found out that the service broker basically does nothing really data service specific. So we outsourced everything that's specific to a data service into a small separate microservice called the Postgres SPI Service Provider Interface comparable to the Cloud Provider Interface of Bosch. So what this fellow does is offer the meter data so telling which service plans are offered and also when creating a service binding it issues the credentials. For the initial credentials this includes the initial credentials. So credential management and everything service specific is going to be in the SPI. So the service broker then triggers the creation of Bosch deployment which will then talk to Bosch. And subsequently of course there will be virtual machines being deployed by Bosch. So the service broker as I said it implements the Cloud Foundry Service Broker API is generic for all the services we have. We have Revit Redis, Revit MQ, MongoDB and Postgres and it can be configured to use the SPI as a remote service. The SPI itself as I said encapsulates those data service specific logic and among that the service catalog and the credential management. So the deployer is a small abstraction abstracting from Bosch deployment. So it actually does two things. First it manages deployments of course and the second is it manages templates which can be seen as a Bosch manifest with placeholders in it. So how does it actually look and how do these components interact? Is whenever you, can you see that? Yeah. So whenever you call a create service you will actually hit the service broker who will then talk to the SPI because what the service broker has to do in the next step is to trigger a deployment using the deployer. In order to do that it has to hand over the name of the template to deploy as well as some deployment attributes. So one of the information that is required to do that is the service plan the customer has chosen which maps them to a deployment template. So the system, what the system actually does it creates a service broker that lets you trigger Bosch deployments. So actually you could also deploy cloud foundries with that solution if you create a Bosch release for cloud foundries. So yeah, after you got those information you can then pick a deployment against the deployer by handing over the template and the attributes who will then generate the deployment manifest and trigger the deployment. The cloud controller then keeps on polling whether the deployment is already done and once it's done the service broker will store some metadata about the deployment because if you want to create then later a service binding you'll have to know that there is a data a dedicated instance running somewhere. So the SPI is able to connect to this database server and create a new database user. So you have to store some metadata which is again not service specific because it's handled by the SPI in the end. So this works like charm. We've been using a solution for roughly developing it for two years and using it more than a year and on our platform and yeah, it's proven and it works. So what can we actually learn from that is first of all, designing a data service go with the dedicated service instances. Anything based on a shared cluster is dangerous. Might be working if your company is small but at scale I would be using it. So every shared data service we've been offering it exploded at some time. So with that on-demand provisioning is essential. So you have to pick an automation tool you're familiar with and you'll have to find something preferably that really takes care of the life cycle of a distributed system such as a database. So because you will have to solve the problems of how to update that in the end as well. So the biggest challenge was not about the framework, was not about Bosch, was not about everything it actually was about Postgres. So finding a Postgres replication and clustering tool set was the things we actually had to investigate and how to make this failover happen on infrastructure but still not being infrastructure specific. So we can take the service framework and we've deployed it on VMware, we've deployed it on OpenStack, we've deployed it on Amazon and we didn't have to change a thing despite of cloud configuration of the Bosch releases obviously. So let's consider configuration or change of code. So we also had to learn a lot of Postgres and iteratively shape the thing, edge cases have been found so you have to do some automation around that. But yeah, that's about it. It works, the strategy works. So, and we also open to conversational on how to share that with you. So just approach me and ask if you're interested in something like that will, we help you building data service if you want to. So feel free to ask any question about this. Questions, the handsome fellow in the blue shirt? Yes. All right. Whoa. Are you all awake now? All right, so basically overall, very good thought fodder there, good approach. Thank you, Wayne. I have to respectfully disagree a bit about the dedicated versus shared. There are times when you want to go shared, especially if you're like a service provider and you've got just massive amounts. So the key there is actually investigating whether you can actually have, I've had copies, sorry, whether you can actually have a plan to start somebody on shared, like the free tier, and then migrate them very easily to a dedicated. So that's something that you should work on. That could be a way to go to, yeah. Bosch, as far as Postgres is concerned, oftentimes you can actually get a much better performance scenario if you have separate disks. So Bosch currently has, I'm saying this for the community as a whole, Bosch has a really nasty limitation of a one disc policy. I'm hoping it's in the roadmap to fix that. I'd like everybody to apply pressure for that because that can really help the services stories when deploying with Bosch. And we just announced at the PostgresConf in New York City that open sourcing of a similar project called the RDPG. We did it for GE. They allowed us to open source it. A lot of the same concepts and approaches were done within it. So now that that's open source, what I would literally like to see is if, you know, is this open source and can we merge efforts instead of having two efforts? And like, what are your thoughts on that? Well, my thoughts on open sourcing that is that we are actually currently investigating open sourcing it. So this solution has also been developed over time. We've been using it at the platform. So it could be open source soon but this is a discussion that's currently ongoing. So I can't answer that to a final decree yet, but we're currently talking about with partners how actually open sourcing could look like because we have a development team to fund here. And if there's no licensed money coming in, we have to replace that. So... That is very fair. So everybody hire them so that they can open source it? Yeah, one of the models we actually could apply is that we'll have sponsorships so that people can influence the backlog of such a solution, maybe telling us which data services to make next. So we open the suggestions here. Sounds good, thanks. Welcome. Any other questions? Well then, yeah. There's a plug-in system in the framework that allows you to create streamed backups so you can actually read, well, let's say write a headlocks and then stream it in chunks to, let's say, OpenStack Swift or Amazon S3. We currently have only basic strategies implemented like creating a dump instead of write a headlock logging. I think Stark and Wayne has something interesting. It could be integrated as well. So yes, there is something foreseen in the framework but the plug-ins actually need to mature a little more. That's the thing currently under development. All right, thanks. So I'll be around.