 All right, let's give it a try. So welcome to this talk. Today, we're going to talk about data service design and how to scale data services. So why this talk? We've been automating data service for Cloud Foundry for now, I think, two or three years. And we've learned that it is a challenging task because there are so many possibilities. Also, something we've seen and observed is that early stage development is often very driven by feature demand. So it is very important to ship certain features to the customer. Of course, this is important. But also, we should think of software development in general comparable to driving a motorcycle. So driving a motorcycle, if you're not familiar with that, is challenging because you're not just sitting in the car and you drive around, but you have to know what's directly in front of you. There are stone. It could be the end of your journey. So you have to keep on looking into the next turn and then directly in front of your wheel again. And you'll keep on changing these two perspectives. So this talk is about how do you do that for software development? We would like to find some architectural bottlenecks and learn from them and see how we can improve them. So the mission is actually to find a way to develop Cloud Foundry data service that work at scale. So what do I mean with scale? I mean, we have people here in the audience representing big IoT platforms, for example. They are very likely to scale beyond a few service instances. So how are we going to do that? When looking at data service design, there is a vast space of possibilities. So there are many designs you can actually come up with. Design decisions include attributes such as what are you using as a service instance? Are you doing shared instances, serve from a cluster, or dedicated instances? What kind of resource types are you applying? VMs versus containers? Are you provisioning virtual machines on demand, or are you pre-provisioning them? How about your failover strategy? Do you want to have one replica of data or several? So depending on your particular requirements, there are a lot of design decisions to be made. And they have impact on the quality your service is going to offer. So for example, what is your desired time to repair? I've seen people talking about keep it simple. And data services, they should be no more complex than building a cluster is complex. So maybe we should keep it simple. But on the other hand, we have to deal with outages, hardware outages on regular basis. So we want to deal with them. We want to have minimum service impact. So you always have to make trade-offs when designing a data service regarding those quality attributes. So in order to think of scaling data services in general, we have to narrow down the discussion and be a little more concrete than that. Because otherwise, we'll just be lost in a vast space of opportunities. So I think when coming to that particular design and asking for scale, the definition of what a service instance is is the most important question. Are you going to serve this shared database from a single Postgres database, for example, as a service instance? Or is it going to be an entire Postgres cluster? So that actually makes up the nature of your service. Basically, it makes up the nature of your service. Also interesting is how many service instances are you going to operate in the long term? Is it only 10 or 100 or 200? Then basically, the challenge isn't very big. But if they look at what Heroku achieved, like with some of their services, Postgres, for example, the non-production Postgres, I think they have a million of service instances. So how do you do that? Also a challenging question is, how many concurrent users are using your platform? For example, if you have a fairly slow development pace where you ship new apps on a rather infrequent scale, then maybe the question of how many service instances you're creating at a certain time isn't very high. Where when everything is automated in your company and you're creating and destroying service instances automatically, you might have a different requirement for parallel service broker operations. So looking at scale, I think it makes sense to think about common data service design patterns and look at their architectural scale. And within a 20-minute talk, sadly, it's not possible to cover all the ifs and thens. So I see Dr. Nick here. We cover something you do with Dingo Tiles. So you're free to interrupt me after. So I see it coming. So three major design patterns we've recognized so far is the shared cluster. So basically, a virtual machine or a cluster of virtual machines split up into some kind of data service specific instances, such as databases, dedicated containers, and dedicated virtual machines. So when you look at their scalability capabilities, looking at a shared cluster is like the following. You have a set of virtual machines. Most likely, they are distributed across several availability zones. Why are there three of them? So in order to cover the outage of an availability zone, you still want to be able to provide a quorum by having two nodes being left in the cluster so that they can agree upon a new master, for example. So you have those three virtual machines. And you will slice up those virtual machines into different service instances, serving the databases as a service instance. So while this approach is fairly simple to do, all you have to do is wrap your MongoDB cluster in a Bosch release and write a service broker creating databases. It's also very cost-effective. So because what's the overhead for a service instance? Barely nothing. But at some point, your cluster will be full. And also having a threshold for when a cluster is full is hard because it's not the number of service instances that make up the utilization of that cluster, but it's what you do with those service instances. And so while this has a very low cost per service instance, and apparently a simple service broker logic, of course all you have to do is create a database when a new service instance is requested, or a database user when a new service binding is requested, you have pretty weak isolation because most of the data services, well, if they come with multi-tenancy capabilities at all, they are limited in doing that. Sometimes you are able to see the database names of other customers or even the database scheme of other customers. And sometimes there is no multi-tenancy capability at all. So beside of the weak isolation coming with a shared cluster approach, you also have the problem that the basic architecture cannot be taken as a common way to approach every data service. It's not a generic thing, and especially when designing a framework for integrating a large number of data services, that is a particular bad approach. Also it comes with the structural limitation that once your cluster is full, you have to somehow answer the question on how to scale out. So one of the obvious options would be to just have a second cluster and fill the second cluster with instances. And while this surely can be done, it also causes your simple service broker logic to become more complex. Because you have to deal with challenges such as fragmentation. So after a while, database instances are created and delete, and you have a fragmented cluster. So in this case, you can see cluster two is not really utilized, but it still consumes all the infrastructure resources taken by the virtual machines. So you end up having a placement problem where you have to decide on where to put a new service instance, and which causes the desire to have a strategy for cluster rebalancing. So if that scenario gets worse, the second cluster is basically not needed. So you wanna have something that rebalances those service instances transparently so you can tear down the cluster and free the infrastructure resources. This is all doable. It's possible and maybe in the future that's the way to go if combined with the proper context. However, it feels somehow like what Cloud Foundry does for service instance for application instances already, like you have this placement problem and so on. So yes, looking at the conclusions from this is you come up with two major challenges. First, you have a scalability issue. So with a little more complexity in the service broker, you can address that. But the isolation issues you are ending up with, they don't allow for a generic solution. So one way to get around that, especially if you wanna have cheap service instances, is using containers as they come with very little overhead compared to para-virtualized or fully-virtualized virtual machines. You have a better startup time and still you have a better isolation. So what you can do, for example, is again provide two hosts and this time virtual machines are equipped as Docker hosts and whenever you create a service instance, you create either a single container or a pair of containers. So in this scenario, we are using a clustered container solution. So you have to introduce the complexity of having a replication. So yeah, you have to create containers and you have to equip the containers with appropriate configured replication. So two process processes are actually running and you have replication going on, so good thing. So a new service instance is created, you just create a new pair of containers. So while this is a good approach and it's much better than a shared cluster because it has isolation and it also is pretty generic, all you have to provide is an automation to spin up a data service instance, but once you've done that, you can basically translate that to a lot of different data service types. So it's already a smarter solution. So but you also run into the same problem because at some point the cluster is full and you need more of those Docker hosts. So the same structural limitation applies. You have to answer the question, what to do when the cluster is full? So you also have the same service broker challenge. So of course you can just add new virtual machines and use them to create more service instances. So with that being said, problem is what happens if your clusters fall in the middle of the night and you have customers from a different time zone and they wanna create new service instances, but they can't. So that's something I as a platform user wouldn't like to happen. So I think that in the long run, the on-demand provisioning is somehow avoidable. So if we go down that road and look, how can we actually delegate all this complexity of the placement problem as well as the isolation, as well as the challenge of orchestrating the creation of several virtual machines? We can have a look at the third common data service design pattern which is about the on-demand creation of dedicated service instances, which could be single virtual machines, but also clusters of varying sizes. So in that scenario, comparable to the shared cluster, what we have is a set of three virtual machines. In this example, comprising a MongoDB replica set spread across three availability zones of your infrastructure. You're not using that database to carve it up into different service instances, but you take it as a service instance. So facing an organization with production requirements towards data services, most likely they either have DBAs running around in their company, already applying some automation magic with Chef or Puppet or whatever automation technology they use, so our solution needs to compete with them. Positioning would be around, let's automate the 80 to 90% average use case of the company by allowing you to create on-demand different clusters of MongoDB here or Postgres or whatever data service you prefer, either a single virtual machines, as you can see in the lower right, or as clusters of virtual machines. And of course, according to the Choose and Service plan, different sizes of virtual machines. A good thing and a very strong isolation you get is you get the isolation that comes with parallel or full virtualization, which is a little more resilient than the container-based isolation, comes with more overhead, but also has certain advantages. So the question could rise and how can this be a simple solution? And the answer to that is we have to delegate the heavy lifting, because there's a lot of complexity in how to orchestrate the creation of a large number of clusters where each cluster has a certain complexity. So a possible architecture for such an approach has already been built. Pivotal is working on something like that, and we've done that two years ago. So in that case, you have a pretty generic service broker that has a data service specific plugin called the SPI whose responsibility is mainly managing the creation of credentials and everything that's service specific, talking to a deployment component that actually manages Bosch deployments. So what it does is it chooses, it picks by the given service plan you have a certain Bosch release and creates a manifest file from it. So basically you say, I wanna have a single server, small or you wanna have a cluster, large. So the Bosch release has all the magic in it that covers the replication of a MongoDB cluster Postgres is more of a challenge because you have to add a cluster manager. So that will be all we in the Bosch release. So with that approach, all you have to do to create a new data service is you have to create an SPI. So how to create a credential for a newly created cluster during a service binding, for example, as well as creating a proper Bosch release. All the complexity then is handled by Bosch. I mean, Bosch has been built to create and run and maintain life cycle of large complex distributed systems. So let Bosch do the dirty work. So with Bosch taking care of virtual machine orchestration, we can actually use the infrastructure to solve problems such as placement and fragmentation because whether a virtual machine hose is fully utilized or has free spare. I mean, that's a problem we've solved many years ago. So what are the remaining challenges? Because we wanted to create 1,000 service instances. Think of 1,000 service instances being clustered. You end up having 3,000 virtual machines. So there are some challenges here. How to get there? The most naive approach would be that we are going to spin up 1,000 service instances and just see what happens. But that's way too expensive for a small company like us. We've just 50 people thinking about cloud-friendly stuff. So we are not a pivotal company. It just takes the resource. And I don't think it's meaningful because before you actually do that experiment, you have to prepare yourself in how to do that. Personally, I'm a lean, I'm a fan of lean management and also a fan of the lean startup. It's a paradigm. So what we actually do is we wanna create a build, measure, learn loop to enter a process of learning as an organization because there are certain things we don't know about the environment we are operating in. So there are many small and little question that tease us all the time because we are unsure how a system might behave in one or the other scenario. So the goal is to find a product market fit. And this translates to our scenario as we've been using our existing MongoDB, which by the way has been released for PCF today. And what we wanna end up having is a MongoDB that supports more than a thousand service instances. So how do we get there? The next questions we have to answer is what do we actually need to measure and what can we learn? So applying this paradigm, obviously, answering those questions is key. So we ended up searching for hypotheses of potential bottlenecks, which can be answered by just looking at the dependency tree of the architecture. So you can derive some of these potential bottlenecks and obviously each of our component could be a bottleneck or Bosch itself could be a bottleneck as the creation of virtual machines is handled by Bosch. So if you know Bosch a little more into detail, you know that it runs on one or several virtual machines, so the director could be overloaded in CPU or IO weight or whatever. There could be not enough workers so that you have a queuing of Bosch tasks or maybe at some point with thousands of virtual machines, the heartbeats coming from those virtual machines could actually outgrow the capacity of your NATS message bus. And the creation of virtual machines is also a heavy task to be executed by your infrastructure. So at some point, depending on the size of your infrastructure, you might upend up trouble with it. So now there's a customer and the customer needs to implement a certain solution, you know, promise to their customers. They need a proper runtime, a proper data service solution on their infrastructure. Do we already know when those things cause trouble? No, we don't. And therefore we define experiments to find out how the system behaves when we scale out. So in order to do that, we have to think about what are the quality attributes we would like to expect. And arbitrarily, from just, you know, observing our own patients, we came up with something that's about three minutes. I would like to have a new service instance being created on demand in around three minutes. Fair enough, if there's, you know, a lot of concurrent users, I might wait up to let's say nine minutes, but after nine minutes should be done on average. So it's obvious that the question of how many data services are there going to be in total, somehow influences or is influenced by the scale of my data service architecture. And also the number of concurrent creations of service instances or update or destroying them is determined by the system scale. So not by the architecture, but just by how many Bosch workers I have, how big my Bosch is. So in order to have an experiment and looking into that systematically, we have to identify relevant metrics and locks. One of the most important things, of course, because we rely on Bosch is the execution time of Bosch tasks in all its variations. So similar to a Kanban board, you can defer, differentiate the lead time and the cycle time. So the lead time would be, I wanna create a service instance. Let's assume a single server instance for, purpose of simplicity. So it ends up having a Bosch task creating this virtual machine. So how long does it take from service create until it's done? Because when doing multiple of those create operations at the same time, you might have a queuing in Bosch. So the cycle time, the time a Bosch task is executed will defer from the overall time of creating the service instance. And also, of course, the difference of that would be the wait time, the task just sit in the queue and wait until some worker has time to execute it. And in order to learn about potential bottlenecks around that, and maybe also because to learn why a certain Bosch task took so long, we also need system metrics from all VMs and components participating here. So gathering them required us to co-locate our lock stage on all the VMs and scripting the Bosch CLI to take relevant timestamps. So feature requests to the Bosch team would be very nice to have a more easy approach to collect this data. So scripting Bosch is a big thing, I guess, and it would be nice to have metrics from Bosch. Maybe there is a way to do that, but in the preparation of this talk and yeah, it could be, would be possible way. Yeah, I would love to talk about that later on. It's an interesting topic. So however, we definitely have to gather this information. So now our experiment looks like this. So we have the product, we have the metrics, and we have theories, so hypotheses around what bottlenecks may occur. So what are the influencing factors because this doesn't exist in a vacuum, but it is executed against a certain deployment of your data service on a certain infrastructure. So looking at the architecture again, you see there's a service broker, there's a deployer, there's a Bosch. It's a little simplified. There are a few more components, but for a creating service instance, that's mainly the components that are hit. So the scale of these components is important. Scale of your Bosch and scale of your infrastructure. So obviously it's a difference whether you have three workers or 10 workers or just more of them. What we are aiming at is finding out a scale out formula as a rule of thumb formula. And it's not a precise science here, but I would like to be able to tell a client to say, well, this is the scale and it's suitable for that many service instances. And you can create 10 of them at the same time and they all will be done in that amount of time. So find a system scale that matches a certain SLA. So we did some experiments. I'll go through three of them before coming to a conclusion. We did the first experiment on our in-house vSphere. It's a fairly small one, around 700 gigs of RAM, which is just a half a dozen physical machines, I guess. So when we started with a very small deployment, so with a small Bosch, three workers. And what we can see is that the creation of a single virtual machine when having one concurrent create service happening in parallel, that we are done in two and a half minutes, which passes our quality attribute. With 10 being created in parallel, we can see that the actual execution time of the Bosch task is already increased to three minute and 20 and that it takes six minutes in total. With 25, you wait up to 17 minutes on the creation of a single service instance on average, which is way beyond what we actually desire. So the outcome of that iteration was that we actually have the ability to create 10 service instances at the same time if you're willing to wait up to seven minutes on average. On average here means we had values between five and nine minutes. We also confirmed the assumptions that the utilization of our core components is barely nothing, so they are not going to be a bottleneck in the near future. And that scaling Bosch is the dominant factor. Obviously, we started with a very small Bosch deployment, so that was natural to happen. So the conclusion was we adjust the scale of Bosch for the next iteration. Still on vSphere, we just scaled the number of Bosch workers, assuming that the queuing time would be reduced, but what we've actually seen was that what we've seen is that the execution time heavily varied. You can see that the actual execution of a Bosch task, the creation of a VM suddenly took like eight minutes, and that was unexpected, because it clearly tells you that there's something happening on the infrastructure level that caused the VM creation to be delayed like that. And as you can see and as expected, the overall creation time goes a little down. So you are faster in creating those 25 instances on average. So the outcome was that there's a large derivation on the completion of the tasks caused by the infrastructure. We had learned that even a small infrastructure can be an influencing factor when creating service instances in parallel pretty early. So the conclusion was to see where the limits of the architecture are, we are going to move that to Amazon and see how that changes, because we've been sure that it's really the infrastructure that causes those delays. So the same experiment with, like in iteration two, we performed it on AWS, and we've seen that the creation of a virtual machine, the execution of a Bosch time, it's pretty constant. They are somehow equipped to do that. And you can also see that the average time for the entire creation of 25 instances on average just goes up to five minutes when creating 25 in parallel, which means that there's barely no queuing time for Bosch. So that would be okay, that scale would be okay. If you would like to create 25 service instances in parallel and wait around whatever, seven minutes to create a service instance, observed, we can see that AWS is more stable when it comes to dealing with more load, obviously. And also that small infrastructures may become a limiting factor. So if you expect to create a lot of service instance, for example, as part of your CI pipeline and you want to deploy them on demand, you definitely have to take care of your infrastructure as well. So yes, that is the SLA we came up with. And the next experiments we are going to make is we'll stepwise increase Bosch, increase the number of parallel instances to be created and see how is the relation between those two because we want to come up with this formula to scale out Bosch depending on the number of service instances you want to be able to run. Conclusion from our experiments was that, sadly, it took longer than expected because I actually planned to show you how to create a thousand service instances. But we are pretty positive that the architecture we've chosen is solid to do that because we haven't had any architectural bottlenecks hit so far. And we also learned that the number of simultaneously service broker operations is the first thing we should look at. It's not the total number, it's the number of service broker operations happening in parallel that should care us more. And this is basically determined by the scale of Bosch and the scale of your infrastructure. So this scale out formula should somehow have a balance between your infrastructure resources, your runtime, your Bosch, your data services. So we think that the ground work here is done and we will keep on doing these experiments and stay tuned. We are going to block and post about that because I think that's a valuable contribution to the entire Cloud Foundry community. So thank you for your attention and feel free to ask questions.