 Okay, let's get started. So, thanks for joining us today. So, we're here to talk to you about on-demand service provisioning with Bosch 2.0. So, I'm Alex, Leigh, I work for Pivotal in London. I'm Craig, I work for Pivotal in London on Cloud Foundry services. Yeah, so we've spent the last few months building Cloud Foundry service brokers that provision infrastructure resources on demand. And so we're going to tell you about some new features that we've used in Bosch that enable us to do this and also the features in Cloud Foundry that kind of tie this all together. So, this isn't a talk about the Bosch internals. I think there's a good talk about that earlier today. But we're going to be talking about people who develop services for Cloud Foundry. But this is going to be interesting for the application developers who want to know a bit more of what happens when you run CF create service. So, I'm sure this email will look familiar to any app developers in the audience, especially those that work at typical large enterprises. So, this is an example of an app developer asking for a redis service to be provisioned for them by ops. They probably can't walk over to the desk where the ops team sits. The ops team might be in a completely different office. They've got to specify exactly what they want in that first email. Yeah, and often you get an email that looks like this coming back and asking for your cost centre or business approvers. Like, we don't want to worry about this. No one wants to worry about that. So, we want databases in three to five minutes. We don't want it in three to five days. If you get something wrong, you have to go through a cycle over and over again. It's really painful. So, then long comes Cloud Foundry and improves everyone's lives immeasurably, right? So, who's run this command before? Who's run CF create service? Okay, that's most people. That's good. So, I'll go over it quickly just to break it down. The first argument to create service, redis. Cloud Foundry looks up the external service broker URL for the redis service offering forwards the request to it. The last parameter is just a name that you make up so that you can reference the service later. It's the middle one, which is interesting for this example, shared VM. So, implicit in this plan name is that we're going to be doing an example of a multi-tenant redis service plan. So, redis is a single user, not natively multi-tenant. To achieve multi-tenancy, we're going to have to start up redis processes on the same virtual machine. Yes, so Cloud Foundry really helps us with our applications, but still, behind the scenes, anything to do with managing services, it doesn't help you, and this is quite a hard problem to solve. And so, especially when you've got stateful services and you start to manage disks and persistence. So, we're going to walk through a scenario where we want to do this for a multi-tenant redis service. So, first, you need somewhere to run your service. So, you go and get a VM from, let's say, AWS. You've got to copy over the redis software. You've got to set down some config on the VM, including things such as port. You've got to start the redis process, and then, importantly, you've got to monitor it so that it stays up. You're going to need a persistent disk if you want to use redis' persistent modes to store state. But, wait, someone else wants another redis on the same multi-tenant service. So, yeah, we need to start up another redis process on a different port with different configuration, possibly. And then we also need a place to store that on the disk. And then, actually, if you want to spin up another one, you probably need some kind of orchestration on that VM to handle bringing up redis processes, taking them down. You know, what happens when someone wants to delete their service instance? You have to handle that in this agent. You still need monitoring. If you have too many redises and you have this noisy neighbor problem where redis one is taking all of redis' two's memory, how do you handle this? So, to solve some of these problems, we should just give everybody their own redis. This is referred to as a single-tenant service plan where each redis instance process runs on its own dedicated VM. You still need the virtual machine. You still need to run and, of course, monitor the redis process. And you still need the persistent disk. The configuration gets a little simpler. You no longer have to do stuff like make the port or the location on persistent disk to store data anything other than static. You're only going to have one redis on this machine. You lose the noisy neighbor problem too. But now you've got to automate provisioning these single-tenant virtual machines. So, how do we do this? We use Bosch. So, I'm sure it seems like most of the people in the room know this, but it's a preferred way in Cloud Foundry for managing deployments. It manages the full lifecycle of software, right? So, it's concerns with packaging your software, deploying software, running software, and upgrading software. It supports multi-cloud. So, you've got vSphere, AWS, Azure, OpenStack, CloudStack, Google Compute, and pretty much all of the main infrastructure providers. So, really, this means that you can focus on your code and not the infrastructure. You know, because we do love Bosch on Cloud Foundry. So, great. So, this slide shows one pattern you could use to develop a single-tenant redis using Bosch. We're going to have four virtual machines, three redis VMs, and one service broker. When the broker receives a create service request, it's going to very simply allocate one of the pre-provisioned redis virtual machines from the pool and market is taken. De-provisioning is a bit more complicated. When someone gives up their instance, you're going to want to recycle it because you're not going to get any more. So, you've got to scrub the state of the persistent disk, rotate the password, probably bounce the process as well. You'll probably need an agent co-located on the VM to do all of this. The upshot of using an approach like this is your operational story is a lot simpler. The entire service offering in every instance is described by one Bosch deployment. But the big problem, of course, what happens when someone tries to create more than three instances? Oh, it looks like instant creation will fail. So, the service broker would have noticed that it only has three to give out, and when you ask for the fourth, there's none left. So, we're right back to here. We have to send another email to the Cloud Found jobs team and say, can we increase the number of redises we have available on the platform? And so, why should developers be doing this? So, how about we deploy resources on demand? So, we use Bosch to do this, and what will this look like? Effectively, it means CF create service is a Bosch deploy. And so, let's take a look at what this is going to look like. So, a Bosch deployment is described by a Bosch manifest. Every deployment has exactly one manifest. So, the first job of a service broker that wants to deploy things on demand using Bosch when it receives a create service request is to generate a Bosch manifest. Then, it creates the deployment by just sending that manifest to Bosch. We then, we don't care. Like, we love Bosch, we love what it does, but we love that we don't have to think about it. Bosch will just converge the state of the world into what's described in your manifest. There are a few problems with generating Bosch manifest with code, though, that we're going to go through now. So, in a Bosch 1.0 manifest, some of the IA's abstractions leak into the manifest and it results in IA's specific manifests. And also, you have a lot of stateful IP bookkeeping. So, the service broker has to handle giving out IP addresses and putting those in particular manifests. And it also has to worry about running it over multiple availability zones. So, in Bosch 1.0 manifest, you are hand crafting your availability zone structure. And so, this is lots of work and results in quite complex service broker architectures. So, this is a snippet from a Bosch 1.0 manifest. Another kind of show of hands. How many people have written Bosch manifests? Okay, that's almost everyone again. That's good. So, this is just a snippet. It's not the whole thing, obviously. We're going to highlight some of the issues with generating these. First of all, look at that static IP's blog. You've got to keep a stateful representation of what you haven't already assigned. It forces the service developer to perform part of the role of a network administrator. Another snippet from the same manifest. This is the concrete network definition specific to the CPI we're on. Looking at the cloud properties block at the bottom, you can see that it's AWS. Now, it's not a huge deal, but you're going to need a code path per CPI you want to support. Someone brings out a new CPI and you have five service brokers. Let's say that they're all doing this. You've got to push an update to all five brokers. It can become a pain. Similarly, you've got cloud specific stem cell definitions, stem cell names even, and virtual machine definitions. What comes along? Bosch 2.0, right? It solves all of our problems. It makes our lives a lot easier service developers and also anyone that will be running Bosch releases. It's great. Really, it's not a 2.0 release. As you might have heard earlier, it's like a set of incremental features that have been put into Bosch over the last six months and then aiming to keep some backwards compatibility. Let's go through some of the new features in Bosch 2.0. Some of them are a lot old on others because, like Alex said, they've been trickling in over a long time. Static IPs have been optional for some time now. If you omit them, Bosch will dynamically assign an IP per job from whatever range you've configured from the network so no more bookkeeping. There was a kind of old trick people did for one job to discover another. Let's say you've got an errand that registers a service broker with Cloud Foundry and needs to know the service broker's IP address. You would in the past give the service broker a static IP and copy that IP into the properties block of the errand that needs it. Well, we now got this feature, Job Links, that allows one job to discover facts about another such as the IP at templating time. You never need to use static IPs in any circumstance that I can think of certainly. CPI-specific resource definitions to those subnets and VM types we saw earlier in the snippets, these have all been moved up to a global Cloud configuration. Manifests now only reference resources by their abstract name. These names are then concretely defined in the Cloud config. The power of this is that manifests are now completely portable across different clouds. No more code changes when someone wants to support a new CPI in their broker. It's all just done for you. Similarly, like Alex already said, there's now first-class AZ support. You don't need to do the trick anymore where you stripe jobs across AZs manually. You define which AZs you'd like to stripe across in your manifest and Bosch handles everything for you. This is an entire Bosch 2 manifest. It's short enough to fit on one slide. Down the bottom, you can see an example of resources being referenced but not defined in the same manifest. This manifest is portable to any Bosch director on any CPI, much easier to generate in code for us lazy developers. This is all documented on the Bosch IO website. These are some of the main features. You can go to these links to check it out. We'll probably upload the slides afterwards so you can get these links. It's quite easy to find on the site. What we're going to do now is we went away, taken these concepts and built an example of how you'd build a service broker. We're going to take a look at the CF-CLI lifecycle and look at how that orchestrates Bosch tasks and IaaS resources. Bear with me a sec while I try and get out of this into a demo video. We're probably going to have to flick through every slide again when this is done. In the mirror. Just play this here. What we're going to be doing here is using Redis as an example again. The first thing we want to do is see what's available to us in the marketplace. I'm just going to stop this and see if I can make it a little bit bigger. We can see that we've got an on-demand Redis broker and we've got a dedicated Redis plan available. What we're going to do is create a service. This will use the Cloud Foundry async provisioning feature. What Cloud Controller does is sends a request to the service broker. The service broker accepts it and then the Cloud Controller will keep polling the service broker saying, is it done yet? You can see what's happened here is that the service broker has created a Bosch task. This is a new Bosch deployment for Redis. This is going to provision a VM. You can see here that Bosch is talking to the IaaS to provision this VM. What this results in in AWS is a new VM coming up as expected with Bosch. Here you can see that we've got a T2 micro. It's a really small Redis instance suitable for a development use case. Now Bosch will lay down the Redis software like we described earlier and it's going to monitor and make sure that all those processes are healthy. Once it's done that, it will come back to report success. In the meantime, the Cloud Controller is still polling the service broker saying, is my service instance ready yet? Here you can see that it's still in progress and in a minute we should have it created successfully. You can see that we've easily gone from CF create service talking to the service broker and then resulted in a new Redis deployment. Let's have a look at how we could reconfigure this. You've got your own dedicated Redis. That was really easy. But you might want to fine tune some of the configuration. We're going to push an example app. This example app allows us to read and write from Redis. Also check out the Redis configuration. You can see here that we're just pushing it and binding it to the service. Once we have the application up and running, we're just going to make sure that Redis is healthy and we're just going to try and write some data to Redis. You can see next. Sorry, the video might be on the bottom. We've effectively got the application running and bound. Now we are going to issue a command to write some data, which was successful. Next, we want to try and change the configuration. We've chosen the max clients for Redis, which you can see in a sec. By default this is set to 100. We want to limit how many people can connect to our Redis by default. We're going to set the max clients to 10. This is something that as an app developer you might want to put constraints on in your service. What we're going to do is do this update service and pass in an arbitrary pram. Again, you can see a common pattern here. The service broker has received a request, generated a new manifest, sent it to Bosch, and then Bosch has started a deployment and it is going to update the deployment. Lay down in a new configuration. Let's run the max clients command again and see if that's changed. Bosch has done the deployment and we've changed the max clients to 10. This is giving the CLI user a lot more power around how they manage their service. Finally, we're going to look at switching Redis over onto a high memory VM. A use case for this is your app is running out of memory in Redis and you want a new bigger Redis. It's quite common. You do this through a plan migration. We're changing over to a high memory plan. I think you might be shocked at this, but we're going to use Bosch again to upgrade the deployment. What this does behind the scenes is detach this disk from the old Redis VM and it will bring up a new high memory VM here. You can see we're going for an R3 large. Then Bosch is going to reattach the disk and it will mean that we have pretty easily switched over the VM type all using the power of Bosch and the service broker effectively just becomes a manifest generator and instructs Bosch. That's the end of the demo. You can see this is quite a powerful experience for CLI users now. Let me try and get back into the slides. That demo was made entirely using shell scripts that look like they're creating Bosch deployments, but really they're not. No, I'm kidding. It was created using entirely features that are available in open source Cloud Foundry and open source Bosch so you could make something like this today. This is technical problems. Back on track. Great. This is fantastic for app developers. This gives app developers more control over provisioning and managing their stateful services than ever before crucially. It even gives them the ability to provision and manage IaaS resources which quite possibly they won't have done before. This can move organisations to a more DevOps-oriented culture. You can see that we can offer these cool experiences around managing services from the CLI. Today you'll probably use arbitrary prams to configure this. We're going to look at an example here where we're scaling a cluster. You might want to have a Cassandra service and all of a sudden you want to scale the number of seed nodes in your cluster. You could pass an arbitrary pram that would get through to the service broker. Here we're saying we want five seed nodes. The service broker would then receive this. It would generate a new manifest just increasing the instance count and deploy this. This is pretty awesome because now the CLI user, the app developer is in control of scaling their cluster dynamically as they see fit. Currently we do this often through plans so you change your plan which isn't great. We currently have no control for app developers often of when your service gets upgraded. If you're changing major versions of your service you might be changing your Redis version and it might have breaking changes. At the moment, it's a weird cycle between operators and app developers to let them know there's new software available and they're going to be upgrading on this date. You could move this over to allow the application developer to have some control of when they get to upgrade their service. You can see here that we would use arbitrary prams to pass in an upgrade command. We could then generate a manifest for the new version of Redis all controlled from the app developer. We're just going to look at is it time for a richer Cloud Foundry service experience and what would this experience possibly look like? These next few slides are not using real CFC-like commands. They're using potential CFC-like commands that achieve the same things as the previous slides did with arbitrary parameters. What are the advantages to making CFC upgrade service a first-class citizen? For one, unifies the UX around upgrades. If you had lots of services, lots of service brokers offering upgrades with differently named arbitrary prams that can get confusing and frustrating for the CLI user. If upgrade service was a first-class citizen, it's implied that CLI users, app developers, would be able to discover their upgrade path as well using the CLI rather than poking around in Bosh and asking service developers. That's a thing that potentially could be explored in the future. Similarly, same UX argument as before, but for scaling. Scaling the instance count of clustered services is extremely common. Currently, it's often accomplished through having lots of plans. The advantage of doing it through an arbitrary parameter is that you declutter the services marketplace. You don't have so many plans if the service developer wants to offer a large range of instance counts. They can instead just pass that responsibility to the app developer. While this all sounds pretty awesome, it does really make some huge operational challenges. Imagine you've got an organisation that's creating services. You create 50 radices. You create 50 Cassandra clusters. You've got a ton of Bosh deployments, and Bosh deployments mean resource usage. The Bosh deployments will use your IaaS resources which you get charged for, or it might use your internal vSphere. Again, you're going to need some way of monitoring who's using the resources, how much they're using, are they efficiently using them, does this development team really need the biggest Cassandra cluster? Another example would be quota management and charge backs. That's like a common request around services. How do you manage this? Effectively, as you're going to see your AWS bill ticking up and up and up, you might want some quota management around that. So when I mentioned earlier that app developers having control of IaaS resource provisioning was a benefit for them, it could also be considered a risk from the point of view of operators. It's possible using all of these features today to build a very flexible service broker that puts almost full control of IaaS resources, instance counts, et cetera, upgrade lifecycle even into the hands of app developers. Do operators want to permit a service broker on their platform that allows app developers to stay on an arbitrarily old version of Redis forever and then upgrade to the newer one using an unpredictable path? Do they want to permit service brokers that allow indefinite scaling of nodes? Probably not. There's a balance to be had here and this is a new challenge for operators. Yes, so with that, we hope you would give me something to think about especially with the last few slides and I believe we've still got a bit of time for some questions if anyone had any. In the back left, the example service broker you saw in this was not yet open source. Yes, so the question was around like Bosch failure which is quite an interesting scenario where the Bosch deployment might not succeed and you might have some leftover resources that need to be cleaned up. So if you're building a service broker like this, you'll need lots of logging and metrics and feedback cycle to operators if something did go wrong. So there's also a UX element to this. So when an asynchronous service provisioning or update or deletion for that matter fails, the service broker can send back some message that gets displayed on the CLI. So again, it's in the hands of the service developer. Do they download all the output from the Bosch task and expose that to the app developer? That's maybe. That's for the people who productise these things to decide. Cool. Anyone else? Okay. Great. Well, thanks for listening and cheers.