 Hello, everyone. Welcome. Thank you very much for coming. This talk is about multi-data centre cloud foundry. If you're in the wrong room, leave now, because I'm going to lock the doors. We won't let you out after now. Thank you very much again for coming. Quick introductions. My name is Colin Humphreys. I'm CTO for Cloud at Pivotal. Will Daniel introduce himself? Hello. I'm Daniel Jones, CTO of Engineer Better. We're a UK cloud foundry consultancy. This is the maximum CTO quota per stage that we've hit. If you get more CTOs than this, we explode. Right, we're going to get started with a story. I'm going to tell a story. I'm going to wander around. Is this being recorded? Oh, it is. Let's see if she can keep up with me. I'm going to get started with a story about my youth. Way, way back in time, in my younger days, I got involved with a big technology project for the UK's largest cinema chain. I like a bit of audience interaction here. Who's been to a cinema, most of you, and the rest of you can't be bothered to put your hands up. Now I know who's involved and who isn't. I was helping out the UK's largest cinema chain with their requirements and their technology. So what used to happen back in the day is that people used to go to the cinema and they would buy tickets there and then they would watch a film. Now I know nowadays you have Netflix and no one goes to the cinema anymore, but this used to happen. This was a thing. So people would go to the cinema, they would buy a ticket, that ticket would be for a seat, they would sit in that seat and then they would watch a film. Are we all okay with this? Is this too technical? No, good, great. So they'd get a seat. Now what happened was the cinemas, they had a database and that database would store the times, the films that were on, which seats had been booked for which people, all that kind of thing. But the cinema chains noticed that there was this thing called the internet. What they wanted to do was allow people to book cinema seats online. Wouldn't that be amazing? Go online and book a cinema seat. So we're going to become part of the cool new world here. We're going to have nuclear powered houses. It's going to be amazing. We're going to be able to book cinema seats online. So I was part of the project to do that. So we installed some web servers and we built an amazing website and it was fantastic. What we did was we connected the cinemas to the internet. We connected them with ADSL, which I'm going to depict as a piece of string, because that's about how reliable it was. So this was great and everything was going along, but ADSL, as you know, isn't the most reliable connection, so this would happen quite often. And the cinemas would get disconnected. And then you couldn't buy seats online. And then I got shouted at. Thank you very much to PHP CEO for his avatar here. So I got shouted at by the people with all the money and this was bad. So we had this problem in which you could go to the cinema and buy seats and that would carry on working, but online would stop working. So, you know, I was young and I knew everything. So I thought I can fix this. I'm super clever. So what I did was I put a second database near the web servers and this was awesome. So granted the second database, we now have high availability. Isn't this amazing? I'm sure you will try to do this. We had zero single points of failure, something I'm sure you will hear a lot about. This was truly amazing. Nothing could break it, not even a thermonuclear war. That's a reference for some of you. So this is great and everything's amazing. And then obviously, you know, the thing that I wanted to have happened happened. So our string, our ADSL, that would get cut. So now we had a solution that looked like this and everything's great. This is so, so amazing. People can still book seats. They can go to the cinema and buy a ticket and get a seat. They can go online, they can buy a ticket and they can get a seat. The world is fantastic. However, this didn't really work out quite as I planned. Can anyone guess what happened here? It's exactly as you thought, yes. So what happened was I looked at it optimistically and I think what happened was I doubled the profits for that cinema chain for a short period of time. I increased the density of the seating is another way of looking at this. Yes, I did. Well, for a short period of time, we did sell two tickets for every single seat in a certain cinema. And guess what happened? Yes, I got shouted at again. So I had to learn very quickly, I had to learn something called cap theorem. So who has heard of cap theorem? Okay, it's all the same people that went to the cinema before. Okay, we know you're still engaged, you're still listening. That's good. So I learned this lesson very quickly. So what we're really saying here is you can choose consistency, which we had initially when we had the outage and we snapped the wire. You know, we couldn't buy seats in one of the two sites. Or we can have availability where we can carry on buying seats, but we're going to sell them twice. We cannot choose both. It is not possible to choose both. So that's what cap theorem really says. It's more complex than that, but I wanted to still it down to that for you. So when you have this network partition, as we call it, when you chop the string, is our lovely ADSL, you create a partition and you can choose between availability and consistency. So when people ask you about cap theorem, are you a wizard of computer science? Do you understand cap theorem? Well, now you do. You can say it's got something to do with selling too many cinema seats and being shouted at. So a lot of you are now thinking, why is Colin waffling on about cinemas? Because we all know that cap theorem only applies to stateful systems and Cloud Foundry is a stateless system. So why are you going on about cinema seats? Allow me to explain. Cloud Foundry runs stateless apps, but is a stateful system. That seems to be the part a lot of people are missing out on. It has state. Cloud Foundry has a database behind it. If you think about what you're doing when you say to Cloud Foundry, CF push an app, Cloud Foundry now has some state. It knows you want to run your app and it has your app code and it compiles it into a droplet and that's some state. And when you scale it up, you're saying to Cloud Foundry, run 10 of this app and that is a declaration of state that you want to run 10. So it has state and thus cap theorem applies. Examples of state within Cloud Foundry, the Cloud Control database, that has the state of what you want to run. Your UAADB knows about your users, the Blob Store has your source, your droplets, your build packs. This is all state. So we have the state management problem. Even the routing table is state. Which application instances are running which codes so we can route to them. And when you have state, you have problems. So what can we do about this? What does cap mean for Cloud Foundry? Well, the reason you're here today, the thing I hope we can help you with is if you understand cap theorem and you understand Cloud Foundry's stateful and cap theorem applies, we can understand the constraints that we're operating under and then we can make some choices so we can feel empowered by this knowledge. So I think you have three options with Cloud Foundry about how you deploy to multiple data centres. And I'm going to run through these options very briefly before Dan throws me off the stage. So the first option you have is this option, no partition tolerance. So what you're saying here is you have Cloud Foundry deployed, don't have it in a second site, you just deploy it to one site, perhaps multiple availability zones, but single site. What you get here is Cloud Foundry and it's great. And if your site goes offline, you've lost Cloud Foundry. But for most people, this is the simplest way of doing it. And I gave a lightning talk on Monday night about how if you're preparing to go multi-site because of the possibility of a thermonuclear war, you may be over-engineering. There hasn't been that many thermonuclear wars. So for most people and most availability requirements, just running Cloud Foundry single site will be the simplest, easiest way to do it. But some people do have higher availability requirements. So if you do this, single site, you don't have a second site. If this site gets offline, you get shouted at. So what do we do about that? So your second option is to do this. Now this diagram may not make much sense. I now realise that. What I'm depicting here is that in your first site you have the whole of Cloud Foundry as you know it, all of the components. And in the second site, represented by a small rabbit juggling gears and wearing goggles, that's just the runtime components. So if you took just the runtime components over to one side and then had the management plane on the first side, everything that's stateful, what you're doing is you're saying if the left-hand side goes offline, the right-hand side, the runtime for your apps will just carry on at altitude running all your applications. You can't push new apps, you can't scale things up, scale things down. It just carries on as it was. So we link those two together. All your control plane sits on the left-hand side, your running apps sits on both sides. Now what this gives you is consistency because if your primary site goes offline, you can't make any changes to your running applications. So you can't push new apps, you can't scale them up, scale them down, but your apps carry on running. So you've lost the availability of your control plane, but your apps carry on running. If your secondary site, the rabbit, goes offline, everything's fine because you can still make changes on this side. And when the rabbit comes back online, it will converge. So this looks a little like the isolation zones that were spoken about earlier on today. Maybe this is an advanced feature, a later feature of isolation segments, sorry I should call them. So as I say, if we cut this, we still have a good situation, but we lose the ability potentially if one site goes offline to push new apps. So I recommend for a lot of you, this is a good way of doing multi-site cloud foundry because you have that consistency. You don't have to worry about the applications being inconsistent versions across the multiple cloud foundries. But as I said, we can't push new applications, so we're going to get shout at that. Same as usual. So the third option that you have is to just run multiple cloud foundries. Now we see this a lot. So you run two completely separate cloud foundries in different regions. You push your apps to both of them, and this looks like the dream. But the problem is then management. You can push different versions of your apps to both of them. If you're pushing a new version of your app, what happens if one of them is offline when you push it? And then comes back online later with an older version of your app. What happens if your users exist in one, not the other? All these kind of things, you have a big issue around consistency. How do you keep multiple cloud foundries consistent? Very challenging. So you push this out. And I've done this, and I've had a cloud foundry that was unavailable while we were pushing out the app. And then it kind of self-healed shortly afterwards, and then you have different versions of the app running in different places. So one third of your users across three cloud foundries are getting a wrong version of the app. And then we know what happens when we get shouted at. And this problem just multiplies itself as you go for let's have lots of cloud foundries, let's maximise availability. Let's have four cloud foundries that's going to be great. Let's have ten cloud foundries. Well, what happens when we start to go really big? Will we get shouted at a lot by lots of people because it's gone really, really big? Big problems. And I have to say everything that I'm saying now, all these problems also exist for the data behind your cloud foundry applications. Please be aware this is not just for cloud foundry for your stateful services behind cloud foundry. All these problems exist. And when they go wrong, you're going to get shouted at again. So there's a lot of shouting going on. So the question is, if we are going to run multi-site cloud foundry, how can we make it more manageable? How can we make a cloud foundry, a p-clip in there? How can we make available cloud foundry, AP available and partition tolerant cloud foundry, more manageable? How do we keep some notion of consistency across multiple cloud foundries? And this is where things get really difficult. So I'm going to hand over to Dan. Thank you, Colin. Shall I try this one? I'll go with a handheld mic. So the funny part of this presentation is now over. There will be no more jokes. There will be no more levity. We're going to talk about technology and CLI tools. So you may want to leave at this point. So I'm going to take you through a couple of tools that either we've written or other people have contributed to the ecosystem that make this more manageable. Because it's a problem that lots of people seem to have against Colin's advice. Lots of people want to run multiple cloud foundries. So the first tool was one that we created to help some friends of ours who were running multiple cloud foundries. And this is a tool called CFPlex. It's basically a shim for the CFCLI that allows you to run one command against named groups of cloud foundries. So maybe you've got your prod set of cloud foundries. This will just run that one command against each one. If it fails, it tells you that it's failed, doesn't do anything clever, doesn't try to achieve some kind of convergence state. But if you are operating multiple cloud foundries and you're typing commands in by hand and then changing your target, typing the same commands in by hand, and then changing your target and typing the same commands in by hand, at least this will reduce the number of errors that you make. Or if you make an error it's going to apply it to all of your cloud foundries at once. It does everything the CFCLI does. It works with plugins, it isolates CF home, so all the behaviour that you normally expect is there. And it works in an interactive or a batch mode. So you can either specify, you add APIs to it, so you can add a list of things, a list of different cloud controllers to hit, and then run a command, which is handy when you're maybe doing ops type stuff. You can name those groups as well, so this is the non-group version, but you can do add API and then say like a prod, and then one of the cloud controller URLs. There's also batch mode, which perhaps is a bit of an anti-pattern, but if you need to do this kind of thing in your CI jobs and you're not using something cool like concourse, this will help in that you can specify all the cloud foundries you want to hit and the usernames and passwords as an environment variable. So you just set it off from running, so it's not interactive. It's worth me pointing out actually, and the last one it will do login interactively so you have to be there to type in your username and password. We don't do anything clever with credentials management because that would be hard work, right? So this version you can specify an environment variable to do that. And it will run off and do whatever you tell it to do against all of the cloud foundries you specified in your envars. The separators are really weird if you've ever tried guessing which symbols people are going to use in their passwords and not those were ones that seemed least likely and weren't going to screw up Ash. But that's a tool for performing the same operation against multiple cloud foundries. It's more of a helper for ops people to make their lives easier to reduce the likelihood of typos and mistakes and that sort of thing. The next approach is specific to apps, so concourse is lovely. If you're not using concourse for your CI you should be. If you're doing cloud foundry you are probably already using it. The thing is it's got a cloud foundry resource so you can use it to push apps. And really if you've got a value stream that involves applications being on some cloud foundry somewhere you should be mapping that out in concourse. There should be little boxes on a UI somewhere that say we're going to push this app to that cloud foundry. But apps is all it does. So this is great for making sure that apps get there and I highly recommend you use it but it's not the whole story. What happens about user provided services? What happens if you've got a new version of your app and you've got a version of code in Git and you need to make sure that when that version gets pushed a user provided service is available? How do you keep those in lock set? At the moment there's nothing in concourse that does that other than if you script it yourself. Also you end up with one resource per cloud foundry that you're hitting so if you've got 10 cloud foundries then you're going to have 10 boxes and there's going to be lots of copying and pasting in your concourse. So the next thing I wrote was something that I was hacking on at the last cloud foundry summit and if you ever talked to Colin about contributing something to the cloud foundry ecosystem there are only two answers. If you say I've got this great idea for a bit of software, what do you think? The answer is either cloud credo already did it for a customer and they can't tell you about it because it's proprietary or Pivotal are working on it so don't bother. So I ignored that. You say that every time. So I was hacking on something called the Converger and the idea is to have this is a Java application that is a bit like Bosch but for cloud foundry. So you specify a declaration of how you want your cloud foundry to look in YAML. You specify the orgs you want, the spaces you want, the user provided services, the services, the users, the user roles and it will go away and make several cloud foundries all look identical. But it doesn't do apps and that's on purpose because my personal belief is that apps should be done through CI. This is for the setup around these apps. However, I did have one idea and that was making it a concourse resource. So you could have concourse pushing apps to cloud foundry and then we could make together because it's open source in its community. We could all work together to make the Converger a concourse resource. So you can say and make sure that this space looks like this and it has these user provided services with it. So this is an example of the kind of YAML snippet that you would configure it with. For some friends of mine that I worked with when we originally first had this idea, we're using a software defined network. So we've already thought ahead of being able to define security groups and then have a plug-in architecture. So you could have something that would run against, I shouldn't name particular SDNs, but run against your SDN and create firewall rules as well as an effort saver. And originally the use case was to converge an org at a time. So you'd have one org and make it look exactly like this. The example at the bottom about having a REST API where you specify the group of cloud foundries you're after, and then an org and then a space would allow you to integrate this as part of your CI pipeline. So set up my space, make sure my space is in exactly the state I expect when I push this out. You could also then use it for setting up test environments as well. So if you're a cloud foundry developer, if you're one of the folks at Pivotal that end up standing up cloud foundries 15 times a day to test things against, it will be quicker and more declarative than bash scripting the setup of a load of spaces. So the Converger is a work in progress. Everything you see there does work, I believe, at the moment, but it doesn't do apps and that's on purpose. So we still don't have a complete tool. The next tool, is that my way of, does somebody lean on the thing? I can stop if you want me to, just say. The next tool has the best logo, therefore it wins the competition. So deploy adapt tool. It's not mine, I can't take credit for it. Deploys an app to many cloud foundry instances. And we'll cope with things like rollback. But again, it's only doing apps. It's got a plugin architecture and I believe the people that made it have got some really cool plugins. I'm not sure how many of those are open source. But in terms of deploying apps across multiple cloud foundries and coping with the failures that might happen, what happens if one's down, then deploy adapt tool is the tool that will manage that best for you. The next tool also has a logo, unlike mine. I don't think it's as good as deploy adapt tool one, I'm afraid to say. But push to cloud, there was a really interesting tool given at Sant Clara about this. And I can't work out whether I really like this or whether I think it's a bit of an anti-pattern. But if you're deploying large microservice architectures, then this might be of interest to you. So you can describe your entire architecture and all the apps that make it up and deploy those all as one goes. It's a declarative format for whole microservice architectures. There's a lot of JavaScripts and JSON in it, so that might not be to everyone's taste. The thing that I kind of have queries about is that we went to microservice architecture so we could deploy things on their own cadence. It's a well-defined boundaries, bounded context, and so we wouldn't have to worry about pushing them all at the same time. So is it actually an anti-pattern to describe your microservice architecture all in one go and say, to probably this whole lot of things, these versions all work together? And if that is an anti-pattern, why do we think it's an okay thing to do when we deploy stuff with Bosch, because it's the same kind of principle with many components of the same system working together? So there are no tools currently in existence that solve all the problems of having multiple cloud foundries running together. It's a hard thing. You will have to worry about the consistency problem. CFPlex will make it easier if you're going to do stuff by hand. Concourse you should be using to push your apps into cloud foundry. The Converger isn't finished yet, but can do a hell of a lot of stuff and has all the qualities of Bosch in terms of being a declarative format and making the world like you want it to be in an idem fashion. Deploy a Daxel, great for rolling forwards, deploying apps across multiple cloud foundries and coping with their errors, and push to cloud if you've got a complicated micro-service architecture, then this could be the tool for you. And that's a wrap-up of tools that may help you if you decide to go down the many cloud foundries route. And with that, I think we are done. Does anyone have any questions? Yes, Mr Pinto. Is there a reason you didn't mention Spinnaker as a deployment tool to many cloud foundries? Not really, other than I didn't think of it, but now you mention it. That's how exhausted my research was. So yes, Spinnaker was, the question was, why didn't you mention Spinnaker? Have you used Spinnaker? And has it been successful for you? It's okay at the basic level, but it's ensuring as we go on. So it's one to watch. And that's Spinnaker predominantly for deploying apps. Does it do any of the configuration of the cloud foundries? Awesome. So you can do one of those things. Cool. So there you go. Recommendation for you, check out Spinnaker. Any other questions? Mr Matthews. Was CFPlex doing its operation in parallel or sequentially? CFPlex does them sequentially so that when stuff goes wrong it fails straight away and you know about it, rather than trying to do anything too clever. Any questions about cinemas? For scenario two, where you separated out the run time piece, kind of infrastructure is required to run that? So the question was, my second scenario, which is the consistent scenario where we separated out the run time from the complete cloud foundry components. Which infrastructure do you need to do that? We're exploring that at the moment. So there's a proposal for isolation zones and that's kind of an evolution of elastic clusters. So we're looking potentially, I think you definitely need bare minimum routers and cells. And we probably actually need, the feature we need really is around the container networking inside of cloud foundry. At the moment your routers will, by default, prune their routes if they are detached from the management plane after 120 seconds. So it would kind of sit there everything being fine for two minutes and then you'd have no apps left. So we're looking at how we can deal with that in a consistent fashion because unfortunately the routers have state as I mentioned and that makes things a little tricky. But we're getting there, it's in progress. Any more questions? I have more. For those of us that weren't around at the talk about isolation segments earlier, when is it going to be ready? You know I work for Pivotal, we're an agile company. I don't know, we predict timings, we just don't give commitments so I can look in the backlogs for you and see what's going to be happening. Is it more up to down the backlog or is it? So we turned the lights out earlier on and then disappeared. I think when you started asking that question I personally, the honest answer is I don't know. I can look in the backlogs and we can give you a prediction based upon our velocity so I can get at you on that but I just don't actually know. It's currently in proposal stage at the moment so we probably don't have an accurate prediction right now. I think you could experiment with this yourself if you wanted to, we just wouldn't support it right now. Any more questions? Thank you all very much for coming.