 So welcome to my talk. I'm giving here and that's quite a long title. I'm kind of bad in Finding a right title in the last minute when the summit went to summit. So I'm going to read that one out Who was in the talk from? Colin and Jonas about multi data center deployments Okay, this is One of the implementations of it it takes up some parts So you heard a lot of theory in this talk. This is We're talking here a little bit more about how we are tackling the approach In Collins and Daniel Stokes. You heard a lot of phpco yelling In our here, maybe you don't see phpco. Maybe Here I'm old man healing at clouds. I'm not the old man, but I'm my name is Marcel Harry I'm leading the architecture within Swiss comms application cloud and the elastic stack I'm also a member of Cloud Foundry's technical advisory board and I'm working with Cloud Foundry for now more than three years What is Swiss comms application cloud? Swiss comms application cloud has public offering It's on developer Swiss comms.com. We're one of The certified Cloud Foundry providers. You can go there sign up push your applications immediately and see the way how we are Provisioning Cloud Foundry and to to our users As Swiss comm as a service provider, we're offering Cloud Foundry to numerous customers. We have some of them are Powered by application cloud before you just had a talk about how Dorma Kabul leverages their microservice services architecture on top of our application cloud We are also leveraging Cloud Foundry heavily internally for all our newer applications to be built Swiss comm has a wide variety a wide range of applications, whether it's for supporting People on the field in in the stores or even just internally and we have also a Much bigger project running on top of one of our virtual private instances. This is my cloud my cloud is the Dropbox offering of Swiss comm it runs purely On top of our application cloud and offers you cloud storage Actually anybody can go there and sign up and you get a bunch of gigabytes to store your stuff So all all of these customers they have customers as well and What you usually expect as a customer with when using a cloud service is That cloud is always on it's it the service is just there it works No matter what might happen, but the important thing is Whenever you want to use the service if you have internet connectivity, it should work As a developer if you look at it and we have is very nice Haiko is as a developer you just want to say well, here's my source code Run it on the cloud for me. I don't care how and this is where we as as a cloud foundry Platform operator are coming in into the game our customers are The developers who are trusting us with their code and and all that we should do is We should provide a platform that keeps everybody asleep. So no outages and It just works So But cloud do fails. I mean there might be Connections getting lost there might be hardware failures. There might be logical Failures within the infrastructure underneath data centers might might have problems whatever and also Since we're getting more and more customers and more and more applications for example Also with our internal application cloud. We're seeing a bunch of requirements that are popping up like Business is actually requiring that The application that is being hosted it runs within multiple data centers. I mean we're keeping up Cloud Foundry within Highly-awayable we keep up your your application, but there are certain maybe regulations or or things that say well You just need to run within multiple data centers and also if you start getting more and more customers on top of a platform and you might need to do maintenance on that platform it finding a Maintenance window that fits all of the customers is just forget about that. That's not something that will work out and as a developer you still want to stick with With that user experience off. I just want to push to the cloud and just It's it's your you're the platform operator. You keep things up and running. I mean I don't need to I Don't I should not think about AJ because I mean I built my application already based on on on on the 12-factor Manifesto, so why I mean my my application is ready to horizontally scale But also being highly-awayable if you as a platform operator do it and then if we We're also running our infrastructure. Then then the infrastructure wants to keep things simple as possible and stupid and But you also need to lifecycle infrastructure and this can be adding new hardware. This might be exchanging things that failure and that fail and Over time you might also get bigger upgrades to leverage new functionality and these bigger upgrades might not always Work in an interruption-free manner, so and then giving that we just can't forget about finding finding a maintenance window that fits all and The and the the feeling of the people. Well things should always be on We we actually just need what we want to have is that we can simply CF push into multiple data centers and The infrastructure underneath can lifecycle without interrupting the services running on top So We we then started thinking about okay, we were this this requirements is getting more and more important for us and And then how do we want to approach it? One of the classical approaches so far is okay Let's just make infrastructure highly-awayable through throughout multiple data centers But we really would like to keep the infrastructure Simple as possible and and also stupid as possible So we we would really like to for example not stretch storage over multiple data centers In the end we would also like to be able to just tear down a whole stack or a data center Without also having the possibility that it might lose and the only thing that we actually need to do with the application Was pushed into multiple data center? We just need to direct or shift Traffic to the right endpoints and given that we're a telecommunications company network operator It's kind of the things that we we can easily do we can easily Manage the traffic to the right end point So There was recently an xkcd was I think the second last one I'm not going to read the whole one the funny thing is okay Let's just burn down data centers in the end. There was also Colin talking on Monday Saying well, maybe if you're engineering for a thermonuclear war, you're maybe over engineering So let's just find some somewhere the balance in between and so we went off with that and We we we made a few important marks that we wanted to to consider So for us The developer experience is really important. We would like to keep one push. So there are At this summit now a lot of people talked about multi-cloud deploying to multiple clouds folks have presented tooling that supports you in pushing towards Multiple cloud foundry installations, which then mimic kind of that way of of a of a simple One CF push we we think we we would like to to to keep with that without addressing multiple cloud foundries at once one of the things and this was also in Collins and Daniel's talk was in the cap theory. So this is certainly something We we should keep in mind people who were Listening to to Collins taught and you now have all the theoretical details and you know how Cinema seats can get mixed up if you don't add her to the cap theory and Another thing is certainly we we also need to keep network traffic in mind I mean we're building a dispute the we're building on top of a distributed system that itself Provides a platform for a microservice architecture. So network traffic is certainly something that happens a lot and especially if it comes in in together with with a synchronizing state or Keeping things efficient latency is something and we should really care of the good thing is as Swiss We have certain advantages and with regards of that I mean we own and control our stacks the data centers We run it the networks and also the pipes All these things are being pushed through and everything is even in a a metropolitan area because I mean if you compare Switzerland is that small that we don't need to talk about going from east coast to west coast where we're certainly hit hitting Various physical limits of just being fast enough so When we look at how we're deploying our application cloud or Cloud Foundry then we basically It more or less looks a little bit like that. So on the top We have an access layer. So this is where traffic from Other networks flow in to our stack and this is also where for example, we terminate SSL, this is the first thing where load balance a first phase of load balancing happen is happening and This is also the north south bridge between The physical network and and the virtualized network and based on an SDN on top of of an open stack and then we We have Cloud Foundry is is is quite a nice already built distributed system. So it it has various ephemeral components or easily horizontal Horizontally scalable components like The cloud controller API or even the Diego cells themselves. I mean they're usually just running off of ephemeral storage But as we also learned in in other talks cloud Foundry is Itself a stateful system. So it has certain states and this is mainly being provided through console Etcd and as well as the cloud controller database and this one We're running on top of a Galera cluster that is already spanning over three nodes and We have our persistent services persistent services is always the thing that gives you a little bit a headache if you are starting to think about Making things horizontally scalable. It's also the nice thing where cloud foundry says well 12 factor apps where we're the platform for 12 factor apps just consumes the services from from outside But we as we're offering a platform to to run applications people are also also asking for these services But there we're now moving more and more towards highly-awayable deployments of enterprise great service offerings like MongoDB enterprise either sharded or partitioned and and with replicasets and or like Also the Galera cluster with which is already highly-awayable and The other thing where cloud Foundry stores it states is the blob store and within Swisscom. We have Another platform that is offering s3 as a service already out based out of four locations so we we already have an object store that is highly-awayable and Geographically distributed and Then underneath you you kind of like I have the Bosch director and and various supporting services But all of them they are not that important in case of In case of a failure because they're there maybe here to manage But maybe first you can resurrect them and then go on later because running the workload is not being driven by them Another important thing is this all all these components within the orange box They are heavily talking to each another so usually you deploy it either into one or multiple networks But more or less there's they're just they must be routable on layer 3 So they should be able to communicate directly with each another So when we when we looked at that picture and then moved on We said okay, we're going now to take this this thing and going to span it over multiple data centers that are given our the size of the country we offer the services in That they are very close to each another and we're able to get very good a bandwidth and and also be Latency in between those data centers So so we can address various so we we can address various of the problems easily by just having that Advantage of of the infrastructure underneath So this pictures shows you now a little bit how Conceptually which are the important things that you need to stretch out. I wrote it At the bottom that this pictures only shows it for two sides If you know a little bit about quorum and things like that You should always go with three sides, but the third side is left out as left out as an exercise for for people but basically as I explained before We have all these ephemeral already horizontally scalable Parts of of cloud foundry. I mean they we can just deploy them I mean that we just need to replicate them as well and then the the the most tricky thing is is Is is is the CF state as well as the services itself? But there we're already running them in highly-awayable fashion within our stack so and Given that our stacks are very close to each another. We're just going to span them amongst multiple stacks Which then means that we mainly also need to make the network Within the orange box. We need it routable with the network on the other sides. So then the The access layer the one above is the one where we make connections Where we span the internal connectivity between the different side? And this is also where then we direct traffic to to the various stacks This is for example the outside traffic routing it to the right data center and the right stack This is something that within Swisscom. We're already doing for other projects So this is a standardized service that that we're able to set up and then we we can just leverage the things underneath and As I said before DNS NTP This is like usually things that you can anyway just use locally If they are highly-awayable within the stack, it's fine if if a stack it's if a stack is failing and another one Is not depending on it. I mean you don't need to span it anyway, so But as you might see We we really think of keeping For example, the stacks very individual. So we don't want to stretch open stack over multiple sites even As a multi-region approach with keystone and and then you suddenly have rabbit MQ also being stretched and Galera and and then comes storage and network and So there we just said no we just keep the stacks Autonomous so they can also fail on a on a per location site and if we if we look at that a little bit then People would usually say well, this is just an away ability zone within AWS so usually People are deploying cloud foundry over multiple away ability zones and yes It's it's more or less that and I will also pick that up later when when when to show you like for also other Things we develop to be able to do that If we're deploying and life cycling cloud foundry Bosch is is the tool to do it and So we said well Are we now going to split up our Bosch manifests Into two different Bosch manifests that we deploy to each stack on the other side But then like okay now we need to run updates simultaneously. We need to orchestrate that together. Well, let's build something like a meta Bosch and then Well, no, this just gets too complicated. I mean in the end and this is what I mentioned before Bosch is already aware of a way ability zones and it will it is able to deploy into multiple Availability zones the only difference here in our cases. Well, our away ability zones are to those totally different Stacks so it's two different clouds but yes in the end we're just deploying to multiple Availability zones and this is then where where we looked at. Okay, can we do that kind of deployment? with a single Bosch but deploying it against multiple availability zones that are backed by different Clouds and this is where we started working together with the Bosch team on What is called the multi CPI support for Bosch? It means Bosch needs to talk to different clouds it needs to upload the stem cells maybe twice it needs to create resources based on on their away ability zone and Yeah, essentially We're mapping the various resources were deploying with Bosch to different clouds over the Availability zones and then I mean within Bosch Itself Bosch already handles all the away ability zone stuff So it will just handle the deployment as we know and we do one on other single clouds, but multiple away ability zone setups given that we like one Bosch is able to To reach all the components and all the different clouds because we interconnect them with a direct layer free link The current status of that It's ready to be merged. We're we're discussing Like the the the last cleanup things there's a branch up within our github repository of Bosch This is the thing that is currently running in in our premises It's not yet officially merged, but it's heavily discussed with the Bosch team I mean also they already had some plans for that So it's not something where we had the idea and and and now we're just proposing it So it was really something we developed together with with the Bosch team It's as we this week as we discuss it still or as we present here It's it's being discussed with them. It will hopefully be merged At some time and then you also need some changes to the CPIs So The CPIs need to be now know a little bit more, but we were able to do that with backward Compatible changes. So the thing we did as we're Talking directly to open only to open stack is we implemented it with the open stack CPI and And So we're showing it how it will work with talking against multiple open stacks But yeah, so once people are implementing these necessary changes to also the other CPIs You will be able to talk with one Bosch to to open stack and AWS at the same time so That's then in my opinion going to be the real like multi-cloud experience with when deploying things These URLs here are up for for you to look to look at to to to give feedback Also, certainly to to tell people well well if you want to maybe try it out with your CPI you need to adapt your CPI There's the open stack CPI That that shows what you need to change like these changes We will propose these changes to be merged once we have Things merged within Bosch But for example one one of the nice things would be if the Bosch warden CPI would be next Because at the moment if you're running Arrants with Bosch Bosch spins up a VM this might take some minutes and then it runs a job For maybe 20 seconds. So if you could actually say well run all my errands in another way ability zone Which is pointing to the warden CPI then all these errands they could just run Locally on on the Bosch director or or so with where or where Warden is running and and it will would be very quickly because it's just a matter of spawning and container and destroying it So errands could become very fast to be executed This are kind of like the change the changes underneath that that we now did that we're now We're now rolling out things using that and we're certainly sure that there still will be dragons. We We don't think that this this will just work, but it's something that we said well We're going this way around because we think This is the way how How we want to stretch cloud foundry amongst multiple data centers But certainly the capferium will hit us sometime will make us trouble But we're certainly sure that we're finding solutions there Also, one of the things that we see is we do not yet have intelligent traffic routing Isolation segments might be something that come in the future, but at the moment we will still have lots of interTC traffic Because and the go router isn't not a way ability so unaware Also services are not like intelligent in Placeable like apps Like you cannot say well be rather placed there or here because also app placements and think this is things We we want to figure out whether it makes sense or not and then yes only I mean we have some experience when Cloud Foundry parts of it go down within a single stack. Let's see how robust it will be when When it's two totally or three different stacks Yeah, this is certainly also one of the dragon. We mind the counter, but to end we We really think that we should really build the platforms for cloud native applications cloud natives themselves because this will make Operating running them as well as easy as cloud native applications and with that I'm at the end and We're more or less right on time. So I don't know how much time we have for questions if We do not have but I think we're more or less done with all the talks so Questions and otherwise, please catch me later. I'm still here till Somewhere in later in the evening Thanks a lot for listening any quick question No, yeah Between the data centers without that Elegancy gets into your way It really I would say then it will really start depending on on the services for example like how efficient will Etcd slash console still be with the raft protocol How efficient will Galera synchronization still be and all these things this really depend We we We know that we're able to build it way Like to have the latency in between our data centers way below what what what should become a problem? But I know for example that people were building Galera clusters stretched over Europe and Switzerland is still smaller than Yeah, shouldn't be a problem We'll see okay Thanks