 Well, good morning everyone. My name is Alan, Alan Moran. I work for a company named Althoros and I've been working in Cloud Foundry for the past two years. Today I will be delivering to you a talk that was prepared by my friend Alexandre Lomov. Unfortunately, he couldn't travel today. So today we will talk about managing multiple clouds with only one Bosch installation. We will particularly tackle two questions here, how we can manage Bosch deployments on a single multiple regions and how we can manage Bosch deployments on multiple clouds. So let me say a few words about Althoros. Althoros is a consultant company. We provide solutions from the Cloud Foundry ecosystem to our clients. We are in various locations around the world. So how we started working on this? Well, a client came to us. A health service provider came to us that we've been working for them and they wanted Cloud Foundry. But they wanted to have Cloud Foundry all around the States and they wanted to be able to spin these clusters on demand and that was not the only concern that we had. They also wanted us to be able to deploy those Cloud Foundrys on multiple infrastructures. So we said, okay, we'll look into this problem. The community has been working on similar problems. We will look and we'll start working on a proof-of-concept in our own data centers. We have offices in Minsk and a cloud provider in Minsk and a cloud provider in the States. So we said, okay, let's start working on a proof-of-concept. But why? Why would we want, in the first place, having multiple Clouds deployments, which would be the reasons that we would want that? Well, to start with, we want to be as close as possible to our clients, right? We want to be as close as possible to the user. We want to have our cluster physically close to them. In this way, we will have better response time, less latency, less probability of package losses. And having multiple clusters also would bring us a system with fault tolerance. But let's think for a minute. What about Cloud Foundry? How come Foundry works over one? When there are serial issues like Cloud Foundry is not prepared to work over one out of the box. I would just mention a couple of them. For example, the UAA Cloud Controller. For those who don't know what I'm talking about, the UAA Cloud Controller has two components of Cloud Foundry. The UAA would manage the users and authentications and role. The Cloud Controller would basically manage all the rest from apps, services, spaces, or everything. And these two components persist data on SQL solutions, such as MySQL and Postgres. Now, if we would want a Cloud Foundry deploy on multiple clouds, we would have to make this component this SQL solution to work over one. So that's the first problem that we will have. And another problem that I can mention is NATS. For the ones who haven't heard about this component first, this is the component that allows all the internal communications that happen within Cloud Foundry. And the problem of this is that NATS is not optimized to work over one. And NATS, each node of NATS, nodes about a group of nodes of other nodes. And it is not prepared to work over one. So if you want to take risks and we find a solution of a SQL database such as RDS that actually would work on different regions or distributed in a distributed way over one, I will say, okay, let's try to do the deployment. The first problems that we will find would be the internal communication between the components. We will write into timeouts, right? For example, we will have unpredictable things happening, right? We could lose packages such as Harbit and an app could be relaunched or the app staging process could have problems too. So we went forward and said, okay, the solution would be to replicate the same Cloud Foundry installations on each of the regions or clouds that we will be. Together with this, we would use a GeoDNS. In this way, we would distribute the load between the different clusters and make sure that the closest cluster responds to the closest client. And then we will leave the app deployment process to be distributed on all our clusters. Then the app developers will need to find a way of their apps to actually work over one if they need any storage solution at all. But this will be the decision that we will tackle. And to apply this solution that this is what the community and we personally at Altos have been doing so far whenever we have multiple clusters, we always need at least one Bosch installation in each of the regions or each of the clouds that we are at. So we started thinking of possibilities. We started thinking how hard it would be. We know how Bosch works behind. How hard it would be to do some changes on Bosch and be able to spin up multiple clusters from only one Bosch installation. And why would we want to do this? Well, this client in particular wanted to have different clients to spin up on the band. And it would be less hard for us to have all the synchronization in one cluster. And also it would save us DevOps time and resources in particular because we will have only one Bosch. I would prefer personally as an engineer to have a really robust Bosch installation instead of having multiple installations on each of the regions. So we went ahead and we said, okay, let's first start this proof of concept internally and see if we manage to do it. So how we did it? Well, first to understand how we did it, I will explain briefly how Bosch interacts with the different cloud providers and show the different changes that we had to perform for Bosch to be able to do this. To simplify the scenario, whenever we do a deployment or we update the deployment, we have three main inputs. One is the release that is basically the recipe. It's everything that we need to run our software. We have a manifest, which is a descriptive file that describes the properties and the desired state of our cluster. We have an instance, which is an image that is prepared to run on this particular provider that we are working with and also holds a piece of software that will let Bosch later communicate with these VMs that we get created on our cloud. So these inputs arrive to the director. The director is responsible for concentrating all deployments in Bosch and all updates on the whole life cycle of our deployments. The director talks with the CPI, and the CPI is a particular component that is developed one per cloud, and it lets Bosch be agnostic to the cloud that we are. And finally, we have the cloud API that this would be like the AWS API or the OpenStack API that we're actually talking to. Now, for Bosch being able to work asynchronously with deployments on multiple clouds, we needed to do some changes on Bosch director so they can hold in one instance of the Bosch director, it can work with different CPIs at the same time. That was the first change that we needed to do. Then we needed to address the inputs. We have three different inputs here, which are the release, the manifest and the stem cell, but unfortunately only one is cloud agnostic. The release is the only input from our system that is cloud agnostic. The manifest holds information about the cloud, and the stem cell is also related with a particular cloud provider. So let's start by addressing the manifest. There's, luckily, we didn't have to do much work because on recent release of Bosch, there's a cloud config feature where we can extract all the cloud-related information and put it on a separate file. This is an example of how the cloud config file would look like. And together with this feature, they distributed on the key the options to be able to set and output what we are actually having that file. So we have our first problem address. We would separate the, hopefully, all the cloud-related information about our manifest in different cloud configs. One per each of our infrastructures. Now we still need to address the problem on stem cell. This is not actually a problem if we would do a serial deployment. We would first work with one cloud than with other, but working on multiple clouds, we would need to have a reference to it. So to understand how this stem cell gets to the cloud API, I would describe this process briefly. Bosch director sends the image file to the CPI. Now the CPI is irresponsible of talking with the cloud provider. Here the CPI sends the file of the image that we have to the cloud API, and in response from the cloud API gets a reference, a reference in the domain of the cloud provider that then later Bosch would use to spin up machines. So then this reference ID comes back to Bosch and is stored in Bosch director database. To be able to perform this, we did some changes. We were able to be able to tag the the different stem cells that we want to get to the system. So we used this cloud tag in this example to tag the different clouds that we're going to work with. Or in the current situation, we could do this here. We could just update the cloud configs and upload the different stem cells so that Bosch makes the different changes between the different CPIs that need to talk with each of the cloud providers. So let's make a quick recap. We have everything ready, right? We have the manifest, ready, we have the different cloud configs, we have the stem cell, we have the release probably upload or ready to Bosch. We have everything ready to do the deployment or an update with Bosch. So we will go briefly through the process of how this works and which are the modifications that we need to do to support these multiple clouds. The first two processes that happen are the binding deployment process and the plan creation process. During this process, Bosch receives a manifest and it links all with what it knows that it has in the database. It finds the deltas between the desired state and the actual state, for example, like the amount of machines that are actually created, see if the jobs that I'm planning to deploy are the same amount and finds the deltas and while doing this, it creates a deployment plan. So we have one problem here, how we can do the binding for multiple clouds. So for this, we had to actually do some modifications in Bosch and support a new entity which is cloud. This is the database model of Bosch and we added the cloud entity here. Next part, package combination. For those who don't know what happens here, it basically grabs the package that hasn't been compiled for a particular stem cell and on a particular cloud provider and it compiles this package. Let's go through the process briefly to see what happened behind. So Bosch director talks with cpi, cpi talks with the cloud API and they create a VM. This VM is used with the stem cell that was previously uploaded and it will compile the packages from our release. These packages later get sent back to Blofzor so they get downloaded from Blofzor, pre-compile and then get uploaded to Blofzor, compile now. So the next problem, how we make this Blofzor accessible from multiple providers that deploy maybe in many regions or not around the world. Well, there are a couple of options that we analyze. We could have like separate Blofzors and sync them. We could have separate Blofzors and compile on each of the Blofzors only what is related to its own cloud. We could use an external Blofzor public and everyone could point to that one. Or we could use what we did actually as our solution, which is use the current Blofzor and establish VPN connections from where we have our main Bosch server to all the other clouds. So the last problem, creating jobs in VM. Let's see what happens here. Well, when we create a VM, we first send some agent settings that this will be, we will put inside the VM and the stem cell ID. And the CPI later talks with the Cloud API and creates this VM. In return for the Cloud API, we get some information about this VM, like for example, IP, so we know where we actually need to look for it. Now, CPI sends somehow depending on the different clouds some initial configuration, initial settings to that VM through the Cloud API. The VM and the Bosch agent, which is like a piece of software that lets that VM talk with the Bosch installation, receive these initial settings and from it are able to talk back with Bosch. And they do this through NATS. NATS is back in the picture here. Now, it's in the context of Bosch. So we will have only one NATS or maybe highly available deployed, but they will be on our own installation of Bosch. We need to make NATS accessible somehow or be able to perform these processes at least. Then with NATS, it receives back the information on what it needs to run. The VM, it's identified and it's assigned of a resource pool. And it brings all the packages and shops that we run to start the actual system, the actual deploy that we're performing our update. So, again, we have this problem. What we do in NATS, how we establish this communication being on multiple clouds. So, there are two options here. There's the HTTP message in BAS, which is one option that doesn't use NATS, which is the one that you apply when you deploy a micro Bosch or when you do a cloud in it. Or the second option that you have is establishing a VPN connection. This is the one we took, similar to what we did with the Blobster solution. We established a VPN connection from each of the clouds, providers that were connecting to our main Bosch installation. So, this is a picture of how it works. We have a VPN server on the data center that is connecting to our main Bosch. Now, to recap, we proved and we finally got it working. Our two data centers, one OpenStack on Minsk, one AWS deployment of Cloud Foundry on the States, working with one single Bosch deployment that was deployed on Minsk. We got this working. Unfortunately, I don't have time for demo because this process takes a long time. But this week, we'll be releasing a blog post. We are looking forward to work with the Bosch team and see if this is of any interest to introduce to the main branch. In the blog for that, please check this website during this week. There will be a video of how this is actually working on our clusters. Furthermore, we will work on applying this solution for the client that we've been working for. Finally, let me say some words about Bosch. We proved that Bosch is extensible. We proved that Bosch is a powerful tool and it's easy to work with. We are really thankful for Bosch team for making such a great tool. Again, thanks to LOMOV. Are there any questions you guys might have about this? Yes? Right. No. You mean what do you ask in particular? How do I keep absencing? Well, we haven't implemented a solution for that, but we can work on a plug-in and just make sure that it manages the different reference to the different Cloud Foundries. Then on my experience, the problem is itself how the apps would work on multiple Clouds, right, which if they need to search any type of data. Yes? How do I manage it? Yeah. Well, that's more on the developer side, like how you will develop your app. First of all, you will have to find solutions that work over one if you want to have sync your data layer or at that level. Then it depends which type of apps we are working with, right? Oh, you mean like how this actually works in regards to latency, these VPN connections that we establish? Well, at least we are still making tests on how it doesn't work in regards to the lifecycle of the deployment and using features such as the Resurrector. We haven't worked on these things yet, and we are still testing this. Again, this is a perfect concept, but at least the latency is enough for us being able to do performance deployment, let's say. Yes? Yeah, actually, we did consider using it, which is when this path, because we implemented already the VPN solution for NATs, so we said, okay, let's just, we have it there, and we just applied, but yeah. Yes? So you are saying the question is how do we manage data isolation from the app side in different clusters? This is correct? Well, that depends if you really want data isolation. That depends of how you want each of the clusters to work. If you want like always same clients will address through Geo DNS to the same clusters, if they are located in the same place, but if they move around, they might be addressed to a different cluster. So that, I don't know if you, again, that depends of the use case that you're working with, if you need data isolation at all. Yeah, yes? Well, I'm actually, I haven't worked much on implementation, unfortunately, but what we did is add on the domain of the Bosch database, a new entity, which is cloud, so we can reference to that one. If we have a way to speed up our orchestration process? Well, no, but by having only one Bosch, it's already speed up this feature that our client wants in the very beginning, which is having clouds on demand. Having one Bosch, we don't need to go there, they spin up a micro Bosch, spin up a Bosch, and then do our deployment, which is have, no, there's no constraints in regards to scaling, as long as your Bosch still has connection and you still have resources in your, in your cloud provider. We have only done simultaneously and asynchronously deploying two clusters of OpenStack and AWS. That's what we have done so far. We haven't done further tests, yeah. Well, that's, yeah, no, we haven't worked in any of those, but the GDNS, if it's, it will redirect to the other IPs if it's down. I understand what you mean. Well, if any of the clusters fail, then there will be the others providing at this point. We haven't made a solution for that yet. Yeah. Yeah, the cloud config. Yeah, we talk about that. It's actually a feature that we make use of and it's one of our issues actually. It's great. Are there any other questions? Yes. Bye. I didn't hear what you say. Sorry. Not yet. We are, we are, we are, we are playing on, on doing it in the next week or two. We want to, we want to put this public and we'll be together with the blog folks that we are, that Lomov is finishing working on. Together there will be a video demo that, that shows all the different customizations that we did in Bosch Cli to interact with clouds and yeah, it will be, all the information will be there. Okay. Thank you very much.