 All right, we're going to get started. Thank you for coming, everybody. It's great to be in Tokyo. So my name is Sunny Rajgopalan. I work for Plumgrid. And today, I'm going to be talking about building a scalable, federated hybrid cloud. The way I'm going to structure this presentation is that I'm going to start off by talking about a few use cases for what I'm calling the multi-cloud. And I'm going to talk about what a multi-cloud is. And I'm going to move to what I believe is a scalable way of making a multi-cloud manager that can manage this kind of multi-cloud. Then I'm going to end with a demo to prove that the stuff that I've been talking about is not just theoretical, but it actually works in practice. So let's say that you're in a situation where you have to deal with the complexity of managing more than one cloud. Now, this could be a mix of public and private clouds. Maybe you've got more than one private cloud. Maybe it's a mix of proprietary CMS and maybe OpenStack. So you're trying to deal with this whole mess that you've got. So let me see if I can figure out this remote. I'm just going to go with this. All right. You got the joke there. All right. So if you're in the situation, you might find yourself asking, how did I get here? Or if you're not here already, why would I want to get here? Now, if you were to ask your friends and family about this, they'd probably give you an answer that sounded like this. That maybe it's because you just don't like having a social life. Or maybe it's because you love complexity. Or this is the idea of fun. Now, let's leave the jokes aside. And let's look at what real use cases are for the multi-cloud. Again, there's a word again. I said multi-cloud again without explaining what it is. I'm using the word multi-cloud to just refer to a situation where you are responsible for managing more than one cloud. That's it. So the use cases for the multi-cloud or, as I call it, nobody builds a multi-cloud just for fun. So let's say that you've got an app that only runs on a certain kind of a CMS. I've given a few examples here. This is all hearsay. So maybe you're running SAP, or maybe you're running Microsoft Exchange, or SharePoint, or maybe you're trying to run Halo in multiplayer mode, and it's only certified to run in Zen. Maybe you need more than one cloud for disaster recovery. And why did you set up two clouds like this? Because you, sir, are smart. You know that one day, this is going to happen to your data center. And then everybody is going to be talking about how smart and how much foresight you had to have created a backup cloud. Maybe you're trying to reduce costs. And this is actually something that we hear often from customers as well. That they're running a proprietary CMS. It's really expensive. And when the CFO thinks off you, this is the image he has in mind, which is not that great for job security. So you've been hearing about OpenStack, and you've been trying to figure out if you can move some of your workloads onto your OpenStack cluster. And maybe do it slowly over time in such a way that people don't really notice. So you want to see if you can pull this off. Being able to migrate to maybe a lower cost CMS. Maybe you're trying to scale. Now by scale, I mean at least 200 compute nodes. Maybe 1,000, maybe 2,000, maybe even 5,000 compute nodes. Now you may ask, now why on earth would anybody need a cloud that's that big? And I don't know. Maybe you're trying to achieve world domination. Now the question here is, even if you were to believe the published numbers of your CMS vendor, even if your CMS vendor promised that they can scale to whatever number of nodes you're trying to get to, the question is, do you really want to put all of your eggs in one basket? Because when the controller for that one cluster goes down, you're basically taking down your whole 1,000, 2,000 node cluster along with it. So just from the perspective of having a smaller blast radius, what you should probably do is combine maybe a bunch of smaller clouds together to make a larger cloud. That's probably a better way to structure your cloud needs. Yet another use case we hear about all the time is that you're a company which is geographically distributed. And what you want to do is connect your clients with these servers that are geographically closest to them. Now this is usually achieved using something called a GSLB, which for those of you who are not familiar with it, it's just a fancy DNS with some load balancing characteristics and with some health checks built in. So the way a GSLB works is that when it receives a DNS request, it checks where the client is located, and then it responds with the IP address of the server that's geographically approximate the client. So what this means is that you could set up your two clouds in different parts of the world, and you can connect the cowboys with the cowboys and the Indians with the Indians. So let's say that you want to also do non-disruptive upgrades. And again, because you, sir, are smart, you instead of making this as just one big cloud, you made it as you made your deployment as two smaller clouds which are connected to each other. And now time passes, and then your favorite CMS releases the blue software. And you really love the color blue, and you really must upgrade your cloud to the blue software. So how do you do this? It's pretty easy if you set up your clouds this way. All you have to do is migrate your workloads from one cloud to the other cloud. You upgrade the cloud, which doesn't have any workload right now, to the blue software, and then you migrate your VMs back. And there you go. You've got a cloud running blue software, and you're happy. Maybe you've got many things in your IoT. IoT is also a buzzword that you hear these days a lot. IoT just means internet of things. And what we're really talking about here is that you've got, so until now, things were hard enough where you had a physical data center, and all you had to deal with was pesky employees with laptops who tend to VPN into your physical data center. But now we're moving to a world where your all-of-the-stuff is in the cloud. And it's not just employees with laptops that you need to be concerned about. You have to be concerned about things like the temperature sensors, or power meters, or rainfall sensors. All of these need to be put together to form one giant cloud. So how do you manage this? Now, before I get to this, there are a few more use cases that I haven't talked about, because really, there are many use cases for having a multi-cloud. People talk about cloud bursting, for example. That's where you treat the public cloud as an extension of your private cloud, so that when the load given to your private cloud exceeds its capacity, you can extend into the public cloud. And then there's also some companies which like to do what they call a follow the sun strategy. So if you've got customer spread all over the world, you can turn on and turn off clusters around the world depending upon the usage and depending upon the time of the day. So if you're still asking yourself the question, how did I get here? Why do I have a multi-cloud? Or if you're trying to figure out why am I trying to get here, and you don't know the answer yet, it means you haven't been paying attention. All right, so let's talk about managing your cloud. As I call it, how to keep sane at scale. So this is where I admit to something. This is where I admit to having a little bit of a bureaucrat in me. So I love documentation, for example. And that's a very strange admission coming from someone who works for a startup. So when I look at a multi-cloud, the things that I worry about are, how do I know what's going on in the cloud? How do I figure out what the status is? How do I monitor it for defects? And if something goes wrong, how do I troubleshoot it? These are things that you can worry about if you've got more than one cloud. How do I do inventory management? Now, you all heard of this thing called NFV. And NFV has what they call VNFs, which is basically specialized VMs. They call it VNFs, but they need a term for it. And if you've got all of these clouds, you're going to have VNFs with different software versions, and you need to keep track of what's running where and what versions you've got, and maybe trigger upgrades with them when the time comes. Maybe you're running OpenStack Helo in one cloud, and you're running OpenStack Liberty in another cloud. And you want to be able to keep track of this in a single pane. Then you've got to worry about global policy and configuration. So for example, if you set up a policy that the engineers should never ever be able to talk to the marketing people, which is maybe a good idea, you want to be able to apply that across all of your clouds. Then metering and billing. This is something that keeps the bean counters of your company very happy. They need to know they obsess about things like your link utilization and your CPU utilization and how much bandwidth they're consuming. You need to be able to meet all that and build it so that you can do capacity planning for the future as well. Now, you also need to be able to do things like event-based cloud migration. So maybe it's based on time of the day. Maybe it's based on a catastrophic event. Maybe you want to switch between the clouds. And you really don't want to be the guy who gets woken up at 4 AM to do this manually. So you want to be able to automate a lot of this as much as possible. So one way in which actually one of the only ways that you can keep sane at the kind of scale that you're talking about is by the use of templates. And the idea here is that you make patterns out of common deployments and common applications. And once you define the pattern for what your application is going to look like, you then instantiate that pattern many, many times. So the idea is define once and then instantiate many, many times. So for example, maybe you built an application for world domination yet again. And you could instantiate that into all the clouds you built all over the world to achieve world domination. OK, now let's talk about an architecture of the multi-cloud manager. There's another buzzword out there which I just slipped in. What is the multi-cloud manager? I'm assuming that if you've got a multi-cloud, you'll need to have some kind of software to manage it. That's what I'm calling as a multi-cloud manager. That's all there is to it. How do you do the magic? So now the question is, OK, there's so many controllers out there, Sani. Why don't you just use an existing controller? That's a good question. Nobody's asked it yet. It's because this is coming. These are all the things in the internet of things. I don't know if you would want to make your bike internet accessible, but I'm assuming at some point it will happen regardless of whether you want it or not. So you need to be able to scale to millions, maybe even billions of endpoints. And you need to be able to manage not just hybrid clouds, but even things that don't look like clouds. So if you want to be able to add your electricity meter into your private cloud, if that doesn't look or feel like a cloud, how are you going to do that? So controllers are not really built for that kind of thing. So here's some thoughts on how not to screw this up if you want to try to make this kind of multi-cloud manager. Number one is to be a manager, not a micromanager. What that means is that make the clouds do the heavy lifting. So if you've got a multi-cloud manager that's managing a set of clouds, you want all the heavy work to be done by the clouds themselves. You don't want to be in the position where the multi-cloud manager is doing anything heavy. And as an example of that, I would say don't go to the multi-cloud manager to validate Keystone tokens. So let's put that down and say that you're not going to go to the multi-cloud manager to validate Keystone tokens or to handle R responses or to do DHCP allocation, none of that stuff. All of that stuff needs to be done inside your respective clouds. You do, however, use it for configuration management. You need to be able to support multiple backends. And this is an OpenStack conference, of course. But OpenStack for a multi-cloud manager should just be another back-end. So it needs to have a kind of plug-in-based, pluggable architecture. So if you've been listening to me so far, you probably think that I'm now going to urge all of us to let's go make a controller. Well, here's what the timeline of making a controller looks like. We would spend the next two years writing the platform for the controller. We'd spend another two years making it highly available, but that I mean making sure that when portions of it crash, that doesn't take down everything. We would take another two years making it scale. So this is what it'll look like today. It's not a very flattened picture, but that's what I have. So six years of working on a controller will do this to you. So let's not make a controller. OK, now let's talk about whether is there a solution? So OK, Sunny, you said that you don't want to use an existing controller. We shouldn't be making another controller. Can we solve this problem? Well, yes, there is a solution to this problem. I'm coming to that. And you use them every day. They scale to millions of users and billions of endpoints. And maybe some of you already guessed this. Yes, we're talking about web applications. The load balance, the autoscale, they can be distributed geographically and they still play nice. Plus, the good thing about web applications is that you can build one in just a few weeks. Imagine if you were trying to make a website to sell Star Wars memorabilia and you told your boss that it would be ready in six years, you'd be laughed out of the room. And the good thing about building the multi-cloud manager as a web application is that anybody with web application development experience can work on it. You know how hard it is to find somebody with experience in working on any one of your favorite controllers? It's very hard. It's because they're very proprietary and they've got all these very unique. Each of them is a special snowflake. So you need somebody who understands that particular snowflake. Whereas for web application, there are armies of people you can find who know how to modify this and to add features to it. So this is what I thought I would do. I thought I would do an experiment and write the multi-cloud manager as a web application. And I don't want to worry about the platform. Why? Because worrying about the platform is a trap. It's a very costly trap. Because now think about this. When you want to write an app for a mobile phone, the first thing you think about is not, let me write an operating system for the mobile phone. You just pick a platform that you like and you go with it. The same thing goes with, let's say, Linux. If you want to write an application that goes on top of Linux, you don't build your own Linux distribution. You just pick Ubuntu or whatever best suits your needs and write an application on top of it. But we are still stuck in this world where people constantly want to write their own platform for solving these needs. There's no need to solve every distributed computing problem that has already been solved since 1980. I know it's a lot of fun, but you don't have to solve it. There's no money in that. So I decided to just use a pass, which is a platform as a service for those of you unfamiliar with the acronym. And the question is, which pass to use? There are a whole bunch of passes that you could pick from. You could have gone with Cloud Foundry or OpenShift, they're both very good options. There are also commercial passes like Google App Engine and Amazon's Elastic Beanstalk. For the Multicloud Manager, I actually went with a non-intuitive choice. I decided to do it in Google App Engine APIs. The reason is that, again, I could have picked any one of the passes. Any one of them could have done the job. The Google App Engine is one of the oldest passes. It's been around since, I think, 2009. That's the first reason. And the other reason is that there's an open source implementation of Google App Engine APIs called AppScale. And that's also been around since, I think, either 2010 or so, which is pretty mature. What this lets you do is that it lets you take your application, which has been written on using Google App Engine APIs, and run it on any cloud, private or public. So you can run it inside your OpenStack Cloud, for example, or you can run it as a hosted service in any of the public cloud. You can even run it inside EC2. So this is the architecture of what I'm calling the Multicloud Manager. There are a lot of confusing looking rectangles in here, but this is just, again, I remember it's just a web application. It's written on using a web app to framework, but that's just because that's the default framework that App Engine comes with. You could have picked Django or Node.js or whatever your favorite framework is. It doesn't make a difference. It's logically split into a top half and the bottom half. The top half offers a restful interface to the rest of the world. What it does is that it gives a generic object model of all of the objects in your universe. So it gives an archetypal server, an archetypal storage unit. It gives an object model for your switch, or your router, or your device. And when you want to configure any of these, you talk to the top half of the Multicloud Manager, which receives these as rest requests. And then the Multicloud Manager looks up the flavor of the zone that this target device is on and schedules the right bottom half. What that means is that you schedule the right bottom half plug-in, and they could have an open stack plug-in, or an IoT plug-in, or a physical router plug-in, or an AWS plug-in. Today, we already have an open stack plug-in and an AWS plug-in. What this does is take your generic object model, and it talks to the target using APIs that the target understands. And this whole thing is encapsulated inside the pass. So the platform APIs are just whatever App Engine provides. Now, because this was done using a pass, from day one, the Multicloud Manager supports a load-based autoscaling. It has a distributed database back-end. It's got memcache. It's got a web-based interface for viewing and monitoring database contents. It's got channels to send real-time messages. And I got all of this, in a sense, almost for free. I didn't have to write a single line on the platform. It just came because these are features that App Engine provides. So this is what the interaction model of MCM looks like. The basic idea is that you go to MCM and tell it that please apply this configuration. Now, the top half of the Multicloud Manager takes that and then passes it on to the right bottom-up handler. The bottom-up handler speaks the right set of APIs to the targets and makes it happen. And if you look at the examples here, there's a reason I have these specific examples. You could use this, for example. If you're, let's say that your OpenStack Cloud has Open Daylight as the networking plugin. And Open Daylight has this ability to connect to another Open Daylight instance in another OpenStack Cloud over an MPLS backbone. Now, as long as this feature is exposed through APIs, this is something that the Multicloud Manager could configure for you. And the same thing is for an OpenStack Cloud running Plumgrid as the networking plugin as well. So if you're running Plumgrid as a networking plugin, we have the ability to connect to other Plumgrid plugins on other OpenStack Clouds using VXLAN, or going over the internet, or even peer with AWS using IPsec and BGP. So all this can be set up using MCM. So let's go back to the question that we brought earlier, that how do you protect your cluster from Godzilla? So the first thing is, of course, you need to have two or more of your clusters. If you just have one, then there's nothing you can do. Now, remember, I spoke about MCM templates. What you need to do is templatize all of your configuration and then apply your templates to the two or more clusters that you've got set up. Now, when you've done this, this means that your configuration is basically identical across all of your different clusters. The only thing that you don't have to be concerned about from an MCM perspective is the replication of application data. And that's because most databases can be set up to do remote replication. So that's not something that the MCM needs to be actively involved in. And that's probably not a good idea for it to be involved in either, because this is high volume stuff, and you really don't want it to be transiting MCM at all. And really, really, why would you do this? The databases do an amazing job by doing remote replication anyway. So again, we're still following the less small approach where we try to do as little work in the multi-cloud as possible. So this is an illustration of what I just spoke about, where I've packaged all of my configuration as templates and I'm passing it on to the top half of MCM. That said, I've labeled this as Active Active or Active Standby Cloud. Now, and I've got two clusters out there which have been synchronized using the template instantiation mechanism of multi-cloud manager. And the databases have been set up to do remote replication. Now, whether this is Active Active or Active Standby just depends upon how you're steering your client traffic. If your client traffic goes to both of these clusters, then you would call this as Active Active. If your client traffic is only going to one of these clusters until that one goes down and then you switch it over to the other cluster, then you would call it as Active Standby. But really, that's just how you configure your load balancer. So this is what I already spoke about, that the apps are responsible for synchronizing multi-time databases. And then once you've got all of this set up, when the day arrives, that your cluster gets caught in the war between Godzilla and Mecca Godzilla. And the cluster on the left goes down. You're still good because your configuration was already persisted by a multi-cloud manager. Your application database was already synchronized. And how do you do the switch that can be done using either the GSLB or load balancer? Both of them have got health checks built in so that if one cluster goes down, usually it'll know. And then it's able to switch over to the other cluster. All right, so let's talk about identity management or how to do authentication and authorization in this big new world. Now, before I launch into this, I have to give this warning. This stuff is very boring. There are a lot of details. And I'm going to try to distill all the details down into a couple of slides. But it's probably still too much information. Let's see how we do. So that's a lot of words. That itself means that there's some complicated stuff there. Basically, the authentication and authorization is done at the periphery of the cloud. So when you talk to the multi-cloud manager, then at that point, you authenticate the user making the request. You check to see if he's allowed to do what he's trying to do. And then within the cloud itself, all the different elements of the cloud talk as privileged users. So basically, the idea here is that you secure a perimeter. And inside, they talk to each other as privileged users. Now, the MCM can use an external IDP. Today, we've got support for all of these, actually. OAuth, SAML, LDAP. So it can interface with an external IDP, which is a good way to do this. You really don't want your multi-cloud manager to be in the business of cycling user passwords and checking for password strings and things like that. You've already said those policies for your organization. You should just leverage that out here. So I'm going to walk through the steps needed in server creation. And just to illustrate some of the points we talked about earlier, this is probably where some of you are going to start to zone off. But I'm going to do this anyway, since it looks like I might have a little bit of time. So the call to create server comes into the top half of MCM. So MCM then redirects the request to the IDP from which MCM gets the user and the group associated with that user. So the IDP is responsible for IDP for those of you who are unaware is the identity provider. That's a module that you go to for authenticating an end user. So you come back from the IDP with the user and the group. Then there's an assignment module in which you then assign the user and the group to a role in a tenant or a domain. And then you finally check the authorization policy to check to see if that role is allowed to have access permissions on the object that it's trying to access, access or modify. Once all that is done, this is what I was talking about authentication being at the periphery. Now the bottom half talks to the respective targets as a privileged user using tokens. And then the rest of it is just usual open snack stuff. That probably wasn't really very clear, but just come talk to me later. OK, that's just in time for the demo. Let's now prove that what I just sold you wasn't a bag of manure. So a bit of introduction of what I'm showing out here. So I've got two clusters out here. One is an open stack cluster, which is running Plumgrid as a networking plugin. But again, going back to what I said earlier, that is not important. It could have been ODL. In fact, one of the clusters didn't even have to be open stack. But for the purpose of this demo, there's one open stack cluster running Plumgrid networking. The other cluster is AWS. And what I'm going to do here is using the multi-cloud manager. The Federation manager is just the name we use internally for it. Using the multi-cloud manager, I'm going to connect these two. I'm going to get these two to peer. Not just that, I'm also going to set up servers on both the sides. I'm going to set up the networking on both the sides. And I'm going to do all of this using templates. So you can see how templates are a really powerful mechanism for bringing up your applications. So this is a recording. My apologies, I couldn't do this live because I wasn't sure if we would have connectivity to AWS. So what we're doing out here is telling multi-cloud manager about the zones, about what the flavor is, and how to connect to them, et cetera. And what you're observing out here is a Swagger interface. This is not the UI for the multi-cloud manager. We are still working on the UI for it. But Swagger is a great way to exercise the APIs in a UI-like fashion without actually resorting to scripts. So there you go. We just told, first we set up the OpenStack cluster with MCM. Now we are setting up the AWS cluster, which you can see. We just called it AWS Center Cloud. That's the name that we gave to it. And this is the Plum Grid UI to the OpenStack cluster. It tells you that there are no networks. There's nothing configured here at this point. It's empty. We've told MCM about it, but we haven't yet configured anything. So let's go ahead and do that now. And these are the other templates. I'm just giving an overview of some templates which were already set up. And we are going to now instantiate these templates. By template instantiation, again, what that means is that you take a template that already exists and then you publish it to a target. This is similar to heat, but also different in the sense that heat requires you to, for every target that you want to use in heat, you need to make a back end in heat. Whereas the templating mechanism in MCM is more general than that. It doesn't require you to create a target back end for everything that you want to create a template for. It's more general than that. Oh, there you go. We just instantiated the networking template. And now you see that over on the Plum Grid side, this is a pretty complex topology that we've made. This is, I guess, we are not showing the open stack view of this, but that's a mixture of switches and routers that you see out there. All right, so now we're going to go and instantiate another template. If I'm not mistaken, this is the AWS networking template. Let's see. All right, now that is probably the server template. So we just launched a VM using a yet another template. You see the image name out there? That's the, so we just launched a server using the template. All right, now, all right, we just created a VPC, again, using a template. And the VPC with the long name that you see out there, there's a certain syntax to the name. It has the operator name and the tenant name and the region name, et cetera, et cetera, in the path for that. But that one with a long name out there is the VPC that we just created, again, using a template. All right, proceeding along. We just launched an instance using a template. Let's see if it's getting spawned out here. There you go, there's an EC2 instance that just got spawned again using an MCM template. Now, this is the dynamic router out on the networking side, and these are the routes that it has. Now, all the IP addresses you see here, there's a 1.0, there's a 2.0 address. These are all IP addresses which are on your open stack side. So at this point, this is just to illustrate that you now have your two clusters, and that's the IP address of your VPC. It's 10.30. So you now have two clusters which are isolated from each other. They're not connected to each other yet. You just created the networking side and the compute side using templates. Now, to prove that the two of them aren't connected, we of course have to do a ping test because how else will you believe me? There you go, it doesn't work. All right. So now, let's try to connect these two clusters together. And we'll do that again by coming here. We're going to create what we call as attachment points. Before we do that, let's look at the route tables on the VPC side. Yeah, those are the routes. That's just the 10.30 route that you saw earlier. There's no route for the open stack side. So the open stack IP addresses, the 1.0 IP addresses and the 2.0 IP addresses, they're not there on the VPC side yet, of course, because we haven't connected the two of them yet. So now what we're going to do is we're going to create this object called an attachment point. On both clusters, we're going to create an attachment point on the open stack cluster, and then we're going to create an attachment point on the VPC cluster. And then we're going to create a link between the two of them. The idea obviously is that once you create a link between them, the two of them are going to be able to talk to each other. So that's creating the attachment point. And an attachment point, if you use, it's like a peering ID. It's a place where others can come and connect. You saw some BGP as an information out there. That's because to peer with Amazon, you need to use BGP and IPsec. So when you set up this connection, you need to make sure that the information is consistent on both the sides. So now we're creating the attachment point again on the Amazon side. All right, that's done. Now we need to create a link between these two attachment points. So remember, we've gone through the whole gamut of bringing up your whole application, setting up the servers and the networking, and even connecting two clouds together in the course of this short 10-minute presentation. So it's really not that involved. You just give your name to the link and telling it what's on both sides of the link. Okay, so now let's see if we had any magic. Coming back in here, you'll now see that we've got a couple of tunnels set up. These tunnels are necessary for your Amazon, your AWS cluster to appear with OpenStack. And all right, what are those IP addresses there? 1.0, 2.0, those look very familiar. Those are from your OpenStack side. And now let's look at the routing table on OpenStack side. Again, we're looking at the routing table of the dynamic router here. There's a 10.30 IP address which wasn't there before. And that is the IP address of your VPC cluster. So now that both sides have exchanged routes and you've got connectivity between them naturally, let's check if it works. What is that IP address again? 2.30, yeah. All right, so now we've now connected to two clusters together using MCM. All right, so before I came here, my five-year-old son asked me to show him the slides of this presentation. And I quickly went over them and then he asked me, Papa, why do you have Godzilla in your slides? What do you want people to learn? So all right, so I have to answer that question. If, what should it key takeaways from this be? One, you don't necessarily need a controller for solving all of your problems, you know? Try to use a web application if you can. That's it. All right, thank you, folks.