 Hello, this is too loud, okay. So hi everyone, thank you for joining the multi-site session, we're going to talk about tricycle today. If you see me going into palpitation on the stage, don't worry, I have stage fright. So today with me is supposed to be my friend Pino, thank you, there he is, the CTO of Midakura, and what we're going to cover today is a deep dive into tricycle. So yeah, no, I'm trying to move the, there we go, okay. So before I dive into the technical details, first of all, some motivation, why do we care about multi-site, why do we want to manage multiple open-stack clouds at all? So first thing that pops into mind is geo-redundancy, right, you all just thought about geo-redundancy, no I'm kidding, it's just a fancy way of saying disaster recovery. So normally when we're talking about multiple clouds, that's the first thing that pops into mind. Second thing is service locality, so we want to be able to run virtual machines close to where they're needed for low latency or sometimes for legal reasons, we want to keep our data on a certain region and not migrate it to places where they do not comply with the law. Of course cost optimization, it can cost less to run in certain times on one site rather than the other. We get use cases in from the opnv workgroup, this is basically about how to manage virtual network functions, I will not cover that at all. You've got the classic cloud bursting scenario, right, when I have, my resources are dwindling on one site and I want to start spreading out to other sites and this really becomes even more interesting in the hybrid cloud scenario where I have my local OpenStack and I want to burst out into Amazon, but what we really want is to do this using OpenStack APIs, so we actually, this is not just theoretical, we have a project about this as well and my colleague, Geshe-Galore will be giving a session today in this room at 530 about that. Okay, so you understand what the use cases are, what are the requirements? So we want to be able to have a global resource management, all right, I want to provision virtual machines across sites. We want single resource utilization dashboard, basically I want to see the statistics of my sites in a single place, I don't want to go jumping around between dashboards. Perhaps more interesting is cross site virtual networks, I want the usual L3 peering scenario management and perhaps in some use cases I also want cross site stretched layer 2 networks. Now this does not mean that we want to start propagating broadcasts across the WAN, but it does mean that if I have the ability of taking a virtual machine from one site copying it to the other and I don't have to start modifying it internally changing the IP, the MAC addresses, that could be really nice. To provide this, of course, prerequisite is identity management, right, single identity management be it using a federated keystone or a single stretched keystone, either way this is required. More technically we want to use the OpenStack APIs. Two years ago maybe everyone was talking about the AWS APIs, today we're actually want to use OpenStack API to manage all clouds. We want to manage this as an aggregated resource pool, right, a single resource pool from which I can carve out resources regardless of their physical location. We need to minimize traffic over the WAN, cost reduction, better performance, things like that. Of course our global management system cannot be a single point of failure so service continuity and we want the user experience to be comparable with single site management. So to solve all these things we've introduced tricycle. So what is tricycle? Tricycle is a project meant to do the management and orchestration for multiple sites. As you can see we have OpenStack APIs at the top, you can't see it, never mind. The users interact with a single OpenStack API, single dashboard. It goes to a top site or a top management layer which then distributes the requests down to the separate sites. Now down below you can see we have disparate resources in the sites as well as cross site resources like what you see in purple which is a single network cross sites. From the user perspective the experience should be pretty much the same. The only difference is the fact that when you're launching an instance you have the data center drop down and you can see that here launch instance has data center which is new. Other than that everything should be the same. If we take a look at all our instances you see under availability zone you also have the name of the site in which that availability zone is located. Okay, take a deep breath, we're going to dive into the technical details. So we've been running this project for nearly two years now. This today is running in production in Huawei web services, the Huawei public cloud. It's running in production in other areas and we've really learned a lot in these two years. So what we did is we started with the concept saying we want to take OpenStack itself in order to manage OpenStacks. This is a nifty idea. It's not revolutionary, it's similar to the way Nova used to manage vSphere as a single compute node. So we have a single site aggregated and abstracted away using a single compute node. This works but this is what we've learned. OpenStack is really single minded. It was built, designed and implemented to manage a single site. So introducing multi site concepts into a piece of code that was designed and developed for a single site requires a lot of creativity. You need to take care of things like consistency and atomicity. Your bottom sites are not homogeneous, right? I can have OpenStack, even different versions of OpenStack while I'm doing upgrades, right? So one version here, one version there. Or I can use the reference implementation for neutron in one site and ODL in another. Let's say I have size considerations, etc. So all these things really require some kind of a leap. So what we've been doing in the past few months is we've been working on an experimental architecture. Now, what we want to do is to learn how to improve the current solution we have to be even better, right? It's already running in production but we want it better. So what we said was, okay, if we had a clean slate, what would our architecture look like? And then how do we evolve our solution to be this architecture? So of course, we're still providing OpenStack APIs. So we've got unmodified API management layer at the top. But from there on, instead of reusing Novine, the different components, and intercepting the requests down at the compute node, we've actually immediately intercepted the requests here in the adapter layer, which I will explain now. So what does this layer do? Well, what it does is it intercepts the requests and it abstracts away. It's a stateless layer that abstracts away what I present to the user or to the administrator and the way I manage it at the bottom. So today, if you go to the dashboard, you can see you have multiple compute nodes, right? And they represent, in the existing architecture, each compute node represents an entire site. Now, we may not want that. We may want to manage at the top availability zones or even drill down to the level of a single compute node on each site. This flexibility is important because this also determines the granularity of our dashboard reports, right? So really what the adapter is, is an abstraction layer from what we're presenting to the administrator or the user and how we're managing it underneath. So then the next part is the Workload Distributor, this piece here. So once we intercept the request, we rebuild all the information that we need. We add the site information, etc. We pass it to the Workload Distributor. This is not a full-blown service. All this does is distribute the requests or push the requests into different queues. The way we build the queues is again a deployment issue, not a design issue. Why is this important? Because this determines how many top services we have. So basically, in a very simple implementation, we don't want to come and install 10 nodes just to manage a single or two sites. So we can start with a single global queue, a single top service, and that top service manages all the bottom sites. Some nomenclature, right? We call the top service top and the bottom sites bottoms. We thought it's pretty simple to understand this way. I could just as simply go back to the view that we have today, write a single service managing a single site. But because this is deployment, I can split my requests across tenant queues and scale in this way. Now, this layer really gives us flexibility in how many top services, how we scale the top services, okay? So this becomes a deployment question. So what does the top service do? Basically, something that I have not told you up until now, when we run create network at the top, we usually don't pass it down to the bottom, right? I don't want to create a lot of resources down at the bottom until I really need them. So when the user comes to run a virtual machine in a certain site, that site may not have the networks, the virtual router, lots of things that are required in order to fulfill this request. So a simple create VM command becomes now a transaction that needs to be managed. So the first thing we need to do is build the dependency list, right? To understand what the list of operations are that we need to run. So that's the dependency builder. Then the job builder takes all these operations, builds a job description, right? And passes it down to the bottom side to be executed as an entire transaction. We don't want over the wan to start passing create network, wait, create virtual router, create port, etc, right? We want to pass once a bunch of operations to run, manage the execution locally, failure is less likely because there's no wan in the way. And then just let me know the final result. Did it succeed? Did it not? If I'm really interested, I can pull in the middle for status updates. I will not go, due to lack of time, I will not go into detail about the consistency monitor and the data access layer, but they're there as well. So that leads us to the bottom service. Once it got our request, right? It needs to be executed. Now, we did not want to reinvent the wheel and start managing transactions and job execution, etc. So we introduced a pluggable workflow engine. This can be Mistral or Taskflow or NLEE workflow engine of your liking. We don't care. We pass it the job to be executed. It manages the execution, really. If you think about it, most of the load resides here, right? So in this architecture, most of the onus of the or the burden of the work is here. And once it's finished, it can notify us of the result. Another thing that the bottom service does is monitor changes on the local site. So we don't want to go polling across the land for changes. So we've got this local service, which is standalone on the side of OpenStack, and checks to see over the land what changes have happened since the last time, creates a diff and then only sends changes across the land up to the center. Now, locally, it can do polling very frequently or publish subscribe model where it gets notified of events and then passes them, caches them and passes them on to the top. So if I now have network failure, I can then later send the changes to the top. So basically that covers my part. Again, as I mentioned before, at 5.30 today in this room by chance, my friend Eshed Galore will be presenting the hybrid cloud solution, which is based on this solution. I actually, when I'm nervous, I speak quickly. So I'm actually a bit ahead of time. So any questions about this part before I pass the mic to Pino? Perfect, yes. Okay, so now you're diving into the hybrid cloud solution. You will recall, however, that we want to manage everything like OpenStack. So we want something at the bottom that speaks OpenStack APIs. We've got adapters, Eshed will cover this today at 5.30. Other questions? Yes, I can't see there, okay, you? Okay, so this is a very good question. Consider this, I want to provision a network on one site and the same network on another site. The network ID given here is not the same as it is given here. So I need, first of all, mapping information at the top, right? Second thing I need, basically, I need a global ID saying, this is the network ID here and this is the network ID here, right? And then if I want to do L3 or L2 peering, then my border gateways, which Pino will touch a little about, need to know, what's the global ID to map to, right? I get something on tunnel X, change it to tunnel Y, then the other sides get that and changes it to the local tunnel ID. Another use for a state, so IPAM, right, IP address management. I have, in the L3 use case, I have two subnets and I need to make sure that I do not have the same subnets in different clouds. Same goes for IP addresses, if I have a stretched layer two network, then I need to make sure not to allocate the same IP cross clouds, right? So there is some state, we're trying to minimize that as much as possible because state is evil. Other questions? Yes, there. So I cannot give you numbers about network delay, but what we're doing for network failures is very similar to what OpenStack does today. First of all, no, no, I don't mean we ignore them. Okay, maybe that, that was a wrong example. Okay, so what we do is we persist, first of all persist the operation that wants to happen, that is about to happen in the top database. We are actually reusing the OpenStack API data model, right? So we are letting Nova and the other API services update the database by themselves. We are reusing the code in this way. So we know that an operation needs to happen, but in cases of failure, we need a healing mechanism, right? So we need something at the bottom, I don't have it here in the slide, basically we're monitoring and because we're performing complete transactions here at the bottom, we can roll back or roll forward to a consistent state here. And then notify of the change to the top and let it synchronize what the new state is. Other questions? Yes, fantastic question. Okay, so the question was how do I manage cells, regions, things like that? So first of all, regions is an unimplemented feature in OpenStack. We need to populate region information in Keystone, that's part of the requirements. With regards to cells, so cells were really built with different assumptions than multi-site. Cells assumptions are I'm in the land and I'm completely sharded, right? I don't have cross-cell resources, they are different networks. They're different network topologies, right? Because I have two neutrons, things like that. So really cells took some simplistic assumptions there, which is good for them and then they can just run as separate things. For us, a cell could be an availability zone or anything you want. You remember that from the top, we can present it however we want. So instead of sites, compute nodes and availability zones, you could have cells there if you'd like. Other questions? So Pino is actually the expert on all things networking, so I will let him reply on that later after he goes through his part. Let me pass the mic and then if you have more questions later, I'll be happy to answer them. Thank you. I actually don't touch on Layer2Gateways in this talk, but let's talk about them when we've gone through this. So it's great to be here, thanks everyone for coming. So I want to talk about networking from the point of view of the applications. And then we'll touch a little bit on implementation in Tricircle, but right now I really want to take the point of view of the application and ask, what do we do with a cross-site network or what does it mean to have a cross-site network? So the ideal situation would be that network objects would be cross-site by default and that means that starting with your Layer2 network, it's extended. It's a stretch Layer2 network across two sites. So I'm just showing here that we have subnet 1, 2, 1, 6, 8, 10 and we have a VM on the left, some VMs on the right. And you can see right away, and this was mentioned before, that you're going to need IPAM here because you don't want to allocate the same IP address as on the left as on the right and IP address management is single site. Although the orchestration, we can pull that into the orchestration layer. And then you have IPAM anyway. So that's the first thing. Secondly, if you send a packet across just by having a stretch Layer2, you hit the port level firewall, the neutron security group. Now, are the rules in that security group configured the same way as the rules in the VMs on Site2? Only if you have cross-site security groups. So right off the bat, if you want to do cross-site application deployment, we need to have these features in the cross-site implementation. So IPAM and security groups. Now, before I go on, I just want to point out that why do you want cross-site L2? Because people sometimes say, I want cross-site L2. And other people say, I can live without cross-site L2. Cross-site L2 gives you, perhaps, VM migration. And you might or might not want to use that. It depends on your bandwidth. And then you want to have symmetry in your deployments. You want to use the same IP addresses on the left and on the right, the same configuration, the same heat templates, and so on. And especially if you already have a whole bunch of orchestration and you just want to use it in both sides, that kind of symmetry that L2, stretched L2 gives you is very nice. And then, of course, someone mentioned partitions. So in the data plane, absolutely you have to be aware that it's not single site. We have partitions in the data center as well. But you especially have to be aware of them in multi-site. So if you design for multi-site, even though you have stretched L2, you're still going to be aware of what you run where. And you're going to make sure you're somewhat symmetric and you're not using the WAN as much. Of course, if we're talking about a multi-site deployment in the same data center where, for example, you have a kilo deployment and a Juno deployment and you want to somehow have them talking to each other, then you might have easier requirements to meet on WAN utilization. But you still have gateways usually between the clouds, depending on the implementation. And so throughput and latency still aren't as good as within a single site. Now, if you're doing cross-site, stretched L2, the next question that comes out is, OK, well, should all my networks be cross-site? And the answer is yes, because you want that kind of symmetry in deploying your applications. You don't want to manage different configurations on each side. That's the whole premise of doing stretched L2. So how do you do routing across two subnet, two stretched layer twos? You put a router somewhere. So here I put the router in site one. But if I put the router in site one, VMs in site two that I want to talk to, the other subnet, end up having to do a traffic trombone. That's obviously not desired. And so what we'll end up doing is using a second router in the other site. Problem solved. Traffic trombone is gone. But now we ask the question, how do you implement this router? So consider that the routes have to be the same in the two routers. What about the IP addresses? If we use different IP addresses, then when VMs migrate, the default route of the VM must change. So what do you do properly? Maybe you wait for the DHCP information to expire and you can or you proactively install a new route in the VM. So the IP addresses have to be the same. What about the MAC addresses? The MAC addresses also have to be the same, because otherwise we have to wait for the ARP table entry to expire when a VM migrates. So for all these reasons, you start to see that the cross-site router, meaning a router that is mirrored on both sides and where the VMs don't have to be aware of which site they're in, the cross-site router now becomes a requirement of deploying your applications across sites. So what about load balancers and firewalls? So firewalls a little bit easier. Of course, you want the same rules on each site and the classic use case we're talking about is you're doing DNS load balancing across the two sites and then you've got a multi-tier application in each site, for example. But you can manage your file as a service with heat templates. So it's easier. You can manage it even by hand. You don't change these things a lot. Load balancers are more complicated, because now what we're doing here with the load balancer is we've got to decide what the policy is for going across the WAN. So you may have a policy where you can use the WAN quite a bit, or you can only use it during upgrades, or during maintenance, or during VM migration. So this is why load balancers end up more complicated. And it's not even really a cross-site load balancer, because many people don't want that. You may not want that your load balancer on site one knows about all the back ends in site two, otherwise we'd be crossing the WAN a lot. So I'll stop here in terms of developing this story, but the message I'm trying to give is that if we do cross-site L2, even if we use the network sparingly, the model itself forces us to offer, from the very start, a very complete solution. So that already means that we don't have a very nice, easy path to getting to cross-site deployment of applications. So what's the alternative? Router peering is what I'm calling VPC peering, what AWS coined as VPC peering. It's the idea that you just peer two routers. Okay, and we can extend this concept to peering end routers on a stretch cell two segment. So the key point here is you have a choice between what you put on your stretch cell two. You can put VMs directly on your stretch cell two segment, or you can put routers on your stretch cell two segment. And I'm going to argue that this is the first step, it's an easier step to take, and it has some advantages. So, now, I want to point out that in this diagram here, you have different subnets, different subnet addresses. So you're doing some subnet management and the orchestration layer can help you with that to keep your subnet prefixes different on each side. And then I've designed this so that I'm using different slash 16s on each side. This lets my routing be pretty simple. And this is just, this is what AWS does. It doesn't, AWS doesn't allow peering between two sites that have overlap in their sort of, what is it, an address scope. So, but I don't have a pretty animation for this, but you're still going to need security group synchronization across the two sites. So across site concept of security groups because these two blue networks are blue on purpose. I'm trying to represent that we've split a tier, an application tier, into two different subnets, but it's still the same application tier. And they sometimes may want to talk to each other because you can still have, you know, for certain purposes, a load balancer on the right could talk to a VM on the left under certain conditions or you might want to deploy your migrating on MySQL or you have some storage layer or database layer synchronization or caching layer synchronization across the sites. So security groups are still a requirement. But with router peering, and I'm arguing that router peering is easier to implement and security groups, we can start. Now, you're going to have to redesign your deployment templates, your orchestration, right? Because now you have to deal with tiers that have different prefixes. However, the advantage of this model is that if we start to talk about compatibility with hybrid, in the hybrid cloud use case or direct connect, we might have advantages there. So, direct connect is another term, AWS term, it's express route in Azure. And it's the idea that at the peering facility, the customer has a hardware router. And they want that hardware router to have direct access to their virtual data center. Okay, so there's a layer three model. And I'm not showing you here that there's a physical piece of equipment in the peering facility that allows this layer two connection between the routers. But that doesn't matter because we're interested now in the application layer virtual network. So, what I'm trying to show here is that what I expect the public clouds to do is to offer this model. And that it's unlikely, I claim, it's unlikely that they're going to support a stretch layer two model. You can implement it using L2 VPNs, that's fine. But I think the vendors, the cloud providers won't do that because it is more complex. Instead here, they're basically opening up a tunnel and then they offer BGP or dynamic routing over the tunnel. And it's pretty simple from their end. And they can even give you the configuration to install on your customer router. Then very similar use case would be that if you are dealing with three public clouds, then you peer them all. Now I'm showing here that every pair is peered. Okay, so it's a mesh. We would strive for and so we've built a prototype of this where the three routers are on a single VXLAN segment to really drive home that point that you can put routers on a stretch till two segment. But the point is that you can build this today with public clouds and your own open stack. This is the private cloud on the left. And again, I claim, and I'd like to hear opinions if they differ that we're going to be able to do this in the industry on layer three, but we will not be able to stretch till two into public clouds and conversation points. Okay, now, do we have to choose? So I believe the Neutron community should build these APIs so that we can support both models. After all, we're just stretching layer two. We're using some tunneling, the excellent to start off maybe and then the components are very similar in terms of implementation. What I think will change is that the implementations, the vendor implementations will target, will sort of zone zero in on one or the other. So for example, we're very focused on layer three. We're not going to look for scale for now on stretch layer twos where you put VMs on, right? So every vendor will have a point of view on that. And my question is, what's your use case? What do you need? Do you need stretch layer two and why? So let's talk about it in a minute. Now let's spend a minute on a few minutes on implementation. How am I doing on time? You left me a ton, so. Yeah, I mean, good. Okay, well maybe pause a little bit. Oh, five minutes, okay. Okay, so our colleagues at Huawei have come up with a great design where the goal was to be as flexible as possible in terms of where, of how you can get cross-site connectivity across clouds. So they started off with a goal, try to support any kind of open stack that existed in the past, okay? And they had to compromise on that a little bit. But the goal today is if this site, site A, if the site support, maybe I think probably ML two and they have some sort of tunneling, with those two requirements alone, we can drop in a border gateway on each side that can connect the tunnels on the left to the tunnels on the right. And so we can implement either, we can implement stretch layer two, okay? So that's great. Now, what is this border gateway? So I don't want to go into so much the implementation of the border gateway, but the interfaces of the border gateway which allow anyone to create a border gateway. We at MiDokura will be developing in MiDonet a border gateway. And other vendors will have their own. A neutron implementation may have it natively. But if it doesn't, you can always drop one in. And so you need an interface here between the border gateway and the rest of the cloud. And the interfaces, we hope, the layer two gateway API. Okay, it's not exactly suited for this today, but with a few tweaks, we believe that this is possible. So we'll be trying to discuss this with the community over this week. What other interfaces? So on the north side, we want that the orchestration layer can talk to both border gateways. And we don't want the border gateways to have to be the same implementation or the same vendor. So we need a standard API on top of the border gateway. That's today that could just be OVSTB on its own. So OVSTB to manage tunnels and with some modifications to OVSTB, some minor tweaks, yes, but that could be OVSTB. And then between the two border gateways, we have already have tunneling protocols which are standardized, so that's already taken care of. So does that sum it up? All right, great. So questions, and we could start with this gentleman's question about layer two gateways or as you wish, is it an appropriate time? It's partly answered. Partly answered, yeah, okay. And the question there will be, but there is some control doing the magic? This depends a little bit on the vendor and the implementation. So maybe this is what Ayyal was answering before. Do you want to take the mic? And maybe, I mean, it goes back to Ashed's talk, which is again, at what time? 5.30 today, right in this room. Well, perfect, just get some coffee and stay here. Okay, so back to your original question. So do we have a control protocol between the border gateways? And the simple answer is we do not want to mandate that because once we mandate a control protocol, both sides need to support the same standard or the same protocol. And that kills the opportunity of having different vendor boxes, right? So I need a, if I have a Cisco box or a Huawei box on the left and I'm going to my public cloud, I need the same implementation or a virtual implementation on the right. What we did was we went to the L2 Gateway community and we're trying to push these concepts and the deltas into the community so that this will actually be standalone. You don't have to use tricycle to use this. And what we need is the ability to create tunnels and you don't need a control protocol for that and you need L2 population locally. So we're defining these APIs on both sides and then as long as you're using the same tunneling protocol, you don't need to run back and forth any control messages between the border gateways. Does this make sense? That makes sense, but the question was more specific to the previous use case where they're migrating from one cloud, one site to another site with L2 extension. In that case, they're using the same subject on both. How will the route, because you're actually routing the same cycle. So what you need to do with the orchestration layer needs to remove that MAC from one border gateway, go to the other border gateway and populate it. So there are two options. Either you come from outside and manage this with an external management layer like tricycle or your border gateways can negotiate between them and pass this information, right? Gratuitous ARP or any other way. So if I'm migrating a virtual machine, this is not something that happens automatically. The orchestration layer knows that it's moving the virtual machine from one side to another. It's a very simple matter of removing an L2 entry or an ARP entry on one side, introducing it in the other and problem solved. You're asking about North-South from outside the cloud? Yep, yep, okay. Actually, we don't want to stop those applications from running. We want the ability to run local operations, right? And have the top synchronized later. This is a short talk and I have not been able to cover everything, but really we want as much independence as we can give. If we're talking about IPAM, then we want some individual abilities in the bottom so we'll maybe split up the subnet into sites, allocate according to demand and then the top just needs to do IP management when one site is running low on IP addresses. So then you can start running local applications and have the top learn once connectivity comes back. Maybe I'll chime in for a second. So our colleagues at Hawaii have thought through a lot of this, I've been working on it for a long time. But of course, and the attitude you'll get from them because they're bringing in and they're inviting people into the project very openly and they have a great way to work together. But it's a conversation. And my answer to your question is that it's a matter of policy. What do you want your site to do when it's partitioned away from the others? And I hope it will develop over time is alternative policies where a site should just stop doing stuff because it's no longer consistent with the majority, the quorum. And another policy where it continues to work because you want high availability over consistency. Other questions? How are we on time? Yeah, we still have two more minutes. Last question, maybe? Yeah? EVPN solves pretty much all of them. Sorry? EVPN solves. Have we solved them? No, no, the claim is EVPN solves all these issues. So EVPN is a standard technology. We're talking about here deploying different technologies, different vendors, different clouds, right? It's not just about the connectivity. Does that make sense? I mean, yes, yes, we could all agree to deploy EVPN. I actually am not, I don't know much about EVPN, okay? So maybe we should talk. But I think the problem is more complex because there's the integration of, there's a layer two gateways here between the border gateways, right? So, and then there are the models that you want to use. And if you do this, I'm arguing that you don't want to have every single tenant and every single network do its own VPN. That's not efficient, okay? So, yeah. Okay, so thank you very much.