 All right, shall we start? Yeah, I think we'll go ahead and get started here. OK, so my name is Nolan Leek. I'm the co-founder and CTO of Cumulus Networks, which is a network operating system company that makes a distribution of Linux that runs on a wide variety of hardware platforms. This one right here is from our friends at Quanta. And it is very much Linux-centric, as you'll see soon enough. So my name is Chet Burgess. I'm the senior director of engineering at MetaCloud. MetaCloud is an OpenStack Solutions company. We design and build OpenStack Clouds for our clients. And then we do 24 by 7 maintenance administration, monitoring, bug fixing for them. And today, we're going to talk to you about using switches, such as the one we have in front of you here, that are basically running Linux. And how if we have Linux on the switch, we can do some interesting things with Neutron on the switching platform itself. OK, so we're going to talk a lot about Linux switches. But first, why would you want Linux-based switches? So raise your hand if you know how to put a configure an IP address on an interface in Linux. OK, now raise your hand if you know how to do that on Rista, EOS, and NXOS, or any other kind of CLI. Yeah, thanks, Carl. I don't think he counts, yeah. He's a ringer. So the advantages is very much familiarity and kind of common tool sets. And so it has a lot of advantages. You can run them in VMs. So we're going to show we did most of the development on this in VMs. So you have the exact same environment in a network that is encased entirely in one laptop, as you do in the networks that have physical switches in the data center. And the other is that there exist operating systems like ChemoLinux that can actually let you do this on real hardware that can switch at very high rates. So how do we do things today? If you have an unmanaged switch or a switch you have particularly configured that's just an L2 bridge, what you end up doing is you end up trunking all of the VLANs through the switch. So every single hypervisor in the entire OpenStack deployment can see every single VLAN, even if it's not running any VMs that are associated with that VLAN. And so people have tried various things. Various vendors have done things to try to fix the situation. So we won't mention any specific names, but you have your ML2 mechanism driver, which talks to some special agent that was written by the vendor, which does some kind of very vendor-specific magic to the switch to trunk those VLANs. So in this case, 101 and 102, only to the hypervisors that actually need them. And any time you bring up a VM, it has to go fiddle with the configuration of these switches. So what did we do? We replaced that proprietary switch with just a Linux switch. And as everyone knows from experiences with V-switches, Linux itself can do pretty much everything you need in a networking device. So in this case, you can see in this configuration that since we have two VMs, two VLANs, I mean, running on the hypervisor down there, those two are actually trunked through the switch. So if someone sends some traffic to the switch on VLAN 105, it's not going to go down to that hypervisor. Only the VLANs that are associated with VMs running on that hypervisor will actually be trunked down. And so that gives you isolation, broadcasts, are limited to only where they need to go, and all the usual benefits. OK, so how do we do this? So we have a prototype agent that we've written that will run on, well, any flavor of Linux, actually. And it happens to be optimized to run on Linux distributions that are designed to run on switches, like the quanta one here. This agent is based upon the existing Linux bridge agent. So we had a great reference and map we could use in the existing agent. We had to make a few changes to it, because there's obviously a few things different between a switch and the traditional hypervisor. We don't have tap devices on our switches. So all the management of the tap devices and that sort of stuff was no longer necessary. But we do have the basic ports and VLANs, and then we want to build bridges to span them. We have leveraged the existing ML2 notification framework. So if you're not familiar with the ML2 driver, it's a new feature in Havana in Neutron. It's a pretty cool implementation. It's designed to replace the old multi-driver mode that Quantum had, where you could chain together multiple quantum plugins, if it worked right and they behaved well. ML2 is a whole framework that's designed to enable that, and it works pretty well. One of the cool things about it is it supports two things. You can have a mechanism driver. So if you need to do something really advanced, your mechanism driver gets the message and gets told what to do with a port or a network or something like that. But it also has a lightweight notification framework that can be used just to send a notification to any agent that wants to register for it. And those types of notifications are generic notifications, such as a port got created, a port got deleted, and then it gives you information about that port. So what we've done is we've actually written just an agent that leverages that notification framework. So the agent gets notified of these events in the case of what we've prototyped. It's a port create, port update, port delete, very basic. And the agent will examine the event and determine is this port that's being created, where the port in this case is a port for a VM, is it something that's actually connected to me? Do I need to take action on it? So like I said, it's a prototype. So there's still more work to be done with this prototype. So we wanna add support for trunking between switches. The whole goal of this is basically you should be able to take one of these switches, and since it's basically running Linux, you can use something like Puppet, configure it, drop the agent on there, start the agent, and then Neutron should give you all the config you need. So if you have trunk ports on the switch, where switches are connected to each other, we kinda wanna be able to discover that and trunk the VLANs that need to go between the switches. Topology mapping. We do have to know what hosts are plugged in where in order to make this work. Today we just use a very basic YAML mapping syntax to do that, but we're working on a model where we're gonna use LLDP between the switches and the hosts to discover who's plugged in where, and that allows us to get a very dynamic, rich support for just being able to move switches and hosts around without needing to reconfigure a network because the network will do it for you. There's still some state synchronization we wanna do, specifically around detecting topology changes. Like I mentioned, we wanna leverage LLDP so that if you do decide to move a host to another switch, or you need to replace a switch or change your trunk ports, you can just do that, start the agent, and it'll detect that and reconfigure the network for you. The other big thing we wanna do is refactor the agent to share the code with the Linux bridge agent. There was a talk earlier today in the Neutron Developer Track about this, which is we have a couple of agents now that leverage this ML2 mechanism. Can we start to build a common framework for agents in a common set of libraries so that they can all kind of leverage the same stuff? And especially with the Linux switch agent, it probably uses 75% of the code is from the Linux bridge agent. So if that can be abstract into a shared library, we both win. And then finally, since it's a prototype, we wanna finish it up and get it upstreamed into the Ice House release so it'll be an official part of OpenStack. So now we're gonna attempt to do a demo. Five PM on Friday, the last day of the conference. So we hope this comes off and actually works. It worked an hour ago. It worked an hour ago, trust us. Yeah, so I'm gonna drive the demo and Nolan's kinda gonna explain what we're doing here. So don't let the Apple logo fool you, but these two laptops are actually running Linux and they're standing in there. They're our hypervisor hosts. One of them is running all of the OpenStack server code as well. And then they're connected through these front panel ports over here to this quanta switch, which is also, it's running the agent. And so it's also under OpenStack control. Chet is going to log in and create some VMs on these two hypervisors. And we're gonna see that since they share a logical network, the VLAN that they're using will be connected to these two switch ports and then bridge together. So here's the configuration before we start. You can see SWP one and two. Those are the front panel ports. They appear just like any other ethernet interface on a server would. But, and they have all the similar properties right now. You can see they're up, they're running and they have no IP address, which is exactly what we need. All right, so we'll come over here and go into our demo project. And since we have two compute nodes, we'll just launch two instances. And we've built a really, really basic SIROS image. Nothing fancy here. So here we go. The VMs are building. And so this is the typical process we're all familiar with what's happening. But what's also happening on the back end here is as Nova compute basically asks Neutron and says I need a port for this VM, the ML2 framework is getting told about this. We're running Neutron with the ML2 mechanism driver, in this case using the Linux bridge mechanism driver to control the hypervisors. But in addition, our agent that's running on the switch also receives a notification, a notification that says this port for this VM was created on this hypervisor with the following network information. So we can see here we've got one VM running on demo one and one on demo two down on the floor there. So if everything works. All right. Let's see. All right, so you can see it all works. We've got the two new interfaces. So the VLANs are represented by virtual interfaces. In this case, SVP 1.1000. So we have a VLAN ID of 1000. And now show. Oh, show, there we go. And so you can see the bridge has been created and those two VLAN interfaces are in. So now only traffic going between these two hypervisors for VLAN 1000 will actually traverse this bridge. And you'll notice too the bridge name and everything is to the standard way that bridges are built with neutron. Again, we're leveraging that shared code that exists in the Linux bridge agent. So this should all look very familiar if you're familiar with that agent. And one point that may not be obvious, this isn't just a Linux server running with ethernet interfaces. It can actually switch 1.28 terabits per second through all of its 10 gig ports on the front. So now if we've done everything right and this is really gonna work, I should be able to log into this VM and I should be able to ping the other one. So let's see if I can get the console here. Console. I don't know who builds CROS, but apparently they have quite the fascination with the Cubs. Oh, I ended the password wrong. And so we have an IP address of 10.0.0.3. And if everything works, we should be able to ping the other node. All right. There we go. I think that's the most excitement I've ever seen for a networking demo. Well, to reward everyone for sitting through a demo that was mostly just text scrolling by on a screen, we've prepared something special. Yeah, these switches, it turns out, they're pretty cool. They can do a couple of things. You should cap the script first. Oh, okay. This is a short shell script. It's two, four loops and a little command. But when we run it. Give it a little shot here. Oh yeah. All right, so we managed to pull off the demo. I think you're up next. All right. So, you know, if you want to do everything at L2, that's, this is kind of a state of the art. VLANs, trunking VLANs around. But is that how we want to do things? I mean, I know we all love IPX and NetBooey, but you know, it may be time to just let them go and go with an all IPv4 and v6 world. So in a traditional data center network design, your top RAC switches are often L2 only devices. And then they trunk VLANs, they're connected to VLANs on a big chassis switch up above and that's where all the L3 functionality happens. And so, I mean, this has some implications, right? You need two of those big chassis switches. You know, you need to use proprietary protocols like, you know, various vendor specific MLAG implementations to allow them to appear as one switch to all the top RAC switches. You have to run protocols like VRRP for router redundancy. And it's all a fairly complicated setup. And it's got some flaws even when you get it all working. You know, you have ARP spoofing problems that you have to figure out how to deal with. You can have rogue DHCP servers which can cause all sorts of havoc. And increasingly, especially with things like distributed storage and CEP and tools of that sort, you have a huge amount of east-west traffic between individual nodes. And so all of that then has to bottleneck through that chassis switch at the top of the network. So, you know what scales great and is really fast is the internet. And it's all done at layer three using routing protocols and they're extremely well tested. You know, they're proven at a scale that would be difficult to replicate inside a single data center. And you have great things like ECMP. So, you know, at layer two, we use technologies like STP to protect against loops in the network. But what that means is, even if you build a densely connected mesh, you're still limited to only one of those links being activated at the same time because all of the other ones have been shut down by STP. With ECMP and IP, you can keep all of those links active all the time so you can get a huge amount of bandwidth through the core of the network. And the other thing you can do is you can push the L2-3 boundary down to the top of rack switch. And so now your L2 domain gets smaller, your broadcast domains get smaller. And so what if we could go further, we could push that L3 all the way down into the hypervisor. So, how would this work? Let's say we wanted to build an open stack-based cloud and we wanted it to basically use pure L3 connectivity or as much L3 connectivity as we can get. But we know that there are certain things we don't want to sacrifice, right? We still want to support floating IPs. We still want to have strong layer three security for our VMs and we still also want a certain amount of layer two isolation for our projects because that's something that we need for compliance reasons. So how might we start to build this or how might this look if we were to build it? Well, for the VM connectivity piece, what we can do is with something like a layer three agent now running on the switches, we could actually have the top of rack switches doing announcements for slash 32s for each VM and flooding these out throughout the network so that all the switches now know pretty much where the VMs live. We could then push an L3 agent, something similar to the existing L3 agent, down onto the hypervisors to provide our existing layer three functionality. Now, this isn't something that Neutron can do today. This would be something very similar to the old multi-host mode that Nova Network supports, but it is something that's being worked on in Neutron today. We could also then have the hypervisors set a slash 32 route on pointing to the tap device for each IP that's on that VM. What this allows us to do is remove the bridges. There'll be no more bridges. There'll just be routes. For floating IPs, similar thing. In this case, we're gonna push that actually down to the hypervisor. Turns out for natting and that sort of stuff, hypervisors are still a bit more performant and better for doing that. So we can have the hypervisors announce a slash 32 for the floating IPs and run the one-to-one nat for us. VM mobility, well, this is actually really easy. In fact, it's even a little bit more mobile than what you have now. Today, if you wanna move a VM from hypervisor one to hypervisor two, well, in addition to actually moving the VM using your underlying virtualization, you have to inform the network, make sure the network controller does the right thing, reconfigures the new switch to know about it. Well, in this mode, all we have to do is withdraw the slash 32 route from the switch it was on, announce it on the new one. Scalability, so how would this scale? Well, first off, it's gonna depend on the number of IP addresses. We're talking about slash 32 routes here. So every IP we have in our cloud is gonna count as a route. So an average VM actually is gonna have probably one or two IP addresses on it. You're gonna have a single fixed IP address and you're gonna have a single floating IP address for external access. Now, this isn't a rule. You can have advanced network configurations, but the average in really large clouds is gonna be one to two. So with the current generation switching hardware, such as the switch we have up here right now, each switch can handle about 20,000 routes internally. And the next generation hardware is specced to handle 100,000 routes. So doing some fuzzy math depending upon the density that you can get for your VMs, sorry, the density on your hypervisors you can get and how many IP addresses on average you're gonna have, you should easily be able to support about 500 hypervisors before we start running into a problem with not having enough route entries on our switches. And even then, if we're using something like OSPF, we can create OSPF zones. And we can use something like our border routers or inner zone routers to handle moving the traffic east-west between the different OSPF zones. And like I said, we don't wanna give up our security in our isolation. So the VMs are still all too isolated. They're not L2 isolated using VLANs now, but they're L2 isolated because there's no bridges. There's no layer two that the VM can actually see because everything is now routed and forwarded through the hypervisor as opposed to bridged. And additionally, a layer three isolation with this model, our existing security group model with IP tables on the hypervisors continues to work as it does today. So we still have that strong L3 security for your VMs. Okay, so it's a very short presentation being at the very end of the conference. We do have about 20, though no promises, we haven't counted exactly shirts. So anyone who asks a question will give you a cumulus networks T-shirt. So I see one person's already got their hand up. What do you mean in terms of having aggregation? Well, that's what we were talking about with the scalability limits is because you're announcing full slash 32s, there's no router aggregation happening at all. So the trade-off is then you don't benefit from summarization and so you end up filling up your routing table. But unless you're trying to build an enormous cloud, well, a large cloud, it's not a... Yeah, you can use things like cells and multiple ASEs within NOVA today to further segregate that. So we have those constructs in OpenStack to be able to take a group of hypervisors, segregate them, and then have multiple segregates together and kind of address them as one cloud, if you will. So... What size would you like? Yeah, if you ask a question, come up and get it textured afterwards, I guess. In the back, yeah. Oh, so you're saying if I've got a VM in one project and I have a VM in another project, how do I protect them from talking to each other? So yes, well, so those L3 agents that we talked about are gonna be based on the existing L3 agents. And so the L3 agents today in Neutron are what control the virtual routers. So you're still gonna have something like a virtual router in there. I say like, because this isn't implemented today. So that virtual router, that L3 agent that is running a gateway directly on the hypervisor would still be controlling that and controlling whether it's gonna allow you to egress through that gateway port or not. So today, if you had a Neutron deployment today using just Linux bridge, you'd have a network node somewhere and it's running your L3 agent. And when you create a virtual router, that's gonna run on that node. And so that's node is what's now enforcing whether this virtual router and this virtual router will be connected or not. And so you'll have the same construct. It's just instead of being on a singular network node somewhere, it's moved down directly on the hypervisor now. So that rule would still be there. You'd still have to connect your virtual routers and build networks to connect them and rules to control them like you do today. There's no change in that directly. Well, yeah, in the back there. We'll serve the first one for you talking about broadcast traffic. I mean that is the limitation of this implementation. As we said, you do give up because we're basically removing layer two, no bridges. You do lose broadcast stability. So any broadcast based protocol wouldn't work. Something like multicast would require multicast routers. And that is the limitation of this model for sure. So if PaceMaker's running top of Chorusync, you can actually configure it for IP Unicast mode. And that would continue to work in this way. But yes, you're correct. The standard default configuration for PaceMaker with Chorusync is a multicast. And unless you had a multicast router, it would not work. And that is a limitation of the model we're proposing. And to your scale, the woody point, the IPv6 implement, the way it's implemented in the hardware is that there are entries that are four bytes wide. And so a v4 entry takes up one, and a v6 entry takes up two. So if you're running a dual stack network, you basically have one third of the scalable that we talked about. So on current generation hardware, that's pretty limiting. But on next generation hardware, it's still fairly large. I think directly behind. And the person in front of you had the question first and then you. You mean like VXLand style networks? Yeah, I mean there's a different set of trade-offs in that often you won't set up all of the VXLand tunnels immediately because the cross product blows up. And so you'll set up, every time you learn a Mac, you may set up the tunnel to the hypervisor that has that. And so depending on your workload, if you're doing a lot of, if you're learning a lot of different addresses, that'll be slower because you're setting up and tearing down all these VXLand tunnels. Whereas in this model, it's only VM creation and deletion that causes data to be sent through the network and learned by all of the other networking elements. And that one's really gonna depend on your workload, obviously, for that. So, strike shirt. Yeah, you wanna handle that one? Yeah, so that's your department. There's a, it's forwarding ASIC in this box. That's a Broadcom Trident Plus. And so it has 64, 10 gig ports on it. And so the way the Linux OS is working here is when you add a route or when you add a forwarding database entry into the bridge, there's a piece of software on there that mirrors that entry into the hardware. So anything that is mirrored in the hardware will actually be switched by the hardware and the Linux will never see it. Now, the stats are read back so that the interface stats and all of that stuff is accurate as if it was software switching everything, but it's actually happening at hardware line rates. Yeah. Again, Nolan, that's your department. Well, we have a, yeah, so if you go to our website, we have a hardware compatibility list and we have four different vendors on there. And so, if you have difficulty buying from any of them, please email me directly and I will make sure that works out. Yeah, it's... No, no, if you place an order, it'll ship to you within a week or so. And it's production hardware that's been used at a very large scale. Yeah, just to clarify, when I said prototype, the prototype is simply the Python agent running on this switch. These switches and cumulus Linux which runs on top of it are production ready. They're in deployment in a number of places today. Yeah. How much time do we have left? Yeah. So, it's an orthogonal solution. Yeah, this is kind of a different, as you saw, we're using BRCTL, where you can use IP route add, IP table. So it's all very traditional Linux networking. So you could write an open flow agent that would run on this switch that would take open flow protocol and program the hardware for it. We haven't done that yet. Yeah, this one was basically designed to integrate as quickly as possible to some existing neutron concepts. A couple of people have mentioned a couple of different competing ways of doing this. I mean, this is just one of the great things about OpenStack, right? As you can configure it 600 different ways. I think Vish Ice of the former PTL of Nova gave a talk once where he was literally like, there's 600 options you can configure in Nova and that's just one of the projects. So there's a ton of different ways to build these type of networks and clouds. And this is just one take on solving that problem that's based upon this layer three technology and this switching technology that we like. Anyone else? Yeah. Yeah, I mean, so we obviously don't have the L3 model implemented. This was kind of a future thinking once we get done with all the L2 work. And that is potentially one of the issues that's there. A bound segment as it's called in Neutron has a segmentation ID, but that's actually controlled primarily if I remember correctly and I am not a Neutron core. So if there's any Neutron core developers in the room and I butcher this, please feel free to say I'm wrong. But if I remember correctly, the kind of the definition of the network and the way the segmentation IDs work is controlled by your type drivers. So you have like VLAN, VXLAN, GRE. I think there's one or two others. So it is potentially a new type of type driver that would also have to be introduced to support this where the segmentation ID becomes less relevant or the segmentation ID might need to be some type of unique ID or a copy of the UID or something. But I suspect, at least from looking at this and our thinking on this, that this potentially would need a new type of type driver. And like I said, if there's anyone in the room who can correct me on that, please feel free. I am not a Neutron core, so. It gets the Carl Searle approval. What was that? Okay, all right, there we go. Any other questions? No? Okay, well, thank you very much for coming. We're gonna end it early, since it's the last session of the last day. If you want a cumulus t-shirt, come on down. If you have additional questions about cumulus or MetaCloud, we'll be around for a couple minutes. Please feel free.