 Okay, so this talk is proof that you can just about fill a room even on the second half of the afternoon of the final day of the OpenStack Summit, which I guess is a testament to Dan's and this talk's popularity. So by all means please give Dan a warm welcome for his session on lessons in network design. Networking is not free. Please welcome Dan. Good afternoon. Thank you everyone for coming. My name is Dan Snedden. I am a member of technical services at Cloud Scaling, member of technical staff, sorry. A little bit about my background. I've been working in network and systems design for 20 years. I was the lead network engineer at Apple and worked on, among other things, the iTunes Store network, the MobileMe network and their data center network design when we redesigned their data center. I was the network security architect at Slack National Laboratory, which is one of the Department of Energy's high energy physics laboratories where the Quark was discovered. They've been doing high performance computing since the 60s and have large supercomputing cluster there. I was an IT architect for a division of Schneider Electric, global 100 company, one of the most popular providers of data center infrastructure, equipment and services. And I've worked in the financial sector and at startups as well. So fairly broad background. And I'm going to be presenting some strong opinions today. These come from my experience. I'm happy to take some challenge to some of the things I say in the questions. But let me try to bring my perspective to things. So here's what we'll be talking about today. I want to talk about a little bit of history, how data center networking came to be done the way it is today. I want to talk about how VLANs came to be ubiquitous and then came to fall by the wayside a little bit as scale increased. I want to talk about the challenges of networking that apply really only at cloud scale and how we're learning that we need to build networks differently for clouds than we did when we were building just for the enterprise. And I want to talk about the OpenStack networking model formerly known as quantum, how that works, how that fits into things, and some of the gaps and room for improvement that we have. So first a little bit of history. There's a timeline of networking, very, very high level. But in the 80s, networks were essentially shared media, token ring or ethernet used hubs or repeaters where traffic went to every port on the segment. And those were separated by routers or packet repeaters or just dual home systems. And typically users used serial connections back then. So the network was fairly small in comparison to the behemoth networks we have today. In the 90s, things got faster, better, switched ethernet at 100 megabit came out. And there was the gray box revolution. Suddenly everyone had a computer at their desk rather than a terminal rather than sharing time on computers. And everything became networked. The serial connections started to go away and we were finding that as you added people and additional computers to the network, the size of the network had to be managed. And so really in the 2000s, that's when VLANs took over. They became necessary for the large number of systems and to segregate, reduce the size of the broadcast domain and create manageability and the ability to have security isolation between VLANs. And then what we've seen most recently, just in the last few years, is that things have been changing a lot. The pace of change is amazing. SDN and network virtualization have made it very hard to understand some of the bits and pieces for, especially for non-network engineers. And what we're seeing is a huge increase in the speed required. So you've got 10 gigabit connections with 40 gig, 100 gig or larger uplinks between your access layer and above. So just very quickly, let me go through looking at what those various historical designs look like in the 80s, like I said, shared media. And those typically were star topology or daisy chain of star topology hubs, sometimes token ring and serial connections. Once the switched ethernet came along, it became possible to do a lot more things wrong and still have it work. And the reason for that is that the switching hides, in some cases, putting too many systems on the same node. It takes care of bad actors on the network and helps to reduce the impact that they have on everybody else. And if you've ever used a token ring network, you know that one bad actor actually can take the whole network out. Ethernet is very resilient in that respect. In the 2000s, people started actually using security zones, separating and segmenting out their networks according to access rules. And so instead of having everybody on the same network with the same access, in many cases, now you can make things available and resources only to those who needed them, and also separate your servers from the people who are accessing them. Maybe put your databases behind firewalls, that kind of thing. And then in 2010-ish, it all got easy, right? I don't actually know. Got very complicated, but there is good news in that as well. There's a lot of options for us today. So I want to talk about VLANs and why they became ubiquitous. Like I said, you need separation. You need to be able to reduce the size of your broadcast domain. And you need to be able to have some control of isolation and segmentation. Within the data center, no matter how many systems are on a particular VLAN, and no matter how many systems are on another particular VLAN, when those VLANs talk to each other, when any system on one talks to another, it has to go through a router. So that gives you a point at which you can apply access control lists. It gives you some control over choke points, in case you need to apply quality of service limitations, that kind of thing. And it gives you the ability to have a better understanding of where your traffic flows are going. If we all just bridged VLANs together, frankly, it would be a little hard to tell where things are going. Tools like Traceroute are nice additions to our ability to see the path. VLANs were at one time often used for physical separation. And what you'd see is one part of the data center would have one VLAN, another part of the data center would have another VLAN or another set of VLANs. When you wanted to move a server from one VLAN to another, you would literally unscrew it, move it across the data center, and screw it in again in a new location. That was not very manageable in the long run. And most of us have probably worked in data centers, or are aware of the concept, where every VLAN goes everywhere. And you don't move a server when you want to move to a new VLAN. You change the port assignment, or even you're trunking multiple VLANs to a server and let the server decide. So the downside to that is that as data centers got bigger, some of the negative aspects of VLANs came out. So the good side of VLANs is that you get the isolation. And it's generally considered good enough isolation, even for financial transactions, systems that really need to have one segment separate from another for security reasons. VLANs will usually pass an audit these days. It wasn't always true, but you can depend on them. There's, of course, the reduction in broadcast domain, and they do add to your manageability. It's nice to be able to assign a range of ports to a VLAN, tell the installers anything with these colors is going to be used for this purpose. But that manageability becomes difficult when you're using many, many VLANs. Thousands of VLANs is a challenge no matter how you divvy them up. But the negative side of VLANs, first of all, they require spanning tree, really. If you're going to be using VLANs that span over multiple switches, then you really need spanning tree to make sure that you're not looping at any point. And spanning tree also provides a measure of resiliency. You can use it as an active standby tool for failover of links. But spanning tree can really be the bane of a network engineer's life. When it goes wrong, it can take down the entire network. Spanning tree is a topological representation of the entire VLAN structure. And since that has to span all the switches and has to be working properly in order for the network to operate, it can become not only a single point of failure when something goes wrong, it can become very hard to troubleshoot. When you have a spanning tree loop and you've got broadcast storms happening, you can't use tools like trace route because there's so much packet loss involved. Sometimes you can't even have your laptop connected to a misbehaving network. Because if it's getting a full gigabit a second of looped traffic, then your laptop itself is going to be churning just to throw away the packets. So anyone who's ever seen a spanning tree breakdown knows that you're going to have a bad day. I've seen major organizations just brought down to their knees. Their entire network disabled for the better part of a day, more than once. While network engineers went from point to point tracking down the configuration problems or finding the buggy equipment. Also there's a limit on the number of VLANs, 4,096 minus a few. And since it's easiest to assign them in groups, people tend to run out faster than you'd think. So VLANs are great, but really it would be nice if there were a better way to do things. So in the late, I'd say, 2000s, mid to late 2000s, the scale of some data centers grew beyond the limitations of VLANs. I saw this myself when I was at Apple and we were expanding in order to build out the iTunes Store. And just the sheer number of servers that we were adding to the data center made it impossible for us to continue to use our architecture, which spanned VLANs across switches. And when you're adding servers 5,000 at a time, you're going to exceed fairly quickly the limit of a spanned VLAN. So we went to a different model, which is layer three everywhere with small layer two segments. So if you can imagine looking down from the top down onto your data center, you've got routers, switch routers. Instead of layer two only devices, you've got routers as close to the actual servers as possible. Maybe a VLAN lives only within a rack or within a row. And then layer three coming back to the core and layer three at all levels. So this is a very nice scale out architecture. And here I've shown the hierarchical networking model that Cisco developed along with some other researchers and vendors. Where the access, distribution and core allow you to have a network which is the same design at small scale as it is at very large scale. And predictable numbers of hops between any point of the network to anywhere else. So the access layer is where the last mile is, typically the copper networking that connects to the server. Between the access layer and the distribution layer, you're aggregating traffic and also at the distribution layer, you've got the opportunity to provide services. VPNs would typically tie in their access control lists. Your filtering or quality of service might also be applied there. In the core, most people choose to keep their core very clean. There are three only, no filtering or large scale filtering. Firewalls and things like that tend to be, although every network is different. They tend to be between the core and the distribution layers and often again in the distribution layer to access layer. So let's talk about some of the architecture decisions that really only affect clouds. First of all, I want to describe something that we call a tale of two clouds. We present this idea because we think that clouds are changing. The legacy clouds, the enterprise virtualization style clouds, they're not going away certainly, but they support a different kind of application than what we're seeing today. They support the application where you have multiple systems that are in a cluster. And so they need to have a shared layer two segment in order to communicate to one another. They support the VLAN assignments that people already are used to using for security isolation. And then they replicate that within your virtualization environment. Contrast that to the Amazon web services model. Well, Amazon had some very distinct limitations on how you could do the networking and limitations on how you could build your application. But there was enough motivation for people to do it that a whole new class of application was built. And dynamic cloud applications need to provision very quickly. They need to unprovision as well to prevent idle resources. And they need to be able to interoperate with the internet at large. And so we think that a lot of OpenStack clouds that are being deployed today are going to more resemble that model. Although, depending on your needs, you can build an OpenStack cloud to do either model. There are some advantages as well to these dynamic elastic clouds. The cost can be lower, the performance can be higher. If you're operating your network in such a way that you don't have passive links. Now, if you're using VLANs and spanning tree, typically half of your uplinks are not going to be used at any given time. They're going to be blocking, they're going to be waiting for the other uplink to fail, at which time they'll come in and start operating. If you've got all of your links operating at all the time, and using ECMP and technologies like that to multi-path, you can use a higher percentage of your network at any given time. So before we get into OpenStack networking and the quantum features, I want to talk about the first style of OpenStack networking, which was an open network. It's layer two only. There are four modes. There's flat network, flat DHCP, the only difference there is whether you're statically assigning IP addresses or assigning them through DHCP. There's VLAN Manager, which maps the network resources, tenants, security groups to the VLANs that exist on your network. And then there's also a version of flat DHCP, which is multi-host. An agent runs on every node and provides DHCP services locally, DNS, things like that. So within each compute host, all of the network resources that are required to run those VMs are also local, tightly coupled. So you have a lot of resiliency in that model. What we did when we were first building out layer three networking at Cloud Scaling was to add on to Nova Network, because quantum wasn't an option yet. So we actually built a plug-in that allows us to do layer three networking all the way down to the VM, but uses the same API calls. And through a modified scheduler, the same scheduler that's included in OpenStack. We also decided to use a different messaging, a different message Q carrier. We use 0MQ instead of RabbitMQ. RabbitMQ works very well, but once you scale up to larger sizes, it does bring up issues of scalability and it can be a single point of failure as well. So that's something to consider when you are deploying a large cloud or a cloud that you need to be HA, do you want to be building an HA message Q? Or do you want a message Q where everything is peer to peer and doesn't rely on any single broker? So OpenStack networking, which was formerly known as quantum, is new, shiny, and has a lot of promise. It also is new, shiny, and has a lot of gaps and functionality. We're hoping to close soon, a lot of people are talking about it. But if you're putting production loads in your network, you may want to consider which aspects of it are more mature and which are not. So the core of quantum is that everything is an API. There's abstractions for everything and it allows you to have common models that represent your network, regardless of how the network is actually implemented on the back end. You may have different networking vendors. It may be virtual or it may be physical. You may be using load balancers. You may be rolling your own services. But everything's plugin based and it's an API. So here's a diagram of quantum itself. And this is fairly simplified. I wanted to bring something here which would convey the ideas in a way that everybody can grasp. This is the core of quantum. Quantum APIs, except from Rust calls or from the CLI, the requests to actually initiate network ports, create virtual interfaces, that kind of thing. The heavy lifting is actually done by the plugins. So there would be a provider network plugin that would provide the mapping to your actual physical back end network. Or to the VLANs that you have on your network. A virtual network plugin is going to provide the tunneling over GRE, or your commercial vendors tunneling protocol. Nacira has a virtualization protocol, Medecura. There are big switch, there are vendor specific ways of implementing things that are done as plugins. And so the quantum API itself then utilizes these plugins that actually connect then to your physical network or SDN or both. One limitation is that you can't at this time run multiple plugins. There has been some work on a meta plugin to allow you to run multiple plugins. But it needs some more work, so we'll talk about that in a little bit. So as far as the modes of operation for quantum, VLANs are supported using a provider network plugin. So if you've got VLANs, if you want to continue using VLANs, that's great, it will support that. There's also a layer three plugin. So if you want to get beyond VLANs and actually start routing layer three all the way to the VM, it will support that. Although what it won't do is configure your routes on your routers for you. If you want to have one shared layer two VLAN and then do layer three on top of that, quantum can take care of that. The quantum agent runs on each host and it installs the routes that get all the way back to the VM. So that compute hosts know how to find VMs on other compute hosts and network controllers and your API servers also know where to go to get to a VM. There is a GRE plugin as well if you want to do layer two over layer three tunneling and that is a little bit more complex. But it provides some nice advantages. But if you don't need virtual networking and you're not going to use that, if you do, then you're going to want to pay very close attention to how that works. It is possible to get yourself into a situation where you have another single point of failure but now it's your GRE tunnel endpoint instead of spanning tree that can take your network down. And then of course there's lots of commercial vendors supplying plugins for quantum and I'm not going to recommend any particular one. This is only a short list. There are actually quite a few more. But there is some information about these which are perhaps the most well known. So I want to talk about where there's some room for improvement in quantum. So this by the way, some of this is changing, is in flux. Really day by day, even as this conference is going on, some of these things are being addressed. So first of all, here is a diagram that I took right out of the OpenStack network documentation that shows quantum. And one thing that you'll see here is that although all the bits and pieces are noted and where they are, the colored lines here, and it may not be a parent at first, those are VLANs. Unless you're doing your own hierarchical networking and setting up your own routing, then it's going to expect, this is the reference default architecture, that everything's got common VLANs. So at a certain scale, you then got all the problems of VLANs, even if you're using the layer three plugin in quantum. So you're going to have to mind your spanning tree and have some links which are in passive state. You're going to have to keep in mind that all of your traffic from one point to another point of the network is going to have to span things in perhaps a less than optimal way. It is possible to set up a fully hierarchical network design, like I said, but you're on your own for doing it at this time, quantum won't do it for you. So, like I said, quantum doesn't fix the inherent limitations of VLANs. There is a layer three network plugin, but the default reference is that it gets installed over layer two and all of your VLAN limitations still apply. If you want to build it bigger, you're on your own for configuring your routers. I'd like to see dynamic routing be added to quantum in a way that could support standards like OSPF and BGP to do some of that. Routing configuration for you within the network, rather than on the host, rather than using an agent. That would also allow devices that don't have quantum agents. For instance, if you had a load balancer appliance, it would allow that through the network to know how to get to the VMs rather than requiring an agent. Overlay networks, they're great, they're awesome. In particular, if you have multiple physical locales and you want to bring everything together in a virtual way, so that server A and server B are on the same virtual network and can share messages at layer two, maybe even if you're doing clustering. That's a great approach to that, and that uses layer three GRE tunnels. But it can be a little complex to set up and it's definitely, it's under heavy development. I don't know that I'd be ready to put production loads on it today. So another thing that's coming in quantum, and I'm sure some of you have heard about this. You've got load balancer as a service, VPN as a service, firewall as a service, and other things as a service that are coming. And there's some great design work going on right now. And some commercial vendors are working together with the community. Nasera, F5, Cisco, and others have really been putting their weight behind this. And more as well, I don't wanna leave anyone out, but it's a long list. So these are great, but we were talking about this in San Diego. And how to set up, how to model the load balancers and firewalls and things for potential grizzly release here at this summit. We're still talking about how to design some of these things for Havana release. I think depending on what it is and depending on what your particular needs are, this may be ready for you in Havana. It may be ready for you after that for production loads. It might be I or J before some of these things are ready to go. And there are commercial alternatives today that I think are, some of them are very well-baked, very resilient, but they cost money. So how can we as a community make things better? Where should we focus? Again, these are my particular opinions about where we can spend our effort well. But I really think that we need to work on the metal plugin that will allow multiple plugins to be used. Once you get beyond a certain size, it's actually fairly common to have part of the network use one design and another part of the network to use another. Not everyone has the luxury of using the same network architecture for their storage that they do for their databases, that they do for their front end web servers. So having multiple plugins and devising the API abstractions that allow those to be used at the same time in the same region in the same OpenStack cloud, I think it's fairly important. I'd like to see 0MQ make it into everything. Today, we've provided code back and other people have as well to allow you to use 0MQ to replace whatever other Q you might be using for most APIs. But there's still some work to be done with quantum. There's some known issues. I'd like to see better high availability. There was a little bit of a regression actually between Folsom and Grizzly where the high availability options in OpenStack networking were actually became more limited. So today you are limited to one DHCP server on your network. There are some ways to get it working with DHCP servers on every compute host. But you just can't have a resilient set, a pool of DHCP servers that are all active at the same time. And that would be how most production networks would actually do their DHCP services. So right now there is a limitation there in a gap in functionality. I think that's a really important thing for us as a community to work toward. And finally, I think we need better tenant support, better multi-tenant support. There's a limitation where depending on your network mode, you may not be able to reuse IP addresses between multiple tenants and multiple tenant networks. And that's a real challenge once you get up to scale. Public Cloud especially is going to have problems at that scale. So I'd like to open up the floor to questions. I'm gonna repeat any questions for everyone to hear. So fire away. Come on, nobody. I see one in the back. I don't know. Who is doing work with BGP and OSPF? So apart from commercial vendors, I don't know. We at Cloud Scaling use BGP and OSPF, but we haven't integrated that work with Quantum yet. So I know that there's been talk about it, but I don't know that anyone is spearheading that today. Do you know? Yeah, unknown. Any other questions, question? Sure, sure. I'll elaborate on what I mean by more support for overlapping IPs. So in the OpenStack networking model, there is, and I'm gonna go back actually to the slide that I had here. There's the concept of your external network, which in a public cloud is going to use public IPs. But in a private cloud might just use a different set of IPs. Your data network, which is typically private IPs and typically used for intra-cloud communication, may also be used for communication between say the load balancer and a node. But it's not typically used for external access. Usually there is NAT running, floating IPs to allow external access. But within those internal networks, you can even have networks of your own. Depending on your mode, you may even be allowing tenants to create their own networks. So if they wanna create a back end network that allows their application servers to talk to their database servers, then they can do that. The problem is, with that mode, there is a mode in quantum that allows overlapping IPs, but in that mode, every tenant has to have their own router. So that becomes then a scalability problem. And so what would be nice is if every tenant could use their own assigned IP address space that might conflict with somebody else's. But then if there was some method in there to provide them access to a router that then gets them external or to a NAT gateway that does source NAT on their behalf that doesn't require an explosion in the number of routers required. And some things that are happening in that space that may help this, OpenVswitch can help with this. OpenVswitch has a concept of virtual networks. So you can actually carve out isolated networks from within one subnet, so that may help. But really, it's something that has not been addressed today in what I consider a way that's ready for public clouds. Well, the reason to do it, yeah, let me repeat the question. So the question is, what's my take on running Compute Agent on every Compute Host and why would you do that? The reason you would do it is if you don't want to have to configure everything in the network, pre-configure all your routes, and you want Compute Hosts to be able to find which other Compute Hosts are hosting particular IP addresses, then each node has to run the quantum agent which then installs the routes locally that tell the local host where to go to get to a particular endpoint network. And each VM in that model gets its own small network. So it's got an IP address and it's got a gateway. And in order to know where those are on the network and which Compute Hosts are hosting which of those, you can either run the quantum agent or you can use IP address space and then forward everything to the router where that's responsible for the IP address space behind that. But it's an either or. You either have to do all the configuration on the network and set up all the routes internally on your routers or you have to run the agent everywhere. What would be nice is if you could maybe perhaps run the agent in a few places to provide some resiliency and then have those communicate with your routers via OSPF or BGP or even just have those be the network distribution locations for particular subnets. That's right. You know, I don't know, so let me repeat the question. The question is, as far as single plug-ins go, if you have one plug-in for load balancer as a service, does that mean you can't have another plug-in for load balancer as a service? Now, let me rephrase that a little because the load balancer as a service, the firewall as a service, I don't have a lot of personal experience with them and they're really bleeding edge. But the provider network plug-ins and the virtual network plug-ins that actually map your cloud network logically to either the physical network or to the virtual network underneath can only be run one at a time. Well, it depends on what kind. So is it practical to deploy a network at scale with only one plug-in? And I think the answer to that is that it depends on what kind of cloud you want to deploy. So if your goal is to deploy a cloud where you have standardized networking that applies to all of the compute hosts and storage nodes and you've got the same vendor requirements everywhere, then yes. So that may be practical at scale for certain applications. But like I said, when you get to a certain scale, you tend to want to run your storage or your database networks differently, whatever way works best, than your front end and other networks. So I think the answer is for a lot of people, it is not practical today to deploy a quantum, at least not without a commercial plug-in. You do lose a lot of flexibility, that's right. That's right. Okay, the clarification here is that the provider network is not a plug-in, it's an extension. So I apologize for misspeaking about the terminology. Okay, so just to repeat this for everyone, the additional clarification is that it is possible today to run an extension and also run a services plug-in, it will be possible further down the line to run more plug-ins at the same time, if I understand what you said correctly. More services plug-ins at the same time. Question here. So you're only allowed one plug-in, if you want to migrate from one vendor plug-in to another plug-in, it's a forklift upgrade as far as I understand. You can migrate, you can change over, but you can't run both at the same time. There's no database migration between plug-ins. And question, I think this will be the last question. Well, I think it depends on what kind of, the question is, some commercial vendors have, they have network controller architectures. And do I have any thoughts about the scalability of network controller as well as the cloud controller in OpenStack? So I think the way to answer that is that it depends on what kind of cloud you're building. I never like to have a single point of failure. So certainly, I never like to have a single controller that manages everything. And in some cases, the controller is going to be a bottleneck. It depends on your architecture and your design. In other cases, the controller is actually going to help free things up. For instance, by enforcing multipathing and load balancing. So it's not a simple binary question. So I think the answer is try to avoid any architecture in which you have a single controller or even a limitation on, say, an active passive controller. And demand from your network that you have more than one active controller at any given time. All right, time's up. So I'm sorry if anybody has any questions and wants to come up to me directly. I'll be hanging out for a few minutes. Thank you all for coming.