 Well, good afternoon. My name is Alan Swanson and I'm the lead engineer for Cloud Networking within Wells Fargo's private cloud engineering team. And my name is Marcos Hernandez. I'm a staff engineer with the Networking and Security Business Unit at VMware, business unit responsible for the NSX platform. So over the course of the last year, Marcos and I have had quite a few conversations about neutron network deployment strategies in the enterprise private cloud context. And so this this session is kind of the culmination of a lot of those conversations that we've had. But before we dive into the presentation, I think we'd like to get a kind of sense of the audience that we have today. So I guess the first question we'd like to ask is how many of you are running your open stack clouds with with the neutron networking model? Okay, so a fairly sizable number. And then another question we'd be interested in getting your feedback about is how many of you are with your private clouds are only allowing cloud native workloads on those clouds? Nobody. Looks like nobody's. Okay, so let me ask the kind of the corresponding question, which is how many of you are running cloud native workloads and legacy workloads on your cloud environments? Okay, so a fairly large number. And that's good to hear based on, you know, some of the things that we're going to say. So let's just dive into the first slide. So as I said, you know, we're specifically going to focus on enterprise private clouds in this presentation. And we're going to assume a basic understanding of neutron network concepts like ports, subnets, networks, and how all those things kind of interrelate with each other. And, you know, as I've read and looked at a lot of the open stack networking related information out there, it's really hard not to come to the conclusion that tenant networks are a foregone conclusion, that you should be using tenant networks and, you know, really that's really the way to go. And so what we're going to try to present here is maybe a little bit of a contrarian view on that. But, you know, as I've dug into tenant networks and what that kind of means in an enterprise context, it's evident that there are some challenges associated with deploying tenant networks. And so we're going to touch on a few of those. And lastly, you know, I've got this slide up here of the, you're probably familiar with this, the Batman and Superman movie that recently came out. And I actually haven't seen the movie, Marcos has, and he tells me that one of these guys dies. I don't know who, but... It's fully for everybody. Right. But that's not the analogy that we're trying to draw here. I just want to be clear about that. Tenant networks and provider networks are both fundamentally good, but they're different. Right, right. And they each have strengths and weaknesses and optimal use cases, right? So one doesn't have to lose for the other to win. It's not a win-lose proposition. That's basically what we're trying to say. So I think it's helpful to first talk about, okay, what's the difference between provider networks and tenant networks? And I think the difference that is most readily evident is kind of, you know, it's kind of built into the names, right? So it revolves around who provisions the networks. And obviously with provider networks, the OpenStack administrator provisions those on behalf of the tenants for use by the tenants. And the OpenStack administrator can decide whether to dedicate the provider network to a single tenant, to all tenants, or to a subset of tenants. And I want to pay or call out a feature that was delivered in Liberty that I'm actually very excited about. And that's role-based access control for networks. And that's very useful in the enterprise context if you're leveraging provider networks. And what that allows you to do is to basically, before RBAC for networks, it was an all or nothing approach, right? You either share with all your tenants or a single tenant. But now we can actually share a network with a subset of tenants, right? And that's highly useful. So on the other hand, tenant networks are provisioned by tenants for their own purposes. And based on default policy settings, those tenant networks cannot be shared with other tenants. Another difference between provider networks and tenant networks is that generally speaking, provider networks rely on the physical network infrastructure for default gateway or first hop routing services. Whereas with tenant networks, you're going to rely on a neutron router to provide those capabilities. And the neutron router will have to be attached to a provider network that's marked as external to talk to the external physical network. So what I'm depicting here on the left is obviously an example of a tenant network topology with two different tenants. And then on the right hand side, I'm depicting a shared provider network. So you have three tenants that are sharing a common layer two network with multiple instances deployed. And so we're going to specifically focus on the value proposition of shared provider networks as a potential viable replacement for tenant networks. Lastly, I think it's important to point out what's not different between provider networks and tenant networks. And I think this could be a common misunderstanding for people. And that relates to how do you instantiate the network, whether it's a provider network or a tenant network. And the point I'm trying to make here is that there really is no bearing or you're not forced to use an underlay technology or an overlay technology based on whether you want to use a provider network or a tenant network. Really, those things are decoupled from each other. Traditionally, provider networks have been VLAN backed, right? And then your tenant networks could be VLAN backed or overlay backed, right? But for example, you could certainly provision an overlay network for a provider network. And then you could attach it to the physical network through a software or hardware layer two gateway. So just trying to draw out the point that choice of technology to instantiate the network, that in general has no bearing on whether you can use a provider network or a tenant network. But certainly specific implementations might constrain your choices. For example, with NSX, tenant networks are instantiated with VXLAN. And you don't really have the option of using VLANs for that. Okay, so the title of this slide could be controversial or it should be controversial. But what we're trying to communicate here is that network security is decoupled from network topology with some of the constructs that OpenStack provides to us. And this statement needs to be true for shared provider networks to actually be a viable alternative to tenant networks. And so let me kind of explain that. So for decades leading to present, the way you would implement network security in your network is you would force traffic through layer three boundaries, right? Because the only place you could implement filtering was at a layer three boundary, right? Using a router or a firewall. And more recently, some advanced techniques have been developed such as VLAN stitching, which is basically like a bump in the wire that allows you to do filtering without forcing traffic through a layer three boundary. But there are some flexibility challenges associated with that approach. But now with Neutron, we have the constructs of security groups, logical ports, and port security. And all of these things together enable us to basically implement network security without being forced into a particular network topology. So as you're probably aware, when instances attach to the network, they attach to a logical port. And there's some properties on the logical port that allow us to basically enable filtering on a per-logical port basis. And port security basically allows us to prevent IP and MAC spoofing, right, which are common like layer two attack techniques. So these two capabilities together allow us to move from a model on the left where your application tiers are using dedicated layer two networks and filtering is implemented at a layer three boundary, whether it's a Neutron router or a traditional physical router or firewall, to a paradigm where we have a common layer two network, where, for example, in this case, we have two different tenants and each of those tenants has multiple application tiers and they are protected from each other by these security groups, but they're sharing a common layer two network. So this is, in my view, this is a paradigm shifting kind of capability, right? And I wonder if many of us haven't really come to grips with this and in terms of what that might, the implications of what that might be in terms of our network design. And if I may add, another term that this security brought into and closer to the workload is using the industry is microsegmentation. So Neutron security groups enable microsegmentation by bringing security closer to the application. It doesn't necessarily mean that you're not going to need a perimeter router, a perimeter firewall anymore. There are always going to be flows and threads that are better stopped closer to the perimeter of your data center, but Neutron security groups, the idea here is that enable all the microsegmentation use cases by attaching security closer to the workload. Okay, so let's dive into some of the challenges that we see with tenant networks. So the one that immediately drops kind of rears its head has to do with address space selection. So I'm sure many of you are familiar with Horizon and the dialogue box where you have to specify the cider that you want to associate with the tenant network that you're creating. And I would argue that many end users are ill-prepared or are incompetent to fill out this field, frankly. And I think that selection of an appropriate address space is actually a much more complicated process than you might first believe to be the case. And let me elaborate on that a little bit. So even if you're using OpenStack source NAT or destination NAT, the reality is that that does not mean that you can pick any address space that you want and you'll be able to go on your merry way. In the public cloud, which is a little bit different scenario, obviously as you know RFC 1918 space is not routable in the internet. So if tenants pick RFC 1918 space, there's no chance of a collision with consumers of the cloud that are coming from the internet, right? But in the enterprise context, RFC 1918 space has a very high likelihood of being used. So it's not possible to just pick some prefects out of RFC 1918 space, throw it in there and be on your merry way. So in this diagram here, I'm basically depicting this tenant created this tenant network called WebNet, 10.0 slash 24, but unbeknownst to the tenant, that network is actually being used in the enterprise somewhere. And effectively, what's happened is there's no way for that enterprise network to talk to that tenant network or vice versa because it's basically an address space collision, it's just not going to work. So the fundamental point I'm trying to make here is even if you're using NAT, you cannot just pick any address space you want in the enterprise context. So all is not lost, however. I mean, subnet pools were introduced in Kilo to actually address this exact problem. And basically what a subnet pool is, is basically a block of address space, a large block of address space that you allocate to the cloud, and then tenants, when they want to create a network, they can just request a prefects from the subnet pool within parameters, right, in terms of max prefects length or minimum prefects length. And so that's a mechanism to address this problem so that you can be guaranteed address space that won't collide with pre-existing address space. But obviously one of the requirements there is that you be able to assign a large block of address space to your cloud, and that may not be... that may not be an easy thing to accomplish. As many of you know, IPv4 space is very scarce, registered address space, and it's practically been completely consumed. So the reality is that tenant networks actually exacerbate this problem if you're going to choose to use registered address space, right? Because you're basically siloing away address space that has a much less likely to be used completely versus, say, shared provider network address space. And in large organizations like Wells Fargo, even RFC 1918 space is in very short supply and it's not easy to allocate a large block of address space even out of RFC 1918 space. So the point that I'm trying to make here is that tenant networks actually put greater pressure on your address space than a shared provider network. Now obviously if you're using overlapping IPs, that's one way to kind of combat this problem, but there are other concerns that overlapping IPs present. So NAT and the Enterprise, so obviously OpenStack was conceived primarily as a platform to deliver cloud-native applications, but I asked the question at the beginning of the session, how many of you are building clouds solely for consumption by cloud-native applications and nobody raised their hand? So it's obvious that there are going to be mixed workloads on clouds, right? And NAT definitely has the probability of creating issues if you're using it to provide general compute services to your organization. I find it interesting that there's actually an RFC, I mean it's a bit dated RFC, but there's actually an RFC that was written to actually list some of the protocols that break with NAT. And the other point related to NAT that I'd like to make is that one of the classic design principles in computer networking is the idea of the end-to-end principle, which basically means that state should, as much as possible, be in the end systems and not in the network. And NAT basically violates that principle. And you can make your own, you probably all have your own opinions on how you feel about NAT, but in my view, if you can avoid NAT, it's a good thing. And lastly, Wells Fargo falls into this category, but certain organizations require that if you have a system on the network, that that system needs to be reachable for audit and compliance reasons. So if you're using NAT, you're going to be forced into a situation where you have to do a one-to-one NAT, and if you're going to do that, what's the point of NAT? Anyway, just get rid of NAT and be done with it and have some address spaces that are actually reachable from the rest of the enterprise. Not to mention the fact that it's yet another IPAM system that you have to manage if you have one-to-one NAT with a neutron allocation. So it's just add to a complication of management. Okay, so if we take a step back and we talk about the way in which tenant networks have been traditionally connected to the enterprise, the initial use case that Alan mentioned for OpenStack Neutron that call for overlapping IP address space require that source NAT be implemented in the neutron router. And that is default workflow in neutron when you create a router from horizon, for example, and you attach then a network to that router. There will be a global NAT policy that will be instantiated on that router that will translate all the IP subnets for the tenant side into a global address space assigned to the uplink of that router. More recently, no NAT topologies have also been introduced in neutron. So you can create a neutron router attached to a network but disable NAT on that neutron router, meaning that your tenant network must be reachable by the rest of the enterprise. The issue with that is that as of today, and there have been talks and we'll talk a little bit more about that, static routing is the only option to create adjacencies and reachability between the neutron router and your physical router that connects to the rest of the enterprise. And if we're talking a very large tenant environment, manipulating those static routes may be very, very cumbersome. There are ways to create summary routes and specify the default gateway for that large address space but as Alan mentioned, in large organizations where you don't have the ability to aggregate all the citers, this may not even be possible. But we just wanted to mention that these are the two traditional ways of enabling tenant reachability by use of neutron routers. Yeah, and one point I wanted to make is that even if you're using a tenant networking with source NAT or destination NAT, notice that even in that case you're dependent on a provider network that's been marked external, that was previously made routable in order to have connectivity to the physical network. So there's, even in tenant networking, there's a dependency on provider networks to actually reach the rest of the environment. Provider networks to the rescue. So, static routing is by far the way in which, like I said, in no NAT topologies, neutron routers are being connected to the rest of the enterprise network. But there are conversations in OpenStack of enabling dynamic routing support. And when we do that, and with the expectation was that Liberty was going to include an initial implementation of a BGP speaker, BGP or their gateway protocol being the dynamic routing choice, that is going to mitigate the management of all these static routes. However, we're not quite there yet. And at VM where we're experimenting with ways in which you, without necessarily changing the way that neutron works, or if you're not necessarily at the OpenStack version that enables dynamic routing support, you can still get some support for dynamic routing that is synchronized with the IP address base of your tenant networks. And I'm showing here an example that leverages heat orchestration to create a VM and give it personality using the metadata service of Nova and the user data that can be configured within heat and give that VM a personality, which in this case is using an open source implementation of dynamic routing with BGP Quagga. You could pick any other one. This is just an example. Let me just emphasize that this is not the way in which you do dynamic routing today. This is just a way just to create some initial and hopefully productive conversations with your network team. But the idea here is that the OpenStack network implementation doesn't really know that there is this dynamic routing relationship happening between the Quagga VM and the rest of the enterprise infrastructure, but heat creates a level of abstraction and synchronization that can help create that illusion of synchronization. So again, it's just one example. We added a link to the VMware GitHub where you can download and explore and experiment with this heat template. So it's an interesting implementation. Again, it's a stopgap approach until we get full and formal dynamic routing support in Neutron. And if you use Neutron routers, and I like this quote, we put it here in the first bullet. This is Alan's quote. Well, if you're depending on the Neutron router, and what that means is that frankly speaking, a Neutron router does not have, and I don't think, we don't think it will ever have the capabilities and features that an enterprise routing platform has incorporated and supported for many, many years. And I don't think that's the goal of Neutron routers in any case. The idea of the Neutron routers is to connect tenant environments to the rest of the enterprise network, not necessarily to getting to a feature of race with established routing platforms, a feature of race that Neutron is never going to win. We're not going to be able to catch up to 30 years of networking, nor do we have to. Having said that, there will be features that you will need as you use and incorporate and develop OpenStack in your private cloud environment. And these features could be table stakes, by the application profile. So if we're talking, for example, about an application that requires multicast support that has to cross routing boundaries, that means that your routers need to support multicast routing, protocol-independent multicast, for example. And that is not a Neutron. So you will find the situation where Neutron doesn't really give you the ability or the capability as a routing solution to enable the right connectivity and example that there could be others. And there could also be limitations that are specific to anyone's vendor implementation. So just a reminder that there are dependencies and you need to go into this selection of your Neutron topologies with your eyes wide open. It's a problem that you don't have if you leverage provider networks and rely on a physical router sitting in physical address space to provide all these routing services. Yeah, and what Marcos just mentioned, if you're only deploying cloud native workloads, this might be a non-issue for you, but again, back to if you're deploying legacy applications that you don't necessarily know what those tenants are requiring from a connectivity standpoint, communication standpoint, this could be a very real issue for you. So what's the impact of IPv6 on everything that we've just said? Obviously address space scarcity is, and I worded this way on purpose, is not a current concern with IPv6. Now I know it's a massively huge address space, but it's not a current concern, so there's no reason why we can't allocate large blocks of IPv6 address space to subnet pools, which effectively eliminates the address space selection problem for tenants. Alan, if I may real quick, can you raise your hand if there's a mandate within your organization to explore or migrate or use IPv6? Okay, good number of people. Thank you. So with IPv6, there's no reason why we can't allocate unique routable address spaces to everyone, and there really shouldn't be a reason to be forced to use NAT. And just as a side note, the most recent information that I've read indicates that there's no intention to use IPv6. I did run across a little bit of snippet indicating that there is work being done on what's being called floating IP-like support, but the idea there is to avoid NAT to implement that, but actually just use, because IPv6 address space is so plentiful, but actually use IPv6, multiple IPv6 addresses to actually provide that support. So to the best of my knowledge, there's no plans to support NAT with IPv6 with OpenStack. But even in this case with IPv6, IPv6 doesn't solve all of our problems, because tenant network reachability is still an ongoing challenge until we get formal dynamic routing protocol support in Neutron. So that's still an ongoing issue. Hopefully that'll be addressed relatively shortly. I think the fundamental question still remains, now that we have security groups and port security, do we still need to provision dedicated Layer 2 broadcast domains for every tenant, or can we effectively replace that with a shared Layer 2 environment and layer on the security services on top of that? I think it's an important question for us to ask ourselves. So what are the benefits of provider networks now that we've kind of listed some of the kind of the pain points with tenant networks? Well, the benefits line up quite well against the kind of the drawbacks of tenant networks, right? So provider networks are pre-created, they're already there. What's quicker than a network that's already been pre-created for your use, right? I mean, there's nothing faster than that, right? And frankly, I would argue that for most tenants, and definitely in Wells Fargo's case, being the process of having to provision a network, select an address space, make sure that it's routable, that's basically a burden for end users that most end users would really prefer not to have to deal with. And most of them would probably don't feel competent, actually in doing that. And as I stated previously, shared provider networks have a high likelihood of making very efficient use of your address space because you can consume the entire address space and then just allocate another provider network when you've run out of the address space in the existing provider network. And I was just kind of beating a dead horse here, but if we can actually effectively implement application tiering and security zones, using security groups and port security, why don't we seriously consider that as a viable option to the proliferation of tenant networks and these dedicated layer to environments? And certainly it's possible if there's a mandate to provision a dedicated layer to environment for a specific use case, you can certainly provision a provider network that's been dedicated to a particular tenant. So that's that capability is not off the table, it's still there available for your use. And one additional comment, hopefully it is clear that these options don't really change the primary tenant stack in keeping the APIs open and exposing that to a tenant. What we're really proposing here is a hybrid consumption model in which the network admin that is responsible for that open stack network services does work on behalf of the tenant and the tenant uses some of the other open stack services or API services to consume capacity in the cloud. But hopefully it is clear that everything happens that an API layer in open stack is just that we have two different personas doing different things when it comes to networking, which we think we strongly believe is the right model for platform two applications and the enterprise use case for private cloud. So in conclusion I think what we want to leave you with is kind of a more balanced view of provider networks versus tenant networks. I think that there's been a lot of focus and emphasis on tenant networks as kind of like the go to model for providing connectivity for your customers. But I think provider networks have a lot to offer in terms of reduced complexity and simplification. I use this kind of statement provider networks are not the poor cousin of tenant networks. There's a lot going for provider networks. You shouldn't discount them and as you look at your own network deployment strategies you should definitely give provider networks do consideration as you make your deployment decisions. Do you have anything you want to add in terms of what you've seen personally? Yes. In my capacity as a pre-sales engineer at a VM where we talk to a lot of customers and one thing that is a common theme and Island actually told me the other day it's incredible that this is not more broadly discussed in online forums or with vendors but it's definitely true that we're seeing enterprise networking as an impediment to more open stack adoption. So let me explain what that means. Enterprise networks that are using applications that are long live, that require routable IP address space and that are faced with this model where the tenant is in charge of the network allocation and network creation see a problem there. So what we want to do is really educated. I see this with a lot of customers. I can't really give the tenant the option to create all this routers and create all this load balancers and create all these IPs because this is infrastructure that requires specific connectivity that has a specific connectivity need. So we see this across the enterprise that earlier is typically the application profile driving that need. So what we want with this session and also a super user article that Alan authored is educate the community and offer and provide the information so you can see the choices that you have and see that open stack networking is not incompatible with legacy apps that you may want to run in that private cloud implementation that you will likely run based on the survey that we did here. So hopefully that's clear. And like I said it is confirmed by multiple discussions that we have with other enterprise customers. So we have a few minutes and I'm glad we have a few minutes at the end because we definitely would like to entertain any questions or comments that you would have about what we've had to say today. Thank you for the presentation. Just two cents. I think we can talk a little bit about the model of the virtual router. We have been using both the provider and the tenant networks hosting the legacy applications almost two years now. Two tier ones in the U.S. tenant networks mainly for VNF back end communication provider for the front end communication. And hearing a person from the U.S. tenant network. Similar statement for the hypervisor wise. At least for the type one. Now we have the ironic. Thank you. Great. Thank you. That's running network services directly on bare metal. We see that use case. In fact we had to delete a slide in the interest of time that was showing that. For example it's a perfect example. You have a front end that connects to a tenant network for example and that does all the application data and a back end that connects for example to a backup network for application monitoring management or patching. That is absolutely true. I agree with your statement. Anybody else? We're still using Nova Network. I'm not really familiar with Nova Network. If you do have a shared provider network and you're relying on neutron security groups for tenant isolation. Would that mean that if I'm a tenant and I decide to have everything open on my security groups that any other tenant could is in my broadcast domain and could actually reach me on a layer 2 level? That's true. We figured we didn't have time to take security groups potentially out of the tenant's hands and put it in a security administrator's hand. The tenant can't shoot themselves in the foot. There's some tension between self-service provisioning and separation of duty which is a basic tenant of network security especially in a larger organization. There's definitely some tensions and things to be worked out there. Even with a tenant networking model you still have to go filtering that naturally takes place between those different broadcast domains. There's no security inherent in that. You don't have layer 2 adjacency but you have to put firewalls of service on there to actually separate. I would argue this problem is no different. We were very careful not to get very implementation specific but given the question I'm going to volunteer some information that is as specific to the way we operate. We're working right now on this notion of provider rules for security groups where the admin of the neutron piece of open stack instigates configuration that gets configured at the top of the firewall table of our distributed firewall and then anything below that is in the hands of the tenant. You can create rules that create isolation and a share layer too and then have the tenant secure their applications like that. I'm going to go ahead and talk about this. Just to be clear that is specific to the neutron and NSX implementation. Any other questions or comments? Look at that. Right on time. Thank you for your time. Thank you very much.