 Hey guys, all right, so I'm Martin. I was one of the co-founders of this Sierra I'm now the CTO of networking for VMware and the focus of this talk is You know, we've been doing network virtualization as a community now for five six seven years And so I want to talk about not just what network virtualization is but kind of like how now that people have adopted it how they're pushing it into do new things and and And I Would like to step back and actually just talk a little bit about disruptive technologies in general So like as technologists We like to look at like the full extent of new technologies right something like virtualization I think we immediately go to things like cloud But like you know our ability as technologists to see how powerful Disruptive technologies are and the market's ability to consume it are entirely different right so even in the case of compute virtualization When like when like VMware for example first came out They would take you know compute virtualization and they would sell it on these very very simple principles, right? So like the early sales pitch For compute virtualization was server consolidation. It's about as simple as you can get right So you'd walk into a customer you'd be like listen if you run VMware or if you run virtualization and instead of buying You know two physical servers you buy one physical server and you save money and that was that was it and When you have very simple kind of pitches It's much easier for technologies to get adopted It just is very difficult to go against kind of like the conventional practices of a company And it's all has to do with sales and go to market however Once technologies get adopted Then it kind of captures the imagination and once they're in place you normally especially virtualization You have like this like the proverbial level of indirection and computer science and you can take advantage of that position to do really cool stuff right and so for example in the case of Server virtualization we started out with this really simple value proposition of server consolidation Then you end up with cloud and all of the benefits that we see now And so what I would like to talk about is like I believe network virtualization is following the same type of trend Which is originally adoption was kind of around these very simple use cases But now that it's been you know Adopted put into practice and used at scale We're starting to see use cases that I certainly never imagined early on when we were developing this stuff And so the way that I have split up this talk is to talk in two pieces So the first piece I want to I just want to focus on background If you guys have heard me talk before or if you've heard talks about network virtualization a lot of this will be Known to you, but I think it's good to get on the same page So you just have to indulge me for the first part of the talk. I'm going to do background and Then I want to talk about kind of how this is evolving as technology in a purely technical sense To use cases again that we hadn't thought about on the outset Okay So let me start so what is network virtualization? Many of you know this but just to get everybody on the same page. I want to go through it again So in a data center, I'm going to step through an animation really quickly So in a data center you have physical switching gear and in the case of network virtualization from the edge It doesn't matter what switching gear you have whether you have You know standard traditional vendor gear whether you have a simple L3 ECMP fabric on a white box If you have IP over in finnaband you have whatever physical network that you have in place I'll assume that it has IP connectivity So of course connected to the switches you've got servers and in a virtualized data center on the servers You'll run a hypervisor and in that hypervisor has a v-switch so v-switches in Virtualized data center today already handle every packet Right or at least every packet between the VM and another VM and the VM and the server, right? This is a software layer that will handle every packet and so the idea of Network virtualization I'm going to use an analogy to server virtualization So from the v-switches network virtualization creates what you can think of as a network hypervisor so Let me just draw on the analogy with compute virtualization. So what is compute virtualization? So compute virtualization is a thin software layer that runs on the hardware that exposes a Virtual machine abstraction and that becomes the focal point of operations right now instead of operators dealing with sheet metal and wires It's dealing with you know a software abstraction that has an API. It's full soft state. I mean all the standard stuff that you know So virtual networking does basically the same thing independent of who builds the system is it creates this layer that exposes virtual networks So virtual networks, I mean these are implemented in software at the edge. They look like physical networks Right, so they have all of the standard interfaces of physical networks They'll support L2 and L3 and ingress ACLs and egress ACLs and L4 through 7 services like firewalling or whatever I mean they look like physical networks so they can support existing workloads But they have the operational model of a VM which means I can create them dynamically I can grow them and shrink them and move them and clone them and snapshot them and Of course this works out, you know modifying or touching the hardware. I mean that's the whole point is you know You already have servers. You already have these switches if you can leverage this position Now you can create another operational abstraction so that you can automate the entire data center including the networking portion Generally these are implemented in some sort of an overlay. It's a mechanistic. I mean that that's that's an underlying mechanism generally whether it's DX LAN or NVG re or STT normally the way that you decouple the address space from the virtual view From the physical view is you use some sort of a tunneling But from the perspective of a user or this perspective of a CMS like OpenStack What's exported is a virtual machine or virtual network abstraction That's kind of the net new right that's that's like just like a hypervisor exposes a VM It exposes a virtual network abstraction And like I mentioned before you want this to be complete because the goal is is if you're going to be creating an operator abstraction and software You want to be able to support all existing workloads and Hopefully existing tool sets as well when it comes to things like provisioning and management and so forth So for example in a standard virtual network, you'll have L2 like I said, so this is basic L2 switching You'll support L3 routing often a virtual networking solution will support dynamic route advertisements to the physical network So it just looks like an extension of physical network So for example if things move around in the virtual L3 domain, it'll advertise and update the physical network to say things have moved Sport things like load balancing firewall and gateways and so forth. So that's it. That's the focal point of abstraction So like I mentioned before when it came to compute virtualization We had this idea early on of like, okay, what you know, what's the simple use case? Right and that was server consolidation and for network virtualization We kind of had to come up with the same type of thing which is And I think I think we tried everything like early on like we're like oh op-ex and like that was very difficult to argue about because like You have to change You know, I mean there's an educational hurdle or maybe cap X But that's actually very difficult to argue about and so now five years later having seen this I can say that the primary driver for the adoption of network virtualization in the last five years. Oh actually It is so the primary driver to adoption is agility, which is the ability to To provision things really quickly So let me give you like a real high level view of how to think about network virtualization at kind of the the techno conceptual perspective Like if you zoom back as far away as possible and you take a Abstracted view of a data center. Here's the best way to think about network virtualization So if you look at modern sass data centers think of like the googles or You know any kind of online site. There's this common characteristics in many of the modern ones, which is The physical network has become very very simple Like a very simple L3 ECM P fabric and Then things that have typically been in the network things like ACLs security Fault isolation things that we've actually found within the network have actually migrated into the application And they're being rebuilt as part of the application, right? This is something that's happened over the last 10 years has nothing to do with VMware has nothing to do with OpenFlow or Nasir or SDN. This is just like Darwin speaking on how you build good data centers And what Darwin has said is like if you build very simple physical fabrics, and you put functionality and software You get a lot of benefits, which are really obvious to people that do software, right? So instead of having to manage You know CLIs through scripts I now have objects That I can that I'm programming and controlling directly because I've decoupled features from hardware There's clearly in a capex play here and and more importantly like the closer you are to the application the more semantics you have Right, you're not guessing anymore if if I'm re-implementing a piece of security As part of my application. I know something that's much higher level. I'm doing it at the right layer It's just like a basic Use of the end-to-end principle And so I mean of the you know I've worked with some now You know a few hundred of these kind of new data centers, and I would say some large percent of them say You know 60 70 percent are actually built in this way, which is very different than what we think of for traditional data centers But the problem is is the only way to build this is you have to control the app and sometimes you have to control the platform So if I control my application and I control my platform I can do all of these neat tricks, right? I can implement security I Can implement my own types of discovery? I can let my old types of failover and load balancing like all of this stuff I can implement as part of the application and it's great But if I don't own the app and it hasn't been written to support that this no longer applies And what I like to say is actually having worked with a lot of these kind of large SAS data centers It's actually interesting if you compare like the op models and the cost models between like their web presence and their internal IT So if you look at like what they've built for like the website You know for like the website like the software that is the service the primary function of the data center. It's awesome Right. It's like everything's in software. It's decoupled from the hardware It's got all the benefits I talk about but then if you look at the IT It's often exactly the opposite you still have all of the problems that you have a normal IT It takes a long time to provision things. It takes a long time to configure things, right? So just because you know how to build this for a new type of application doesn't mean that you can take that knowledge And then somehow move it into the problem of enterprise provisioning Okay, so if you compare This kind of modern SAS data center to the problem of traditional enterprise provisioning a traditional data center If you look at a lot, especially in the enterprise So the network is where you stick a lot of stuff for a good reason, which is a central point for having that stuff Right. So things like segmentation things like security things like billing things like basic virtualization Privatives like VLANs and VRFs. This is stuck in the physical network as Part of the physical network and then you can now run applications unmodified So if you do a comparison on like an OPEX CapEx side Yes, it's not as nice of a model, but it's great from the perspective of I've already got trained guys working on this And it works for every workload So the high-level idea if you want a high-level mental model for network virtualization, which is okay Can we do kind of the best of both worlds somehow? Which is you can use any application any OS hypervisor But then we'll have a thin layer that actually reproduces networking and software And then you can have any type of hardware that you want Which is can you provide the ability to build kind of OPEX CapEx type for a modern data center for traditional IT and having that model So that's that's kind of the high-level idea and what's nice is if you ever get to a point where you're kind of having these kind of like You know performance scaling type questions, and you're not sure whether the architecture works You can always go back to kind of the SAS model where it does work Right, so it's very clear that you can do this because this is done all the time by the most successful data centers on the planet So at an architectural standpoint Clearly you can do this at the world, you know in the high-level world of trapezoids and circles You can definitely do this now the question is is can you do this in a way that's consumable by IT? okay, so You know we've had I mean there's been various production deployments of network virtualization whether it's kind of open-source plugins whether it's Products for quite a while, and I've you know been tracking this for a while, and I think that You know even though we've had like a few Strong production deployments in a while like this is the year where like everybody seems to be going into production So in 2010 I think people thought we were totally nutty We were a little in 2011. I think there was general consensus that you could probably do this But between 2011 and 2012 a lot of people didn't want to be the first one to stand in front of the bullet But over the last couple of years we've actually seen a lot of production deployments Which is really cool from a technologist perspective because now we're like okay finally we got like we got something in it's a core platform It's a core technology now. What can we do to like extend the state of the art right like? Virtualization is an indirection point. It's not just about making things fast. It's about like using that as leverage to change the laws of physics That's what virtualization does right? It's like it allows you to like stand outside of something if you have network virtualization in then we can do it We can actually push use cases that you couldn't do otherwise that if you didn't have that level of indirection So that's what I want to talk about for the last half of my talk I already mentioned this which is the primary use case is agility So let's talk about going forward. Okay, so the first one is visibility and debugging. So in my opinion Virtualization kind of broke visibility and debugging which it wasn't really good to begin with so in virtualization You've got two problems, you know the v-switch like sucks in a bunch of the network Which traditionally you didn't have visibility into but more so just because virtualization decouples compute from location if VMA Can't talk to VMB you kind of have to find out where they are Maybe it's not that hard clearly. We've got tools that allow you to do this But in my opinion network virtualization actually provides the right abstracted model For visibility, so let me try and let me try and argue that so network virtualization is Not new as a concept. We've had virtualization primitives and networking forever We've had Vlands. We've had VRFs. We've had MPLS. I mean like you know networking is a virtualization substrate But what we've never had is we've never had a single abstraction, right? What we have is a collection of abstractions So if I give you like a collection of mechanisms I give you a VLAN and a VRF and I give you policy routing and I give you an MPLS LSP I've given you a bunch of MPLS primitives But I haven't given you a virtual network that you can like point an SNMP sniffer at that will show you the entire network Right. This is the difference between like virtualizing memory and CPU and storage independently and having a virtual machine, right? Networking has always been this kind of mismatched collection of virtualization primitives So if you look at a modern Data center and you try and do visibility debugging you've got one very complex network That's all of these virtualization primitives kind of mixed and matched If you use something like network virtualization The way that I like to think of it is instead of having one very complex network With no clear abstractions. You've got n networks You've got the physical network will call it network zero and then you've got networks one through n Which are the virtual networks and all of them will support standard tools and Standard interfaces, so if a can't talk to be you take your SNMP sniffer and you pointed at virtual network that they're connected to and then If it's a physical problem you take the same tools and you apply it to the physical problem to the physical network Okay, so that kind of gets us back to where we are But I actually believe that you can go with network virtualization because you're at the edge I think you can go way beyond what we can do today And so this is one of the pet projects that I'm working on actually all of the ones I'm going to talk about are pet projects. I'm working on so they're kind of deer and deer to my heart But let me start on this one. So this is actually a Dashboard for our internal open-stat cloud that we use for development and for labs and all sorts of stuff This is VC ops, which is a it's a VM or tool I'm just using VC ops to show you like the virtual networking system. I work on what's exposed as an API Nothing. I'm talking about is specific to product. I just want to talk about network virtualization in general These are just slides to kind of provide a Visual demonstration of what I'm talking about so on the left. Those are virtual networks We've got I don't know 10,000 of them or something like that and every one of these has a different topology So it could be L2. It could be L3 you could have low balance or firewalls. It's all sharing the same physical infrastructure If the dot is green Things seem to be pretty good. If the dot is red. There's a connectivity issue and the dot is yellow There's some threshold that was crossed and I'll talk about that in a second On the right you actually have the topology for actually I'm not you probably can't see but one of these was actually showing the virtual topology And then right below that is all of the things we track things like whatever it's latency rx counters tx counter so forth And this is like network-wide And so for a lot of the a lot of the The clouds that I've worked with like the way to determine if there's a problem as you wait for the phone to ring Right. It's actually very difficult to determine if there is an issue But the idea is because you own the edge and you see all traffic going over and you have to monitor this anyways And you have an overlay. There's no reason why you can't proactively check all of the stuff and monitor it in real time There's no reason why I can't give you like a full dashboard and this is independent of a virtual network system You have of the entire system Not only that you should be able to like map from this virtual realm down to the physical realm So let's say you're like, okay I'm having a latency issue here where my latency has gone up and I don't know why There's no reason why you shouldn't be able to then map down to the physical world and ask questions about the physical world Like which paths is it taking so actually for example this picture on the bottom is a heat map of the physical fabric Again, we have to maintain all this state anyways. That's what virtualization is. I mean virtualization is always this like I've got a It's actually always an array right you know It's like I've got like a physical address space and I've got virtual address spaces that mapped onto it in real time And I have to maintain that mapping so I already have it whether this is virtual memory or file system I have all of the translations that I need and I'm monitoring all of those so I can provide you a global view of them Whether it's for the physical address space or the virtual address space And so this picture is to demonstrate that you should be able to drill from virtual view down to physical view and then the opposite from a physical view up to a virtual view and you should have full fidelity like the ability to see all of the counters and all of the bytes at the right level of abstraction and Like the ability to piece together a global view is is very difficult networking because you have difficulties with global consistency Right like networking is built around eventual consistency. It's very difficult to be like here's my network state at time x And this is not something networking has had traditionally in the distributed consistency model It's always been like well if something changes and after a period of time, I'll give you an answer But while it's changing you never have consistent views virtual networking you have to maintain this array You have to maintain virtual the physical mapping it has to be consistent so that can become the basis of any sort of visibility Okay, here's another one of my pet projects performance optimization via elephant detection Okay, very quick data center theory Okay, so in in a data center most of the time when people measurement measure it The vast vast vast majority of flows if you're looking at flows are small and we call them mice The vast majority of packets are Actually in large flows and those are called elephants So the classic example is mice are often these kind of real latency sensitive bursty things and then someone will migrate a VM or do a backup or transfer a file So this is actually one of the biggest performance issues in data centers is this dichotomy and let me explain why there's actually two problems the first one is TCP is really good at filling buffers. That's what it was designed to do And so if you have a very long lived flow it likes to fill buffers and so end-to-end you'll have all of these nice buffers filled Now if you've got very latency sensitive traffic and it's going through the same path It's going to take all of the queuing delay Right, so now if I have something very latency sense of it sharing the same buffer like anything That's kind of interactive or latency sensitive is actually going to become a performance problem very classic problem in data centers It's called elephants trampling on mice The second problem is mice are bursty They're very bursty. They're so bursty. You can't do anything smart with them You can't adaptively write in that route mice if you do if you're like trying to do something smart with them Like adaptive routing like by the time you've made a decision is probably no longer relevant Which is why almost everybody relies on ECMP multipathing If you do hash-based multipathing is stateless it uses randomization and you're always within a factor of two of optimal So that works great for mice for mice hash-based multipathing fantastic for elephants. It's horrible Because since you don't have any knowledge about the elephants and there aren't very many of them The algorithm can map two to the same link and now you've underutilized your fabric So these are the two problems with elephants and mice right you introduce latency and then you can suboptimally use the fabric So there's all sorts of proposed solutions for this right like and they're all really simple If I can identify elephants and mice throw them in separate queues. I've solved the latency issue I'll just use DSCP, but just for code point. That's easy I'm going to continue to use hash-based multipathing on mice because I can guarantee that's near optimal And for the elephants I'll actually do something smart like I will route them per flow and make sure that they don't share the same length Good I can oh Oops, this is wrong. So I should be able to turn elephants into mice not mice and elephants So one thing I could do is I could take the elephants and I actually could split them up by modulating the ephemeral port and turn them Into mice you have reordering issues But like modern-day TCP stacks are good with reordering like modern-day sack is actually pretty good with the reordering so maybe you don't care and Another one that's actually comes up a lot is like you know what I'm gonna have a different I'm gonna use a leaf spine architecture and in my spine I'm gonna have an optical spine that I'm gonna send all my elephants to so the mice will stay on my normal network and my elephants I'm gonna send on an optical spine So these are all good suggestions, but the problem is is traditionally a networking. It's very difficult to detect elephants very difficult to detect elephants fortunately, this is something that's very simple in in You know virtualized network environments, and so open v-switch is a project near and dear to my heart It's actually done per flow tracking forever like since the very beginning So there's no reason that you can't use the v-switch to track all of the flows and then based on either operator input or throughput tracking actually Detect whether something is an elephant and signal that to the fabric so you either mark it say say we decide this DSCP bit means elephant you can mark it on the packet or You can actually signal the fabric and say I've identified an elephant and here's the five tuple It's very very this is very difficult to do in hardware because of SRAM density issues. It's very easy to do in software on the edge And so we're working on on a project right now. It's total kind of skunk works fun Thing that that I'm just interested in doing which is within open v-switch exposing a column OVSDB which is all the heavy hitters that you have network-wide So for every for every server here's all your heavy hitters And we'll just use throughput as the metric to determine it and then you can have something like a centralized SDN controller of your favorite choice We'll go and we'll Network-wide globally will list like say the top 10 or the top five or whatever you want And that can actually signal to to the physical network to say here's an elephant do something smart with it Or just throw it in separate cues or whatever you decide to do make sense The bottom line is this is kind of been like one of these sticky problems and networking for a really long time and either you know What your elephants are or you're kind of screwed and like I think that there's a chance here because of the semantics We have in the edge and because we're in software in the edge We don't have SRAM density issues that we can actually solve it All right, so finally I'm going to talk a little bit about policy This one's gonna be a little bit more philosophical in nature So I talked about I think I think we can actually extend the state of the art and visibility and management I think we can really extend to the state of the art in performance in networking I think we can actually change the way you do networking Policy is a lot broader than networking, but it's a great place to have the discussion. So let me just describe So what is policy? So policy is just business logic applied to systems things like We never put apps from different BU's on the same network, right? These are things that people come up with that are written in documents That come from human beings that are applied to systems. That's what policy is and if you look at most Policy systems, they're generally a subset of data log Meaning you've got condition action just like sequel most policy engines look like this, right? So you're like, you know if if the user is martin and you know It's before 10 a.m. Then go ahead and the log the traffic because he never gets up that early, right? This is basically What policy systems do and they're almost always declared over site specific Namespaces and logic right it depends on the site So I got into this whole area actually from the policy side That's what I used to do is policy and the problem is in networking The way that you write policies you've got some policy That goes with policy compiler policy compilers kick off a connectivity matrix But mapping from that output to a physical network reduces to the network virtualization problem Right So this is totally not obvious to me early on but it actually reduces to the network virtualization problem Like if I have a connectivity matrix it assumes Effectively a flat network and if I'm mapping it down to an apology with enforcement points I have to solve the network virtualization problem. That's why most policy compilers either sock or they will constrain your topology heavily Or they will expose the topology up in the policy space, which is not what you want if I if I'm writing a document with a policy I don't want to declare it over a topology. I want it to be totally independent because it's a business thing Now this has been the state of policy for a very long time if you have something like Network virtualization you expose this idealized view for the policy compiler to write to so I have my high-level policy compiler I can write any sort of policy I want I compile it down to this Kind of platonic view this virtualized view and then the virtualization layer then maps it down to the physical topology And if any change happens in the physical network or whatever that's handled by the virtualization layer So now we can build these kind of real robust policy compilers like high-level integrate with ADL DAP Like like actual data log language compilers that will manage an entire network Like I think I can like manage all connectivity in an entire network from like one language spec But I think that you have to solve the network virtualization problem to do that Otherwise you end up trying to solve it in the compiler, which is too difficult So I actually think this is very important because policy is something you can't really automate away It's something that all users must use and so I think what's really important from a policy standpoint is that you've got broad ecosystem Meaning you know we so we go say we go through all of this efforts to to do open stack We automate away the provisioning interfaces You've disaggregated software from hardware and everybody's happy if you slap a policy Language on top of that that's from like say a hardware vendor or or even a software vendor that likes vertical integration and then It that since that's the top part of the stack You can now vertically integrate the entire stack through it And I actually think that the next big strategic battleground is around policy for exactly this reason So I think it's very important when we talk policy governance should be open totally open I think ecosystem should be totally huge and there's no IP and policy There's like we all know the technology here. There's nothing you are interesting So the most important thing is that we have an open governance model Which I think is why open stack is such a great forum for these policy discussions It's nice because there's a whole bunch of them happening There's one that we're gonna have on Wednesday at 440 Unfortunately, I'm actually in a couple hours going to Tokyo so I can't be there But many people will be there and I really encourage you to go and I'm basically Out of time. I don't think I've been have time for questions. So I think I'll end it there. So thanks so much guys