 Hi. So we're here to talk about OVEN. In the past, we've given some presentations about the architecture and sort of what we're planning. But now we're looking a little bit more real. And so we wanted to talk about how to deploy it and what our scale testing is showing. My name is Justin Pettit. I'm one of the core OpenVSwitch and OVN developers. I work at VMware. And I'm one of the OpenVSwitch developers at VMware also. I've been working on the project since it was started. And I'm Ben Faff. I'm Han from eBay, and I work on SDN Solutions. And I'm not Russell. For those of you who don't know, his wife gave birth to their second child last week. Congratulations to them. They're all doing well. My name is Ryan Mottes. I'm at IBM, and we're really interested in this stuff. So we've been actually putting it in place and doing a lot of testing with it, which we'll talk about later. So my guess is that a lot of you already have an idea of what network virtualization is. But I'm going to give a little bit of a review for anybody who might be new to it. So network virtualization, the way I like to explain it is to go back and look at the structure of a physical network in your data center or your office. So if you look at the left side of my diagram, you can see a bunch of hosts. And they're connected by switches. They're connected by routers. And the physical topology of the system determines what can reach what and what it has to go through. Now, when you take those hosts and you move them onto hypervisors, usually your physical network doesn't let you reproduce that topology. So if you want to have a network that behaves as your physical network did, then you have to introduce new concepts. And ideally, you want to implement those switches and routers in a distributed way in software. And so network virtualization is how you build things to let you do that. And then beyond that, when we were at Nasirah before it was acquired by VMware, when we were talking to a lot of our customers, the big thing for them was not only that they needed those features, but they needed them to be self-service. At the time, when you looked at clouds, what you saw was that you were able to deploy your own virtual machines in terms of, say, virtual CPUs. You had self-service access to storage. But if you needed anything more than the most trivial virtual networking, then you had to go to your networking department. You had to get them to set up VLANs and so on. And network virtualization aims to avoid that problem by making it possible to do everything yourself without having to involve networking. So I wanted to just quickly go over what OVN is, what it's providing. Most of you are probably familiar with network virtualization, so it does the things you'd expect. You can create logical switches, logical routers, create security groups, and ACLs, L2 through L4. We support multiple different tunnel overlays, because they all have various trade-offs. So we like to give people options there. We've also provided the ability to have OVN control top-of-rack switches so that you can more easily integrate your physical workloads into your logical workloads. And so OVN works with a number of different hardware providers to do that, but that won't be the focus of this talk. But if you have any questions, you can feel free and ask me afterwards. OVN is really aimed to work on OVS, so it works on the same platforms as OVS. So we do most of our development on Linux, on KVM and Zen. We've also been looking quite a bit at how to make it work with containers. So there's some work there that's continuing to evolve as the container networking evolves as well, but we're very involved in those discussions. We also have work with DPDK and Hyper-V that's underway. So the best integration that we have right now is actually with OpenStack Neutron. But we're being very careful that things that we put into OVN are not just OpenStack specific. And so we plan to have it work with other CMSs. And we have a number of command line utilities that's actually how we use, like Ben and I, usually test it. So you can do everything that you want through the command line or through database calls that you could do through Neutron. So OVN is being developed by the same community as OpenVswitch, and it's being developed in the same way, too. So all of the development is happening out in the open. It's on the OVS mailing lists. Anyone can jump in. They can see as it evolves. All the design decisions are being discussed there as well. It's vendor-neutral, so it started with a small number of vendors who got behind and wanted to start working it. But it's grown as eBay and IBM and Red Hat have joined in. It's under a Apache license. And as I mentioned, there are a lot of vendors involved. And if you look for the Mitaka release, the Neutron plug-in, the top five reviewers are from five companies. And the four committers are from four different companies. So it's a pretty diverse affiliation associated with OVN. So our goals are to make it production quality. The OVS, I think, has a pretty good reputation in general. Not necessarily the plug-in so much, but the OVS core itself. And so we want to continue that. That's really important to us. The design is pretty straightforward, as you'll see in some of the decisions that we've made in the architecture. It scales to thousands of hypervisors. We've already tested it in the low thousands. But we expect to see that grow in the next couple of months. Our goal is to have 10,000 hypervisors supported before long. And we hope that it'll have improved performance and stability over the existing OVS plug-in that's available. And we hope that it's embraced by the Neutron community and it becomes the preferred method for most people who want to use OVS or networking in general. So why should OpenStack care? The primary job of Neutron is to provide a cloud networking API. And so we want to clean that up, make that work a little better than some of the existing solutions. We're adding this into OVS. It's actually a little bit of a separate project, but it's co-located. So we're not making anything into OVS that's oven specific, but we are expanding OVS in ways that makes this easier. And so there are other projects like Dragonflow that are also making use of some of these features. And the existing OVS plug-in is also been making some of the features that we've used to improve oven. All of these things are available and they're not oven specific. And the design that we do seems to have better performance and scale over some of the other solutions. So I'll spend a few minutes expanding on what's different about OVN compared to the existing OpenV switch plug-in and other things that are out there. So let's start out with the OVN architecture. So starting from the top of the diagram, what you see there is the cloud management system at the top. So in the case of OpenStack, that would be OpenStack Neutron and the plug-in for OVN in Neutron. So the primary way that that communicates with OVN is through a database labeled there as the Northbound Database. So that database contains the logical configuration of the system. So for example, it has tables that represent logical switches, logical routers, ports on virtual machines, ACLs, and other concepts that your network administrator is likely to be familiar with. It doesn't contain anything physical. For example, there's nothing there about hypervisors or where things are located physically in the system. So the Northbound Database has only one other client and that is a demon called the NorthD or OVN NorthD. The goal of NorthD is to take this set of concepts that are familiar to the administrator and familiar to Neutron and translate them into lower level concepts that are easier for the hypervisors to implement fairly directly. So it takes things like logical routers and then it translates them into what we call logical flows. And if you have any familiarity with OpenFlow, a logical flow is a lot like an OpenFlow flow. The major difference is that the concepts that it uses are logical ones. So for example, instead of speaking of physical ports, it speaks about logical ports. An example of a logical port might be, for example, a VIF on a VM. So it takes this lower level representation and it pushes it to a second database that we call the Southbound Database. This database has, but besides NorthD, it has every hypervisor as a client and the client there is a demon called OVN Controller. OVN Controller has a bunch of responsibilities. It pushes Northbound into that Southbound Database. That's a little confusing. It pushes to the Southbound Database, for example, the set of ports that are bound on its own hypervisor. It pulls down all the logical flows and it translates those logical flows into ones that are physical flows for its hypervisor. So for example, for a logical port that is actually a VIF on that hypervisor, it would translate that into a reference to that VIF. If the logical port refers to a VIF that's on a different hypervisor, then instead it would translate that into a reference to a tunnel to that hypervisor so that if packets were destined to it, then it can send them across the tunnel. And then on the other side, Southbound OVN Controller talks to the local OpenVSwitch instance over the same channels that any OpenFlow controller would use. So it speaks OVSTB and OpenFlow to the local OpenVSwitch instance. So here's a little more information on that architecture of OVN. We use these databases, these central databases to coordinate and configure the whole system. Currently we're using OVSTB for those databases. It's possible that if we run into scale limitations there, perhaps we'll switch to a different database. The particular database is not essential to the system. And the other point is that most of the work in the system happens at the hypervisors. It's a distributed controller, not primarily a centralized one. It's a centralized database. And the architecture is one that we came up with after seeing a couple of generations of controllers at Nacera and VMware. We think that we've learned from the experience of the good parts and the bad parts of the designs there. All right, so this slide should be familiar to anybody who's done Neutron up through Montaca as far as how security groups are handled. This has been one of the pain points for OVS and the fact that, A, it's fairly painful to put this together in the control plane. B, if you're running OVS bridge, hey, you've got Linux bridges there as well. You've doubled what you've got. And if I count in a path through, I think I've got like six stacks that I have to run through to get a packet from the VM on the left to the VM on the right. And that's not an optimal way to do things. So, passing it back to Justin. So with OVN, we're using a new approach. This is something that we talked about last year, which is making use of a connection tracker so that stateful connections can actually be managed and maintained by OVS itself. And so by doing that, we don't have to go out to IP tables in order to do the stateful firewalling. And so it's much more streamlined. As I mentioned, this is something that we introduced a while ago and we've been talking with other people about how to do this as well. So there was a presentation yesterday about the existing OVS plugin that makes use of this as well and you just get much better performance. This is a slide from a presentation that we did last year in Vancouver. This is before it was integrated into any project. It was more, we were still working upstreamed in the kernel. But you can see that the throughput improves significantly by not going through those multiple stages. So the other painful point for folks doing Neutron and the reference stabilization is L3. It's agent-based, which has implications in the RabbitMQ plane. Again, you're back to using the Linux IP stack and IP tables in different namespaces for forwarding and that. And you can do overlapping IP addresses with it, but you end up with a bunch of complex stuff to actually handle what you're trying to do. And from an operational point of view, it's often difficult to figure out where things went wrong. If folks made it to the notorious MTU discussion this morning, the slides on what could possibly go wrong were just astounding. Therefore, oh, actually, therefore the picture that this turns into is that. I am not gonna read through it because it's an iChart. If you want more information, the networking guide is perfectly good. I am on the hook to put something similar to this together for the networking OVN, OVN project. So that's gonna get documented so that operators can actually set this up out of the box and get it to try it out and kick the tires. Something simpler rather than similar at home. Simpler, yeah. So yeah, something simpler. So for the OVN L3 design, this is something that I'm not aware of other open source projects doing. This is something, a similar approach that we used in the commercial product that VMware sells NSX, but that we've actually already implemented it in OVN. And so that is we're using, we're doing distributed L3 in a slightly better way or a more efficient way. Instead of going to like a network namespace and then configuring that, we actually do all of the L3 processing in OVS which is much faster. So without OVN, normally what you would do is you would attach it to a vith, send it to a network namespace and then that would have the networking configuration and then it would pop back out and then you would route it someplace else. With OVN, we actually program that as flows. And so rather than making all of those jumps, we've cached what the eventual destination is. So even if there are multiple hops, we will decrement the TTL by that number of hops and then we'll just set the correct MAC address, destination MAC address and then we'll send it on its way which is much more efficient so everything can just stay in the kernel and be a single lookup there. We've also been with one of the big things about how large you can span an L2 network has to do with broadcasts. The biggest broadcaster usually in most networks is ARP. And so we've built into oven ARP suppression so that rather than having to do that broadcast and send it to all the different nodes, we locally in OVN controller find out what the, we trap that ARP request and then generate an ARP reply and then send it back so that there is no ARP that's going out over the network and never crosses a tunnel. And the other thing is that we make no use of the neutron L3 agent. One thing that our goals for OVN is that we don't require any additional demons to be run. The only thing is OVN controller runs there and then it does everything else that you would need to provide networking. So the control plane scalability is one of the most important thing for the SDIN solution. So to achieve the target, the first thing we need is we need environment to validate scale and to verify the improvements we did. So it's unrealistic to have a big data center to just for the testing purpose. So we made this testing simulation framework just for control plane and we can simulate 2,000 hypervisors with just 20 bare metals. And we use Rally for the deployment which is familiar with in the open style community. And so this architecture, the structure is that we have central node that's hosting the central part of the OVN which can comprise by the OVN NOSD process and the OVS DB server processes which hosting the NOSBAN DB and SOSBAN DB. And this central node is connected to the farm test farm which is comprised of tens of bare metals. On each bare metal we have, we are running many, many sandboxes. So this is utilizing the OVS sandbox with very handy feature. And it's just on each sandbox, each sandbox is simulating a hypervisor actually. So on each sandbox there are three major processes running. One is the OVN controller which is a distributed part of the OVN architecture and others are OVS processes including the OVS research and the OVS DB server. So this project is actually under hosting by the OVN research on the GitHub. So you're very welcome to contribute. So I think most people are very interested in current scale achieved by OVN. So based on the testing framework we did some testing with pure OVN setup which means there's no neutron plugin involved yet but this is the current data. So for the near true part, we have two K hypervisors and 20 K VIF ports bounded on the system and there are 200 log switches. So on the right hand figure it means it's a number of ports. The port creation and binding speed along with the increase of the number of ports in the system. So the right end is which is 20 K ports and you can see the speed getting snow and on high scale the number of ports, the speed is getting snow but this is not the focus here since this is still similar in system and we're sort of over committing CPU and on each bare metal we have 40 cores but we are running 100 sandboxes. So it's sort of over committing. So in a production environment can be even better number but focus here is we can achieve the two K hypervisors without any issues and the OVN is still working pretty well and we also tried three K hypervisors but there are some issues, not issues but the speed is getting very snow and we don't want to claim that this scale we achieved but for the near true, if you're running provider networks so there's all that you need. For near three it means you need to get the notch routers involved and connecting with different logic switches. This is to be tested. So expectation for that part can be a little bit lower because of the complexity there and the work is ongoing. So the scale achieved here there was efforts and improvements already done. So this is a list of what we did based on the scalability test environment, the data provided by the environment and we did profiling and so it's very concrete result we get. So for each improvement we can get regression and we can get hotspot. One example here is a local data path optimization. So the figure here shows the improvement. It's very significant and and so what we did is just low-hine foods so far. There are even more important works ongoing in the community and more better result will be expected. So the open controller, there are two already under view. One is incremental computing is doing by Ryan and conditional monitoring is doing by Leran who is also here and showed very good results yesterday shared with us and incremental computing we also verified in this, the patch is verified in this testing environment. It's not yet merged but it's very promising and over in NOS D which is the central part the incremental computation is also very critical for the scalability and we just got very good news from Ben that the NNOC system from VMware will be open sourced and which is very helpful for the incremental computing for the logic flows. And OBS-DV is one of our hot topic but so far what we got is that even with single thread it's still be able to handle 2K hypervisors. There's ongoing work for multi-threading. I think it's Andy who is doing that job and so it's expecting a very even better result and ACL if you're familiar with the security group. So if using remote group the scalability will getting bad easily and this effort is the address set is to improve in that situation. This work is doing by Russell and the code is almost ready to be merged I think. Thank you. All right so the Neutron plugin lives in the networking OVN project. As we've said it speaks OBS-DV to configure OVN via the northbound database. The point here is a lot of there's a lot of efforts to get rid of all of the existing agents the DHCP agent the metadata agent get those into the OVN controller process so that we can keep the RabbitMQ bus from melting down when we scale up because it's a bit of a pain. One of the reasons why we at IBM are interested in this is that we have doubled down we will run this in our public cloud where this is what we're gonna do so we are currently having it in test with OpenStack we've done a 15 hypervisor deployment you can see what we've achieved we've done a 90 hypervisor deployment and you can see what we've achieved there those are the targets that we had for those deployments that isn't to say that's where they broke that was just the targets we have had for them and we met them and we're in the process of putting a 300 hypervisor and a 700 hypervisor deployment out there to test the control plane improvements because simulation is wonderful but we all know that simulation and reality can be sometimes different. One of the other advantages that we have comes in deployment because simply deployment is easy there's no additional demons host level is very simple you can do rolling upgrades as a matter of fact we'll get to that a little bit more Puppet OpenStack now supports it and TripleO there's a posting out there that's for review I don't think it's merged quite yet now as far as the rolling upgrades OVSDB scheme is versioned so when you put it in there it's carefully managed this allows rolling upgrades we've actually done this we've set up instances running data connections and pushed the big red redeploy button for the OVN control plane and the data plane and watched it work and not disturb the user connection we have not done kernel upgrades because we need to get live migration first so truth and advertising this is the same strategy that OVS itself has been using and we can certainly say that it works and let's get this to the next slide and Ben I'll turn it over to you Thanks so I'm going to talk about where we are and where we're going so tell you a little bit about what we're targeting for the upcoming release so we're working on a bunch of features that aren't quite there yet we already mentioned multi-threading of OVSDB server for performance we're also working on high availability using the RAF distributed log algorithm that should be available pretty soon we're working on support for an L3 gateway with NAT which is a feature that a lot of people are looking for I think we have IPv4 logical routing in our logical routers IPv6 I believe patches for that have been posted once or twice patches for that have been under development for a while and now that Justin is no longer spending all of his time playing manager my guess is that he'll get a chance to actually do some development he's been itching to get back to that so I think we'll see that soon we have patches that have been posted in our under review for native DHCP so that there's no need for individual DHCP demons for each logical network and in fact no need for extra demons at all metadata proxy is in the works we've had some proposals there and I think we're going to figure out what's the best strategy for that my favorite so far is to do it using a general purpose service function chaining but I don't know whether that will be the approach we end up using address sets were already mentioned those should allow for some extra efficiency when there are large groups of IP addresses or other kinds of addresses that lots of ACLs reference and finally we're working on support for routed networks you can see our logo for our milestone there the microwave release we're calling it but before that we had the toaster oven and the easy bake oven so as you can see we're moving to higher technologies all the time so here's a list of resources that you can go look at if you're interested in learning more about oven I'm not going to read all these to you so you can look at them now or you can refer to our slides that we'll post after the talk or you can come talk to us and finally if you want to want to help then one of the greatest ways to help would be to try it and test it and report bugs and in addition report successes because I know as a developer one of my frustrations is that I tend to only get bug reports nobody tells me if it works okay for them so here are the different places you can go to get involved and that's the end of our talk and we can take questions now I think I will flip it back to the previous slide because I think that's more informative so we'll take questions thanks this is a very interesting very informative I've got a couple of questions one from the NSX side to the VMware guys there is this what's going to be under the hood of the merge of the multi-hypervisor vSphere NSX product? so the OVN project is completely independent of NSX we've learned a lot from our experiences with NSX but we're not using any of the code from it I know that somebody earlier said something about open sourcing NLog which is a big component of NSX but in fact that's unrelated code it just implements the same thing okay that kind of segues into the second part which is how are you going to approach layer 4 through 7 services and do you have a strategy do you have partners for that so what do you mean by in what part of layer 4 through layer 7 do you mean? say for example load balancing okay yeah so there are patches for ACLs that are sorry no ACLs sorry for NAT that are we just got the NAT component upstreamed in the kernel for OVS the load balancing is going to leverage that as well so there are patches I think they've been sent out on load balancing if they haven't I would expect them in the next few weeks but they're actively being worked on and so we'll be making use of it there's a draft out okay yeah so there's a draft out so that's leveraging the connection tracker sort of like the ACLs are if you want to talk DPI or something then I think we're looking more at we have some ideas about how we can send traffic from the kernel up into user space for DPI engine but also we may be looking at service function chaining that sort of thing thanks since there are four of you I almost feel like this is a panel and since this session was actually marked as beginning level I feel like it's okay to ask this question do we really need so many open source networking alternatives an open stack and will we ever see them merging back together so there are fewer yeah that's a that's a that's a good question that's an excellent question the it's been one of one of my personal frustrations I I liken it unfortunately to a little bit of the history of of neutron that we are in the situation we are now the honest truth is networking means different things to different people what what you might have as networking requirements as an operator might be different from your requirements as an operator might be different from my requirements as an operator and if there's one thing that that I think we found because of the all the different solutions to try and get a one size fits all solution just makes a whole lot of people unhappy and it's not a great answer but I'd rather be in sort of the IETF model of let a thousand flowers grow and let people pick the thing they want even though for the poor guys that are doing docs and and they know who they are it makes their life heck I'd rather be doing that than then come up with a solution and have everybody go well that doesn't work and wander away well we're really complicates things is as we try to drive new features into the upper level API like I'm pushing hard to get a security group logging capability into the API and then all of a sudden all these networking alternatives underneath don't support it or they break well so so in that case what you're worried about is specifically the reference connection of the reference implementation or reference implementations since there's there's Linux bridge and OVS so let's let's not forget Linux bridge you're concerned with getting that to work something like OVN because it's in the networking OVN project it's up to the networking OVN people to make that work once it appears in the core a neutron project so it's not really your responsibility to worry about all these other things I would I would say just worry about the the reference implementations and they've all got to figure out their own their own business okay so my question is about the North Sea which does the translation between the North database and the South database base what is the design consideration of having that hosted in a central database location versus pushing that as a direct translation onto the OVN controller so the the design there it well so that that's one of those places where I really had to defer to the experience of of some some people at Nisir and VMware who built controllers before my first design for OVN my first proposal and I and Justin Justin I worked on this together our first design had had that essentially what was in that north northbound database at the height had all the hypervisors pulling from that directly and then I and then I I showed that to some of the some of the people who had worked on MVP and NSX and they said what are you doing that that isn't that that isn't going to work and they they they explain it to me and it it took a long time for me to to get convinced so I can understand why why you'd have skepticism to it Justin do you have a better technical view on that not really I mean I think that yeah I think we it's probably worth it we documented some of the reasoning I would have to to go back and look at all that we went through but but in the end there that that translation the southbound database the bits that North the rights those are those are deterministic function of what's in the northbound database so if that turns out to be a bottleneck then I think there are several ways that we could distribute or shard the work that North the does or possibly we could even distribute it down to the hypervisors I so far it it seems pretty far from being an important bottleneck in the system but I certainly understand the viewpoint that it looks like one I agree so when it comes to that level of scaling limitation will be open for having such modifications or enhancements to the current architecture right I think the the the big thing that we're planning there is to currently North the is written in a very simple way up so that whenever the northbound database changes it processes everything and then sends the differences to the southbound database the obvious thing to do there would be what we're calling incremental computation before so that if you make a a small change northbound database we do a small computation in North the and then we send those changes one thing I would say is that we didn't spend a lot of time early on in the architecture trying to optimize everything we tried to do everything in a way that was obviously correct and now we're going through and doing the evaluation to then see where the hot spots are because often times in our experience if you try and make guesses of where the problems are then you end up in the wrong spot and so everything right now is very straightforward now we're saying seeing that you know that we mean it's pretty obvious that reprocessing everything is going to be expensive but let's just make things correct and then later on we can add incremental update and then you know that's gonna be much more complicated but at least it works up until then just just let me add to that that in fact I had a bunch of guesses where we would find the first bottlenecks when we scaled it and I was wrong so I'm glad that we didn't pre-optimize those okay thank you yeah you said OAN is going to be production ready by newton but is it still going to have a future parity with the existing agent-based implementation of newton? I'm especially concerned about HR outers, DVR and things like that so independent of newton release or not we are gonna run it in OVN we're gonna run our public cloud with OVN and at the point we run it it will have feature parity with DVR, HA what's there today so it will be there by the newton release it will hopefully be there before the newton release because we plan on being out there before newton is done yeah so you guys talked a bit about a scale but I wanted to touch base more on the HA side so you said OVS-TB is gonna evolve in some sort of sequentially consistent store with draft implementation but at the same time there are a lot of other moving pieces like your North T-Process and the whole public subscribe network that you have set up between southbound DV and OVN controllers and those networks can get partitioned and you have to do a lot of state synchronization between northbound, southbound and OVN controllers is there some testing and some sort of analysis being done given any of these components can fail and networks can get partitioned that you can still maintain the invariance that you want to maintain of course testing will be important and we we have a plan to do thorough testing I think that part of what you're worried about is that it's not necessary to worry about that to the same extent you're talking about that published subscribe mechanism for example that is in fact a core component of OVS-TB it's not a separate service it's basically how all clients that I know about use OVS-TB so it's a fundamental feature if that was broken the system wouldn't work so if you get a network partition and a lot of these published subscribe connections break when they come back the full state is synchronized? so it won't be fully re-synchronized we have mechanisms to basically pick up where it left off in common cases where it's cheaper than sending the entire state okay thanks and it looks like that's the end of our time but I think that all of us are available to talk to people after the session thank you