 Good morning, I guess yep, I can get away with still saying good morning. So thank you everybody for coming This is the fourth session in the Cisco sponsored track room. My name is Gary. I'm the host and emcee for the day We do have one more session after lunch today so I hope you can come back and join us then with Lou Tucker and Steven and some folks from SAP but we're gonna get going right now with what you're all here for and to hear from Jerome and Ian two of our most senior engineers at Cisco on Superfast network performance with Thank with VPP. Thank you. It's been a very long week. So With that I'll turn it over to them. Thank you Oh one quick thing as you all came in this morning or today. You should have gotten a little card We are doing a drawing at the end of the session For a very very nice Phillips Bluetooth Speaker so fill out the cards will grab the cards at the end and do the drawing Okay Okay, so Thank you for coming. We are Thursday and it just before lunch. So I'm not sure this is the best time slot But we'll try to be as quick as possible. But if you have any questions, I will have a long Q&A session afterward So my name is Jerome and I'm managing this networking dash VPP project And I'm working with Yann as well as other engineers including the guy by the name of an aving joy in the room here today Before we start who in the room is already familiar with VPP and FD IO Okay, so that's a good 40% so what I'll do is I'll tell you a few words about what is VPP and why we are trying to integrate that into Open stack and then I'll end over to Yann will explain you the print design principles and the architecture of the software So VPP to make it simple is a is an extremely fast virtual switch It's a data plane that is extremely fast and can process huge amount of packet per second you will get more details afterwards and That can be used to for east-west communications in inside a VM between different VMs For North-South communication when it goes to the physical world So that's based on DP DK and that The way it works is that the reason why it's called VPP It's vector packet processing because instead of working packet per packet It works vectors of packets by vectors of packets and you will get more detailed afterwards And what we are trying to do in the frame of this project called networking VPP is to make sure that this virtual switch is properly integrated into OpenStack and that anyone using VMs VNF can take benefit of that We want it to be simple Robust and production great. So we are not looking at the fancy feature to start We really want to have something production grade to be used in the cloud or for VNF in production So it's not a toy. It's something that is designed to be in to go in real life in production and in order to do that we want to comply and to adhere with all the regular OpenStack rules including blueprints code reviews Mailing list blah blah blah. So we have an open community working on that And if some of you are interested to work with us, of course, you are more than welcome so Yann will go in more details on this topic, but Just Just to give you a rough idea of what we are trying to do We want to do a software which is highly scalable not only in terms of packet per second But also in terms of number of nodes we can deploy We want something really simple to operate and to debug because some of you might have some experience with a virtual switch and That's not always super easy to operate and to debug so we want we want that to be simple to operate and Of course because it has to be deployed in real life environment High availability is not an option. This is something you need to get by design by default and Will explain you how we did that So in terms of efficient so what what we did to reach these goals is all the communications are rest-based and jason-based you'll see that there is a at CD software which is used as a cornerstone of this project and Everything is based on asynchronous communication. So when you create a port you give the order to create a port So this is a desired state and then later on when the port is actually created you receive this information So all the things you do when you want to do Async communications, which is really useful for distributed systems and in terms of High availability we use a neutron journaling system as well as key value store based on at CD At CD currently is not being being used in cluster mode But eventually it will be used in cluster mode to provide, you know, what the best availability that you can think about and last but at least code is small and Easy to understand and debug so today we are speaking about a new project very interesting project But at the end of the day if you have a look at the number of lines of code We are speaking about it's between 2000 to 3000 lines of C code So that's extremely complex compact and the reason for that is because of this architecture as communications and because it's we started from scratch So that's that's another advantage Okay, so for those who are not familiar with with VPP So VPP vector packet processing is one of the project of the product of the FDIO project so Fido FD.io Fido is a project under Linux foundation and It's a multi-project project. So you have VPP, which is this fast data plane Virtual switch virtual water, but then you have other projects such as a CCIT which does Continuous integration you have t-rex, which is a packet generator. So we have many many projects Fido is a is a project being driven by you know Intel Cisco Red Hat, Ericsson, so different companies different level of involvement and But the but the VPP software itself has been designed since 2002 So it's not something new, but it has been open sourced last February So the fact it's open source is really new, but the software is is rock solid now So again, what makes it different from other V-switch is that it's I think there are several things One is that it's extremely fast and it's really important to be extremely fast because we now want to deal with multiple Tengig links and multiple Tengig links means that you want to have Like 150 cycle per clock cycle per packet, which is not a lot Right and you cannot achieve that if for each and every packet you need to fetch states from memory So what we do here is we process vectors of packets per vectors of packets so we get a bunch of packets on which we apply similar processing and That makes us able to warm the cache with the first packet and then to process all the remaining packets Which are building to the same vector, but without the penalty of cache effect And that's a very important way to a very very very significant speed of speed up in terms of of performance that is based on DPDK, but Nothing you know forbids, you know as to use all their interface in the future, but today it's fully user space only DPDK and That gives us very very high performance very very high performance for physical Interface, but also very very high performance for the host user kind of connections between virtual machines and The v-switch so that makes perfect sense both for cloud and for NAV kind of environment Recent results, you know performance results we did you know in based on 1609 Which was released in september as you can guess Typical processing on two threads two hyper threads one core It's like 10 gigabit per second So in this setup as you can see on the slide we inject five Million packet per second And then these five million packet per second are going to a vnf or to a vm Using vios to the interface they are sent back so all in all it's five plus five Which is 10 million packet per second and that is only on one core and that scales really Linearly with the number of cores So this is something you can reproduce if you just you just have to go To download the code and and do this setup. That's really easy to to reproduce And we are increasing that with multicu support for for next releases Anything to add on these topics? Now I think that's good What are the main components? So the design is actually very simple We have developed a mechanical driver which sits on a neutral server And this neutral this mechanical server have to populate States into an hcd key value store right to do that we use This json rest apis and then we have all the compute nodes while watching state changes on this on this hcd key value store So as soon as the new desired state Is available for a given node this node is a wake wakes up and retrieve the states And populate that into vpp So vpp is a data plane only and it's not a control plane. So we have a python api That use a shared memory with vpp and that's that sits on the agent side and the compute node side So we retrieve all these states populate that into Into vpp and once the port is created we give this information back to hcd and that is given back to neutron So that's that's the way it's done And one of the benefits of this architecture is that it's extremely simple to debug If you are familiar with hcd you can do a simple hcd watch Cli command and you will see all the port created all the port deleted Value states that's extremely easy to analyze So this is The architecture and jan perhaps you can Commend that so um what we're trying to do here and and another reason for using etcd is what we're trying to Implement is a model of distributed state not distributed commands So instead of telling each of the agents what it must do next Which means you've got a stream of commands running out to the agent and if the agent gets behind then the backlog increases phenomenally Then etcd stores the state it should be in if it ever loses track or if it gets behind it can always catch up to the current state So our aim here is that neutron When you run a neutron command as you would normally as you're as you're bringing a virtual machine up Or even when nova does it for you as it's starting a virtual machine Then we store that command in the neutron database the same as usual So that you can retrieve it so that neutrons got consistent state And then we um we put it in a journal within the database at the same time Which gives us the ability to do it synchronously within a single commit We're guaranteed that the operation succeeds or fails as a whole Which means that you're never getting into awkward states if there's any kind of system failure Um a background thread then takes that and pushes it into etcd We use a background thread so that the user space calls are always very fast when when you tell neutron to do something You're saying I would like you to create a port neutron is always effectively saying at that point I'll see about that at some point in the future And then the background thread is part of actually getting the job done But it doesn't make the user wait to hear that the operation has succeeded And then from the back end the agents for efficiency sake Watch etcd for state changes and individually the state changes every time they see something new that they've got to do They go and tell vpp to get that sorted If as I say they run out of history and and etcd has a limited record of changes If they do run out of history, they don't get into the backlog state But they instead say well, I'll just catch up with what's current I'll go and find the whole thing and I will resynchronize everything that I'm doing If the agents die or if vpp dies then they go to the state and they say well I know what I'm supposed to be doing right now. I will go and make that happen So the architecture is designed so that failures in any of these systems or intermittent network connectivity problems don't cause you issues And we can all wish that failures don't happen But if you're running a system that's got tens or hundreds of servers And you're running it for weeks and months on end and if in the case of nfe where you're a service provider And your main source of income is absolutely dependent on this system working flawlessly Then you have to tolerate failures and this is what we've designed it to do Yeah, so there's also a little bit to talk about here in terms of the port binding workflow So port binding is when nova and neutron together try and come to an arrangement about where traffic is dropped off From a network so that vp so that the virtual machine can then go and pick it up and there's a whole bunch of Slightly asynchronous calls that go on between nova and neutron to kind of negotiate that settlement between the two of them What we do is again slightly different from from what other mechanism drivers within neutron do we Propagate that information through etcd whenever neutron nova says I would like to bind this port to that host over there So we tell you know We put something in the in the etcd state saying that that host is the place that's now where this this traffic needs to drop And then the agent on that host will go and try and Land the information in a in a common point basically a v-host user interface This is the shared memory fast interface that we use for virtual machines to use a space switches And ultimately when it's done that it will notify back to the neutron agent again through etcd what it's done And neutron will tell nova asynchronously nova gets the virtual machine all prepared and paused But it doesn't actually start the virtual machine until that binding is completed So that when the virtual machine actually starts to run It's got working networking and it can actually get its dhcp addresses Because obviously that's pretty much the first thing that a virtual machine will do Having done it this way we the change is basically that we are completely Certain from top to bottom neutron the server but also neutron the agent that the whole system is set up And that message that's being given to nova says we are absolutely ready and we're totally certain at this point And it's also fault tolerant nova here in this particular instance Will time out if neutron doesn't get its job done in time So the fault tolerance is is the same as you usually find with mechanism drivers So our key value store here we've used etcd. It's fairly common key value store It's a synchronous key value store. That's kind of important So when you've written something then any reader from that point on will get the information you've written And as I say it keeps notifications and it will send notifications If you're watching a subset of the keys within etcd It will tell you when any of those keys have changed so that you can do updates on demand If you want to but if you fail to get the update if you find the connection has dropped Or if you think you've dropped out of sync, then you can always go and get the entire state from etcd As drone was saying debugging is this was a side effect. We didn't expect this We're so happy that it works like this If you've ever tried to work out what ovs is thinking when you're debugging neutron and why it's configured the way it's configured Horrible don't go there. It's a miserable experience There are about four different ways that ovs gets its properties plus all the ip tables not pleasant In this instance etcd contains a breakdown of what you're trying to configure what what what you want the Vpp to be doing and you can simply compare the vpp state has a nice little cli So you can always look at it and see what it's doing with the state that you're seeing in etcd And you can see whether it's sensible or whether you've got a bug which is great And in fact one of the things we're looking at doing there is actually adding a few pieces of Debugging checking code at some point in the future so that we can given that the information is there for the administrator to check We'll just write a script to pull both and compare to see whether or not everything is working properly Or whether there's been some sort of fault And within the etc data structure we basically we've set the the data structure up in a slightly interesting way So there's an outbound set of keys that are always written by the neutron server in the center of this And there's an inbound set of keys that are only ever written by one individual agent So you haven't got agents trampling over each other changing things that each other are You know information being put into a key from six agents at the same time And you haven't got the server in the agent fighting It's always totally clear where any bit of information has come from when you look into that data still This is admittedly a horrendous I should blame Jerome for this particular slide a fairly horrible Example of what comes back from etcd looks pretty nasty But the key bit in the middle there is the highlighted section And that should look fairly familiar if you've ever seen what a port looks like in neutron It's a whole bunch of settings that you find on a port that describe its MAC address and how it's bound And a whole bunch of bits and pieces And that's basically our message of communication. We just trimmed down the port to the bare essentials We stick it into etcd and that allows the distribution to go on And a little bit more there Perhaps slightly nicer formatting, but you can see there there's nothing terribly surprising in this It's just boring old neutron information that describes how the network is working and where the packet needs to be landed So we have a list of supported features and as Jerome was saying the aim here was to Perhaps I keep saying this to people people will always talk about building insecurity from the ground up You must remember security is the first thing that you build into an application I wish they would remember that when we talked about robustness as well If you want something to be robust and highly available and fault tolerant It has to be the first thing you design into the application. That's what we've done here So we set ourselves a sort of timeline to stage Firstly a framework that would do what we wanted and then we brought in features one by one So so far what we've got to is it works with flat networks and with VLAN networks It works with the host user ports for virtual machines But it's also capable of connecting to the namespaces that that neutron already provides for DHCP for metadata and for routing So all of the DHCP metadata and routing features exist today and you get them just as you would expect The db journaling bit that was added a little after we first got it in that was the synchronization to make sure that Etcd is always in sync with the database and never loses a change So we added that just a step or two down the road And it actually was surprisingly simple to do I have to say We've supported the ha scenarios we can restart the agents we can restart the drivers We can restart the server we can restart etcd very carefully because it's redundant So you tend to want to restart it one one node at a time, but we can And we've also managed to get this Deployable obviously the first thing we did for our own benefit honestly was to get a dev stack plugin working You can certainly go and download that Download vpp using dev stack today and it will deploy a dev stack with vpp running and configured in it We've also got it working in the opnfv apex installer Which is a triple o based installer that that sysco and red hack collaborate on And that's working today as well And we've looked at integrating it a little bit with red hats packages Most of what we're doing here is either in red hats package repositories So etcd is easy to get hold of or you can download pre-built packages like rpms for vpp components So again, you know, it's usable today in reasonable systems for test I would suggest not production just yet, but you can certainly get a handle on it and see what you think So the the next bits and pieces that we're planning on dealing with is It might seem odd that we didn't implement security groups up front There's two reasons for that one is as it happens if you're doing nfe Then you often find that security groups are the thing that you first turn off Because they slow everything down and if you're looking for the absolute flat out straight line performance Then then for nfe, um, it's not what you want So so we put that on one side for the time being and said we'll do that Second, but also security groups and anti-spoofing require a little bit more work on the firewalling functionality of vpp So we've talked to the vpp team. They're actually implementing some apis for us right now and we hope to have That implementation Released with our second round of releases. Um, which should come with the next release of vpp in january next year And we're also oddly a strange service to think of but we're looking at the tap as a service functionality Because again, if you're working with nfe applications and networking high performance applications Being able to take a copy of the traffic that's coming in is absolutely critical to being able to debug what's just gone on With your application. It's not the infrastructure is working But the application is not working or the other way round How can you tell the only way to tell is to have decent debugging functionality? And obviously, um, and I would admit I'd be the first to admit that that testing kind of came Slightly late in the day with what we're doing, but we're working on our enhanced test plan More in the way of unit tests for the code which fortunately is very small So the unit test should equally be very small But also more importantly integration and feature tests to demonstrate that in a deployed system It's actually doing its job And then there's a couple of things we've been talking about in the longer term One is telemetry So it's really useful to be able to get out statistics from your virtual switches To see what's happening with your network and where the network load is And sometimes even down from just port load to individual load on certain kinds of flow or certain ports on a virtual machine There is quite a lot of that already in vpp So we're looking at how we can pull that data out and make it consumable by open stat users And also we this was actually a discussion yesterday morning How we can make vxlan support work vxlan is supported within vpp But configuring vxlan is very different to configuring vlan's So that's something we're planning on down the road And oh yes, and I should mention in fact colorado here colorado is the release for opnfee of their integrated system So they have a number of combinations of installer and components that together deliver an nfee platform An nfvi as it's known or a vim In in nfee world And we're actually trying to get this into the well We have this in the colorado automated testing through that apex deployment system now And we're seeing whether we can get this into the next the next release of the colorado version of the opnfee platform So hopefully in the next few days you'll hear how that's going and I'll be able to give you good news on that subject But yes, if you're interested in looking at this then obviously we're now an open stack project So you can find our git repository under the open stack repository You will find obviously it was on review.openstack.org You're welcome to review our patches and tell us that we can't write code Which people do to me a little more often than I would like but such is life And obviously you can submit patches as well and we'll give them a good hearing We're always pleased to see people using this and trying to make it better And there's a launchpad there if you use it and you find that something's not working Then we would be we would be happy that you file as a bug bugs are good. We like bugs We don't like bugs, but we like having bug reports at least And even blueprints as well. So again, please come and join us participate in the project Okay So That's Okay, I'm live. We've got some times for some time for questions Or you guys did oh, okay, here we go. All right Yeah, I just had a question about there's also an effort to build an open daylight based control for vpp Using gbp currently and just wondered if what you know, how do you compare contrast the two? It's a good question. And I've had that question before I I totally understand that it's confusing that we would do this twice over But there's a logic to this This model here is very simple as I said two and a half thousand lines of code and it gets your functionality working This model here will get you Virtual machines connected to each other into the world But there are a lot of things you can do with networking. There are a lot of apis we could potentially create To do things like service chaining connectivity to mpls connectivity to lttpv3 or whatever other tunneling solutions that You find in service providers and again if you focus on the service provider world No two service provider networks use the same technologies and are configured in quite the same way So we know full well. There's a lot of variety out there. This is the simple version This is the one that's intended to be useful to most people Open daylight is the complex version which lets us add more shiny features Now it may be at some point in the future You'll want one of those shiny features and you will go and use open daylight But at least in the meantime you can use this and see how it works for you And it may be that this will serve for all of your needs as well This is actually perfectly fine for for typical enterprise and and similar workloads Okay, well two two for the price of one. Okay Uh, have you guys done any kind of benchmarking with mlto vs versus networking We've done some um, I'm going to look pointedly in that direction But I don't think Alex given me some test results on this yet We we've certainly done tests of ovs alone against vpp alone And and this is a point um, Jerome made earlier We always talk about packets per second when we're talking about this because packets per second are interesting And simple overall bandwidth is kind of boring honestly I can fill a 10 gig link and I can get ovs to forward the traffic of a 10 gig link and that will work just fine And when I do that, I'll be using something like a perf and I perf will be sending 1500 byte packets down the link And the packet rate will be down here. I mean, it'll be well under Well, it's it's a few hundred thousand packets per second that you're looking at Vpp is good for 14.8 million packets per second on that same link Which is a 64 byte packet filling a 10 gig link So the difference here is Not can I fill 10 gigs, but how many packets per second can I get down that link now when we've tested the two things alone And I believe we've published this information again. You probably want to check on the opnfv website We found that we can fill 10 gig link with 64 byte packets You need to be very careful about how you configure things and I'd like to know how close we are to that with vpp With networking vpp's config, but we can fill a link quite easily with with vpp Whereas if you start Shrinking the packet size of ovs you find that you're down to about 1200 bytes and you're already starting to see packet losses When you get to 64 bytes, there's a dribble of traffic going down that link So so there is a distinct performance difference between ovs and vpp A slightly more comparable version with ovs dpdk with vpp and I know we have some results on that as well It's certainly better than stock ovs in the kernel, but it's still not terribly high performance by comparison In addition to that, I'd like to add a couple of things One we we run a test that did by the eantc to compare ovs dpdk Versus vpp. So that's something which is Available and then we are also publishing regular results because Under fidot you have this cc project I was speaking about, you know, and this project does for every commit, you know, you do Performance testing benchmarking blah blah blah, so you can get all these results And last point on this topic we had In Seattle there were during last odl summit There was a presentation about what is the typical throughput you can get at zero packet loss That's very important because you can have different Numbers in terms of throughput If you accept like a dot two percent packet loss, that's very very different from zero packet loss, right? So you have full Benchmark with this level of details. It's not a comparison. It's absolute result in that case. Thanks guys Any We're there. Okay anybody else any other questions? That means you guys did a really good job. Thank you, and you are thanks again to ian and jerome