 Okay. Good afternoon. Welcome to this joint presentation with Hewlett Packard Enterprise and Winterware. Just by way of background, we're going to talk about a specific component of Healian Carrier Grid, which is based on OpenStack with Carrier Grid features and uses some technologies from Winterware as well. Okay. My name is Madhu Kashyap. I work for Hewlett Packard Enterprise and the NFV business unit. I'm responsible for all of the open-source strategy and direction including OpenStack, Open Daylight and OPNFV. With me is Glenn Seiler from Windriver. Good afternoon. Okay. Glenn, you want to take it away? I'll kick it off. How about I take it off? All right. Good afternoon, everyone. It's my pleasure to talk to you this morning about some interesting performance benchmarks that Windriver and Hewlett Packard Enterprise have run together with Spirant and we want to map how these benchmarks map to solving industry business problems if I can just solve how to move this pointer here. All right. What we want to do first of all is look at just very quickly some of the software components. As Madhu had mentioned, we're going to be talking about Healian carrier grade, HPE Healian carrier grade. Some of the technologies in the Healian carrier grade were designed by Windriver and we have a partner relationship between Windriver and HPE. We want to identify some of the performance bottlenecks that we see common bottlenecks in the industry and provide some details on how we've done these characterization test cases, and then, of course, present the results. The focus is going to be around data plane performance, the V-switch, the packet processing engine, and it's going to pinpoint areas that we have compared to standard open-source components. You'll see some pretty dramatic, we think some pretty dramatic performance improvements. But a couple of things I want to emphasize. First of all, this is done by a third-party testing lab. We used Spirant to actually do the testing, run the tests, provide the packet throughput engine, and we also used an independent analyst, Tully, to actually verify the results, and we'll have a published report on this later coming out in the next week or two. Like I said, one of the things I want to do, rather than just talking about benchmarks and performance and my dog's faster than your dog or my server's faster than yours, I want to tie it back to why performance actually solves a business problem. So there's a lot of reasons why service providers, equipment manufacturers, are looking to NFV to solve certain problems. I like to characterize it into three categories. First of all, is increasing top-line revenue, getting new revenues, and that's gaining market share by providing new services very quickly. So that's top-line revenue. Then there's really lowering costs, whether you're lowering CAPEX or whether you're lowering OPEX. That can be done through a number of ways. Obviously, one of the main principles of NFV is using Cots-based hardware or what I like to call industry-standard high-volume hardware, but also using common de facto open-source components. That's lowering costs. Where increasing revenue and lowering costs meet in the middle is agility. That agility and the ability to roll out services quickly is really I think the intersection and one of the main benefits of NFV that service providers are looking to benefit from. But what we're going to talk about, I want to be very clear, what we're going to talk about here is really number three, cost optimizations. We're going to talk about how you lower OPEX by increasing VM density by having very, very fast packet throughput. So I want to tie it from just running benchmarks all the way back up to what's the real business problem, why this even matters. So first of all, just a little bit about the architecture. So you understand the tests that we've been running. This is the HP Helion carrier grade architecture. What we're really going to be focusing on is the middle node there, which is the compute node. That's the data plane node. So that's running standard Linux and a very high-performance KVM using some of the real-time patches that are available in the community. And then the Wind River accelerated V-switch, which uses Intel DPDK. Some of the other things that we've done is we've added an accelerated virtual port, which is essentially a kernel loadable module that sits in the guest OS of the virtual machine and provides very, very fast packet throughput between the V-switch running in user space and the application up in the virtual machine. And they talk directly, the accelerated virtual port essentially emulates an Intel Niantic type of Ethernet port and as I said provides very, very fast throughput. So we're going to be focusing really on the performance of the compute node with data traffic going in and up to the virtual machine. The overall system itself, as you can see, uses OpenStack for the control plane as our framework for the control plane. The V-switch has a neutron plug-in that talks back to OpenStack and can perform various functions, live migration and things of that nature. But what we're really going to focus on here is the performance from the physical devices up through the V-switch and into the virtual machine. So just a little bit more about service provider OPEX and CAPEX. This chart that I'm showing here is from CIMI. If you're not familiar with that, some of you might be familiar with an industry pundit analyst named Tom Noll. He provides almost a daily blog on NFV and Telcom and he created this chart which is actually a little bit unnerving if you think about it, because it essentially shows that, you know, towards the end of this year, next year, the cost, the revenue, if you will, per bit, will start to exceed the cost per bit, sorry, will become more than the revenue per bit, which essentially means that CSPs and telcos are losing money. Now, obviously, we can't let that happen. That's one of the reasons why NFV is so important today, NFV and SDN. We know why it's happening because of the exploding growth in video, but what's really critical to this is there's only two ways to fix that, by the way, right? Increase the revenue per bit or decrease the costs per bit. It's only two ways to really solve it. So critical to solving that problem is increasing capacity and lowering the costs per subscriber. And that's really what we're doing by showing and driving this increased V-switch performance is we're actually gonna show how that lowers the cost per subscriber. So let's take a look at kind of what we're doing. This is diagram that I showed from the previous slide, kind of focuses in on the V-switch architecture. You can see the virtual machine traffic going from the VM to the V-switch and then from the V-switch out to the network. So this is the V-switch that's used in the Helian carrier grade product. And what we've seen is orders of magnitude. So depending, of course, on the use case, depending on whether or not they're using the accelerated virtual port, the kernel loadable module that I mentioned, we've seen anywhere from 10 to 40 times better performance than just sort of the standard OVS that's in the open source community today. So what we're gonna be showing is the increased switching performance equals greater VM density. And the reason that's important is because the payload, the service that you're running is running in a virtual machine and you want more cores available to run that service, fewer cores running the V-switch. Nobody's making money by selling V-switches, right? People are making money by running services on top of the servers with as many cores and consolidating as much as possible. So the more cores that are available for the VMs or the services to run, the more subscribers then you can get per server. That's gonna result in significantly lower OPEX. And it results in lower CAPEX as well, but really over the long term, the lower OPEX of supporting and maintaining and managing fewer servers will significantly result in more savings than just the CAPEX. So let's look at a specific use case. This is a real use case that we've tested and it's a virtualized media gateway. So obviously it's running a lot of data, right? Very high data content. And the bandwidth required for an instance of this particular application is about three and a half gigabits per core. The system configuration that we tested was a standard two socket Xeon server, 14 cores per processor per socket, so 28 cores total. And our testing, what we were able to do with just standard open source or standard V-switch was really the most efficient implementation required about 23 cores to get that much bandwidth through the system. That really only left one core or one core, one VM, to run the payload. So you're essentially running one instance of the payload per server. That's not very efficient. Now on the other hand, when we used the AVS or Accelerated Virtual Switch, we were able to run that same amount of traffic in 10 cores rather than in 23 cores. That left 17 cores for running the actual payload, which is where you're gonna drive subscriber data. So that results in about a 17 to one difference in capacity that's significantly greater VM density, and which results in significantly reduced OPEX, more subscribers per server. So that's really why running and testing and validating these benchmarks in this packet performance is so critical. I'll talk just a little bit about the tests, then I'll hand it over to Madhu to actually run through the actual tests. As I've said, performance is really the key enabler to be able to deploy really large scale NFV deployments. And the way that you can measure that and get a sense for how a system is gonna operate in these large scale deployments is by running the benchmarks. But it's very important to be able to run industry standard accepted benchmarks. So one of the things we'll show you is that the benchmarks we're running here are defined by the Etsy organization. They have been run and validated by independent third party, and we're working both WinRiver and HPE with organizations such as Etsy and OPNFV to further define these tests and make them available so that anyone can run the same tests. And that way, as suppliers and as service providers start to look for solutions, there'll be a common industry standard way to benchmark and identify performance. So hopefully that gave you some ideas of why this is important, how it ties back to ultimately lowering OPEX for the service providers. I'm gonna hand it over to Madu now and he's gonna actually run through the tests and the results. Thank you, Glenn. All right, so let's get to the guts of some of this presentation and the results and the tests that we run. So just to give you an overview, like Glenn mentioned, we based the tests on the Etsy test specs. This particular test was run only for layer two switches, the V-switch, and we used different frame sizes for the different tests. And this is hard off the press. This was literally done last week. We did phase one of the tests, which are the benchmarking tests that I'll walk you through. Unfortunately, the phase two, which was the availability resiliency tests, had to be postponed because of the floods in our Houston lab and messed up the testing there. But we did run the phase one of the tests, which is the results that you'll see here. Okay, not able to... Okay, so these are the four tests that we ran. So the first one is a actual packet forwarding test, the benchmark test. The next one is a consistency test where you run multiple runs of the same test, 10 trial runs to see if you get the same consistent performance from the network and from the V-switch. The third one is the latency distribution. Again, we measure min, max, and average latency of the packets going through the network. And finally is the scale testing for the number of flows on the V-switch. So we test for different scale numbers. And that is the fourth and final one. So let's get into each of the tests. Before that, just a quick note on the hardware that was used for the test. So we use both the Proline DL360 Gen9, which is the Rackmount server. And we tested also on the Blade server, BL460 generation nine. It's dual CPUs, eight cores per socket. And the network card, which is HP branded, is the Intel Neantech NIC. Okay, so this is the first test here, the forwarding, packet forwarding benchmark test. So we used three topologies. First one is from left to right, physical to physical. Basically, the idea is to just run the accelerated V-switch on a bare metal server and run the packets through the V-switch and back out again. And so this simulates where you might have services running on bare metal and the accelerated V-switch running there, and this is running DPDK, the V-switch, and the packets actually go into user space and back out again. The second test is a single VM test. So again, packet going through the guest VM and back out again. And finally, you can think of the last test sort of as a service function chaining test where you have two VMs and packets traversing both the VMs and out again. And we use the aspirant generator, packet generator test here to generate the traffic. All right, so on the last test, VM to VM, this was where you have basically three networks here, one going into the first VM and then VM to VM and then back out again. So we are talking about six ports, so the ingress egress ports of the accelerated V-switch going into the VM and then back out from the second VM. So in all, we are seeing about 39 gigabits per second traffic going through the system with absolutely no data loss that we can tell from the test that we ran. This is running nearly at line speed, 99% of line speed, line rate. So let me walk you through some of the numbers as well here. So these are the tests for the different frame sizes. What you see on the X-axis is the frame sizes. On the Y-axis is the line rate percentage. So as you see from smaller frame sizes to the largest, for all of the three different topologies, whether it's physical to physical, single VM, VM to VM. So for the smallest frame sizes, you have quite a bit of overhead and so for encapsulating, decapsulating and all of the overhead there, you will see the line rate for virtual machine to virtual machine running at about 18% of line rate and going up to just physical. Bare metal, just the V-switch install about roughly in the upper, lower 60%, right? And as you go across the X-axis and as the frame sizes increase, increasingly you'll see that past 256 bytes there for single VM and VM to VM. Sorry, physical and single VM, you start hitting the line rate percentage. So it's running at 100% line rate for the larger frame sizes. Okay, so this is test number two. Again, let me explain. This is running the consistency tests where we run 10 trials to make sure that we get consistent performance of the packets running through these different topologies. So again, let me explain the chart to you a little bit. So for the 64 byte frame size, which is running, the, is showing you roughly about, I don't know, 30, again, this is percentage of line rate on the Y-axis. So it's consistently, the dot on the 64 byte says that there is zero variance after running 10 trials and it consistently runs at about 30% of line rate. For the other 128 byte and the 256 byte, you see the line that goes across the diamond there. So that shows you the variance. Okay, and in the next chart, I'll show you the variance there. So, but it consistently, for the 128 byte, consistently running at 55% of line rate and for 256 byte frame sizes, about 97, 98%. With a slight variation on the consistency test. So if I take you into this next chart, so here you will see, for example, on the 256 byte, you will see some of the runs don't hit the 100%. And you can see that that is the variation that I'm talking about, which you saw on the previous slide with the bar there. So that was the variation that showed up in our test for the different runs for all of the packet sizes, frame sizes. So again, this is showing you consistency after running 10 trials, showing you the line rate percentage for the different frame sizes. This is single VM, the test that we did, the consistency test is for the single VM topology. This is test number three. Again, this shows you the latency test. So what you see again on the X axis is the frame sizes. And what you see on the Y axis is the microseconds, latency in microseconds. And so again, running these tests for different frame sizes, you see chart for minimum latency, which is roughly in the 15 microseconds, which is the minimum average, roughly about 20 to 18, somewhere in that range. And then maximum latency on some of the runs were in the 220, 240 microseconds. So this shows you the latency distribution when you run these tests for a few trial runs. So the next chart will show you the breakdown as well. So we broke this down into different buckets, so less than 50 microseconds, 50 to 75, 75 to 100. And so you can see the percentage completely decreases as the latency buckets go to your right. So there is latency, less than 50 microseconds, you can see for the 64-byte frame size, for example, 99.42 times percent times, it's consistent with line rate or with the latency, but there's a 0.18 percent, which is in the bucket, 150 to 200 microseconds. You can see there's some latency which shows you the max latency that was seen through some of these tests. So again, these buckets show you where the latency, which how many microseconds latency we found running these tests multiple times. Okay, this is the fourth and final test that we ran. Again, for Layer 2, we switched performance based on the RFC 2554. This is a standard test run by most L2 switches for scale, including performance benchmarking flow tests. And this shows where we ran 20 flows through the switch, 2000 and then 20,000 flows. And you can see, again, that is the line rate percentage. So no matter when it scales from 20 flows to 20,000 flows, you still get line rate performance on the V-switch. So this was the test run last week. Like I said, we will continue to add additional tests to what we saw previously and we will output an official report based on these tests as well. Okay, with that, I think, just wanna summarize, I don't know, Glenn, you wanna say a few words? So like I mentioned, it was run by a third party, Tali Group and Spiron actually conducted the test. We will have the test results soon. And some of the next phases of the tests are based on resiliency. So whether a VM goes down or a host goes down, how quickly can we do live migration, the rapid detection of failure. So those are some of the tests that we'll be running next week. And those results also we will share with the community. Unfortunately, as Madhu mentioned, we had a little Mother Nature intervened in our best laid plans to be able to present the resiliency tests, but it'll be available on the HPE website as well as the Wind River website to be able to get the full disclosure document. We're working to put this together now. In the meantime, we encourage you to come visit the HPE booth, take a look at the demo, multiple demos, and you'll be able to see some of the performance and how the Helian Carrier Grade product is working. If you have any questions, please use the mic so we can record it. Go ahead. Hi. Thank you for the presentation. Pretty much every hardware render, Dell, HP is presenting their test results. Whenever I go as a solution architect for designing and putting a bill of material together, what I do go look at the spec.org to see the independent people verifying the hardware by means of CPU, VCPU, ratios and so on, the virtualization side effects. And then I'm stuck at there because after that there's no more KPIs to look at it. When I look at your presentation, yes, you're using Wind River DPDK enabled OVS, most likely or not, using Melanox, Nix to offload the traffic to those guys with SRIOV, which improves the performance, yes. But when I look at what HP or Dell does in the sense of adding value to this hardware, I see pretty much nothing. So one of the things that we're, so yeah, we are producing the hardware, but one of the things we're looking at, I mean, looking to do with the Helian carrier grade is to take advantage of that hardware. So how we can do CPU pinning and process pinning, thread pinning, to take advantage of that hardware. So that is some things in the work. So where the software can take advantage of some of the hardware capabilities. I'm glad you mentioned about the CPU pinning and thread pinning because we have been testing stuff as well. When you look at the CPU pinning, the NUMO architecture and so on, the limiting factor is the layer tree cache versus the QPI bridge between the sockets. Is it gonna really affect the NFV, VMs or not? Or shall we just skip totally the pinning? So it's gonna be much easier for VM to recover or migrate to another host. If you have any test results regarding this using CPU pinning, thread pinning, NUMO architecture, that would be awesome, thank you. Okay, thank you for the input by the way. That's interesting and of course, one of the things that we're trying to do is be able to demonstrate an industry standard benchmark which not all hardware is created equal as you've mentioned. So you wanna be able to provide something that can be run on a wider range of hardware. We didn't use Mellanox, we used Intel Niantics and going straight up to the V-switch, there was no SRIO-V involved. Happy to disclose more details on that, absolutely. The question over there? Yeah, you've mentioned a couple of times about industry standard benchmarking but the only results I saw were your results and you claimed orders of magnitude improvement but you didn't demonstrate that to us. You showed us your results which are great, I'm not just claiming that but what were the results of straight open switch? Okay, so great question and you're right, that wasn't part of this actual benchmark. So what we did in a previous test in that comparison that I showed you where I showed 17 cores running versus the one core, that was a separate, that was a different test that we ran, it was running a commercial version of open V-switch, we weren't using the latest, greatest bleeding edge because we wanted to show our Wind River commercial product versus other commercial products. So in that model, I wanna say that it was running about three to four megabyte speed per, was it three to four million megabytes per second, something like that, compared to our V-switch which was running about 20 megabytes per second. Significant, it was about 10X altogether. So I guess that's my concern is you throw up your results but you don't throw up what you compared against and you say it's a 17 to one, so would I expect at a 256-byte packet that I'm running at 5% of line rate? Is that what you're claiming? We were actually running at about 60%, we were running at about 60% of line rate. Oh no, I'm sorry, that was with 64-bit packets. That was with 64. At 256, you're running 100%. Yeah, we were running 100%. You're saying that at OBS, they're gonna run at 5% of line rate? Is that the claim? Ish? Ish. Yeah, so again, what we were showing was just ours. We didn't want to show another competitor's product in this particular session, so we were just showing what ours are measuring. What you may claim is against the competitor's product. That's what I don't understand is you're gonna make a claim, which is cool. We're making a claim of what our product performs at. But you said I have orders of magnitude better. So you've made a claim against your competitors which is fine, which is fine. And that's what, you know, sort of about, but at least show that to us so we can understand whatever it looks like. No, that's good feedback, I appreciate that. Any other questions? This is great by the way, I appreciate it. I think I have one that might be a follow-on to that. Okay. In regard to packet forwarding performance, it looks like you ran tests at the various interesting packet sizes, 64, 128, 256. Did you run a test with typical traffic and understand what the average packet size was so that we can understand what performance would be in a nominal case of actually using the infrastructure rather than a controlled test? Okay. This was a more controlled sort of, you know, with the spiron just sending out, you know, so it was not a typical test where you're just running arbitraries, you know, packets or traffic through the V-switch, but that is one of the tests that we will conduct. Have you tried that at all? Just to see what, you know, just in a normal- I am not aware of it, but I will find out. I'm not sure. That's really interesting, right? Because although it might perform 100% of a line rate at 656 packets, if your average packet size is smaller than that, you're gonna drop packets and you have no idea which packets you dropped. Right. And then that's a bad deal, right? Right, right. Okay. Okay. Any other questions? Definitely appreciate all of you spending some time here to come listen about the HPE solution. And again, I encourage you to go down and see the demonstration at the HPE booth. And please do look in the next week or two for a much more detailed published report of these benchmarks and also some of the resiliency and availability benchmarks. And hopefully it'll help clarify some of the questions that some of you may have had as well. Thank you very much. Thank you.