 Thank you. I thought somebody asked me, is this an Australian company? Viada's in California, but it's Australia Day, so I put it on the slide. Thank you very much. I got into virtualized networking by accident. Viada makes a Linux distribution that is used for software routing, and our customers decided it was wonderful to put it on a VM. So everybody started putting Viada on, VMware, Zen, KVM, and then one of the customers, people on our community said, I want to put it on Microsoft Hyper-V. I said, hmm, do you support the drivers? I went looking around, I downloaded the drivers, and I looked and said, there's one problem here, big binary set of x86 code in there. We can't put that in our distribution. So I went to Greg, who works for Novell, and Novell has the back door into Microsoft. I said, Greg, could you go fix this? And I went away, and I expect nothing to happen. And then about a year later, all of a sudden the news hits, like, Microsoft open source is the Hyper-V drivers. And my name's out there, and like, okay, I guess I got to do something with these now. So I helped the guys get it, put the stuff in the staging tree and worked on it. And now we go to say, well, I think we should release with the Hyper-V drivers. And I said, okay, I really haven't actually played with the performance on this stuff. I think I ought to do a little investigation. So I started a little investigation. To get started with, I'm not doing enterprise level benchmarks. I don't have IBM level machines. I've got a single workstation. It's got eight, four cores, two threads, gigabit networking. And just to give you a baseline what you see on these kind of machines, you see, I get 12 gig. If I go loop back, I got, you know, one gig over my network. And if you do forwarding tests, I got about one million packets per second per core forwarding. So that's sort of the baseline rule of thumb talk. So before I went into looking at the virtualized networking interface, I said, well, all of these virtualization environments have a legacy network interface. Legacy network interfaces pretend to be something else. In the case of Hyper-V, it pretends to be an old decked tulip driver. For VMware, it pretends to be a PCNet32 card. And KVM has, you can choose whatever, E1000. This is not a benchmark. You know why this isn't a benchmark? It's against the rules to do benchmarks. But the real reason is those are three different machines. So I don't think that, I don't want people going away from here saying, oh no, one's faster than the other. The left one is my old Windows box, Panium. The middle one's my laptop. And the right one is my work station. So there's three different sets of hardware here. But the point is that also, oh, by the way, in the Southern Hemisphere, I decided the graph's upside down. So this is the suck bar. So this one is 100 megabits. It's getting 100 megabits out of a gigabit network. And KVM is getting about half the performance when it runs legacy. And what this test is, is I'm running a guest, I'm saying how fast can I transmit data from the guest and how fast can I receive data in the guest? The simplest kind of guest environment in a hypervisor. So obviously, the next step is, well, how well does this run with virtual NIC? If it runs with the virtual NIC, which is the recommended way, I don't think anybody should run the legacy NIC unless they have to. We do much better. KVM, there's no borrower there in receipt because I'm getting 100% of the network. That means that the hypervisor is not the bottleneck, the network's the bottleneck. And this is VMware work station in Hyper-V. Now, I should mention on VMware, I'm running VMware work station. There's VMware server. They have two different virtual NICs. So the performance is very different on the server version. So optimizing networks is a well-known problem. There's a very good book. I recommend students to take classes, study the book, rule front page of the book. It's got 15 rules on how to optimize your network. If you look at all the things I talk about later in this talk, each one of them has some corresponding law in there. And if people want to improve things, just go to the book. Choose the next rule that hasn't been applied yet. Each virtual NIC has a different set of characteristics. But look at the Hyper-V one. It's in the staging tree. It has no real offload. It basically copies every packet and checksums it before it sends it off the network. Likewise, coming back. The VMware one, the version that's in the server version, the VMware three, has a full set of offload features. Pretty much like any 10 gig card has now. And Zen and KVM both support transmit offload. They don't do any fancy VLANs or multi-Q. You'll notice that each one of these has a different maximum packet size you can use. Hyper-V is basically back where you were with Ethernet. The server version of VMware supports jumbo frames and both these two support as big as you possibly can get with IP networking. Now I started to say, okay, does this offload make any difference? Now if you remember back to the earlier slide when I did KVM tests, I was pretty much saturating the net. I was going as fast as my actual physical network could go. So for this test, I said, what about between two guests? In this case, I would expect if I was doing well, I should be able to get 10 gigabits a second. If I was not doing so well, it would be worse. So if I have everything turned on, which is a default, I get about 4,500, 4.5 gig a second. So half of, less than half of what I can do in loopback. And if I turn off the offload features, I get down to just a gig of it. So really what that says to me is the bigger packets that I'm able to get with TSO are what gives me that big performance benefit. VM implements another thing called large receive offload. What large receive offload does is it aggregates multiple received packets together and then passes them up as a unit. This works really well for an application. It does not work so well in our environment when we're trying to act as a router. Because what happens is the packets come in, they get aggregated all together into one big packet, then our router VM runs and it has to break them up all again into separate packets. So one of the first things we discovered, we get customer complaints on VM where they're saying, why am I only getting 10 megabits a second through your router? And the answer was because large receive offload was on and it was basically jamming everything together, splitting it back apart. Luckily in our case it turned out to be an option in the driver so we just turned off the option, bug was fixed, like it was on. But not all offload is always a good idea. The next possible way to jack up the performance is why don't we just use bigger packets? I mean if you have the slip from the research network here, research networks for years have used bigger packets. So I did some tests on well why don't I try different packet sizes between guests. And the bar here is 1500 because that's the one everybody day to day runs with. And you see you start out with five gig here and you end up getting pretty close to what the maximum I was getting with loop back as you go up to full side packets. This is a log scale because it made the graph come out better and shows you more information. So I was feeling this looks really pretty good. The problem is that when you do virtual networks, you kind of want to get outside the door. You kind of want to get past your machine. And the way you connect virtual networks together is with bridging. Bridging is a layer two protocol. Layer two protocols do not fragment. They do not look at packets. If you try to put a 10 pound packet in a five pound pipe, it drops it on the floor. So you're basically stuck. The world is stuck back here at 1500 MTU unless you're doing some bulk backup between VMs or you have a private research network that supports bigger packet sizes. So there are multiple ways to hook up virtual networks. Up till now I've been playing with the standard. If you use Vert Manager and KVM or use the GUIs in VMware, they give you basically a NAT network. They give you a private network. They translate the addresses and they send them outside. You could also with a few more command line options set up a bridge network. I tested the two to see whether it made much difference going to and from the hypervisor between guests. The bridge gives you a significant boost. But the other thing that KVM offers now is you could actually use, instead of being just a device, you can use tap with pages where it basically passes pages between the VMs. And that gave a massive performance boost. The problem with that is you give up all your GUI environment. So you no longer have the ability to set up networks, manage them easily. You're typing, you're basically building shell scripts that have set up this network with this device and these options. Now, this is a temporary situation. I'm sure the Vert Manager guys will get to it, but it's one of those things where as developers, we leave the charge and the users get stuck way farther behind. The next thing I tried was, single processor guests interesting, but what happens if I have an SMP guest? What if I tried using multiple queues and receive scaling? These are new options we put into Linux where some hardware has multiple received queues, so a packet comes in, they have a hardware flow classifier, and they spread it out and we have one of those per core or we also have it from Google. We've got code now that takes a hash of the packet and spreads it out among cores and software, so we have ways to basically use the cores we have to execute network code in parallel, and hopefully we'd be able to have VMs that have multiple SMP VMs, but the packets would just pass right through and, but turns out that all the virtual next we have today, except for the KVM one especially, is single queue. So go around the same test we had before. Only this time I said, let's switch from looking at actual throughput to really what would stress cores more. Let's do a transaction test. So on my A core machine I said, eight way SMP guest and I ran eight processes all trying to do a transaction test and took the sum. You'd think, oh well, the blue ones are the big SMP. Now those are the UP guests. The UP guest was doing fine getting about 1,600, 1,600 packets, transactions a second. I turned on packets steering, which is the software version of direction, flow steering, or I have an Intel multi-Q NIC and it was, you know, we weren't getting any real benefit out of it, but really more depressing is if you turn out and use an SMP guest, we are getting hammered. That virtual NIC not having multi-Q, not having any, basically being a single bottle NIC makes that the SMP performance terrible. Now there are patches for both Zen and KVM to do multi-Q NIC. They aren't in yet, they're still under discussion, but this is the point. This is why this is important. So that's my summary of what I've seen in virtual networking. I would say the glass is half full and that we've got the offload. We're doing pretty well, but glass is half empty that we really haven't addressed the multi-Q things and really hitting the levels we need, especially when we get to 10 gigabit networks. So I have plenty of time for questions. I ran a little quick, so thank you very much. Stephen, can you go back to the page with the numbers for the tap, the bridge and the NAT? Yes. Okay, so notice that your bridge numbers are pretty much identical to NAT numbers, which makes me suspect that you haven't disabled bridge net filter. I don't run bridge net filter, so... You do know that if you happen to, if it's enabled, even if you don't have any rules, it still runs contract on every single packet and that totally kills performance. That would explain that. So from my numbers, what I've observed is the bridge numbers are typically the same as tap numbers. There's essentially no overhead unless you turn on contract, which is a big security hole, by the way. If an entity is running virtualization, you must disable bridge net filter, otherwise your different guests' packets will intermingle in there because the bridge net filter only has one namespace for packet reassembly and the connection tracking, so it's not a good thing. One other thing I want to mention is that we should try to make things directed so that you've heard brought that up indirectly. Things should default to the good behavior for people? Yeah, well... Jens. Hi. The reason I'm asking this is because at work we run large scale voice servers and we want to pump our packets as fast as possible, and in the past we have never been successful in using VM because with physical servers we can go up to 150, 120,000 packets per second, really tiny packets, but VM we get stuck by 20,000, 10,000 packets per second. So do you think it's possible and how can we get there with VM? How can we achieve similar numbers? The problem is that VM is basically an application to the hypervisor, so you're doing that many context switches. You basically have that many more context switches, so it's basically a context switch test. I mean you go back to the rules on how to optimize and some of the things about aggregating and doing multiple things at once will help that. I wouldn't say get you an order of magnitude, but you'll incrementally get closer. More questions? We've got plenty of time. Has there been any testing with the very high-end NICs that where the vendors are now claiming full offload capabilities largely for VMware? Say you're Chelsea as your solar... I don't have those, so I haven't tested them. I'm sure they've tested them. Talk to me later. With Intel's new virtualization technology you can actually hand a NIC directly to a virtual machine and have it bypass the hypervisor. And actually with FlowDirector you can even send part of the traffic to a virtual machine. So this kind of flow classifier in hardware can actually be to the beat. You can get this queue out of the way and get right over to the virtual machine. Just on that topic of these hardware devices, SROV and similar technologies, yes they will improve performance, but they do come with a cost. I mean part of the advantages of virtualization is it gives you features such as live migration and the ability to actually have that flexibility in your system setup. Having those sort of technology hardware actually having a piece of hardware injected into a gist does take away some of that flexibility. So there is a cost and the other thing is that even with a software and resolution we can actually recapture most of the performance that's gained by technologies like SROV. So if we can do that at all it's usually more attractive to do it with software if it's possible. But if you're in an environment where every last percentage of your hardware performance matters then yes SROV will make sense for you. Any other questions? If not we may finish early then. Please put your hands together for our speaker.