 All right, everybody. Welcome. So my name is Tapio Tangren. I work at Nokia. And the topic of my presentation is performance testing opens stack networking with DPDK and RT. I'm going to talk about DPDK. So a little bit about my background. So I'm now working in most of my time in the OP NFE project, where I'm in the technical steering committee. So I'm going to use examples from the OP NFE because it's open source. You can go there and sort of take a look how people are doing network testing, performance testing in OP NFE. But before I started doing the OP NFE full time, I was working in Nokia sort of doing the performance testing with our internal real time open stack platform for the radio applications. So very strict performance requirements in that side. And we had a small team of people. We were doing sort of testing, benchmarking the platform. And we wanted to sort of make sure that there's no surprises that the performance is good. And we were getting sort of new releases every now and then of the underlying platform. And we were sort of supposed to make sure that things don't break, that the performance stays at the same level. So I was working with the people doing a lot of things by myself. And so what I'm going to talk about is like experiences about how we did things in testing that platform. And also a little bit like how things are done in the OP NFE project. So just the obvious thing is that the performance testing is a functional testing, sorry, non-functional testing. And the purpose is to test how well a system performance. And in our case the main thing that we are interested in is the sort of latencies. We are mainly interested in sort of a packet performance. So how fast packets move from place to place. We're interested in throughput. But mainly the sort of the really interesting question is always like the latencies. And the packet processing latencies from place A to place B. Because that can really mess up the performance of a BNF. And as I said, there's a bunch of OP NFE is doing a lot of testing. And they have, there's a very lively testing community in OP NFE. And the couple of projects that I've been, I should mention here is the yardstick. That I have been actually tried myself and installed it and run it and tested it a little bit. And there's also the Q-tip project in this area. And there's many others, just take a look. So before performance testing can start it's good to know what the hardware is and there's some limitations in sort of what capabilities the hardware has before you can even consider certain kinds of testing. So I mean there's a few things like the BIOS setting for example. There's some options that you can set to maximum performance and turn off all the power saving and sort of the CPU frequency scaling, things like that. That can interfere with the testing results. We also recommend disabling hybrid trading. So hybrid trading is great for general purpose computing. But as you know that when you have hybrid trading you have two trades, which are sharing some of the same resources. And if you don't have perfect control about where things are running that can also impact the performance. And then there's these things like VTD and SRV, especially if you are using SRV. But it's good to sort of change those settings in the BIOS before doing any installations and things like that. And then there's these NIC capabilities that have to be considered like hopefully you can run DPDK on your NIC. So basically two setups that we have for testing is, and this is a little bit obvious, but I mean there's like the load generator, which is sort of an outside machine. We try to use a sort of computers as much as possible and not specialized hardware because it gives more flexibility. And then the basic model is that you go to the V-switch in the compute node and then it goes to the virtual machine and it comes back to the V-switch. And so the way I done this with the open stack is just I launched the virtual machine either with provider network or without. And let's say I'm using VLAN, VLAN tagging is the easiest. So just check like what is the VLAN that this virtual machine is using and then when I have the load generator I configure the same VLAN on the load generator and then I know that it's in the same L2 segment as the virtual machine and the packet will actually go through to the right destination. And another way of doing measurements, which is actually easier to do but it, okay, there's a different tradeoff. It's just that you have two compute nodes and then you have two virtual machines and just sending packets back and forth. And this is of course easy to set up because you can do everything with open stack interfaces. You can set up a heat template to do the testing. And there's no special setup or configuring VLAN tags or setting up the switch or anything like that on the hardware. But then some things again are not possible when you're doing it like this. So since I had the dpdk in the title, I just added here like where is the dpdk used. So in the load generator I prefer the ones with the dpdk. And now the question is why do I have dpdk everywhere? The reason is that in this kind of setups we have like 10 gig links, we might have 25 gig links, something like that. And in our case we are mostly interested in sort of small packets. So we want to have testing which is saturating the link with the 10 gig link with 64 byte packets for the testing purposes. And that's why dpdk is the only way of doing it. If you have big packets, then it's possible to use something which is running in using the Linux network interfaces. So basically I can have the dpdk in the load generator, I can have it in the v-switch and then I can have it in the virtual machine. And if I put this parenthesis, this v-switch dpdk here because of course that's something that I'm testing. So it's the system under test is the v-switch and then this virtual machine on the right hand side part of the testing infrastructure. And then of course the other setup is that I had the virtual machine and then I'm running dpdk on both of them. Okay, and then just a couple of considerations like how do we want to do testing, what to consider when you're doing performance testing. So like I said we are mostly interested in network throughput and network latency with different packet sizes. So if you're interested in something else then maybe your sort of set of tools is going to be different. But these were like the main metrics that we were interested in and then the tools that I'm going to show you that we chose is based on being able to measure these things. And then of course I prefer open source because it's obvious with OPNFV we try to use open source as much as possible but it's also nice to have sort of software based tools because that gives you a lot of flexibility. And the flexibility that the software gives you it's related to this automation so it's worth investing effort instead of automating the testing and sort of make it so that you can just have a single script that will run dual of the testing. Because at least my experience in performance testing is when you do it you have to do it many times and sort of repeat it and repeat it and repeat it and make little changes and then run it again. So easier you can make it the better it is. And then one consideration in sort of choosing the tools is sort of where do you put the results and how do you show the results. So it's good to have a lot of information stored somewhere but it's also even better if you are sort of able to somehow show what the results are and being able to sort of take a look at the data and sort of see if something is going wrong. Like I was saying that we were sort of doing this sort of regression testing that we had like one version of platform, next version, next version, next version. And what we wanted to do is sort of take a look at the numbers that we get from each release and then it's of course nice that you have like a curve that you can sort of if you can sort of graphically take a look at the results then you can sort of easily with your eye spot if something is going wrong. Otherwise it could be quite difficult to sort of just look at the numbers and notice something like that. So here's some of our favorite tools that we have been using in doing this testing. So starting with the DPDK, if you just download the DPDK like I usually do and compile it, then one of the applications, example application sort of is this TestBMD which is of course testing. I mean it's main purpose is to sort of test the DPDK installation and see that it works but it also has this nice feature that, or actually a couple of nice features. One is that you can configure it so that it just takes a packet in and changes the MAC addresses and sends it out. Like I said that one of the ways of doing the measurements is just sort of set up the L2 segment, set up the VLAN if I'm using VLAN overlay. Then I don't care about IP addresses or anything like that. I just put the destination MAC address and then it goes to the VM and then there's TestBMD which will take the destination MAC address and make that the source address and then configure another port as the destination address and just send it back. So this makes it the job of the TestBMD very simple and that makes it actually very, very fast. Even if you're running it in a virtual machine it requires very little resources when you're doing that because like I said the only thing it does is just switches two fields in the packet. Then TestBMD in my experience again it's nice because it does this like text interface so when you're setting up things and things are not working you can have a console access to the VM and look at the numbers. It has the counters like how many packets are going in, going out. So that I have found convenient. And of course when you're setting it up there's a few things that always with the DPDK you have to do is set up the huge pages, the two meg should be enough because we are not, I mean that should work enough. And then in my experience, I mean there's like a couple of ways you can do user space networking in Linux. One is this VFIO and the other one is UIO. And my experience again is that the UIO works better and then you have this IGP-UIO driver in DPDK. That's my recommendation is to use that. But I'm working, maybe I get VFIO working in a virtual machine one day. And of course an alternative would be to SR-OV. Okay SR-OV in my mind is not that cloud like but there's some certain use cases where it can be used because SR-OV has the benefit that it's almost like zero overhead. You get the same performance from the SR-OV interface that you get from the native NIC which is of course useful. That you sort of, you know that there's no overhead then instead of using the SR-OV interface. So that was one tool. The other tool is Moongen which is, it's a very nice idea. And it's a sort of load generator written on top of DPDK. And the trick about Moongen is that there's a feature in a lot of the NICs which is called BTP, Precision Timing Protocol. And the functionality is such that when a packet, sort of certain kind of timing packet goes out, the NIC itself put a timestamp on it when the packet goes out. And it sort of, it's then the timestamp is in the packet and then it also works the other way around that when you get a packet gets in, the NIC can sort of look at the timestamp and sort of compare the numbers and so forth. And this is really great because that's like the, when you're doing the measurements, you don't sort of, if you take the timestamp at the host when you're sending the packet, you still get some delay when the packet travels all the way to the NIC and then goes out. This Moongen is using this BTP feature and it means that the timestamp gets put to the packet exactly at the moment it goes out. So the great thing about Moongen is that it's still like the, if you want to use standard PC hardware, it really has like the best possible performance in latency measurements. If you want even better performance, you have to use some dedicated hardware devices. Of course the downside is that there's some, all kind of things can go wrong and there's all kind of things that sort of debugging may not be so easy but in our experience, I think the Moongen has been working quite well. And it has a scripting interface too. One fascinating thing about Moongen is also that it uses the Lua language, translated, sort of cheat it Lua, which is a sort of a scripted, scripting language. And you can write a script in Moongen in Lua, which sends the packets and does something, which is really amazing because you can get extremely good performance with just sort of a script. I mean you can send the packets from the script and you can saturate the Tengik link with that. Yeah, and as I said, it's open source, it's all done in software and it works best if you have like, I mean it doesn't support all nicks. It's an open source project, it's quite small open source project. So you have to check that you have the hardware which supports Moongen. So if I put these two projects together, I have this kind of picture that I have the Moongen on a physical server somewhere and then I have the virtual machine with dpdk testbmd. And like I said, I just do it so that I launch the VM. I have the dpdk testbmd pre-packaged inside of the VM, all set up so that it starts forwarding the messages. I configure the VLAN on the Moongen. You can do that also on Moongen that you can sort of define it so that you put the VLAN tag on the packet when it goes out. And then it goes to the switch. Now the funny thing is that I've tried this many times with sort of so that you're using a switch between the Moongen machine and then the system under test. And it doesn't somehow seem to work with the switch in between. So I wrote this text here to direct connection or switch between. You really have to take the internet cables and plug it directly to the machine and then it works best. I've been debugging it a long time and couldn't figure out why it doesn't work. And then I had this other setup and then there's another dpdk-based tool called dpdk-package. And it's not compiled from the same source code. But it's a separate project. But if you go to dpdk.org and look at the downloads you can also find the dpdk-package from the same location. It's also scriptable with the Lua language. I haven't used the Lua bindings, but I've used sort of the command line. So when you're launching the dpdk-package you can sort of tell it how to run. What kind of traffic to start generating and so forth. So it's scriptable in that sense. And like I said about the benefit of dpdk is that you can create, sort of generate the huge amount of packets, huge amount of traffic with small packets. Like being able to saturate at least a 10-gig link with this. And now, sorry, went the wrong way. And actually the way I would use this is that I would use this in the VM to VM setup. Of course the dpdk-package also works on the physical machine, but this is like my preferred way of doing it. There's one sort of blow generator with the dpdk-package and then there's the dpdk-test.bmd on the other end. And then you just send traffic as fast as possible and then you make the measurements with that. And now those questions, okay, I mentioned already that you can use Moongan or you can use dpdk-package also in a physical machine. So why do we have both? So a couple of years ago when I started this, things were easy. I mean I was looking at Moongan and Moongan didn't support running in a virtual machine. So in a virtual machine you have the virtual interface and the virtual interface was not supported by Moongan. However, it's been added afterwards, but I haven't tested it now. And then on the other hand, if you wanted to use the dpdk-package earlier it did not have the latency measurements. So that's why I needed to have two different tools. I needed to use the Moongan for latency measurements and then packaging for running between two VMs and sort of generating a lot of traffic. So this is now a little bit more complicated than that because Moongan, like I said, it does support the virtual these days and packaging does support the latency measurements. However, like I said, the Moongan has really this fancy feature of supporting the PTP packets and being able to do extremely precise measurements of the latency. So my recommendation again is to use the Moongan in bare metal deployments because that's like you get the best features if you have to support it hardware. And then dpdk-package seems to be very popular. I've seen it used in a lot of different projects. For example, in OPNFE projects in a virtual machine and then so I would really recommend using that in a virtual machine because things don't always work the way you expect. So you want to choose the same setup that most people are using and then you can go to the internet and ask questions and find answers. Okay, then I had this question, this point about the sort of reporting. How do you show the results? This is something I sort of like I said, I was playing with the yardstick in our lab instead of sitting in top and running some of the tests. So I took sort of this screenshot from the web browser. So these are just sort of some of the tools that can be used. So this yardstick, OPNFE yardstick project is using this influxdb, which was very easy to set up for me. There was a docker container, I just set it up and then directed the yardstick to sort of start sending the measurements results to the influxdb. Actually, I think like HTTP messages or something like that. Something very simple to set up. And then this is a screenshot of the Grafana tool. So Grafana is an open source tool, which is creating a web page to configure a sort of to display information. You can just install Grafana, and then you can sort of click, click, click, click on the web page. You can set up the data source in Grafana and sort of configure what is the password and such things. And then what it does show is that it sort of shows a timeline. So it has sort of timestamps on the x-axis and then on the y-axis, you have some kind of measurements. And I really had to say that, again, with my experience, this is the best use for Grafana, showing this kind of timeline measurements. And it's not perfect to show the results because it's showing the timestamps. And what you really would like to show is release one, release two, release three, release four, things like that. But anyway, like I said, it's very easy to set up and it seems to be working quite robustly. There's other alternatives that people have been using, so it's worth exploring a little bit about what the options are. So I mentioned this inflex DB. Some people are using the Elasticsearch, which is a nice sort of feature like the search is very easy. So if you have a big database, you want to look for some measurement results with some keywords, you can do that with Elasticsearch. And then this MongoDB is what is used in the opnfe testing community, so that they ended up using that. So that's also a very popular choice. And then about this visualization, I mentioned this Grafana tool. I already showed it already. My picture was not that beautiful, but the point is that the set of ways that Grafana can display information is a little bit limited in my mind. And there's another tool called Kibano, and that is used in the opnfe test community, for example, to show. And they have very, very beautiful sort of diagrams. All kind of made gauges and meters and colors and things like that to visualize the testing results. But the sort of the trade-off is a little bit that, okay, Grafana is easy to set up with different data sources. Like I said, just click, click, click. And Kibano has really nice graph types, so a lot more capabilities in that sense. But then there's, okay, the opposite is also true. And so, like I said, the opnfe testing community has been working on this, and you can sort of check what they have found out, and it's all available in the opnfe wikipages to read more about that. Okay, so just sort of a little bit disclaimer about, like you can do a lot of measurement tools with these kind of open source tools that I was mentioning to you. Like you can do latency measurements. You can easily to set up like a constant rate, constant bit rates with different sort of, I mean, okay, you can't completely freely choose what is the rate, but you can have a lot of freedom in sort of setting different traffic rates. You can also set test with different packet sizes quite easily. And if you really want to get fancy, you can have like a p-cap file. You can have these tools like the packet chain, for example, or moon chain, both are able to read p-cap files and just sort of, you know, send the same packets over and over again if you want to have like a mixture of different packet sizes and different traffic types and things like that. And then this, you can also find readymade scripts for doing this RFC 2544 measurements. This RFC 2544 is sort of, it's sort of way of measuring the throughput for different packet sizes. So it's sort of doing like sending packets and then if you have packet loss, it decreases the rate and if there is no packet rate, it sort of increases the rate and sort of you can converge on something where the packet rate is acceptable and then it will sort of tell you the result. It's very nice because then you get like one number. If you fix the packet loss rate that you want to have, then you can have sort of a single number, which is a throughput for its packet size. And it's sort of a little bit tedious to do it by hand. So you really want to have like a script, which is doing it. But so this can be done. What cannot be done as easily is that if you really want to sort of test sort of more exotic traffic patterns such as you want to send a burst of packets with fast rate and then have a delay and then send a packet of a burst like that, then really what you should look at this is writing your own test using DPDK. We have internal inside of the company, we have a tool like that and it's not a huge amount of code on top of DPDK. So it's doable. And also what I think cannot be done as that easily is if you want to test like with broken packages and sort of doing this kind of measurements. If you want to have like the wrong checksum or something like that, then you want to check what is the performance if this happens. Okay. And then finally sort of make it a little bit more complicated. So now it looks like you can just run a single test and you can get the number out. But then this is like all kind of things that can affect the performance besides the settings that you did in the beginning. And one is this of course is that you have to sort of think about like CPU pinning. Like where is the VM running? You know, if you don't pin the virtual machine to certain CPUs, then you can have like Linux operating system can migrate the virtual machine to different cores and then you get latencies while the migration is happening. And then there's also this pneumatopology that if you have dual socket machine, it makes a big difference if the two virtual machines are running on the same socket or in a different socket because if they are in different socket, then you have to go across the two sockets on a physical interface and that takes a lot more time. So this picture is about this pneumatopology. And then another one that makes a big difference that it's always worth checking and it's a lot of work to do sometimes. Is this a PCI topology? So if you have a dual socket machine and you have a NIC card and packet comes in to the NIC card, then usually it's so that this NIC card is only connected to one of the sockets. And now again, depending on where your virtual machine is running, then you can have different results. So in this picture you have, okay, starting from the bottom you have the switch, then there's two NIC cards on two different machines, then you have two sockets on two machines and then you have a virtual machine running on a socket. So this is like supposed to illustrate the worst case scenario where you have the virtual machine in sort of in both cases in the wrong nummonode. So the packet originates from the virtual machine, it goes to this. In the first socket it goes across to the other socket, it goes to NIC card, it goes over the switch, it goes to NIC, it goes to the other socket and then it goes to the VM. And there's actually big differences when you measure the performance sort of between the best case and the worst case. So if you don't control these kind of factors, then you don't really know what kind of numbers. Your numbers can be widely different. So again what I do is that I take a look at the PCI topology, I figure out sort of where is this NIC connected to and then I just tell OpenStack to run this virtual machine on either one of those sockets. Or there's some features in OpenStack to do this automatically so that it can optimize the performance. But again I like to do things by hand and sort of have control about where the virtual machine is running. Socket zero, socket one. And then of course one thing that always sort of really messes up things is that you can have other system load on the system. So you can have latency, very good. Like I've been doing these kind of measurements and seen it many times. You can have very good latencies without any system load. And then when you get some load, then you get much worse performance. And this is of course one thing that makes this performance measurement so difficult is that you don't actually know what kind of other processes or what kind of background processes are running in the Linux operating system at the same time so the results can vary a little bit. So what we do is that we have usually, since you remember initially the purpose was to sort of verify that we can run BNFs on top of this OpenStack platform as well as possible in all possible conditions. So what we do is that we run some stress testing tools, sorry, some stress BNs on the same system and then do the measurements and sort of that way sort of try to verify that even in the worst case you get some reasonable performance. So spent some time early in my life looking at this kind of different tools and identifying which ones are the good ones. So I just put the links here about some of the tools that can be useful. And here I have at least I've been using sort of three different kinds of loads. So storage, traffic, CPU load and network load. So there's a couple of tools for storage. I don't know which one is better. I think we have been using both. And they're pretty much the same in what they do. And basically just sort of define what kind of storage test you want to run. So both of these tools are able to sort of write random data on a file and read random data from a file and do it repeatedly, continuously, all the time. For CPU load there's a really great tool called CPU Burn and it's able to sort of really give high load to the CPU. And that's a really great one because otherwise without any tool like that it's really difficult. I can't think of any way of sort of generating the CPU load. Like 100% CPU load. And then for network load, I didn't mention IPerf or IPerf 3 earlier. So that's a very easy to set up standard Linux tool for measuring latencies and network traffic and so forth. The limitation in IPerf is that it's using the Linux standard network interfaces. Standard Linux drivers, kernel drivers. And the performance is quite limited. So you don't get anywhere near 10 gigs. I think last testing we were doing like you can get like one gig or two gig or something like that on a 10 gig interface. And of course it all depends on the packet size. But with small packet sizes you get like one or two gigs. So it's nowhere near where the capacity is. But if you just want to have like background noise, background load, IP based and don't want to just something that is very easy to set up, IPerf is great. So just make a little script that launches IPerf and then do it. And then another alternative is to have this VM to VM testing. So the idea here is that when you are doing this performance testing you launch a bunch of virtual machines. And inside of the virtual machines you have storage test, network test and CPU load running at full speed. So you have these VMs which are generating noise and generating traffic on the storage system and the network system and also loading the CPU as much as possible. And then you do the measurements. And of course almost all of these tools that I'm listing here they are meant for like measuring something. But in this case I don't care what the result is. So I can just throw it away. I just launch a VM, it generates some load and I don't care. It's just background, for background noise. Okay, yeah, okay, that was actually it. I hope that is useful. I think we have a minute or two if there are some questions. Yeah. Hi, you mentioned about Gen2's Mungent for measuring latency. And it's using PTP I believe. So do you know, what's the level of accuracy of resolution for timing measurements like for latency? Don't remember. But there's an academic paper. I mean it's a university project. They wrote the papers about that. And I don't now remember from top of my head what the measurements were. Okay, that's fine. I have another question. So do you know of any other tools, which are any other way to measure the internal VM latency or timing measurements if, let's say, two or more than two VMs are deployed on the same compute node. How do we measure the latency of packets between those two VMs? You mean besides the packet gen? Oh yeah, so if a packet gen would sit at the end, right, at the NIC cards, I mean it would be pumping in traffic or at the NIC card or it would be sitting at the ends of the service chain. But how do we find the latency between the two VMs? Let's say if there are three VMs, I want to find out the latency between VM 2 and VM 3. I don't care about latency about VM 1 and VM 2 separately individually. Oh yeah, we've also done measurements like that. I mean we do a lot. Yeah, we've done it. It makes actually sense to do it like that, to sort of set up a chain. You just chain them. You have the packet gen here. You have test-BMD. Test-BMD is just taking packets in, sending out. Then you have the next VM, takes packets in, sends them out. Next VM, you know, code like this. And then in the end, you go back to the beginning. And then the latency is with the packet gen. So you can move the packet gen or the moon gen point of connection? Yeah, yeah. Or you can use either one of those load generators and you sort of chain it like that. And that's a very nice way because then you can sort of, in a single test, you can do sort of, test all of the cases, you sort of send packet first inside of the same socket, the different socket to different machine, different socket, and then you come back, and then you sort of combine all of these latencies and get one number. Yeah, makes sense to do it like that, yes. Thank you. Good suggestion. Great talk with the tools for testing. A quick question. So what's your general opinion of the test results, the performance of DPDK versus the test results? Well, we didn't... Yeah, just the latest measurement, yeah. Sorry, I can't answer it right now. And because this was more like sort of verifying that the platform itself, we were not testing sort of comparing the DPDK implementation itself. We were just using the DPDK as a tool for doing some other measurements. I think the DPDK guys, DPDK people have been doing a lot of testing with sort of different latencies, but I think in general my sort of opinion is that this DPDK is good enough for sort of doing this kind of measurements. It's able to sort of saturate the link, get enough packets for sort of throughput testing. It's good enough, and for the latencies, I don't... So you mainly just... Yeah. ...for your other testing as opposed to... Yeah. Yeah, yeah. DPDK itself. Yeah, right. Precisely. Thanks. Did you ever come up with results for DPDK? Is Fonks same question? No, that's a topic for another talk, but no, it was... Yeah, we were just sort of... Yeah, Intel people have... There's some information, some measurement results on the DPDK.org. I think there's a couple of papers that have been published on that. I have been using myself in doing that. Oh, by the way, if you're interested in getting the slides, I'm using Google to publish these slides. I put the link to Twitter. So if you go to Twitter and you look for my name, so my Twitter handle is my first name, TAPIO underscore TA. You search for that. The last Twitter post that I put is a link to these slides. So you can download them and take a look what I have done. Okay, thank you.