 So ladies and gentlemen, next up we have Xiaofeng Wang, senior QE engineer at Red Hat, and he's gonna talk about performance test methodology. So welcoming him on stage. Good afternoon, everyone. I'm very excited to be here at DevCon this year. My name is Xiaofeng Wang. I work in the virtualization QE team of Red Hat. Today, for my presentation, I'm going to introduce you a new network performance test methodology. I was running a performance test earlier this year with widely used performance test methodology. My job was to do some network performance comparison between amongst three different hypervisors, like KVM, ESX from WMIR, and Hyper-V from Microsoft. So during my testing, some issues blocked my test. So I had to fix them to finish my performance test. When all the issues were resolved, I felt that the solution to the problem can be summarized into a new performance test methodology. I will give you a detailed introduction on these issues and the corresponding solutions. But first, oh, sorry, oh, that's not work. But first, I will show you how I run performance test before. That means with old performance test methodology. The first step is always run the test tools. In my case, I chose the hyper-V. So you can use the net-perf or some package generators and everything else. So I run the hyper-S on the server side and the hyper-C on the client side with some hyper-Options. When the test is finished, the hyper-V will show you your performance result like this one. So the final performance result will be in the last two lines. So in this case, the performance will be 21 gigabit. 21 gigabit. The third step is run more tests and get an average value as final result. So I guess we always do that. The last step, generate the performance test report based on the average result. So this is how I run the performance test before. With all the performance methodology, what's the problem here? Maybe you cannot find any problems. I will show you the problem I just found. The first one, there's no step-by-step traffic. Just send as much traffic as possible. So what's the meaning of the step-by-step traffic? Like this chart, it's a step, step, step, step. For example, if I want to test the 10 gigabit NSE network adapters, I will use the traffic should be start from one gigabit to the 10 gigabit with 10 steps. One step will increase one gigabit. So it's called step-by-step. So what's the problem here without the step-by-step traffic? Because the IProof or NetProof just send as much traffic as possible to against your DOT, means the device and the test. In my case, it's network adapters. So sometimes the system will be broken instantly by exiting the max traffic at the beginning of the test. So at this time, I cannot get the performance result of the system. So I have to do some setup or some things to make the test running. Sometimes it's not running because the system broken. This is the first problem. The second problem is about max value. So you can find, nevertheless, the IProof, NetProof or something else, they just give you the max value. That means that the max performance, in my case, the network adapter can support. But it is just a max value. So the network adapter in my case cannot work well at the max performance for a long time. It's just the max, not the working value. Have you heard about the Intel has a technology called TurboBoost? Have you heard that? Let me, it's a little bit like this concept like I just said. That means that the TurboBoost will, you can use the TurboBoost not for a long time, just at some very special scenarios. For example, one minute or just a short time. That's the max performance of that CPU. But at the normal scenarios, you just use a lower frequency of the CPU. The concept is the same as I just said this. So all the performance test tools, like IProof or NetProof, just provide you a max value. So the max value is able to reach the performance, but the stability cannot get granted with the max value. That's the second issue. The third issue, the test tools like IProof three use mean to calculate the final result. Why not median? There might be a result different between the, different algorithm like mean or median. So what's the best one? I will give you an example and more explanation on this, on the solution section. So the mean and the medium is different algorithm to get the average value. So you choose different algorithm, you can get a different result. This is the third one. The fourth one is there's no way to find a spike from the current test result log. Have you heard the spike in the performance? That means you can find in some scenarios, the performance drop to a deep suddenly or increase to very high suddenly. So if you just use the IProof or NetProof, you cannot find this. You just got a final result, right? The last issue, the report is not a story. The report is not as user friendly as I expected. Just to give you a number, no chart, no tables, just a number. So that's all of the problems. So what's my solution? For the first issues, we need a step load. That means a step by step load to avoid the max or let the make the IProof or NetProof send traffic as much as possible. We give it a limitation. Tell the IProof or NetProof how many traffic you should send this time, how many traffic you should send next time. So in my solution, in this case, I will run the IProof multiple times. Each times I will give it a limitation, one-way limitation. So the IProof will send that traffic as I expected. Let's do step by step load. So I use the step by step load to increase the load until the network or network adapter reaches a point where the performance diminishes significantly. As the load increase, the network will be able to keep up until it runs out of the resource. That means when, if your network adapters cannot support that traffic, the chart will be dropped or keep on that, keep on the line, not increase. Does that make sense? Yeah. How to do that? It's very simple because IProof has a dash V options, which is able to set the bandwidth. It's just the need you run the IProof multiple times. And run, it will be very easy because you can write a loop to set up a read about different bandwidths and run the IProof multiple times. So this is the solution of my first problem. Make sense? Yeah. For the second issue, I use some mathematics, mathematics to resolve that issue. I use correlation and R squared. That's true mathematics algorithm. So I will explain this chart later. I will give you a brief introduction about what is correlation and R squared. For the correlation, correlation is a statistical management of a relationship between two variables. Possible correlations range from one to minus one. And a zero correlation indicates that there's no relationship between variables. And if the correlation of minus one, negative one, sorry, negative one, indicates a perfect negative correlation. That means this one. For the one, indicates just a perfect positive correlation meaning both variables move in the same direction together. I have an example to explain this. I have these tables about two variables. The temperature and the ice cream shelves. From this table, you can find when the temperature goes up. The ice cream shelves goes up as well. So these two variables have relationship. They have relationship, but what's relationship? You have to do more things. I will do that later, right? Okay. We can easily see the warmer weather and the higher cells go together. The correlation in this case is about 95%. Therefore, the relationship is good, but not perfect. High positive correlation. Yeah, yes. Because I did some modification on it. So what's the R squared? The R squared is a statistical measure of how closely a train line matches the date. So in this example, I got the point in the chart. I can get a train line across the date. So the R squared will tell you how close the date with the train line. With the R squared, I can find how the train line can represent this date train. So if it's very closely, the R squared will be 100%. That means the higher R squared, the better mode fits your date. I will go back to this chart. I will explain this. This chart for the x-axis represent the load. Like I just do this. This is the load, the x. So the y-axis is the PPS, means the packet per second. That means how many packets the network adapter can process, can handle in one second. So when the load increases, the PPS should be increased as well, right? But when the network adapter cannot handle that amount of traffic, it will go to the one line. So the load and the PPS, they will be two variables. They have relationships. So I can calculate the correlation value between these two variables. So the blue dot line indicates the correlations. So you can find the correlation is always 100%. But at this time, they will drop because they cannot increase. The PPS cannot increase. So I give the correlation a number 90% is my accepted number. So in this case, I can find here, there is a cross, two lines cross here. So I can find the value in the x for the load, right? And based on this, the blue line, I can get a linear. The linear will be y equals a x plus b, like this one. So if I find x value, I can get y value, right? In my case, the max value should be which one? Maybe this one, the max value, right? But in my case, I don't want that value. I just want a value that can work for a long time. So the value will be here or there should be this idea. It's a very difficult for me to explain it very clearly. So let me explain it again. We have two variables, the load and PPS. When the load increase, when the load increase, right? The PPS, the PPS surely increase as well. So this line, the blue line, indicate that these two variables relationship, they always increase, but at this time, they cannot increase, right? So the correlation value for drop here, right? But I set an acceptable value for the load, correlations, I set it to 90%. So there will be a cross point, two lines cross point. The here, this is the correlation, the dot line indicate the correlation. So I can get the cross line with 90% here, right? So I can get the x value. Right, and I can get a lander based on the blue line. It's called Y equals AX plus B, like this one, the A and B. You can get it by some, in my case, I use the Google Sheet, so you can get it. It will be generated automatically. So now you have the x value, right? So you can get the y value. That means you get what you want. It is not the max value. It's a, that means the, maybe lower than the max value, but it have worked for a long time. Just in my case, okay. For the third issues, it's about the mean and the median. The mean is the average you have used where you add up all the numbers and then divide it by the number of numbers. Right, in this example, the mean will be 9.73. It's called average in the Google Sheet. Google Sheet has a function called average to calculate the mean value. And the Google Sheet has another function called the median. It will calculate the median value. Medial means the middle value in the list of numbers. It's 9. The difference between the mean and the median is, the mean value is calculated. This value, you can not find the value from the list, right? That means that in the performance result, the mean value will not in your performance, a performance number generated by the IPROF. It's just calculated. But the median is, this value is in the array of your performance numbers. So that's the difference. Which one is best? It totally depends on your requirements. So your test result and your requirements. I prefer the median, in my case, because median result always exists in the result, but mean result does not. In this case, maybe the difference is not too much, but in some case, they will be very large. The first one is about how I can find the spike in my performance test. I used another mathematics, it's called sand deviation. That means the sand deviation is a measure of the distribution of a set of data from it mean. Perfect used to find the spike. Do you know that deviation is the, how you can find a mean value, right? And the real, the actual value, how far the real value from the mean value. That means that if there's a spike, if there's a spike, the sand deviation result will be higher, very high or very low. So you can find it. Because you can find, from your result or read, you can get the max sand deviation value, the minimal sand deviation value, and the mean deviation, the median deviation value. If the three value is totally different, that means there must be a spike in your test. Does that make sense? For the last one, it's about the result demonstration. From my perspective, I like a test report with a table. It's a table report with some chart, right? The table report should be clean and simple and easy to understand. And the table report should include most important performance indicator. For that, there has to be some charts. The chart intended to supplement the table report. I will give you examples. Yeah. This is what, this is an example I run the performance test on the ESX. This is the report table. I have the RXPPS and the TXPPS and some throughput and loss rate, average latency and max latency. This is not the final one. I will add more into this. And we can find the RSE, standard deviation with this value. If there's a very high or very low value, that means there must be a spike in my result. And I also include some CPU memory usage in this report. So this is a table report. For the chart, you can find there's some chart for you to reference. For example, this is the chart. Do you think there is this step by step? Yes, this is the result from the IPRF. I press the IPRF result and put it into the Google Sheet. Google Sheet will give you a line to indicate your performance. So you can find, give you an overview of your performance result. Well, the spikes. A lot of spikes, yes. So, the spikes, spikes, spikes. So, yeah. Could you please repeat the statement you made just a minute ago about the standard derivation if the number was too low or too high? Oh. In that table chart? Oh, yeah. You were mentioning you didn't use it, or you threw it out, or you... Oh. That's the standard derivation. Yeah. The standard derivation indicates the spike, right? But, oh, but you mean, you cannot find the very low, very high value here, right? Or did you not include them here? Yes, but this, this table is not, the date behind this table is not the same as the chart. So, they are different. I'm sorry, because, yeah. So, if I draw the picture of the date behind this table, it's very small, no spike. Oh, go back. So, this is the last problem and my resolution. So, that's all about my presentations. If you are very interested in my topic, you can find it from the conference page. But, this page is just for the right-hander internal. So, maybe you cannot get it. But, if you are interested, you can find me and give me your email address. I will make them to the PDF and send you, send you. Right? That's all. Okay, questions? So, is the expert tool, is that available for people? So, is the expert tool, is that available to people outside of Red Hat or is it only Red Hat employees available at this time? The expert, or is that, sorry, sorry. Sorry, the expert is the name of my project. Oh, okay. Yeah, yeah, yeah. I just call this methodology, but just give it a name called expert. Yeah, it's just a name. Got you, got you, thank you. Yeah, yeah, yeah. No other meanings. Sorry for confusing. Okay, question? When you're doing your testing, have you looked at percentiles instead of doing what average and mean median? Oh, to be honest, I cannot find, you might have. I didn't find too much difference between the mean and the median. But in some, my colleague gave me their test results. I found that, I found that one. But I really don't know what the scenarios they, what the scenarios they use. So, but I think the median one is more reasonable for my perspective. But it's not the, it's not a regulation here. It just depends on your requirement or your test purpose. Right. But in terms of knowing what the throughput of what you're doing, what you wanna strive for is say 80% throughput or 80% of the time you're going to get through or 90% of the time. This is where percentiles will come in handy versus just doing an average or a median. Oh, so from all of the test value, the interval value is from the median. And this comes from the median. I use median all through my test. Not part of, it's mean and part of median. No, it's always median. Yeah. Thank you, okay, thank you. Thank you very much. Just an announcement, we'll be having a party in the evening on the lounge. If you haven't collected your tickets, you can collect it at the registration desk. And we'll also have a keynote speech tomorrow. So if you're, if you check the schedule, you should be there at Metcalfe Large. And thank you for being here. Hi, I'm Francis. My name is Francis. And I'm a supervisor here. So, you're very well, I'm just doing this, I'm just doing this. Oh, yeah, thanks a lot. When I'm off, I'll be there. I'll be there. I'll be there. I'll be there. I'll be there. You know, if there were times when the job load would stop, right? So, that's God's choice. So what you want to do is you want to say, okay, 90% of the time I'm going to be at this room.