 Hello everybody! My name is Lukas Doktor, I'm from a Verde team from Red Hat Check and today I'd like to talk about about Q Performance Regression CI. I sent this email proposal for a regular upstream performance testing more than half a year ago. I explained the motivation there, which in short is that although developers occasionally check some commits, they consider important for performance regressions or changes and we do some regular performance CI. We still hit some of the issues only during the pre-release performance testing and it's already too late to really discover what caused this change. We would like to change it, so we would like to inspire from the CI what it did like 5-10 years ago, where it moved upstream closer to development, ideally each build which still is not our goal because it's not that easily done with performance due to hardware limitations. Again, describing that email, but really to move closer to ideally pinpoint all those individual changes to be able to decide whether the changes are important enough to cause this change. To limit the scope, I'm not talking about TCG performance, I'm not talking about low-level stuff. I'm really interested in system-wide full-blown OS testing like FIO or U-Perf running inside the guest. There was a short discussion and it vanished to nowhere. Important outcome was that first nobody is aware of anybody or was aware of anybody doing that and nobody was aware of any tool that would simplify this or allow doing us that. So I went ahead and created this set of tools around that can be run in CI as well as locally. The workflow could look like, everything started with a developer who sends a patch, eventually it gets merged and to accept it somewhere in a branch we can detect that, run our pipelines and hopefully eventually detect a change. I overlooked this around a bisection job between the two points, maybe limiting the set of tests only to those affected and the bisection hopefully if it's reliable enough it finds out the regression or in this case the improvement. Now I can create some report, like the minimum, well I can imagine that should be there is some detailed information about that. Sorry, nbd.fio file to be able to run the same file test easily, the libbird.xml files if they are used and the bisection report with all those informations. Don't be scared, you don't usually use the first table because it's just some metadata section. You don't usually need to group such as either because this is mainly for managers but you do, you should pay attention to the list of failures. Because there you can see basically build per each commit, QMU hash and you can see the relative percentage of the throughputs compared to the baseline through the first commit. Now sometimes this can be enough to say yes this is because we change it, it cannot be differently. Sometimes it may not be that easy and you may want to be able to reproduce things so you can either just use the files I just mentioned to create your guest based on the libbird.xml, hopefully install or depend on this correctly and run a file manually. Which is good if you know what you're doing and if you have already some setup in place and you're prepared for that. If not and you just want to give it a try, you can use runperf, either just install it and use the same command that was used in the CI or you can modify it a bit to maybe simplify the setup for you. Where you get the command it's in the metadata section, you can see there are all As which means all of the builds use the same runperf command. On Hoover you can see the full command which is kind of longish. Don't worry you can remove most of it just cross it out but you should pay attention to the red parts. What do they mean? First the hosts is the most important part as I mentioned per testing is hard and we cannot just give those machines to everybody. So if you want to rerun it you need to rerun it on your machines so you need to specify the hosts that this will run on. It's not running on the machine where the runperf is executed. It could be but it's not by default. Then you may want to tweak the distribution. If you don't use the same one for whatever reasons you may want to use your own distro. It's only used if you use a provisioner to provision the hosts or if you're using VMs throughout doing the testing because it can fetch the CloudNet images and preparing them for the execution. In the same way in CIS as well as on your laptops or on your machines. So the default password is only used if you don't have SSH keys exchanged and it's used as a default password for guests that are created. And for path we are gonna talk a little bit later. So now you modify the runperf commands. Let's say here I'm using Fedora with some password. I remove most of the profiles using our localhost and running a limited subset of a file and VD tests because I don't need to run the full long command and I want to run it quickly. So I run this once then I deploy a different QMU version on the target host. Not on my localhost but on that target host. Run the same command again. Those commands are gonna create results underscore something results. I can do that multiple times. And then I can use compare path to compare them. With minus VV it shows you some information about individual failures and individual results in the console. But still I would suggest using the minus minus HTML where you can create beautiful HTML page with all of the information. Hopefully you are able to reproduce everything. You can start coding and afterwards confirm that your fix actually fixes this issue. Now I mentioned the minus minus path so the first time setup may require some metadata. Installing is as simple as pip install although I would suggest using master. And again runperf usually does not run on that machine where you run it. It's a controller and you usually specify one or multiple hosts using the minus minus hosts. And it should not modify your laptop unless you specify it as a host obviously. But it will definitely modify the minus minus hosts machines. Most of the changes are reverted especially profiles that are applied that are usually they are always reverted. But it may not remove all all the dependencies etc. So do that on your testing machines. If although I mentioned that your local host machine your laptop won't be modified you can still use container which we are shipping as well. But what do you need to prepare before the first execution is results or results is a database of your hosts. So the machines that are used during the testing. The information we are looking for is for example the architecture, huge page size, how many hosts, how many guest CPUs you want to use, memory. So those things some of them could be probably obtained from the machine we are not doing it at this point. But if you would be interested in that just let me know I can add that at some sensible default. But at this point you need to have hosts slash hostname.yaml file with those data prepare before the first execution. Now that would be it for the local reproduction. Now let's take a look at the CI. Normally I'm using Jenkins for everything but as I mentioned it's behind the firewall so you cannot actually access it. You cannot download the artifacts from it. What you can do is you can take a look at the documentation where I have all of the pipelines as well as the JGB configuration available. And you can just deploy it on your Jenkins instance, tweak the hosts section and the set of tests and start testing yourself. Now you cannot access the assets there which means I wanted to make something to have it available for you. I did this little dashboard or just a new table where each build when it finishes it just pushes the status there for you. And attaches the XML file as well as some machine readable results like more detailed machine readable results. It's available and in there you can find the HTML file which you saw earlier in report. Although this one should be usually very boring because all of the tables should be empty except of the metadata unless you change the filters because by default you only show errors. And you have the XZ file which contains the machine readable results. You can see the structure, you have the profile, in there you have folder where per each test and in there per each serial number of the test. Majority of the tests are using Pbench to drive the test execution. For those who don't know Pbench is a test suite that allows you to run performance testing. What I like about it the most is that it generates those result-adjacent file which is machine readable and uses the same format for all tests. Second it gathers the background system information. And maybe third is that quite a big portion of Red Hat is using it so when I find a regression or if I want them to investigate it I just send them those results and they can use their tools to review it. Next to it I have run perf metadata because Pbench is not aware of VMs or not to the point that you would like. So the run perf metadata contains for example the set of packages that were installed, the mitigations that are there, whatever configuration was changed in order to be able to reproduce it easily. Or just to see the differences in the report. Those two files per test is not obviously everything. Each build is about two gigs of data which we store up to 14 days where we have the iostat, mpstat, uperf, like perf reports etc. We may be able to send these if necessary for comparison to you although I'm not sure whether NDA wouldn't be necessary. If you just need some serial console output or journals, maybe libware logs we can send these easily via email on demand especially for regressions and things like that. Now the current coverage. So what do we test currently? Only x86, 64. We have four profiles local just to make sure that the hardware is not changing. Default libware which is just Word install, give me a machine and use it so the out of the box experience. The same with multiple machines where we use the same amount of CPUs but split into multiple VMs. Everything is of course customizable. And TuneLibWord which doesn't mean it's tuned to be fast. It just allows you to supply XML file and in this XML file I'm focusing on different features like pneumopinning, strict C-groups, huge pages, just different configuration, not faster. As for the test selection, the OUperf file impact with a limited set of configurations and file nbd which is a special case for a file run with nbd backend. And even this limited set of profiles and tests takes tailflowers to execute in order to have stable results. I am using three samples for example per test. As for the plans to the future, I would like to better integrate it upstream. What that means is I would like to have the Perf CI reports available in GitLab ideally, just you know cross or checkmark that this build was tested. And more importantly for me I would like to get a feedback of what is more important to you, whether this makes sense to you at all, what tests and profiles should I use, whether it's sufficient or whether I should improve or maybe remove unnecessary stuff. What background information should I be collecting? Because every tool that I run in the background is useful later if regression is fine found, especially if it's not 100% reproducible. But on the other hand it influences the execution, so advice on that would be welcome as well. And one more thing is adding other architectures. I tested successfully ARM, it works basically out of the box. So it's a matter of getting the hardware and starting using it regularly. One more thing I played with is the latest kernel pipeline. At this point I'm running a table version of RHEL and you're replacing the QEMU with the current master, which is fine for user space testing, like the QEMU user space part, but we're not really testing the KVM part. So I think it would make sense to add this pipeline maybe to only run with the latest kernel as well as QEMU, so do let me know how important that is in your eyes. And if anybody is interested in trying this, I'm here to help and I'm focusing on the documentation to make it as safe or as possible. Last thing that came up very recently is that I learned about this initiative at Red Hat where we wanted to make bigger machines sound like a bigger lab available to communities. If you think that you would benefit from having PerfCI in there, I can pursue that and maybe get a machine from them and be able to, in order to let developers to issue builds, like saying I want to check this build without the need of doing any setup. That's it for me. I'm really looking for feedback and ideally I would like the feedback to be on QEMU development list so people can be involved. So, see you!