 All right we're going to start. Good afternoon everyone. The EuroBSD confirmation is thrilled to have Mr Alexander Bloom today, talking about performance measurement on OpenBSD, so performance and OpenBSD can mix in the same sentence, which is good. Thank you very much for being here, Alexander, and I'll leave it up to you. Thanks. Thank you. So I'll start with motivation. I work for company Genoa and we build firewalls based on OpenBSD and we want to know how fast they are. Especially we have several requirements, we have customers who are asking for numbers, we have hardware guys who have to choose the next hardware, we have developers who want to see regressions and we have marketing who wants to publish numbers. So we started to build a high-performance firewall testbed, which can reproduce some graphs and as you can see here, is that you can't see much. Problem is it's not written there, what is measured, it's going up and down, you don't have a reason why that happens. So I was feeling that we need something better. Before we were just wondering, we upgraded OpenBSD version, then we got some other jumping numbers here and I wanted to make a more detailed analysis of what's going on with OpenBSD performance. So the reason why this graph looked that bad is because we have a very complex setup. We have a lot of requirements, too many requirements, so we have multi-users using those targets in parallel. We have some machines that generate traffic, those are here, so we have some machines generating traffic, it's quite old, so those are only capable, oops, sorry, got the wrong, press the wrong button, sorry, sorry, sorry, here. Okay, so we have some machines, they are quite old, because they produce one gigabit of traffic, we accumulate that to 10 gigabit, send them through a target, collect it here again, do it in several drain machines and see how and measure the throughput. And by doing this in the switch, we have a lot of disturbances there and you don't get reliable results, you have other people using this, and that is the reason for the numbers jumping up and down. So what I want to have is a design like this, because I have to reduce complexity to get reliable numbers. So the other thing I have done before is doing regression tests, so and there I was focusing on to visualizing what happens, so I run those tests every day, so that's a day of testing, I run them on several architectures and you have a lot of links and this is published on my web page, it's public, and you have a lot of links. So if you click on this fail, you see the test output and can figure out what's going on. If you click on here, you see the DMAs of the snapshot that is installed, and here you can get every result on that day, you can figure out which version was installed, when was it measured, the first line is the date where the measurement was running, and when you click on here, you get to the OpenBSD CVS web repository and see what has been changed in this regression test, what was it somewhere, what was it related to change in the test, and when you sort it, so now we look at only the results for I386, so it's much clearer impression of what you have, so it's sorted by the failing test to the top, and so you see the things you should care about, and it's sorted in the history, so you can figure out, oh, here between the 13th and the 15th, something happened and this test continued breaking, so you know where you have to search, so we have history and we have severity, and for regression testing, that's quite easy and quite common, and I wanted to have the same thing for performance testing, so what did I do there? So here my goals for the system I built, based on the two things I've presented before, I want to see the history, I want to have reproducible numbers, so that you can rely on them, you want to have to click on something and see what are the date details, what has been tested, when has been tested, what are the lock outputs, you want to have drill down, so here we have a lot of commits in the tree and you don't know which one is really the relevant one for the change in the performance, so you want to say, okay, was it here or was it there, and drill down to this thing, and you want to do some automatic testing, those regression tests are run every night, so you can wake up in the morning and see, okay, what did break this day, and I want something similar for the performance tests, and there I created a cron job that collects the data of the last week, every week, so the principle idea of the design is that you start and install an OpenBSD release, we release every half a year, and that's the base where we start from, and then I say, okay, let's, oops, got the wrong button again, here, so here we check out at a certain date, we have a CVS system, and you can say, check out this date, basically I want to measure the performance of the kernel, I could do a full build of the whole system, but that takes hours, one and a half hour for example, but compiling the kernel only takes two minutes, so I just want to test the kernel, I compile the kernel, I run the tests, then I advance by the step, the step is configurable to allow this drill down feature, check out the next date, compile the kernel, run the test until the whole series is done, then I collect the results and draw those nice graphs, so the first thing you get is a webpage where you can say, okay, somebody did a measurement at this date, he installed version 6.4, and then he checked out from this date to this date, and the step was one day, so every day is measured, so that's one of those columns, one column is one test run, and it's running in one day steps between this date and that date, when currently a test is running, you can click here and you see, okay, this machine is doing there, and it's going there, and if the kernel panics in between, you can see everything there. Here we have some details shown, and here we have exactly the test that we run, so here I run a TCP bench with some parameters, so you know what's running, when you go drill down, when you click on some things here, you get even the whole lock output of the command. So now we click on here and get one of those columns, so what happens in one test? In one test, we have here the checkout date, so we check out here, here, I said before that are one day steps, and no, it's a week step, here's written one week, so that's a week, and a week, and a week, and we run those tests. So to figure out whether the numbers are reliable or not, I have to run them multiple times, so in this case I run them five times, and here you see the maximum throughput. So what I do here, so here you see the average numbers of the five measurements you have for this measurement, and here again you see some locks and can figure out what happens. Here you see the number of kernel commits between this measurement and this measurement, so you can even click on the CVS lock, there you see all the commits collected, so you can say, okay, what happened between here and here, and also I have to do some build quirks, I'll explain later what that means. So here we have the average number of all the tests, and as I said before, they are jumping up and down, and I made them red, and those numbers are not consistent, so if they change a lot between in each test run, I mark it here, and what I've also find is that I have very unstable results, so in this column everything is bad for some reason. So I can click on the next time, so I click the one back when I click here, and it goes to the next one, and I see everything that happens in those five measurements, one, two, three, four, five, and what you see here, that we have an outlier, so this measurement for some reason went pretty wrong, so here we have 3.9 gigabit per second, 4.0, 3.8, and here we have only 1.6. Actually, I don't know what happened here, you see it in the next graph as a picture, here something really went bad, so we see that this is an outlier and marked red, and it's shown as red in the upper graph, so you see, okay, here we have a problem. So one thing in this graph is indeed a single result. Yes, so here, then from those numbers I generate GNUflot graphs like this, that's the thing you have seen before, we are running a whole release, that's 6.2 to 6.3, and we do it per week, and we run five measurements each, and the outlier you just saw in the table is this one. So on some measurements we have only 1.6 gigabit, so what I see, it only happens on the two socket machine, we have here this machine has two sockets with four cores each, and with this hardware configuration it's very unreliable, then I get these outliers, I have not figured out why they happen. And there's also one thing that I want to show you, here we have an outlier that is faster, so all measurements and this day were faster, and what I will develop later is how we can figure out what this is. Okay, now let's explain those quirks. You see those vertical lines here, with those letters on top of it. The thing is that the open VSD kernel is not self-contained, you cannot compile it just on a random system, you have to compile the open VSD kernel with a base system that matches to its checkout date. So when we change compilers, then we have to adapt the kernel sources to make it compile again, and so we have to make sure that we can build it. And on the other hand, when I want to speak about open VSD performance, if I have a change in the tool chain, for example, if I change the compiler, then I also want to see the effects of that. And in this case, I recompile parts of the build system. I don't do it all the time because it would be much too slow, but here I can say, okay, we got a new compiler, I have to compile it. And then I draw some vertical line in here. There are also some other places where I have to do that, when the system gets incompatible. So here, incompatible between kernel and userland, if the system interface changes. And here's all the things I figured out manually what to do to compile an open VSD kernel just before 6.3 when you have installed 6.2, then you have to adapt the build chain and the userland in that way. So the first thing I figured out I have to do is to fix CVS, because I cannot check out the compiler because there was a bug in CVS and it picked some wrong files. So the first thing I patched my build system to fix CVS and then the next step is we had a compiler update to 5.0 and we can fetch the right compiler. I recompile the compiler here. Then there is PF and PFCTL, that's PFCTLC userland program, and they have a common header file, that's PFVAR. And sometimes in PFVAR, the system interface changes. And then the old PFCTL cannot load the rules into the kernel and then we have no rules and then the performance changes. So what I did is just tell him, okay, recompile PFCTL. Here somebody, next one, the E, somebody made a mistake when he committed the kernel. He forgot the file and the kernel didn't compile anymore. Normally such things are fixed within hours in the open VSD tree, but when I do the CVS check out exactly at the moment where it was broken, the kernel doesn't compile. Then I just apply a patch and fix it. So here, again, the PFCTL header files changed, so I have to recompile the userland. Here same for Sysctl, then Clang was updated again, PFCTL was changed, and then we are at open VSD 6.3. So the graph from before changed a little bit by this quirks. We do the kernel check out, we recompile the kernel, we run the tests, we want to advance, and now we check, is there some incompatibility change in userland? I have a list of this, and between the last date and the next date, if there's a change, we check out the userland to build our tool chain and go to check out the kernel. And then we can either go the short way to be fast or the long way when it's necessary. So that's my hardware setup that I actually have. So I have two machines, one is a two-socket machine where I have this unreliable results, then we set up the same setup and just removed one of the CPUs, and this machine is much more suitable for performance testing. You have less jumping around. And what my future plans is to have some Linux machine here to get some performance here, but I have not implemented that yet. It's just a plan, but the machine is already there. So how is the hardware controlled to do things? So I have a master machine that's this one. Here, all the scripts are started and run. So here it's also running a web server where I publish my results. I have a console server, all those machines, both target machines are attached to a serial console, and I have some reboots, install scripts and automatic reboot. I use the auto installer from OpenBSD to install those machines with a release software. So you say, okay, install this with 6.3. And then the master logs in into these machines and compiles the kernel. Compiles the kernel, runs the test, extracts the results and checks out, compiles and runs. And then here we generate new measurements and we reboot all the machines over the time. Okay, what have I found out? So that's the graph from before, and now we switched from weeks to days. Before I had one outlier here on the top, I have called you two, remember? Now I have two of them because I am here, here is a day step in between. And now I can find which are the commits here and which are the commits here and what happened there. And here it was the case that DLG committed some TX mitigations. That means that when you have network packets, that should go out. You don't throw every packet to the network card and say, do something, do something, do something, but you collect them and then just throw a bunch of packets to the network cards. And that makes it faster. That was here. Two days afterwards, Claudio, another open BSD developer, complained to ICB that his suspend on laptop failed and the laptop crashed and then it was backed out again. So we dropped performance again, but Claudio's laptop was stable again. So back to the outliers, these here. I wanted to figure out why is it jumping so much? We have those outliers and also here we have several measurements and they are not one dot. So we have several things. I run some TCP bench, I run some iPerfs, but also the iPerfs on the same setup is moving. So I thought, is there some difference what I can do? I told you already that I run five tests in a row. Can also run 10 tests in a row. It's just a parameter. And what I can do is run the test multiple times. So I compile the kernel, run the tests and then I can either just run them again. I can reboot the machine and run them again or I can relink the kernel. In open BSD, we relink the kernel at every reboot to make a random kernel image, to make it harder to do attacks on the open BSD kernel. It's an address space randomization. And it's done automatically at every reboot in open BSD. And before starting that, I just disabled it because when I reboot and it starts relinking, it affects my measurements. So what I did first was this here, just run it again without relinking anything. Then I started, okay, let's reboot it, see if something changes. And then I said, okay, let's turn on relinking the kernel again and see what happens then. So on that level are the findings with those options, what you can do. In this kind, slide, I keep the machine running. And I took one day, so that's the first of August and that's the second of August, and split it in six parts. So we have four hours between each of those. And what I inserted here are the commits. So this one is DLG, committed pseudo driver that isn't used yet. It's something like trunk and I don't use it. And it's just there. It's just compiled in the kernel. Here, Visa committed something for the Octane platform, but I'm using AMD64, so it's not relevant. And here, I fixed some unveil back in the file system. It's also not relevant. And now we look at the numbers. Here we have 3 gigabit. And here we have 3.5 gigabits. And now we are here. And here we have a driver in the kernel that we don't use. Here we have a commit to the platform we don't have. Here nothing happened. No commit. No commit. Buck fixed somewhere else. No commit. So why is it going up and down? So what I tried next, I rebooted between each of those dots. So here we have those measurements that are happening with the same kernel. And I rebooted between each of those dots. Here I have to reboot anyway because I recompiled the kernel. But now I reboot here. And we see it's still jumping. Oh, it's jumping down. And it's basically the same code. So what I now did is I relinked the kernel. When I recompiled it from here to here, to relink it anyway. What I can also do is relinking it here and shuffling around the object files for security measurement. And then it looks like this. So by relinking, by just moving the objects up and down, the performance changes. And now you see that those commits are completely irrelevant. It's just a layout of the files that changes that. And we are between here. Here's 3.1 gigabit. And here is 3.45. So it's pretty much. So I want to explain again how it works. So what the security mechanism of OpenB is doing. So when the boot loader loads the kernel, it looks like, oops, got the wrong button. Here. Okay. When the boot loader loads the kernel, it looks like that. So we have, here's the boot loader. Here's the low core 0. That's the part of the kernel that's setting up page tables in a very basic way. And the boot loader has to know where to jump to it. And that's the start function. So this one is a fixed address. And by relinking the kernel, before booting it, we add a random gap so that those object files are jumping up and down. And we have a linker option that shuffles them randomly. So when an attacker wants to know, oh, where's the address in object 4, it's completely different at every boot. And the low core jumps to the main function, which is in this object file. And after starting, this part is unlinked from the address space. The thing is that here you have the pointers to there. So if the attacker can read that, he can figure out where this object is. But not in OpenBSD, because this here is unlinked. And you can't see it anymore in the address space. So nice thing for security, but bad for reproducible performance. So what I did now. I sort those object files and I set the gap to zero. And now everything is good. See here? So what's that? So here we have a file, a commit that's not used, but an additional object file in the kernel. And of course that moves all the other object files that are above it. So they are sorted alphabetically. So everything after T is moved. This means we still have some moving effects of performance. And I made a test. I just add a random gap. So I sort them alphabetically, but I just move the gap. So move the kernel up and down. And it starts jumping again. So what I can do now. So my thing was, okay, it has to be something with alignment. So cache lines, page alignment, what's my guess? And what I do now is I align those each object file at the page boundary. That means the kernel gets twice as big, but for the test it's irrelevant. And what I get is very precise values here, but still this one jumping. And the thing is that this one adds quite a lot. And the other objects above it start jumping by whole pages. And that's still relevant for performance. So my previous theory that it has to be something with page alignment or cache alignment is wrong. And I don't know why it jumps. If you execute the object file at a higher address, even it has the same alignment page-wise, it changes performance. I did some measurement here. So I took NM to get all the symbols and addresses. And I sort it and I div them and send them to div start. And here with the unveil change, that is very local. When I use, when I just sort it, the, it affects much more code above it until it gets some with alignment, it fades out after a while. And then the addresses in the kernel are the same as before. But if I use the alignment kernel, it gets much better than all the changes are only in the symbol of the source file, all the changes and addresses of symbols are within the source file of this unveil source file. It's somewhere in the file system. So here it works perfectly. Changes are locally. But here with the other with the new driver added, it doesn't help. So everything above it gets new symbols and it's completely somewhere else. Sorry. So what did we do? Now we have 64. I run it for 15 days. Let me see the next slide. Ah, now I know what I want to explain. Okay. So I took the two CPU socket machine. That was that with the outliers. And here you see, we have this line. That's where most of the results are. And there's a shadow line where only one result is in between sometimes. So I checked the slides where you have all the page where you have all the results. And you see that's the first try, next, next, next. And it's always the second cycle. So when you keep the machine running, I don't reboot here, run the test, run the test, run the test, run the test, run the test. Then the second cycle has different results. And they are much slower. No, they are 10% or 20% slower. And I don't know why. So what happens there is that by recompiling the kernel here, the times on the CPU, how much it was used is changed. And then the scheduler decides to run the threads on other processes that might be on the other socket. And that affects performance that was relatively clear to figure out. But then, if you recompile the kernel again and go to the next column, then the performance gets higher again. And that happens reliable on the two socket machine, but not on the one socket machine. And it's... So this going up again, I can't explain. So now what I did now is moving to the one socket machines, because the other one is just crap. And I did the... So the colors change. I give each measurement has a color in this new plot graph. And so we... Depending on whether it has the parameters for the four CPU or for the eight CPU machines, it changes. So that's why it's green and red here. And here we see again this TX mitigation that was this two-day performance fix. What I also do is compiling the kernel and measure how long it takes. And here we upgraded to C-Lang 5.0 and it got a little bit faster. Here we upgraded to 4.10 and also get slightly faster. And here we added some meltdown fixes. So it made it slower. We don't see this on the TCP graph. It's only in the kernel compile time where it affects it. By the way, this is the system time where we spend in the kernel. That's the real time. That's where we are actually in the compiler doing real stuff. And that's the combined time how long it takes to compile the kernel. What has to be mentioned, it's not a clear measurement because we run on the kernel, we compile the kernel, so we depend on the source files we compile and we could depend on the compiler. It's not made to figure out what's going on, but it's made more for the fact that a lot of developers say, oh, my machine got slower when they compiled their kernel. What happened? And I just want to track how the build time takes. And there's one other thing. When you have those network performance, the fast numbers are higher. But here's the time to compile the kernel, the good numbers are slower. So it's going up. So now we have TCP performance from 6.3 to 6.4. There we have multiple things. First of all, here we committed red poline. That's a spectra mitigation where we make function calls more expensive because we trick the branch predictor that it doesn't find, that it doesn't remember where we jumped to. So spectra attacks get more difficult, but jumping on function pointers also and the network stack does that a lot. So here we lose performance. Then we have added witness. Witness is a tool that we copied from FreeBSD that helps you to find mismanagement of locks and deadlock prevention. And it's more or less a debugging tool. And we were running it for a while in our kernel just to see other reports and can we fix it. So it was never planned for release. We know that it would be slower. And here we enabled it. Here we got another security feature that's Red Guard. Red Guard makes it harder to exploit return-oriented programming ROPs because those ROP slides that you have at the function stops before the return, we have an additional check before the return instruction to figure out if the stack was mangled and some attackers doing evil stuff. So we have more jumps, more compares, and it gets slower. That's here. And here we disabled witness again. So it was a debugging test for just one, two months or one and a half months. We enabled it again and now we got performance, but it was much less than before. And here it's more or less constant. And you still see we have a broad range of performance because I didn't get the numbers they will get. So when we look at the compile time, here we updated C-Lang to 6.0 and since them updating the compiler made it slower. Then we added a new DRM. That's the graphics driver. And it added new files. So compiling the kernel gets slower because we have more source files. So that's expected. Here we see again what we had before. We had you especially see it at the system time because system calls get more expensive. Here we have witness. That's a kernel feature. We have red guard. That's kernel red guard. So system time goes up. And we have witness turning off. That's also kernel. So it happens here. And here the system time got up. But here it got down. So when I started doing this project, I was around here. And then I saw, oh, what happened here? The system time got down. And it was a bug in the kernel. We changed the way we measured the time. And we forgot some times, especially the spinning times. About mute access was forgotten to add to the system time. So this time is just a measurement error. And here fix the bug. So now we go from 6.4 to 6.5. So nothing really important changed here except here. We have added safe arcs. Safe arcs is a feature that we copy the register when we call a function. We copy the content of the registers to the stack. Because when the kernel crashes, you get a trace, a back trace. And then you see all the arguments that were given to the function which makes it much easier to debug. But it costs time by copying it from the registers to the stack. And you can actually see it. So I made some drill down and it's exactly this line. So now the name, same for kernel compiling. Here we changed the linker. Before we had the GNU linker. Now we have the CLANG linker or the LLVM linker. And what you see here, that's the link time. And it's pretty constant here. And now LD is multi-threaded. And that's the reason why the link time is not stable anymore. It's going, depending on how the threading works, it's faster or less fast. So here we changed to CLANG 7.0. As usual, compile time goes up. Then we changed to, we created a library out of the compiler. And it said, okay, we create now the compiler, use the library and talk to it with Patrick yesterday and said, oh, the compiler doesn't use it. So this change is irrelevant, but you still see it here in the build time going up. The reason is that it's a quirk. So I recompiled here the compiler because I saw, okay, we had this change. But here in between, there were also some changes to the compiler. So one of those changes between the full compiler build here and the full compiler build here resulted in this little, being a little bit slower. So what we did here is we recompiled the compiler again because we had a stack protector in there and the red guard. And when you have a red guard, that's the stricter thing. Then you disable the general stack protector. Okay. So compile time got a little bit up. But again, I compiled the whole compiler, could be something hidden in here. And the system time got a little bit down. So the kernel, it's only kernel stack protector that's removed. So the system time is a little bit down. So it works better. So that's the newest release. So that's 6.5 to now. So we told DLG about the problem that the performance, no, not the problem, the thing that his TX mitigation made it faster. He said, okay, now I'll try to make the diff again without breaking Claudio's laptop. And that was committed here. So performance got up. Then I couldn't figure out what this is. The problem is, so another thing, I changed to alignment now. Before I had those randomized kernels. Now I have the alignment kernels. I couldn't do that before because I need the new linker, the Selang linker. And we hadn't that before. So I can do it now. And you see that the broadness of the lines went down because the numbers are more exact. But every time when we have a change that affects the layout of the kernel, although we align it, like adding a driver, then we have something like this. Or we have something like this. You see it goes down and then up. And I couldn't figure out what it is. And here also it goes down, down, down. If it's one going up and down, then it's quite easy to say, okay, what's that? But if it's going down slowly, it's really hard to say what it is. So working on, we figured out that the kernel did not use the checksum offloading feature of the driver because it was just disabled in the driver. We enabled it. It was the IX driver. And then performance went up. I haven't figured out what this is, this, and this because I finished this measurement last week and didn't do the drill down yet. So to figure out those things that happen, I showed you the click before. You can click on the column and then you find the CVS log. I take everything from CVS and put it in a nice HTML page. So you can just scroll through it. And here you can click. That's the link into the CVS web of OpenBSD. So you can figure out what was between here and there. So now I have something about UDP. That's the slides I could skip if I'm too slow. So we have five minutes. Okay, we do it. So I started doing UDP performance tests. Basically, I do it from 6.5. I did it with IPerf before and it was just a constant line. Not very interesting and I wanted to figure out why. All the releases before it had only this measurement. That's the IPerf measurement. And I saw it and then here we added MDS mitigation. That's a machine that has an old CPU without new firmware. So MDS mitigation from Intel without Intel firmware drops the performance. That's expected. And then I think Katennis committed a time counter change. That means that we changed the way how we measure time in the kernel and they are dropped. And it has more or less nothing to do with UDP. And here we changed time counter, the time counting again in the kernel. That's pure off this change and went up again. Here we have the checksum fix. That's also expected that it gets faster. And so what I wanted to know, how does this time counting affect our graphs? And I did those IPerf tests with different time counters. That's TSC. That's the CPU hardware time counter. That's the fastest. Then here are some other ACPI counters. Get slower. And some very old Intel counter. So we have very poor performance. But it has nothing to do with UDP. So I checked what does IPerf do when it does UDP. For every UDP packet it is sending. It was one write for the TCP. Then it has two time of get of day. One select system call. And another two get time of day system call. So basically what IPerf minus U is measuring is the speed of your get time of day system call. And the Linux guys don't get it because they have mapped that in memory and it's very fast for them. Okay. Now I went back and said, okay, let's write the tool myself. And I wrote a tool that doesn't do get of time of day while it's measuring. It's only doing some alarm timer to get informed when the measurement is over. So it's all done in the signal handler without system calls. And then they get this line for large packets and that line is small packets. They are more or less the same except that you can't see the changes in the small packets because of the scaling. So here we have the TX mitigation. And for UDP it gets slower because packets are delayed. And that means that you have bursts. And the bursts especially is bad when you try to measure UDP. But for UDP it gets fast. For UDP it gets slower. Here you have the MDS. Yeah, of course. That also affects the write and send and receive system call. Don't know what that is. Going up and down. Here we have the tech checksum. Check some fix. Performance goes up. Perfect. And those time counters don't affect our performance anymore. So what's the conclusion? So measuring sucks. It's really complicated and you see things you don't understand. Multi-sockets boards suck. You see effects that you understand even less. Reproducing your numbers is hard. So it's not an easy job to get the same number when you do the same measurement twice. Don't trust your numbers. Look at this iPerf tool. It set me for two or three releases, your numbers are perfectly constant at one gigabit. And I never know how does it measure one gigabit. And keep it simple and stupid. The more complexity you add, I have no switch in there. It's just a cable between two machines. And it's just two machines with the same software and sending packets from one machine to the other one through a single cable. So what can be done additionally? I have not done forwarding, it's just measuring the stack, sending from one machine and receiving on the other one. I could add some Linux client server. I have the hardware standing there, but I have to figure out how to do a sensible measurement. So I have a constant source or drain of packets. I can't test patches. The framework doesn't provide that. I only measure committed things. And if somebody sends a patch, I want to test that. Somehow, automatically, I can measure historic releases. How was the performance at OpenBSD 4.0, for example. And file system performance. I have a test there, but it doesn't work very well because the controller I have sucks with the hardware and some problems. So I have to say thank you for Jan Klempel, who is administrating all the machines and keeps them running. So Moritz Buhl, he's not sitting in my talk, he went with the other talk. He did. I say thank you anyway. So he did all those GNUplot visualization and also helped keeping the machine running. My employer is Ginoa. They pay for the rack space and work time. So I have put everything online here, those links. You can get those results, those slides. And here's all the test data in a text file suitable for GNUplot. I've put all my performance and regression testing in the GitHub project. It's here. Here's the code from Jan Klempel to set up to make those auto installs and the auto configuration of machines. And this here is my talk on GitHub. That's how it looks like. Thank you. We have time for a couple of fast questions. Go there. Two questions about the tools. Why do you use IPERV and not TCP bench, which is in base and supports UDP as well? So I do TCP bench for TCP. So I have those differently colored numbers. Those are the reds and the yellow ones and the green ones. There's IPERV and TCP intermixed and they give the same results for TCP. For TCP bench minus U, I have the problem that I cannot measure how many packets I received in an automatic way. So I have two tools running and the one says me, okay, I'm sending this much. And the other one prints, now I receive zero. Then I get some packets and then zero again. So I decided to write my own tool that's called UDP bench that is setting up the receiving side with an SSH connection and sending the packets here, receiving there, collecting both results and publishing both. So TCP bench for UDP is not very suitable for automatic testing. And the other question is, I once did performance testing where I used, well, one of the device under test, right, the OpenBSD box in a second machine where I used, and there's an bootable net map image, this freeBSD net map thing. And you can just put it on USB booted and then it works in a way that it basically puts packets directly on the network interface ring. And then I had a dual port card. So one card was set up to send and it can also measure the packet it receives. So that basically allows you to use a dual port IX in a server with this net map image. It's freeBSD, but it's just for testing. Yeah, it's okay. And then you can generate wire speed traffic and measure the return. And the OpenBSD box in the middle just has to forward this because I always have the problem that the tools to generate that much UDP traffic are not really fast enough to send or receive it. For a revert situation you get wire speed from the outside. What you describe is the correct setup for doing forwarding tests. But what I wanted to test is the TCP stack. So I have to have a real TCP connection. And I just wanted to see how it does sending and receiving from the same machine work. For example, for UDP, I think the problem is that the system call we have can only send one packet per system call. That's why this UDP numbers are so slow. So it depends what you want to measure. I would say for forwarding a setup like you described would be better. It can be done. That's the reason why there's this Linux box in this picture, but I haven't set it up. But for measuring the stack, you have to go through your stack anyway. And if you go it on both sides with OpenBSD, it's just a problem that you can't see if it's sending or receiving side causing the problems. Okay, thank you. But of course, there's a lot of room for improvement. And I can do historical things. I can say, okay, run from 6.2 to 6.3 again with its new test. That's also an advantage. I can think of a test in the afterwards and say, okay, now let's put the test to the past and see what happened there. Thank you, Alexander. Thank you very much.