 Good day, everyone. My name is Marek Szprowski, and I will tell you my story about day-to-day testing of Linux Next kernel branch. Let me introduce myself. I work for Samsung Karen Deen Institute, Warsaw, Poland since 2008. A year later I became Linux kernel developer, and in two years I became a Linux kernel retainer. I do day-to-day testing of Linux Next kernel releases since 2018. That reminds me a few words about Linux kernel development model. Each new Linux kernel release comes every three months. The code is hierarchically maintained. There are retainers responsible for various subsystems and various parts of the code. Each retainer manages its fixes and next branch. The fixes includes fixes for the current release, the next branch usually includes new features for the upcoming release. This new code must be first tested in the next branch, which is then merged during so-called to-mix-merge-window, then stabilized during the release coming in the very period. This is how it looks on the graph. We have a base release of 5.18. Then the fixes that are based on that release are being merged during the 5.18 stabilization time. And the new features in the next branch that are also based on that Percival release are merged during the merge-window. For the next release, this would be 5.19. And they appear in 5.19, Percival release. Of course, this is about two branches. Each retainer has such two branches, so there are lots of fixes and next branches. If one wants to test if new features don't break the kernel, he has to either check next branch for each retainer or wait for the release of the new next. The new next is a project that merges all the next branches from all maintainers. It contains a release almost every working day. The goal is simple, to check if and find regressions before they reach the main kernel branch. There are various levels of testing. First kind of testing is compile time. If the code is correct and don't want to be in the break. Then if the code compiles, we can run it on the editors like Qn. And this is what is being done during preparing of the new next project. Then we can try to put it on the real hardware. Once it puts on the real hardware, we can run some specific user space tools to check if all the features works properly. We can even prepare some advanced test scenarios that includes various interactions between user space tools. If you want to do the test of the real hardware, you have to first have such hardware. And the easiest way to do test of the real hardware is to have a test farm. Well, there are various approaches to a test farm. There are separate lectures about that. I will just quickly show how one test farm looks like. I have about 30 single board computers connected to the standard PC. All are based on 32 or 64-bit ARM CPU. For the historical reasons, I have a lot of Exynos-based boards because I did a lot of Exynos kind of development. And then I collected various random boards that were available in the office like Raspberry Pi boards or Android family. I even have ARM, you know, R1. That PC that manages my test farm has over 50 USB devices connected. There are two large internet switches there than USB hubs. All this occupies for storage shelves in the test room and there is a lot of cables. Here are two pictures to give you impression of how it looks like. Well, for me it is important that it simply lets me to do some tests on the real hardware. A few more words about the hardware configuration of my test farm. Each board is configured to output kernel logs and user console to the UART. Boards use Ethernet for the data connectivity. It might be built in Ethernet, USB Duncan or USB CDC Ethernet Gadget if none of them are as possible. I control power with USB XTDI adapter and a set of relays. That adapter is configured in GPIO mode so I can turn off and off each relay independently. I also have some USB cameras there for monitoring board display because some of the boards have a display. A few important things from the software configuration of my test farm. Boards are configured to load kernel and modules via TFTP protocol from the PC. And they have Demian Route file system stored on the persistent medium like VMCC or SD card. I identify USB UART adapters by serial ID feature. This gives me some independence from the USB topology because I can find the given USB device by serial ID. Regardless where it is connected to which hub. And I access the board with SSH to control PC. I have a single script to control power on, to control power to turn it on or off or in address in the board. And to get access to the console of the given board. I have no board reservation, sharing or any other kind of management. My main goal was to allow quick access to all the boards like they would be on my desk. How do I did the boot test on my test farm? This would be the first test done on the hardware. I prepared a shell script that configured the kernel, compiled it, deployed it on the PC that manages the test farm, turns the power on for the given board and then waits for the login prompt on the console of the given board. The last thing I did with the expected tool, which is very convenient if you want to do something on the UART. On the UART and the console there. Because expect tool can output some char characters and wait for the response with a given timeout. If the login prompt appears after a given timeout in a given time, I assume that the boot test has succeeded. Even such simple approach allowed me to record a few issues back. Then I decided to do some more tests because in manual testing one usually runs tools like ModsBest, like with config pink and so on to see if display or networking is working on the test board. One also checks if some files in this directory. If the devices has been propelled early into your lights. Then I've also added support for different kernel architectures and different kernel configurations. Like some boards were tested with Xenos DevConfig and some multivisible DevConfig. I also did ARM 64 bit test. Then a single shell script for doing all this became a problem. So I have to make the script a bit more generic and extract all data from it into separate files. I decided to extract two kind files, configs and test configs includes list of boards. To test, we can give a kernel image, the architecture of that kernel, the configuration file and cross compiler used for it. And the tests are set of rules for the expect tools. This means there are characters to send and the phrase that the tool has to write. After sending them given timeout. And to quickly check if the test succeeded or not, I've added coloring of the output. So if everything is green, I'm happy because all the tests are. Let's see how it looks in practice. We have Linux next release from 8th March this year. And let's run the tests. Well, my script made rather red summary. We see that some boards, which means that some lines are all red. This means that the board doesn't even want all the tests have failed. Let's see what is in the logs for a given board that is all marked in red. We see kernel logs up to display panel initialization. And then after three minutes, which we see that in the timestamp, nothing happens. Then we see the message from the test script that the test has failed. This means that the board didn't do any activity. So this is a regression board stopped booting once we switched to this Linux next release. I've checked that the base release for that Linux next is 5.7 in RC1. And it's fine on that note. So we have to use, we have to somehow find a comment that is between that base release and the top of Linux next that introduce that regression. Git tool has a nice sub command for that. It's called bisect, and it's especially intended for finding revisions. We call it by, we do that by calling git bisect start and then give two parameters. The ID for the tree that fails and the last known working state. In that case I've put this 5.17 RC1. Then it tells us that in roughly 13 steps we'll find the comment that causes the regression. And then it checkouts one of such comments that need to be tested and waits for the user to do the test. And user has to tell the git tool that the test was good or bad by calling git bisect good or bad comment. Testing that comment is quite easy, especially that we have already scripted that knows the test or the real hardware. However, compiling kernel and putting the boards is still time consuming. We can easily mix something or forget what was the right result. So git tool can also run this script for us and get the information about the result of the test. Why I return code that's scripted to zero means that other values have some special meaning like bad, skip or abort the process. So I extended my script that does the test to return a proper value depending on the test result. And here are the list of comments that I run to find the regression. git bisect run test bot, this is the name of my script and to our comments lets the script know which board test and which configuration is used. This way I found that which commit is the first bad commit and that the section has succeeded. I decided to check if that commit is really responsible for that regression. I did that by getting back to the top of the Linux next, the main Linux next release from that day. I've reverted that commit that has been found during the resection and run manually the test again. The test has succeeded so I found the commit that introduced the regression. The commit doesn't change in frame buffer subsystem, however there is nothing much suspicious in it. What should we do to report the regression? We have to make sure that we will notify everyone that has been involved in developing that. This is quite easy if the commit has a link tag which might point to a patchwork or to Lore kernel work service. From which we can just download the message that contains the original patch. Otherwise I recommend just searching Lore kernel work service for the mail with the same subject as the subject of the patch that causes the regression. Once we have a mail with the original patch we can also see from the discussion if the issue has been reported or not. If not we would like to report it. We have to describe what is the regression, what source tree has been tested, what was the hardware platform that we used for the test. This is really important information which kernel architecture has been used and configuration. If we manage to get the stack trace of the crash, it can be attached there. We can also add information if re-burnting it on top of the next steps and everything else what we already spotted. This is the example report of such regressions since the mail increase. This is my mail from 8 March this year. Describe that I found that the commit in Linux Next causes freeze after the DRM and related frame preferentializations on something XML-based most. And this happens only if kernel is compiled from XMLs.dev config. Well then the discussion begins and we can help fix the issue. For example Linux Next from 18 May this year. Let's run the test. We see that most tests succeeded however the last column is red and there is information that we are having some warnings during the booting of the boards. Let's check the logs. Indeed there is a warning. There is a stack trace that caused that warning and in that stack trace there are function names and offsets of the various kernel functions that were called when that warning has happened. How we can use this information to find the regression? Well if we want to do this automatically we can use the function name from the stack trace and just search the logs for it. We have to run the offset because offset might depend on the actual commit that has been from which the kernel has been compiled and the function name is rather stable. Typically those names don't appear in the logs in everything we are finding. So I've added an option to search the logs for a given string and report it as bad if the given string has been compiled. Here we see that I've added a parameter bad with PLKNG create value and we see that Kint data automated bisect found first.commit. I've double checked it again by getting back to the top of Linux Next reverting the commit and running the test manual. Then the test succinct. So this is another example of my report. I've included the stack trace fixing this issue. What might we find during finding for the regressions? The first issue that might be is that reverting the faulty commit on top of the Linux Next fails. We might try to get merged to resolve some simple conflicts. However this doesn't help in all cases. The other approach is to find all commits that modify the affected files by the affected files in the faulty commit. And reverting them to this usually means that all the patches from the patch series has to be fine and reverted. Example is it plinks Next from the 31st march and we have then this automated bisection from the log subsystem. We try to revert it but we see that this fails. Git merge tool also doesn't help much. However it shows that there is a problem given file in the driver's block directory. We get back to the main Linux Next release from that day. And we check what are other commits that changed that file. We try to revert it and then we revert to the faulty commit. And after running merge tool we finally manage to revert it. And then we run the test and the test succeeds. So we have confirmed that given commit really introduces that regression. Well we found that let's describe again everything we need and report the issue. Here we also see the stack trace and there is information that I've tested it on top of Linux Next together with by reverting this commit and the other one that we have. What else might be hard during the regression finding? A new Linux Next release might contain more than one regression. If we run run bisection we usually find only one of them. In the second we can find if we carefully ensure that always the first regression is reverted. Here is example. In the Linux Next release from 13th of April there are two commits that causes root regression. And we can find we can do the second bisection again manually by always reverting the first found commit. The first commit of the regression in the first commit. However I found that we may use git stash function for a little help. Here we check out to the top of the Linux Next do the reverting of the first regression. And then I git reset mix to the top of the Linux Next. This leaves me all the changes reverted from the first regression in the working tier and I store them in the git stash. Then when I did the second bisection I only need to git stash apply to make sure that the first regression is removed. This is a nice feature of the git because if the commit with the first regression was in that tested tree that git stash apply will remove it from the working tier. However if it was not yet there in that tested commit, git stash apply will notice that the working tier, the state of the modified files from the stash in the working tier and their state in the stash is the same and will just notice that there were no changes, no file has been changed. So in both cases this commit will succeed. So we can add it also to our test script to have it already prepared for finding more than one regression in a single read. There might be even more complex issues than two issues in a single release. Misacting might point us to a merge commit. This is rather rare case, however this means that there are some non-trivial dependencies between both merged branches. Example of such issues in Linux Next released from 30th June this year. Misacting points to commit to a merge commit. This is how it looks on the graph. We have a merge commit that has been reported by the git bisect, of course the Linux Next release also is considered a lot from that day. But both parents of that merge commit has been tested and they are good. How to find which commit caused the regression? My approach in such case is to rebase the topic branch, so the branch that introduced that has been merged to the main tree, onto the last working commit from the main tree. This will make a few more new commits, like in this case there were 25 commits in the topic branch and after rebasing they are on top of last mood commit. And however this makes Dalinar history that can be easily accepted again. So in that I ran this section again and found that there is a commit which touches kernel as subsystem that is responsible for the regression. Its hash ID reported by git bisect is irrelevant because we rebased that commit. However we can easily check in the log of the branch that there is also a commit of such subject and get the hash ID from there. I've done check that really that commit is responsible for the regression by first reverting it on the top of mentioned merge commit and running the test and then reverting it on right. I've reported that it is responsible for the regression. Another problem that might appear during finding the regression is the problem with code compilation. Sometimes we find some code that doesn't even compile. Even maintainers try to keep the code compiling. There is a lot of cases where it's simply not possible because no one will always compile for all possible architectures and its configuration types. To make sure that any automated misection works fine I recommend to make the script, test script to abort if the code doesn't compile. To avoid reporting broken code as a reason for the test kernel regression. In such case I try to manually see what was the reason for the build break or if it can be easily fixed. Or I try to use the git bisect skip approach which instructs git to skip that commit without judging if it is correct. So a little summary of my presentation. I've showed a few of my solutions for finding kernel regressions. I was really surprised how many issues can be found in Linux next releases. Although everyone tries to keep the code best and avoid introducing regressions. What should be noted that a script that does a simple pattern search in the boot logs covers really, really most of the issues I've reported. And here are some numerical results. In the 5.19 Linux release there are over 100 commits with my reported buy tag and there are over 300 commits with my tested buy tags. Well this means that sometimes developers don't add reported buy tag or the fix came independently of my report. On the other hand I try to test all the fixes that has been posted by authors or owners and give them my tested buy tags. Sometimes there are more than one patch that gets the tested buy tag if there have been some data. In my approach I also observed that there are false positives. So we really should take care of reporting issues. Especially when looking at the results on the best script. The common console for kernel logs and user space is really a source of false positives. Because mixed logs from kernel and user space really confuse the expected door. Also my test heavily relies on the USB devices. I use USB smart adapters. I also use them for controlling relays. And I also use USB for CDC determinant gadgets for some words for data connectivity. This sometimes fails randomly. So if you want to make it bullet proof don't use USB at all. This is known to everyone who did a lot of build his own farm. We tell you the same. Another important thing that those four kernel configuration options allow me to find a lot of issues. They are enabled almost only in Exynos DevConfig. Other kernel configuration files doesn't include them. So if you want to test please enable manual configurations. And to wrap up there is still a bit of manual work to do the tests of the given Linux next release. However, analyzing such finding and analyzing revisions is a nice home and really nice mental exercise. Especially if it's not in background besides my day to day tasks. What is also important that I'm not alone in testing the Linux kernel. There are other others. The most known one is the Linux kernel CI project. It is being used during the Linux next preparations. And on the next list we see reports from that project. And the other very well known is Linux test robots. It's a part of the CI kernel test service. There are lots of reports from that system. And from my point of view what is really important to know that you will never be faster than any of those robots. Those robots are very, very good at finding compile time regressions or simple regressions observed on QM. And they act usually a few seconds after the code has been put on the toolkit services. They are very, very fast. They are others so you can search them. If there are any questions let me know by email. Thank you for your attention.