 how to report and handle, this is a talk about how to report and handle Linux kernel regressions. So in other words, it's a talk for people that are reporting regressions, also just as a talk for developers that need to handle the regression. So that is normally avoided to do this kind of sharing, but well, it's a narrow topic and mixing is better than nothing. And there are many things that both sides need to understand anyway, things cook down to one important and common aspect in the end. Anyway, that is regression, regression shall be fixed. That's because this man you might have heard of him is like most of, like most of us, he hates when he's installing updates and afterwards something, something that's used to works, doesn't work, works worse afterwards and works worse also include this doesn't work at all. He hates it because people then stay on outdated versions that have known problems are less secure or even vulnerable. Yeah. And yeah, as it happens, that man created a well known software. If you feel if you might have heard of, I don't know if you heard of this Linux thing and coined a rule for its development that the rule is basically no regression is also known as we don't break user space. It's really hard, hard rule that and Linux is pretty rigid in that aspect. Even important fixes are reverted if it if it turns out they cause regressions to explain it a bit more assume with totally bugging behavior and the Linux kernel is fixed and assume further Firefox, for example, afterwards starts to misbehave for quite a few years for one or multiple users. If the fix, if the kernel fix can't be fixed, it's reverted even if Firefox is fixed, because people might use older Firefox versions for some time. And the buggy behavior in the kernel then stays in the kernel, maybe not for a while, but maybe for a year or five years until really all the fixed Firefox versions out as long as no, no other fix can be found for the kernel kernel. So in the card game, a regression is basically one of the highest cards, the second highest to be precise. That's basically could get everything else out again. Nevertheless, if you look around it might look like quite a few regressions that are reported are never addressed. And there are various reasons for that. If you look closer, you see them, some simply because the report was bad, some because it in reality was a bug and not a regression, and some because nobody located the change causing it. And yeah, sometimes it also happens that developers don't handle things appropriately, which they really don't want because otherwise lines might send them angry emails, but if he notices, but it helps, and I'm here to help with avoiding these situations. And here to help users with it with regression to make sure they are fixed. So basically, I'll show you how to get your regression fixed if you are not a developer, if you face one, or how to how to handle it appropriately as a developer, if you caused one. Both things obviously require you to know what exactly is a regression. So let's get down to this and show what it's a regression if a kernel update breaks something or more precisely to quote the Linux kernel docs. It's a regression of something running fine with one Linux kernel works worse or not at all with a newer version. That's and now comes a small but important aspect that's compiled using a similar configuration. To understand that fully, let's get into an example. Say your distro updated from six dot one to six dot two, like Arch Fedora on open source tumbleweed do every nine or 10 weeks, they do these jumps regularly, regularly. And then yeah, after that, this jump of the kernel, you'll be loved software from maybe 20 years ago stops working. And it actually starts to work again if you switch back to Linux six, six point or one. So yes, that's a regression. The age of the software doesn't matter. But as I said, there's something else that that mattered. It's only a regression as long as it's not caused by an optional new few new feature here, the crucial word word is optionally here. It complicates things a little bit. But it's not a loophole. It's important to allow progress, for example, to allow new introducing new security hardening techniques that break and and see and apps. So it's a really good and important thing because without this Linux would become stuck and obsolete because it could introduce new features that break things. And and that's why it's so important. In the end, it says the the users and the distributors has to choose what's most important to them, the compatibility to old apps or the new future, as you have to explicitly, explicitly enable these features. Either you do that at runtime, we are CISFS or ProcFS or something like that. But that's doesn't that doesn't happen that often, more often. It's the case that you have to enable these features at build time when defining the build configuration for you for your kernel. Because and and there are all new features that might cause regressions are disabled by default. So if you take the configuration from an old kernel, say 6.1 and try use make old depth config to to build greater configuration for 6.2, then those features should should be disabled. And that's how you build a kernel with a similar configuration. Luckily, the thing is these features that are known to cause regressions are few. It's not something you should you lose your mind over. But it's something to to keep in mind because especially for for users of distribution kernel, because the deliberate config change by your distro might actually have broken your beloved app from 20 years ago. And that's why you might need to to report the regression to your distributor. Sadly, yeah, that often is not really fruitful because they have lots of bugs reports and can't look into each of them. So you might want to report it upstream. Yeah, that's why if you want to do that, you basically should try to compile your own kernel. So build it yourself and check if it happens if it happens there. So basically boot the working kernel, download the latest kernel sources, ideally mainline so to have something really fresh at that's where all the regressions are fixed first. And you configure it using make old depth config that will normally pick up your old kernel distributions kernel config. Yeah, then then you build the kernel test and yeah, if it's broken, then it's really a regression. But in that case, you really should also compile the working kernel again, because it might actually be something in that kernel that makes it work, especially if you have patched distribution kernel, because working might be due to patch the distro applied. And yeah, the Linux kernel developers only care about their kernel. That's why you easily should rebuild this working kernel to to ensure that it was really working in a vanilla fashion, unless you have Linux distribution that is using a vanilla or close to vanilla kernel anyway. Yes, building a kernel takes time, but no, it's not that hard. And there's actually a document that might soon get into the kernel to to make it easier for everyone. It's in the submission phase currently. But this building also might save you also some time if you, for example, build the latest mainline kernel and see everything is working there. Then you see, then you notice it was already fixed and that then you won't write a lengthy report and then all developers with a bug that was already already fixed. And the thing is, when it comes to regressions, you often need to recompile kernels in the end anyway. But I'm getting ahead of myself here as we are. That is something for the next topic we are going to now is that which is somebody must locate which change causes the problem. And that's what the Linux kernel, it's basically like upgrades in real life. Say you pay somebody to upgrade your laptop, like a bigger hard drive or something like that. And afterwards, something unrelated is broken. So maybe USB or something is broken. Yeah. Well, what do you do? Yeah, the company, the personal company you pay has to fix things. That's how it is in life. And the basic idea is similar with Linux, the developer who caused the regression needs to fix it or if the developer is not available, superior thing is you didn't pay anyone until you don't know which developer caused it. And that complicates things a little bit. But that's also something you shouldn't lose your mind over initially. It's fine to report a regression initially without knowing which change caused it. Just report it and with a bit of luck, somebody will know what which change is causing it. Maybe because the root cause is already known or maybe even somebody is already working on a fix or because somebody can point you in the direction of a likely culprit that's causing the problem. Someone are the maintenance and mailing list subscribers. So other developers, as the maintenance of developers are expected to assist with such regression reports where somebody says, hey, this is this stop working to at least think about, hey, is that something I might have caused or where I know where this comes from to help the users a bit. This often works out. But at the same time, the thing is this often doesn't work out. Sometimes nobody really has an idea what's wrong, especially when it comes to complicated problems where even the subsystems might not be known which is causing it. And then somebody else has to locate a culprit. Yeah, but who? Linus, Linus can't take care of each and every bug reports. There are way too many of them. And the developers or maintainers, as I said, they are expected to help somewhat. But in the end, they are just volunteers. They're doing most of the work in the spare time or a lot of time slot. The company, the employer gave them. And there's always way more work than they could do in the time. And, yeah, they can't look, they can't spend hours on each and every bug reports because they don't get any other work done at the same time. And you can't tell companies to do the work because, yeah, do you want to say AMD or Intel have to root out to look after every bug in the arch directory or something or everything that has to do with ACPI? And in the end, it turns out it was caused by a change from an ARM developer or an IBM developer. Those companies can't do that. Yeah, and that's why in the end, you get what you paid for. And you didn't pay anything to the developer. And, yeah, that's why you get nothing from it. And that's why it's in the end your job as reporter to find the culprit. The thing is, that's often needed anyway because many bugs or many regressions only trigger in a certain environment, say with specific hardware, sometimes with only with a specific firmware, sometimes with a combination of hardware distribution, sometimes the configuration is involved or it might be the application that it's used that might be proprietary so that other developers might not have it at hand or can't even install it anyway. And that's why we do as a reporter really have to find the change that's causing it. The goodness is you don't need to be a developer to find the culprit which is causing the regression, which brings us actually to the second reason why you often need to compile a kernel when it comes to kernel regressions. Or in fact, a few kernels, in fact, as a culprit often can be found by compiling, say, like around about 15 kernels. Don't be scared. That sounds a lot, but it's a lot easier and often works a lot quicker than it sounds thanks to Git bisect. That's actually some feature integrated in Git. So say you have current 6.1.13 and update to 6.1.14, which out turns out to be broken somehow. Now you go and recheck the older one if it's if by going back makes things work again. Yeah, that's a good one. And now you look closer with Git or something else and you just notice, hey, there are just eight changes in between. And yeah, what a bisection does is basically you jump in the middle of of those of that area, build that kernel and check and check if it's broken on off or not. Let's assume it's also bad. Yeah, if it's also bad, you go further down in the history. So in this case to the left and to repeat the exercise. Let's assume that one is good. Yeah, then you go for do something new and the remaining span. So to the right test again. And yeah, let's assume that one is also good. Then you know the fifth commit was actually the culprit that causes the regression. And it works like that on a much larger scale as well. As I said, it's around about 15 steps typically between two kernel versions. If the span is a bit bigger, like two or three versions might get up to 20 or 25. It always depends a bit on the consumer to check how many is on there. And once you know the culprit, the change that's causing it, yeah, you know, it's clear who's responsible. Either it's the author of the change or the committer who actually applied the fix, but normally it's in fact actually the author, the committer or the maintainer that applies, it's just a backup. And the even better news is, once you know the culprit, it's often quite, it's often possible to fix things relatively quickly, sometimes within a day or two, as the authors and maintainers often then can quickly spot and fix the problem. And yeah, if that's not possible, which is also the case for when it comes to more complicated problems, it's often possible to fix it by simply reverting the change and basically getting the feature out again. And because the change then can be analyzed further later offline when nobody is bothered with the regression. And once the fix for the root cause was found, it can be reapplied later. And then things move on as planned, yeah. And actually this reverting and reapplying is something that in my humble opinion should happen more often. So developers really take a note, if you have a regression, always consider doing a revert. Because that's a quick way to often fix a regression, unless of course the revert causes another regression or it's really complicated. But as I said, if there's no way to quickly fix an issue, it's often the good idea to revert. That's also why Linus sometimes reverts quite a few fixes quickly if no revert is forthcoming, especially if he's running into a regression himself. Yeah, that's why bisection is so important because you know who's responsible and know the code changes that actually causes this is then identified and it's a small scale. It's something to easily find the fix. Yeah, that's why I really want to perform a by bisection as a reporter. If the initial initial report didn't get anyone to say, hey, this is this this problem, it's already fixed. Yeah, because this bisection hands you the lever to get most regressions quick fixed quickly. And gives also developers what they need to help you. But yes, it's some afford, but it's well spent. And that's actually why developers will also ask you to do that. Because you really, as a reporter, that's expected from you. And in the end, bisection is basically the second most important aspect when it comes to regressions. And likely the most complicated one, and maybe the one that takes a lot of time. But as I said, it doesn't take that long. If you have a relative modern machine, these building these 15 kernels can sometimes be done in like two or three hours. If you monitor things and use a configuration that is tailored to use to your system, because then building the kernel doesn't take maybe like 10 minutes or 15. And then restarting and testing that that's quickly, then you can be done in two or three hours. Yeah. But the most important aspect when it comes to regressions is actually one is the next one. It's more relevant to all bugs and quite simply actually, you want to ensure the problem is reported appropriately. There's a simple reason for that. Because improperly reported regression might not be fixed. That doesn't happen on purpose. It's simply the case if a report is not seen by the right people. Yeah, then it's not getting fixed. That's how it is in life because you can't expect developers to look everywhere. And for reports, you have to send them their way. And yeah, that's why you really have to ensure that regression is reported appropriately and also in the right tone. It doesn't happen that regressions are not fixed because they're improperly reported, but it happens. And in the end, it's also in your own interest to follow procedures probably when it comes to reporting regression because it prevents you from wasting time. Not only the time of the kernel developers use as well because, as I said, you, for example, want to test the latest mainline if a fix was already found and applied, just maybe not backbotted yet to the latest version. That's something the proper instructions will take care of. The proper instructions actually is a regular bug reporting guide in the Linux kernel. So that's a reporting issues document. Actually, I wrote that like two years ago, it really looks long and a bit scary. But there's a step-by-step guide there that helps you getting through this. You're free to ignore some of the steps if you think you don't need them. But it's a good idea to at least think about things. And because it helps you preventing, preventing or writing a report and spending time on it, that in the end will get ignored because, yeah, this step-by-step guide will try to catch a few local problems early and make sure you can fix them before you waste time on the report. There's also a reference section that explains details if you need them. We had a dedicated mentorship session about that in December, actually. And the link will, yeah, there it is. Thanks, Sean. And the video and slides are available there. Look into it. It will tell you what you need to do when you're reporting an issue. It will help you in a lot of cases. But to be sure, let me briefly mention a few crucial points here. Yeah, as I already mentioned, ensure your kernel is vanilla or at least close to it. Because in the case of regression, that's especially important. But as I already said, that both the working and broken kernel are vanilla because the distro kernel patch might actually reason why something actually worked or broke or something. Say, for example, Ubuntu, who has a lot of patches, external patches and drivers in their kernel, adds a patch that the upstream kernel developers refuse. And maybe they drop it sooner or later. And yeah, if you then go to, or no, maybe they apply a patch to their kernels. Yeah, that's missing in the downstream, in the upstream kernel, you get from kernel.org. Yeah, if you go switch from Ubuntu kernel to upstream kernel from kernel.org, obviously that functionality by that patch will be missing. Yeah, that's obviously not a regression the kernel developers can't do anything about because it only worked because Ubuntu did something. And there's likely a reason why there's this changes only in Ubuntu and not in the upstream kernel, maybe the code quality is not good enough. Ubuntu was just an example here. There are many other distributions that patched that kernel heavily, especially with enterprise Linux and Susie. Most community distros are not that bad. For example, Arch Linux opens with a tumbleweed or Fedora and Debian often used kernels that are quite close to the upstream kernel and you can report bugs with there. But if in a doubt, better check if it's really not something their patches caused. Yeah, as I said, yeah, because working or broken might depend on distro modification. That's also for the developers among you. And that's also why it's okay to ask reporters to recheck with a vanilla kernel if they reported the problem with a distro kernel because the kernel developers only care about upstream kernel. If there's a report with a downstream kernel, either the reporter has to check it with a vanilla kernel or report it to the distro to make its developers look into the issue. Yeah. Which brings me to the second part, how to enjoy your report is appropriately test with a really fresh kernel. It's in your own interest as the regression might be fixed already, hence easily test mainline or at least later stable. That's especially true if it comes to stable and long term series because some fixes are even not back ported there for regressions. Normally it should be the case, but sometimes it's hard to back port regressions fixes there. So you really should test with a fresh kernel and developers. It's also okay to ask reporters to do that. That can be expected of them as long as the kernel is relatively fresh. That should do it. As a reporter, you also should do some basic test checks to ensure the kernel and system integrity is fine because the regression might be caused by a local problem or something you did locally. One of the aspects you really should ensure it's not the cases, you should check the tank flag. For example, if you loaded the NVIDIA proprietary drivers, yeah, they can change the kernel in many, many ways and that can cause regressions that have nothing to look totally unrelated to the driver, but at the end are caused by the driver. That's why you really shouldn't have those drivers loaded and that's why it's also okay from developers to tell you, hey, please recheck with an untainted kernel, otherwise I'll ignore the regression. That's totally okay for them. Yeah. Also run, if you have a file system issue, run the file system check and things like that and ensure you don't overclock and all those things. Yeah, as I already mentioned, you need to ensure the report is submitted to the right place because the responsible people otherwise might not see it and the big fat warning here, Buxilla most of the time is the wrong place because it's a kind of semi-official state. There are a few subsystems that look after Buxilla and handle back and regression reports there, but most of the kernel developers don't look there. Thing is, these days I'm looking at Buxilla a little bit at what new reports are getting in there every day and look at each of them and if it's a regression, I offer forward them to the developers to make sure they see it, but you shouldn't rely on that and better directly report the problem where the developers expect them and that's explained in the other mentorship session in more detail as I already explained or also in the reporting issues document. Yeah, the fifth aspect that's important when it comes to reporting regression, make sure to depict the problem adequately because otherwise developers might miss it's a regression. That sounds like something simple, but it's really, as you can imagine, as a kind of regression tracker, I'm getting a lot of regression reports before my eyes and I often look at them, hey, is that really a regression or is it not or it's so complicated, some reports are so complicated, which makes it hard to to categorize them and make sure that they are really handled. This probably, depicting the problem, probably evolved various things, one of those is actually ideally tag the report subjects to make sure it's really obvious that it's a regression, for example, with using this tag in the subredact with regression and yeah, the other thing you should do is, if you have a regression CC, a regression making list that's relatively new, we created it one or two years ago, because then everybody that considers regression important will notice them and I, for example, will monitor this list quite closely and will actually handle things sometimes there. And the next thing you really should do in the report is to mention the last working version or the first one broken and ideally also include other things like if it's really vanilla, if it's attained status, because that will avoid doubts from the developers that this might be not a valid report. For example, if there's an NVIDIA mentioned somewhere, then some developers will always assume, might assume you're running the NVIDIA proprietary drivers, so really ensure that you say, hey, the attained status is fine. And the next thing I already mentioned, either bisect the problem and if you haven't done so, simply offer to do a bisection, because then the developers notice, hey, that guy means it seriously and it's not somebody who makes a lot of noise and doesn't want to help anyway. Yeah, those are the important things with regards to regressions. They are all covered in this reporting issues document I already mentioned. And nevertheless, there is a dedicated document that goes into more detail and that's actually called reporting regressions.irst. There's also a web view of that. Shuan, could you please share the URL that explains a few of those things I already mentioned in more detail and mentioned a few others. And as I said, tagging your report to make it obvious is one of those things. You see the regression mailing list and there's a third thing that's important from my work as a Linux kernel regression tracker. Include a paragraph like this with the RECspot introduced and the version range where the regression started. So basically the first version is the working version and the other one is the broken version. That this paragraph with this line will actually make RECspot the Linux kernel regression tracker track the regression. We'll get a bit more about my work with this regression tracker and regression checking in general later. But the thing is as I said, this is optional. It's in your interest to do this to ensure it's really tracked because then I'll help to ensure it's fixed. But if you don't do that, I'll take care of that as long as you see the regressions mailing list because that will ensure I'm aware of the report. The document also explains a lot of other things. You might be interested in, for example, is it a regression if a new kernel works slower or consumes more energy? Yes, the short answer is yes. That's a regression, at least if it's something significant. If it's something like something complex that normally takes two hours, suddenly takes 30 seconds longer, that's not significant. But if something that used to take maybe one minute suddenly takes one and a half, that's definitely a regression. Another example that's explained there is a regression if an external kernel module breaks when updating the Linux kernel. No, it's not because anything external, the kernel developers don't care if it's kernel code because those modules, the short version is those are docking on interfaces that are often considered internal interfaces. And if you want to avoid that, bring these kernel modules upstream into the Linux kernel, then it won't happen. And if it's a proprietary driver like NVIDIA that's broken, the kernel developer normally won't do anything about it and NVIDIA has to fix this. There are exceptions, but they are rare. Also, another thing I'll cover there is it, for example, a regression if some test script or test CI finds an API or API change. And a lot of people likely will assume, yes, that is a regression, but yeah, in the end, it's often not a regression if it's just really a script that tests if the interface remains unchanged. But if it's a real application, some practical use case that is broken due to the change then, of course, it's a regression. So theoretical changes, small changes that no app cares about, those are not regressions if something changed in the userland interface. Another thing is does the regression rule actually apply if I seem to be the only person affected by some regression? Yes, almost always that's the case. There are exceptions, like often in life, things are a bit more complicated. For example, if it's something like 10-year-old or 15-year-old hardware, nobody really uses any more except, but you might be asked to let it slip, but that's discussed on a case-by-case basis. Normally, that actually is a regression that should be fixed. Also, does a no regression rule actually apply for code in the staging area as well? There's no easy answer to that. Officially, the staging area is not covered by the no regression rule, at least the cake config helps, says it's not. On the other hand, the kernel developers try to ensure no regressions happen there, but things are getting a bit complicated if a proper driver is developed and introduced and the staging driver dropped, and then you are out of luck. If the newer promo driver doesn't really do all the features to the staging driver handle, then you might be in a situation, but in a problematic situation, that's not considered a regression, but that's normally tried to avoid, so don't lose your sleep over that. What happens if a fixing a regression is actually impossible without causing another? That's also a complicated situation. In the end, a case-by-case basis decided how to fix that. In the end, often it's a lesser evil or the most recent change that's then reverted, but it has to be dealt with on a case-by-case basis. There are no universal answers to this. Also, is it a regression of some feature I relied on was we moved months ago? It kind of is, but that you never less might be unlucky because it might be simply too late to fix, especially when it comes for something that's dealt with old and outdated hardware, that where the driver or some code was removed months ago to clean up the kernel, that doesn't happen often, but it does happen. If that removal happened a year ago and afterwards there were likely cleanups or new features that built upon the cleaned version, then it might be simply extremely complicated to reapply that change, especially as it might cause other regressions then. In the end, we're yet against a complicated situation which needs to be dealt with on a case-by-case basis. If you want to ensure that doesn't happen to anything that's crucial for you, you really need to test the latest mainline kernel regularly, so basically update every new stable kernel release series every nine or 10 weeks and test if everything's still there to ensure you notice quickly if something is broken or gets removed, because if you only notice this after half a year, a year or two, then it might be simply impossible to fix this and you're out of luck, but it's also something you normally shouldn't worry about if you're using something very odd, for example, hardware as it's 10 or 15 years ago. Everything we covered here is kind of relevant for developers as well, but the thing is there's something more for them. We have a dedicated document I wrote for them and some users will find this interesting as well. That's actually basically, it's called handling regressions and it's in a process directory and it looks like this. It also covers a few things we already covered. We already mentioned, for example, developers should also try to see the regression making list to make sure everybody sees it, who's subscribed there, and ideally also tell RECspot if the user didn't do that to make sure the regression is tracked, because yeah, if you forget fixing regression, you will get a brand reminder for me and that's definitely likely more friendly than the mails Linus might send if you forget to fix a regression properly. But the most important thing for developers is actually when fixing a regression, point to the report using a link tag that looks like this. So basically, if you have a regression, you normally have a reporter, you mentioned a reporter by tech to give credit to the reporter and directly below that, you put a link to the archives on law.connell.org or in Baxilla with to the report because that makes things easy for future code archaeologists to look into the issue. For example, if the fix actually causes another regression later, then people can go back and check, hey, what right is fixed? What right is fixed to fix? And yeah, that's why Linus which can happen in half a year or a year or two years ago and then it makes it easy to look into the backstory. That's actually why Linus wants these links actually tells developers about it every now and then check patch.pl actually also tells you to put these links in there if you're using the rope ported by tech these days. And why the docum, the colonel docs also asked to place these links for a while already, but not everybody does it. And this connection to the report is actually why the regression tracking I'm doing actually relies on them to connect the fixes with them with the reports but we get to that later. Yeah. And another thing is also you as a developer should fix regression quite quickly as quickly as possible. Obviously, quickly is something it's hard to quantify for most bisected regressions is actually the goal to fix this actually to do to fix them in two weeks and that actually includes time to get the change into mainline. So it's also quite quite ambitious targets already. And obviously all those this time this two weeks and every time all the other times I mentioned soon are only relevant if you bisect if the reporter bisected the regression if that didn't happen. Yeah, then out of luck. Ideally, you developers should fix those also within that time but without a bias by section there's no guarantee. Yeah, that somebody is responsible. Many regressions actually should be made like within one week. For example, if bisected change that's calling when was was in the latest proper mainline release because otherwise people might might be might be stuck on outdated releases and can't go back. Well, and if it's a regression that affects many users or is critical for some reason, then actually you should try to fix this within two or three days. Yes, as I mentioned, that those are ambitious targets. And as those are only targets, nobody will come with a stick and get at you. But you really should try to do that because those are the reasons why this targets were written down there. It's also explained there, for example, because yeah, distro users where where the distro switched from 6.1 to 6.2 and yeah, maybe the distro like Fedora doesn't ship continue supporting 6.1 afterwards. So then people don't have no option to go back to a fixed kernel because it's either not available as as a prepackaged kernel, they can easily install or maybe it's already upstream from the kernel developers, the already declared as end of life and don't provide stable updates or security fixes for anymore. And then the users out of lucky can only use a broken kernel or none at all. And that's something that should be avoided. See handling regressions for details. It has all the details and as mentioned again, for example, that you always should consider reverting and that to reach those time frames, you need to sometimes poke the maintainers to make sure they review patches quickly. Sometimes they also need to send their patches more often to linus and sometimes it's also the right thing to send regression fixes directly to linus. I can help there. Obviously linus prefers if things go through the proper channels, but sometimes it's really the best if the regression fix gets straight to linus. For example, if the if the subsystem maintainers on holiday or something or just two or three days at a conference, then it might be simply take too long to get into mainline and then ask me to ask linus because then the main you won't get any trouble with your top level maintainer there you because I actually ask linus linus to pull that patch. That document also just like reporting regressions contains a few other things that might be might you developer might find interesting like lots of quotes from linus and handling regressions, lots of them actually really a bit when he told developers how they are supposed to handle regression because that's something a lot of people got wrong and it should help you reading those quotes to understand things properly. If you take those what is written there at heart, you will likely never get an angry mail from him because you didn't but which might happen if you don't handle a regression properly. Also describes how to deal with changes where regression risk is known that's for example totally fine to get them into the kernel and just warn everybody so if the regression actually if it actually causes a regression I'm aware of it and can directly point in your direction on something like that and also if regressions if there are regression checkers should be involved in each and every regression ideally would be good to do so but it doesn't have to be all always the case because yeah some regressions are it could be just overhead that's not worse and if it's a simple regression that's quickly that page also brings tells you how to interact with the regression tracking bot for verx bot which actually brings us to the last section regression checking itself that's what I'm doing these days with the help of this regression tracker bot so why am I doing it yeah I'm trying to do to help Linus do a better job make Linus aware of regressions so he can decide hey do I want to release the final version or does it need another week of development because he otherwise has no insights how many regressions are on top there and yeah yeah I help him with that that's why I write weekly regression support typically on the day when before he releases a new RC or final release because that way he can still apply patches that are floating around and pull them in directly to make sure an issue is fixed before that next RC or final release gets out that's one of the reasons why I'm doing regression tracking the other is I'm basically trying to ensure that no regression rule is no hollow promise as we are all just human and things can easily fall through the cracks and most of us are these days stressed and overload with works and something like regsports can help them to make sure nothing gets forgotten and yeah reporters often fear asking for status update and and I can help them there with the dealing with the developers because they might look scary for for for people that are not used to deal with elixir developers directly and then I'm a kind of middleman there that will help enjoying their regressions are really fixed so if you have any problems with regression you can always send me a mail either on list or off list and ask for help and yeah I do this regression tracking work as I said with the help of the regression tracking bot rex bot which has a really ugly web interface it's also a static web interface it's just basically a page that's compiled and lists all the latest regressions there are more details there you can you can look up if you push a small button there it basically lists everything um currently tracked reports are generated from from from the same same data those weekly reports I mentioned now you might wonder yeah why does don't the kernel developers don't use bugzilla or something like every other project uh does yeah the thing is because classic black triggers don't fit really into the linux kernel developer's mail based workflow because the linux kernel developers do basically everything by mail and going to some web interface is really a lot of overhead and many developers hate that's also why when this bugzilla was set up many developers said from right from the start now I don't want to deal with it and that was okay because they are volunteers they couldn't be forced to do that and that's why why it came to the semi official state middleman was supposed to help there but that never really worked out worked out and to prevent that same fate for regression tracking and I really designed it to be really low overhead to be the thing is in the ida case there's only one one additional task somebody has to do and that's not not even something the developers have to do something the the reporter actually has to do it's basically when reporting a regression at a at a paragraph um to tell rexpot hey uh track this report as I said this this paragraph looks like this with the rexpot introduced uh version that's working version that's broken you can also if you bisect it the the the problem use a git commit id to specify which commit fixes and yeah this small line will actually make track uh make rexpot track the issue everything else and happens automatically uh rexpot then for example we'll watch out for replies to the report and consider this activity yeah and if there's activity I'll assume everything is fine so you can trick me there if you want uh or developers can do that by simply providing some activity about but I'll notice sooner or later rexpot sorry rexpot will also watch out for patches that are posted to fix tracked regressions those are often posted a new new thread but thanks to the link tag it will actually notice if a patch gets posted that reports to the regression and assigns us automatically and also considered any activity on that patch so replies there also considered an activity and will uh will signal me and in the web interface hey there's some still something happening there developers are working on on fixing this and that actually works quite well yeah and once the fix with the proper link tag lands it will uh automatically consider the regression resolved yeah but as I said that's only possible if the connection can be made through the link text pointing to the report yeah and that's why they're why I why they are so important for my work because it makes my life a lot a lot easier if the if the if the fix is actually linked to the report and also as I said Linus likes them likes them as well obviously just as life is we are all just human sometimes developers will forget to to put those link tags in there then you can basically reply to the report and say rexpot fix and the commit idea or you can also specify as a subject and yeah that will make rexpot also consider the regression resolved yeah the other thing what will likely will be forgotten is that many reporters will forget to involve rexpot yeah in that case somebody else like me normally can reply to the report with the small carrot before the introduced and then that will tell rexpot hey not this mail is the report but the parent so the mail I'm replying to and then it will start tracking that and everything works just like like if the reporter had immediately involved rexpot from the start that's basically rexpot core functionality together with the web interface and the reports are already mentioned obviously there's more there's a getting start guide that explains these things in more detail and a reference documentation you can for example set a title to make it make it make it more obvious what the regression is about mark duplicates and a few things like that it's not too much it's designed to be simple and designed to be simple and not not too complicated because things might change over time and I don't want to deal with creating yet another backtracker that's only works by mail that's likely not a good idea but who knows maybe in 10 years that's what rexpot might have become I don't know yet yeah while it it let me thanks ngi pointer for briefly for the sponsoring real life realizing the rexpot idea that help with the funding from the european union and actually ran for one year and these days meta is sponsoring my my current efforts and in the end the thing regarding regression tracking is I keep an eye on things with rexpot and as it makes regression tracking feels feasible because it watches baxilla and and mailing is for me and there's a lot of work that's that you that's humans are well there's a lot of work computers are good for and yeah I rebel I add my manually add reports many reports I found and prod things when needed that's actually basically every second at least every second issue I sometimes have the feeling I need to prod to make sure the developers are looking at again sometimes the users also need a reminder and yeah but in the end it helps helps current development quite a bit I'd say and and yeah things are not perfect and there's so much more I'd like to do but it's a lot better than nothing and it's I guess I guess why leaders are so happy and yeah but in the end it's it's cat herding and it's really demanding sometimes so that it's it's definitely not perfect so maybe you might now ask is regression tracking actually worth it worth it that's actually a good question yeah as I said lino seems to like it and many others said it's great that I do it too so the feedback I get from from developers is often positive as well it's already had quite a few getting in quite a few fixes on the last minute that linus otherwise would have missed and it regularly brings unfixed weekends back to the attention of the developers or sometimes the users also that because they sometimes forget that the developer asks them a question and to provide data and yeah developers often think thank me for that so that's why I say regression tracking is definitely worth it but I'm obviously biased and obviously in some cases every tracking is just overhead wouldn't call it a little crazy because it's so low overhead but everybody has to decide that for him or herself in my humble opinion the benefits out wide with the downside yeah the problem is there's always something to improve I'm well aware that a lot of things are not ideal and could be better I'm working on it I'm just one person and there's only so much I so much I can do which actually concludes the main part of of the talk and actually brings us to the summary the most important thing you should keep in mind when leaving it a computer soon as yeah regression should be fixed shall be fixed that's always the case obviously sometimes things are a bit complicated but you you should really try to help that my regression tracking really tries to ensure that really happens and to make sure I track is let rexport or the regressions main linguist know about regression regressions because then it will be tracked and I will be there to help you and if you need any help with anything regarding regressions even before the report fee-free to to me and me yeah for the developers among you the the important thing is take regressions report seriously otherwise you get males in your inbox from either me your top level maintainer or liners some will be more friendly than others if you don't handle regressions seriously but the good thing is most developers are not a problem there they're humans they sometimes forget that but they handle handle handle most regressions reports just fine handling if you never looked at this handling regressions document take a look at it it has quite a few things you might be interested in short if you break something you need to fix this and that should be a priority of all other cardinal work if there's an open regressions and most of the time should be fixed within a few few days and because it should be fixed in a few days so it's basically users here for pretty strong pretty long level here and you can kind of demand fix if you have a regression but the thing as always in life things are a bit more more complicated and better ensure better ensure to use this level properly so when reporting regressions on reporting issues here helps you to do the do it properly which includes as part of the job when it comes to fixing regressions is you is yours you need for example to really ensure it's a regression and you need to find who broke broke or who caused the regression so basically you just have to bisect the regression with vanilla kernels that are compiled using a similar configuration that being said in the individual report as I said it's never totally fine to just offer a bisection as the regression might be known already and then somebody can tell you what to do and then you don't spend hours on a bisection that was unneeded in the end and as long as you offer to do the bisection the people the developers will know hey that people understand the bigger picture and are likely more willing to help and because you are you show that you are willing to do your part of the job which actually brings us to the most important thing to remember when relevant for both users and developer remember there is no us versus them here or no users or reporters versus developers or maintainers we are all in this together we all want the regressions to be fixed so let's work together as friends to to get this all regressions fixed because the current will be better than it will make more people happy and people will be less annoyed if if they don't run into problems when updating the kernel which actually concludes my talk and yeah that brings us to the question part I'd say if you're still awake and I didn't talk you all into the ground with 172 slides if you anyone wonders there's a chat there you can use to ask questions I actually said something some of you if you listen closely should have a question about now you now but let's first check what questions you have come up to oh there is one question in the Q&A from even maybe it was mentioned but I am going to ask anyway if I find a solution can I fix a regression which was caused by a commit made by someone else of course I I mean you're doing the work somebody else has to do why should that person be unhappy about this just be sure to to CC everybody on the patch just CC everybody that it was was evolved with the original fix or the original change when you're posting your patch so those people are aware of it because yeah things are complicated sometimes and maybe there's a better solution to to fix the problem than you know but obviously yeah you can fix other people problems if you want to right to add to that that's how we the open source development model works really if you have a fix that's one way you start contributing and if you find a fix and then you send it up to the maintainers and then developers and everybody that get maintainers dot pl tells you to do and send there is a script that helps you determine who should get all the patch right so then review and that's how the open open source development and the next development works yeah also in the in in the change that causes the regression there's also a sign off by change where you see who handled that patch and you should CC all those because then you already have to keep people yeah it's a good question and that is so I do have Thorsten for your information I do have I just started a mentorship session with the 26 developers so they are some of the ones participating today to understand how the Christians are reported and get more information on how they can go about fixing this so that's one of the questions that's being asked hey can we send so there are new developers if you can have any tips to offer to new developers more than what you have already done you're welcome to share those new developers maybe I mean if you want to get a good overview about the the kernel as a whole and not just one subsystem so maybe you could be nice to to watch the regression mailing list or the the reports from Rex bought about open regressions and simply help fixing them because then you look into various subsystems and how they are developed and how the people interact there to to to to see how how how the current sometimes is different because the subsystems work differently and the code style is different and all these things but that can be a little bit hard for for really new developers but if you want to get to the next level that might be something that that is likely appreciated by a lot of developers if they get help and also gets your knowledge up to the next level as well thank you any other questions feel free to ask them put it in the chat or Q&A box or just raise your hand to ask somebody is around and wants to do some some script to help users writing good bug and regression reports get in contact with me I'd really like to have like if if there was a script and a web interface to report regressions because it's so complicated but I'd really like to do that but there's not enough hours in a day so basically putting reporting issues into a script or a software that can help you reporting issues that might be something really cool thank you caddis for those links I will dismiss this question so there is a question for reposting the links you shared so no other questions at this time yeah so that will be great right so if you can get help to report regressions that's kind of what you're asking maybe somebody would be interested in helping to come up with the scripts and so on yeah it's I mean there's such a script to report and issues it's not really kernel development but it's in the area of the kernel and so maybe it's of interesting for somebody so another question from even which functionality you are planning to add to a regression bot in the future right now what's really missing so there are lots of small details that are not really working well the user interfaces sometimes like a little bit too too complicated I mentioned you can add reports from somebody else with this carrot thing and before the introduce that something a lot of people get get wrong red red spots and then need to need another command for and implement that but it didn't get around but some people ask for improvements to the web interface to for example also mentioned that commit the subject of the culprit but the most important thing that's missing right now is basically something the subsystem maintainers can use to keep track of regressions in their area for example if the network maintainers want to check hey what regressions are open in the network subsystems that's not easy right now for a red spot and that's something I need to work on but in the last one or two months I didn't really get around to work on red spot much but that's basically the most important feature missing right now and maybe make it a little bit more beautiful but on the other hand the functionality is what counts and the web interface is basically for me anyway but there are lots of details that are really don't work just like I had expected them to work so need some some adjustments but all those watching mailing lists and prodding developers and users to to get regressions fixed takes a lot of time every day and there's not much time right now anymore to to work on red spot any other questions any features you want to see in red spot no questions at this time looks like from in the chat or Q&A if anybody wonders I mentioned earlier or quite early in the talk that regressions is the second highest card if if linux was a card game that beats nearly everything else but sometimes there sometimes there are situations where regressions are accepted but those are pretty rare so that the thing I think Linus once said linux must be linux must be useful and if it's not useful then we didn't do what we are supposed to do so that's why some regressions in rare cases are accepted one of them well for example was the mitigations for the for the meltdown and spectra of vulnerabilities in many modern CPUs they for example obviously introduced the performance regressions yeah but not fixing them would be worse for users so there this performance regressions was something that users had to deal with had to live with and that's why I said it's only the second highest card so security sometimes trumps the no regressions rule but that doesn't happen often even if for example if a security change causes a regression you really should report that maybe some way can be found to fix the security issue and at the same time avoid causing it to regression that's often possible if if if the regressions becomes known that is good to know definitely so any other questions on anything related to regressions okay so okay there is one in the chat maybe adding some automation functionality like this color have will be good for um red spot like the automation of patch testing some yeah like like the patch testing I mean the pet normal red spot stays tries to not get into the middle and stays at tries to stay at the time as subsystems work differently um and and normally the subsystems already has have mechanisms to test their patches and to review them getting in between there and forcing forcing everybody to to to reduce record for something that's unlikely to fly but there's one thing I once considered that might be good to have and red spot sooner or later is if reverting a patch is actually reverting a culprit is obvious is no that's from beginning if uh red spot could check if reverting the culprit is actually possible and even maybe hand users were for something to to test that because then users could if if that's checked and tracked if reverting helps then I or Linus could simply decide okay yeah let's revert this fix quickly because the developer doesn't get closer to a fix and then that could speed some things up and actually made make developers act more quickly but yeah shouldn't be actually too much work but as I said most of the time red spot stays tries not to get into the middle of the of the developer's workflows and stays a little bit at the side and if you have concrete ideas what red spot could do there that would be of benefit for for the developers or users tell me I I don't see much what what could be helping uh yeah automated patch testing is I don't think apart from reverts is is useful I I agree that this in itself compiling the bugs and reporting and keeping an eye on them when it is a it is a different a goal for this red spot then is this color for example yeah exactly color does first testing and it very different goals and very different outcomes and and then also we do have lots of test runs that do a lot of testing already and then and then it's uh it's easier to go that route of letting things work the way they do yeah and and maybe solving the unsolved problems and not trying to to solve the problem that was already solved 10 times and 11th time right I mean this is a white space right we did not did not have before you started um we did not have a um a a good way to track all the regressions compile them somebody looking at them we're all doing piecemeal work um but uh um this is one uh concentrated scope uh effort that is very focused on this and this is a this was a white space that um you took out yeah there was there was some regression tracking in in 2008 to 2011 12 or something by raffae but he did it with bugzilla and uh yeah that's also kind of double bookkeeping and not everybody participated and uh was quite exhausting in the end red spot is designed to be basically like a more like a self-service so if it gets a bit bit better and gets the subsystem uh gets more useful for subsystem developers maybe it will simply work on their own in the future and doesn't really need that much more for me because subsystem developers use it directly but that's a dream right now there's a lot of things need to have before before it gets to that stage right all good questions anything any anybody else that have a question so i have a question of my own um so how often how many regressions do you keep track of how many regressions uh get fixed in each of the rc's or overall in a release cycle no i don't try to do any such stats um because some regressions are really not worth tracking for example sometimes i see regressions only because i can see a patch being posted to fix them and uh adding those regressions to to the tracking what take my time and it's likely just overhead that as of now i don't think it's it's worth it and so i'm i'm keeping the more keeping an eye on those um that are more crucial and how many depends the the last cycle there were quite a few regressions it kept me quite busy that what likely was because there was the festive season the new year and there are a lot of regressions piled up and then we had a week where like 10 or 15 regressions were suddenly fixed and everybody came back and then there are weeks especially when a new cycle opens when all the problems in the older one are already solved and and problems of the next are not not found yet and and there are maybe one or two regressions in that those weeks that are fixed okay fit that that's that's good yeah that's i mean counting these things without adding any value is not useful so that's great maybe sooner or later it could do that but before when we do that it needs to to be more streamlined too so maybe one one developers post patches that they immediately tell rexbot about it rexbot picks everything up from the fixes tag and the subject and reported by we could do that if we wanted to but right now i don't see much added value apart from stats but yeah we are almost out of time about five minutes left i think uh any other questions last chance yeah looks like there are no more questions um kandace would you love i think i'll throw it back to you kandace to see perfect thank you thurston thank you shua for your time today and thank you everyone for joining us as a reminder this recording will be on the linux foundations youtube page later today and a copy of the presentation slides will be added to the linux foundation website we hope you are able to join us for future mentorship sessions have a wonderful day bye