 So, hi everyone, my clock says it's now 10 past 12. Welcome, I'm Kvost Lehmuis, and I want to tell you about how to report your Linux kernel back in case you ever encounter one. The thing is, reporting a Linux kernel back is kinda easy and kinda hard at the same time. If you look at the documentation of the Linux kernel, how to actually report, there's a short guide actually that's just three paragraphs and three or four more that's explained some general stuff and it all fits on one page. But if you look closely, you'll notice that if you print this whole document, it's 34 pages or depending on your layout. But in the end, it explains you all what you need to do in the reference section so you don't need to do that. And to get a fee out of it, reporting a Linux kernel back can be easy. This one with just a few paragraphs and a few local lines of the lock is actually totally fine. This one is as well. I'll explain later what they are, why they are totally fine, and you can look at them in more detail in the slides. The thing is, they are brief, nevertheless cover all the aspects to get developers interested. And that's the important part here because quite a few reports actually fail to do so and due to that don't even get a single reply and obviously the problem remains unfixed as well. That can easily happen as developers are not obliged to fix each and every issue. They're only obliged to fix a certain kind of problem. It's a bit like a community-built playground that's completely maintained by volunteers. The community as a whole has an interest to ensure that everything is and stays safe. It also doesn't want the kids to cry because anything they liked or loved breaks when somebody is changing something. And maintainers, the Linux kernel maintainers actually make sure to do both things. But just because you helped building this playground like maybe 10 years ago or last month or improved something, you're not obliged to further improve it because it was just a volunteering to help here. And fortunately the good thing is most developers actually really committed to help with all sorts of issues you have with the Linux kernel. But the thing is life is sometimes complicated and they can only do so if their time permits. And unfortunately most kernel developers have much on their plate already and are overloaded with work or issue reports. And then those not-that-good reports actually are often the first things that fall through the cracks and get ignored. And that's basically why you really want to submit a decent report to make sure it gets the developer interested and to make some look into the issue. And that's why I'll show you how to do that. The first section are actually mainly preparations on your side to make sure everything is fine. And actually 11 steps I broke this down to. The first one is actually to kick everybody out that it's not really up to the task. It sounds a bit harsh, but in the interest of the reporter because you have to ensure you have a kernel that's actually suitable for reporting bugs upstream because Linux kernel developers don't care about most kernels used in the wild and the bugs with it because many of those kernels contain enhancements that might contain the bug you face. They are added by your Linux distribution or the distributor you have around or maybe your admin or you did it yourself. Yeah, and that's why they enhance and considered modified, built from modified sources. For example, like those in Ubuntu. And that makes them unsuitable. And loading external developed modules is actually the same problem because in both cases it's simply not the kernel that Linux developers build. It's not vanilla as it's a term for that. So when you're using such a kernel you have to choose report the issue to your vendor or be willing to install a vanilla kernel yourself later. Later is important here because you don't need to do that right at the beginning. The thing is when you report the issue to your vendor, that's often a dead end because they are overwhelmed with reports as well and there's only so much people working on the distributions they can't take care of everything. That's why you often will have to be willing to install a vanilla kernel yourself later. And that's actually not that hard as it sounds. There are pre-ruled ones that are easy to install or compiling is also not that hard. We'll get into the details later. The second thing you should do is search for existing reports to join. That's just like with every other open source project you do check if somebody else reported the problem already because it can save you a lot of time and trouble and other people as well. If you don't report a bug in detail again that's already reported and tracked and maybe even worked on already. So how to do that? Then you can immediately join discussions on that. So how to do that? You just take your preferred internet website and search. For example, in this case Linux, I will Wi-Fi connection fails, VPA6, I made that up. And that often will lead you to some results already and there are many mailing lists where the Linux kernel developers are discussing things and where bugs are reported. That's all on law.kernel.org. There's a search function there and law.kernel.org slash all. And there you should also try to look for existing bugs. And when doing so with the search engine remember to vary your search terms a few times because sometimes people use different words or a different approach to things and that's something that's really important because it often helps you to find bugs you otherwise would have missed and can save you a lot of time. So don't neglect this and invest a few minutes or 10 or 20 in it. The next thing you should do is classify is it a severe problem or regression? Those are actually the bugs that the Linux kernel developers are obliged to fix and that's why they sometimes require special handling and that will become relevant later. A severe bug is actually something really, really bad so it's not that something doesn't work. It's really some security vulnerability or something causing data loss or a damage hardware crisis, something like that. And regression is actually when something breaks that when you're updating your kernel version that's as I said earlier you don't want things to break that people love and that's the same for the Linux kernel and that's why there's a special handling for that. The kernels have to be compiled with actually a similar configuration. If you want to know details about it there's a special configuration target to do that but actually described in more detail in the Linux kernel documentation about regressions if you have such a problem. Next thing you should do is what you always should do with open source projects is check if you're set up as actually causing your problem because that can help, can happen all the time and can save yourself and others a lot of time and trouble again because you won't have to report a bug that actually isn't one. The few things you should always do to make this work is always give a de-message output and look out for red and bold things that often tell you if something's wrong for example if a firmware file is missing or something like that you should also think if the hardware might break that happens or some other update like Bios, Grubb is causing the issue or some change in the BIOS setup is the reason why the kernel suddenly misbehaves and obviously you shouldn't overclock and use proven tools. Apart from that it actually depends a lot on the problem you face, what you need to do on the context of it. For example if you have a problem of a file system driver in the kernel then you should really check the file system over the FS check tools to see if there's something wrong and that confuses the kernel that often those tools can repair for things and then the kernel can reboot and things work. But as I said it depends on the issue what you need to check. Everything helps to find similar issues and what people suggest that you should check. So the fifth thing is create a fresh backup and put system repair tools at hand that's actually not strictly needed for reporting the kernel but when you're looking into the system and trying to figure out what's wrong or what kind of bug it is it's better to be prepared for emergencies during further tests. The sixth thing actually you should do is ensure you're not using any externally developed modules we also already covered this vanilla aspect because such bugs modify the kernel and they can cause areas in totally different areas of the kernel. That's why your NVIDIA graphics driver for example might break your Wi-Fi or your kernel memory management and it's really anything but obvious that it's the NVIDIA driver that's causing this can happen that's why these drivers need to be out ideally be removed from the system when you're reporting a Linux kernel bug and the important thing it's irrelevant if the driver is open source of property because it changes the kernel in ways that can cause all bugs and it's not vanilla anymore so it's not the thing the Linux kernel developers build and yeah then you are on your own. So hence disable solutions that build modules on the fly like DKMS and remove all modules they might have installed and that actually might mean that you have to uninstall the NVIDIA driver. Next thing you should do is ensure that the kernel is not tainted before the issue occurs and the tainted flag is set actually when something unwanted or really bad happens with the kernel and the kernel notices because when such a thing happens it can cause any other sorts of issues that might be follow-up errors and that's why the kernel marks itself as tainted then. To do that you can simply look it up in the proc file system and it should be a zero. If it isn't I'll refer to the documentation at this point because it gets too complicated here and explains what you then need to do and what this number actually means because this number is actually a bit filled and explains what kind of issues actually made the kernel tainted. But in the end most of the time it's a bad idea to report problems that are occurring with a tainted kernel and that's why you shouldn't do that but don't lose your mind over it now because you might need to install a different kernel later so maybe it's not tainted but it's good during debugging during preparation to check if such a problem is there because it can save you also time. One of the few exceptions is actually the first oops so a kind of problem the kernel detected and catch and could carry on afterwards but it might be not running perfectly well that's why it taints itself then if that's the problem then it's okay to report that bugs or any further follow up oops because they might be caused by this. Next thing is also in your own interest write down briefly how to reproduce the issue because that's actually the basis for your reports later and also interesting interest for your further experiments to know how you can reproduce it because if you're dealing with bug-related parts that quickly can get confusing and if you're doing this with a different kernel it really helps to have it stripped down and on a paper how to reproduce this. The next step is also in your own interest if in case it's a regression within a stable or kernel series say for example 5.15.4 to 5.15.5 that's a special case and in that case you can take a shortcut because those bugs are a little bit special the documentation in the Linux kernel that explains how to report bugs actually has those four steps you need to perform then explained it's basically also installing the latest kernel from that series and checking if the problem is there then you can simply report a quick mail which is known to check if the problem is known already because that's often the case and it saves you also some trouble. The tenth step is actually something that's a little bit annoying with Linux kernel you need to locate the driver or subsystem that seems to cause an issue you have to apply your best guess because there's no central place where Linux kernel bugs can be reported and that way when you apply your best guess you find the place where bugs need to be reported to later and also the place where you can look for existing reports the thing is most people assume that Buxilla kernel.org is the place where every kernel bug can be reported it also looks a bit like this but there's a hint there if you look closer at that it actually tells you that most of the time Buxilla is a bad idea to use because most of the Linux kernel developers don't use it there's a big risk that nobody will look in the reports that you file there but a few subsystems actually use it so where do you need to look instead? Most of the time you need to submit your report by email most of the time to the maintainers and a few public mailing lists in the CC and that's actually you find those addresses from them in the maintainers file for example this is for the buttoff s system an entry where you see hey I need to file a report this bug by mail to these three maintainers and this mailing list in CC ideally you also CC the Linux kernel mailing list but that's not that crucial and the thing is a few subsystems prefer to get reports via some bug tracker and sometimes that's actually Buxilla.kernel.org rarely some Gitforged for example the ACPI and Power Management people they prefer actually Buxilla.kernel.org and developers of various graphic drivers in the Linux kernel like those for AMD and Nvidia and Intel graphics actually prefer they have their own GitLab instance on freedestop.org and that's where you need to report the bug yeah and once you know where the bug later needs to be reported you should check the archives of that place for existing reports again that can save you a lot of trouble because you don't need to further analyze the problem if you can just join interest in existing discussions so that's it with the preparation and actually ends the first section of this talk which brings us to the second this is actually now you're going to test and report the problem so to do that you actually need to reproduce if the issue still happens with the latest kernel and then for that you really need to install a really fresh kernel ideally the main line so the main development branch to check if the problem is fixed already somebody has fixed it already you go to kernel.org and actually check which the current mainline version is here in this case it was 5.17RRC7 and then you really should use that version one that's a week older say you have RC6 might be okay but it's better to really try the freshest version and remember to ignore the big yellow field because that often is a bad idea because that's points to the latest stable kernel but you want mainline because every bug is fixed there first and you don't need to fear those too much they are pretty stable and you created that back up I told you earlier about it just in case something goes wrong but that rarely happens especially that late in the cycle but you really want to test that branch the main development branch because each and every bug fixed they are first and the thing is some bugs are fixed only there because sometimes it's simply too dangerous to back part a bug fix to older kernel series and that's why older kernel say 4.19, 5.4, 5.10 right now contain many bugs that nobody will ever fix they are known because it's simply too dangerous to fix them you just should use the latest version if you have such a bug there's an exception from the mainline rule if the kernel.org looks like this so if the stable version has a higher number then you should use that that actually happens right after release maybe for one week or two weeks sometimes and other times you better avoid stable but it's okay if you use the latest stable kernel most kernel developers will actually look into the issue so you can for example use it if the mainline doesn't work for some reason the long term kernels you see in the bottom left kernel those are really a bad thing to use for reporting bugs as I said they contain many bugs that are never fixed and it doesn't help anyone and the kernel developers that receive your report will think hey maybe that was already fixed half a year ago I don't have time to look into this right now and then ignore your report and that's something you don't want that's why you really want to look into the test with the latest kernel so how to actually install a really fresh kernel there are pretty well built vanilla kernels available for many distributions you have to just add a PPA or a copper or something where you can install the vanilla kernel actually there are also some big distributions that have them by default and the repositories are used them actually by default sometimes those are really only slightly patched then it might be okay to use them but it's always a little bit dangerous because some maintainers still ignore them so easily you actually compile the kernel yourself that actually might become necessary for debugging and testing later anyway and it might sound complicated but it's not that complicated with the make-targets make-old-config and local-mod-config you can easily create a config that doesn't compile that long and it's relatively easy to handle and yeah then you have with the usual make-targets like make-pcet-image and install and modules install you can install that kernel and you have quickly a fresh kernel on your system to test the thing is when you're doing that remember to easily deactivate these two config options at least if you're dealing with the stack trace because then you can later check which line of code the problem actually happens so next step is ensure the kernel doesn't taint itself because you have now a different kernel you need to check this again yes we already covered this but it's good to check it earlier already because then you know something is wrong on your system and not worth reporting that's why this is here a second time now you want to check if the problem actually still happens with the kernel in case it was already fixed that was quite easy and yeah if it is simply stick to this kernel or install the latest stable or long-term kernel to check if they are fixed there and if that's not the case so in case you want to use them and if it's not fixed there look up this reporting issues document I mentioned a few times already because it just explains you what you can do if you find a bug that's fixed in mainline but not an older series then you can check if maybe motivated developers to backport the bug fix but if that works out always depends on the problem because as I mentioned some fixes are simply too risky to backport yeah now we're getting closer to reporting the issue then so we're getting to the point where you should optimize the depiction of how to reproduce the issue to make it easy, really easy to grasp for others because I'm seeing a lot of bugs that are really complicated and where I have to read the text three times and to actually understand the problem or read five paragraphs that are completely strange to read that bears always the risk that developers will ignore your report in the end and so really try to find a more straight forward way to describe and reproduce the issue and also make sure the depiction is easy to understand for someone hearing about the issue for the first time because those are the people you want to arrest and if you're in a doubt ask somebody else to read your text before you submit it, that often helps and if you learn something in this process consider looking again for existing reports that can often help I mentioned the stuck traces already if you have one, consider decoding them that those are the kind of strange looking lines and numbers that sometimes show up when you have a panic or a warning or something so this stuff here as you can see most of the lines end with some random numbers and by decoding them you can actually check the line where the error actually occurred and how this code actually was called and that often helps the developers to find out what's wrong but if that's a little bit hard if you don't want to do it, you don't have to you can submit the bug first and then see if it's really needed but you maybe should really prepare for it and that's why I mentioned this config options earlier because they make sure that if you really have to do that check the documentation reporting issues that explains actually how to decode the issue both a self-compiled kernel and also with packages that contain debug info packages for example for RPA, Fedora and Susie that's possible so if you're dealing with regression there's some extra work here you need to find out narrowed down when it actually started actually as much as possible because when done probably it actually identifies the exact change that's causing the problem and gives you nearly guarantees of time we fix because then you know who has to take care of it and probably actually means bisection that's actually where you take the distance between two kernels and compile compile in the middle so if you have distance between if a problem shows up in 5.16 and wasn't present in 5.15 you compile these two first check if they are not showing if the earlier problem doesn't show up there and then you come jump to the middle between those two versions check if it's happening there if it's happening if it's not happening there you get closer to 5.15 and you can narrow down this way the exact change that's causing the problem with a lot of normally with current kernels it's about 15 or 16 steps to actually find the change that's causing the problem that sounds like it's taking days but with a modern system and a proper configuration that can sometimes be done in an hour or something and it's even possible to automate this with a script but if you don't want to do all that work feel free to just send a link in private report and just mentioning the brief version range because if somebody already looks into the issue and reported it and it might be bisected already and you can save yourself work quite some work be prepared that it might be needed because bisections regressions that are not bisected might remain unfixed if the problem can't be found it does sometimes now you're getting to compile and submit the report because we have everything done you should do everything prepared for that and everything is at hand so take the depiction how to reproduce and as a paragraph describing the environment kernel version, the tain starters the links distribution use the hardware used if relevant if there's a regression when they start and add or link all relevant details so if you logs or the coded stack trace what's needed is actually depending on the type of issue for example the dmessage and the config for the kernel is often relevant if you have something to do with kernel and LSPCI output often helps the slides are online so if you ever have to go through it you can check this out in detail and if you add or link to those logs is not really important linking is often the better thing because it keeps the mail relatively short and because big attachments sometimes take longer or get ignored or got into the spam folder so simply uploading them somewhere and linking is often the best choice on top of what you have now create a normal length paragraph that outlines the issue and the impact quickly to make sure everybody that reads your report can immediately get what it's about and to do that even better you should work out even better and descriptive title that says an important thing in just a few words because that's what most of the people that read your report later will read if you face one of those severe problems or regressions I mentioned take out the reporting issues document again because it has a section of what you need to do in that case to make sure it's getting fixed it's not that much but I won't go into the details make sure the end result has all the important details but it's easy to understand for outsiders as short as possible and submitted to the place where the maintenance file told you as I said this is perfectly fine this is reported to one from earlier just one paragraph that basically describes the problem the crucial line from the locks that shows where things went sideways and the question that the problem is known and some locks so that's totally fine and it's especially sent to the right people to and CC field that's the same just with regression regression mailing assistant CC there and it looks a little bit different look at the slides later if you are interested in the details here in the end the important thing is they were the reports came through the appropriate channel the description is easy to grasp and make it obvious if it's a bug or regression and that the kernel that was used was a fresh vanilla and untainted and now that the report is out your work is not done you need to keep the ball rolling somehow in one way or another because developers might have asked questions or might want to see to run some tests to narrow down the issue and such in prior try to answer it quickly not wait a week or two because then the developers might forget about it or they might go on holiday and then it actually gets ignored in the end so really react publicly in a timely manner here and keep in mind most developers are overloaded with work and they go on vacations too so if your report doesn't get a reply within two or three weeks it's likely dead and forgotten in that case it's totally fine to ask again hey did I something wrong or something but if that happens actually because sometimes that helps to get the thing running because the developer had a stress or was on vacation but if you do that also look at your own report again if it really was decent maybe that was the reason why it was ignored also new kernel versions were likely released in the meantime it might be a good idea to recheck that especially when there's an rc1 new rc1 from a new release is getting out because there are lots of changes go in and lots of fixes and if you don't get any help or if it's unsatisfying that sometimes happens because here as I said linux is done by volunteers then you need to try to help yourself maybe find others that have the same problem or something like that and team up with them the reporting issues document has something about it that's basically it that's actually to be told it's actually the most important aspects if you really want more details you want to read the reporting issues which is quite long a small book actually maybe that tries to cover all the questions you have the things I mentioned the different steps are also in similar form in the document and there's a reference section which explains each of those steps in more detail 19 steps so you might say is that crazy argument about that yes and no it's a little bit crazy but on the other hand it's not because most of those are quite normal for bug reporting and some of the steps are also just there in the interest of the reporter for example the first step to ensure you have a suitable or willing to install them make sure the people are aware what's coming up for them and to make sure they can decide at that point am I willing to do that and if I'm not then they can stop before investing an hour or two or maybe even more into this bug reporting process and backup tools is also similar in their own interest searching for existing bug reports is also quite normal for for software projects installing the latest mainline kernel is something that most other projects would like you to do as well for the kernel that's a little bit more crucial because it changes so fast so yes that's a little bit special but only a little but yeah that makes bug reporting sadly a little bit hard it would be easier if more distributions would chip vanilla kernel so people could easily test with them on their regular distro some steps are also twice in there like the tainted step and the regression step to check what's up with them so in the end it's just three things left that are really special that's locating the driver subsystem that's causing the issue like I said in the Baxillus not really supported and you need to check it up in a maintainer's file that's a bit odd but that's how it is these days I guess a lot of people would like some improvements on that front but nobody sat down to do that ensuring the kernel doesn't taint itself is also something special for the kernel but it's a special software that's close to the hardware that helps and checking the regression is also in your own interest because that gives you the guarantee that it's fixed so somewhat special but not that special after all so that brings me to the end let's sum things up kernel developers are only obliged to fix some issues that's really something you should keep in mind even if most developers actually want to fix all regressions they don't have to and if their time permits it sometimes can happen that your report gets ignored because sadly most of the developers are short on time and poorly reported reported issues are then the first to be ignored so that's why you want to submit a decent report and invest a few more minutes into that like I showed in this because it's in your interest to grab the developer's attention and interest in the problem and to make it easy for them to help you the kernel documentation has you covered there and the most important things are submit the report to the appropriate place as I said the maintainer file explains that ensure the kernel the report actually covers all the important aspects that are relevant for the developers but at the same time it's still easy and quickly to grasp for everyone that writes it that reads it and if an adopt better omitted detail that's often easier and also make it obvious if an issue is a regression or severe issue to make sure developers handle it with the appropriate priority it's also in your own interest to make sure the bug is quickly fixed if it's such a problem but in the end the most important actually is you really should test and report with the kernel that's really fresh, untainted and vanilla that's something most people actually do wrong and with that and you should also mention that in your report that you did so with that I'm finally at the end and are there any questions there are mics here in the front if you have any questions are there any questions from online don't be shy it's just lunch that's following we have time there's somebody coming I'm just curious how many bugs did you already report and how many of them got fixed I don't I reported a few bugs but it's not like I'm running to so many bugs the thing I know how to do is I'm tracking regressions for the Linux kernel and so basically what I said those problems that need to be fixed and that's why I read a lot of bugs and that's why I see how many people get it wrong and started writing this documentation but to get back to the real question I normally run this mainline RC and every third or fourth release there is one bug I report or want to report often somebody else did it already so yeah not that much but as I said those are development comments no okay if you're online you still can write questions somebody here that's looking for it so how many bugs did you have do you have kernel bugs often a few people shake hands yeah if nothing came in online I'd say that's it and thanks for listening enjoy your lunch and hey I'm really in time I still have 20 seconds left