 Hi everyone, my name is Asha Levin and I'm one of the maintainers of the Staple Kernel Tree and I wanted to talk about the various safeguards we have in the process to make sure that we do the right thing with regards to the kernel tree. So starting with the purpose behind the kernel tree, we have two main goals we want to achieve with the kernel tree. We want our customers to be happy and we want to do that by making sure that first we do not introduce regressions into the kernel tree and just as important as that we want to make sure that we do not miss any fixes that should be in the kernel tree. Now these two goals are a bit competing because we want to take as many fixes as we can in the kernel tree but we have to be very careful that we're not taking regressions and that's very tricky and we'll go over the stages in which we the stages which a patch goes from being written all the way to being released in the stable kernel tree and each of them we will talk a little bit about the various safeguards and ways we have to make sure that we're doing the right thing with regard to those patches. So the first one is the rules we have for the stable kernel tree and this is the most basic way we have to guarantee that we're both not taking any potentially buggy patches and that we're taking all the fixes. So the first rule is that the patches must be small straightforward and correct. It means that no complex new mechanisms, no new features, no new big piles of code and usually fixes are pretty small straightforward and correct right usually something that will fix say you're simply free or something that would fix a double free for example tends to be a one or two liner some of the fixes might be a bit more complex than that but really fixes should be simple just because they are fixes and what this rule helps us is to prevent from commits having two things only a mix of things we're making sure that the fix is only fixed but it's not also something that optimize a bit different part of the code. We also want our fixes to be already upstream it's important for us the fixes live they already existed in this string and it's important for us that those patches have passed this bar of being in this string. We do not want to fork the stable kernel tree but we also and we also want to make sure that anything that would take us past that minimal bar of being accepted into the string. When we backport commits we very rarely modify them because we would rather because of the rule we mentioned above which patches to be already upstream and if you modify a patch it means that it's not upstream anymore and it's just our version of the patch but it's also important for us not to do that because we want to rely on the testing being done upstream and if we modify a patch we don't really have a good way to test it again so we would often rather take series of patches that are already upstream rather than to modify a patch just to make sure it applies. It also means that in the future it's easier for us to keep taking patches because we won't be seeing conflicts as a result of our modified backport but rather patches we just apply as they would on the linsa string. If we do end up with a case where a patch has to be modified or a patch is really big and scary sort of like what we saw with all those speculative execution vulnerabilities last year and two years ago we require really good documentation and really solid testing as to why these patches needed and how the backport to older stable branches is different than upstream. We usually want the maintainer of the subsystem to sign off and we want to see which tests were done exactly again to make sure that we're doing the right thing with those patches because they haven't passed upstream testing we want to make sure that we're taking something good into our stable kernel tricks. So the first part begins as one is still writing this patch you are writing your patch and you might send the first or second version of that to the mailing list and when that happens with patches that have a stable tag in them usually maintainers and reviewers will pay more attention to them they know that those patches are more important because they'll end up quickly within users hands in their environments and running their workloads and it's important for maintainers to make sure that those patches indeed do the right thing and they indeed fix the bugging question and didn't not introduce any new regressions. Similarly if you sent a patch that might fix something but it doesn't have a stable tag or a fixes tag we might ask you or the maintainer might ask you to add something like that to indicate that this patch indeed should go to stable trees and to help us make sure that we're both back ported into the stable branches where it's relevant and to make sure that we're not trying to backport it into stable branches where it should exist. This is where stable tags with version number or fixes tag indicates which commit broke what you're trying to fix is very important. We also have a simple bot that fetches stable tag patches for mailing lists and attempts to apply them on our various stable branches where the stable tag and the fixes tag indicated there should be and that bot will send you an alert if something goes wrong. So for example if you indicated that the patch should be applied to the 414 stable branch and we did it on the bot couldn't apply to the 414 stable branch you will receive a mail indicated what happened and it might offer suggestions and to which patches are missing to allow it to apply or build cleanly and the purpose of that is to make sure that that we get responses what we found that if we sent these mails only one we actually fail applying the patches it's usually two or three weeks after folks have written a patch and they kind of moved on to working on something else and it's hard to drag them back into looking at the patch and trying to apply it on stable branches but if we send this alert only on early on when the patch is still in development we find that we get more responses because people are still working on the patch and are encouraged to go back and figure out what's going on with stable branches. The last point is something that is really a property of Linux's stream and it's that you can always send a patch in there's they could always send a fix in there's never a bad time to send a fix you can you don't have to wait for the merge window you don't have to wait for release candidate cycles there's never a bad time to send a fix upstream and you really shouldn't sit on the fix if you if you think it's good to go there's no reason to wait for anything if you have a fix send it upstream and we'll be more than happy to review it so once your patch was reviewed and it was accepted into a maintainer stream it will also usually end up in the Linux next tree where it will be hit by a battery of usually bots that test those trees other systems that kernel CI also built individual maintainers trees and run tests on them so really once the patches in the maintainer streets exposed to a battery of bots testing it which is a good first bar to make sure that we clear all the stuff that automated testing can find and once we're once the patch makes it into a Linux next we also have a derivative tree named stable next which is just the stable text commits that they could live in linus in Linux next but don't live in the system and the idea there is that you want to make failures in the testing of stable tech commits very obvious we don't want those failures to be swollen by other failures in Linux next so the stable next tree is a very good way for us to make sure that we're seen very clearly failures caused by stable tech commits and we could address them properly before they're being pulled into the stable trees to stable cure actually in the release of the stable tree now at that point if the patch has a fixed stack that points to a different patch which is a fix we might avoid taking that other patch into the stable tree because we know that it's buggy and there's a fix waiting for it in the maintainers tree we can't take both patches at this point because this new patch is still isn't upstream and that's one of the rules of the stable kernel tree but what we can do is delay the inclusion of the other patch until this patch is ready to go and until any discussion around this patch has been settled so once the patch has passed the maintainers tree and it's exposed to linus's tree it's now getting even more testing and now the testing they're becoming there's more humans running this test now it's actually a lot of developers who are taking this kernel tree building it running through the laptops running it on their servers so there's a lot more exposure to different workloads and a lot more exposure to different hardware and this is sort of the bar we're trying to achieve when we're taking patches for stable we want to make sure that the patch was at least tested and reviewed enough to make sure that it got into linus's tree and we really appreciate the work people are doing testing in this tree and it's important to us that anything that goes into stable was tested by those people and really appreciate the work being done there if the patch doesn't have a stable tag but we suspect it might be a fix we will run it to our autosale bot so autosale is a neural network which identifies patches that might be fixes by looking at the commit message it's looking at the author of the commit people who signed off on the commit it's looking at which files were changed and it's looking at the various code constructs and it tries to guess whether a certain patch is a fix even though it doesn't have a stable tag in it and I will also talk about how it works more next slide at this point also if the commit has a fixes tag which points to a different commit that didn't make it into the stable trees yet we will sort of work with those two or three or as many patches as it takes as one unit and usually one we will merge we will either merge all of them together or we won't merge any of them just to make sure that whatever we merge is completely fixed and tested it's not always the case sometimes we have scenarios such as a fix that is the fixes white spaces or indentation where it's not interesting for stable trees but usually the cases that we would take all commits based on their fixes tag together to make sure that we never introduce regressions into the stable kernel tree so now that the patch is upstream and in Linsus tree usually most authors are happy at this point and this is their mission accomplished they did their work but for us really this is where the real the real work starts so for every patch that we consider for stable the first thing that happens to it is that it's being carefully reviewed by one of the stable maintainers we look at every patch manually no bots and we make what we try and make sure that the writing is being done there and the patches really appropriate for stable for patches that went through the autosale process or the neural network suggested that they're relevant for stable we will kick them off for another round of reviews that where we allow at least another week for folks to object and to comment on those patches just to make sure that we're doing the right thing with autosale patches when the patch has been queued you also get an explicit mail you want to make sure that people are aware that their patches point through the stable kernel subsystem or stable kernel workflow and we want folks to comment on whether we might have missed a patch whether the patch isn't appropriate for the branch it was queued on we're basically trying to make this whole process very explicit for users another thing that would happen is the dependency chain analysis process so when we try to apply a patch and to order a kernel version it might not always apply cleanly and we do not want to modify this patch so instead we would look at the dependency chain of patches that are required for that patch to be applied on and you can take a look at the data tree basically a library of dependency chains and see how it works the idea there is that we would rather take a few more patches to make a certain patch apply and build and test cleanly rather than modifying a patch but another benefit from this process is that it helps us find other fixes that were lost usually looking at the dependency chain we can see that we missed a different patch that fixes an issue in that area we also should take it so really it's also a way for us to make sure that we're missing less fixes on other kernels so now that your patch has made it into all the way into the stable queue this place is very similar to linux next where it will be tested by a lot of bots that try and build this queue it receives the same quality of testing that linux text receives where this is a tree that gets generated very frequently and usually bots run it and it's a way for us to detect issues that come through the issues that patches that were just introduced into their queue rather than been there for too long it also give the scale of this queue is very small relatively compared to linux next so it's actually much easier for users to see if their patches which were missed or is the patch that wasn't backward correctly it is just an easy way for folks to do sanity check on our work and make sure that we're doing the right thing next we have the release candidates so this process happens about once or twice a week or when we try to do a stable release the first the most important thing that happens with release candidates is that you will receive yet another mail saying that the patch made it into a stable release candidate and this is again to make sure that folks are aware of what's happening and that they can object or they can comment on their patch being on the patch going into the stable tree and another important thing that happens here is that these stable release candidates are being tested on real workloads so rather than just developers or bots it's actually going to real users of the stable tree it's gonna run real workloads it's gonna run on real data centers it's gonna make sure that we don't regress real users with real workloads this is a very important step and it's very different from what we saw in linux tree or what we saw in linux next we're here it's exposed to the actual end users of the patch so the tests end up being much more comprehensive they're very different from tests that were done on developer laptops testing linux linux's release candidate cycles here it's real live it's real workloads and a question that keeps coming up often is that how can I make sure that my workload isn't being regressed by new releases of the stable kernel and an easy answer here is just replied to the release candidate mail that we're sending out feel free to test the stable kernel with your workload when you receive the stable rc email and if you see any issues just report them back to us and we will make sure that we're addressed all these issues we will never release a kernel when we're we know it has a regression so really you have enough time to test your test the new stable kernel with your workload and just come back to us report that everything okay or if you see an issue that the issue exists and there's no standard template for this and it's very easy to do you can just either reply to the mail you can reply to us privately if there's an issue with talking publicly about your work it will make sure to address it it's a very important goal for us to make sure we do not regress a stable kernel tree and this is sort of the last step we can do to make sure that it doesn't happen before we release the kernel so please if you have workload that's sensitive to changes in the stable kernel tree just test it during the release candidate cycles and report back whether we broke something or not and it will help us make sure that we're releasing a kernel that doesn't have bugs and it doesn't have new regressions in it so now after we've released this stable kernel it doesn't mean that our work is done we still keep monitoring the both upstream and we monitor the mailing list for new patches that might fix patch or fix a commit that would have an unstable kernel tree we look at bug reports that show up on a stable mailing list both for issues with the patch or issues with the back port process of the patch and we try to address it very quickly to make sure that regressions don't live for long in a stable kernel we really keep monitoring the work we did in the past just to make sure that any new issues that come out of it are addressed very quickly to make sure that there's no regressions in the kernel tree and beyond the work that we're doing directly with with these commits and this workflow an important goal for us is also to make sure that we improve the kernel's testing validation story that there's a lot of arguments that the stable tree should be should be seeing very minimal changes as it's stable and it shouldn't be receiving a big amount of patches but that also means that we're missing important fixes that go to the stable tree and I think that the way to address that is not by taking less fixes is by beefing up the kernel's testing story and we do that by working on projects such as kernel CI and zero data make sure that they're healthy and the work is being done there is solid and good enough to make sure that it helps the kernel's validation story we also review downstream trees so we often look at we put this kernel tree we look at fedora's kernel tree and other kernel vendors just to make sure that if they take fixes which did not exist in the kernel tree we should probably consider taking those fixes as well so we often review downstream kernel trees to make sure that we haven't fixed any that we haven't missed any fixes that those vendors took in and similarly for backtrackers we follow both the kernels bugzilla as well as vendors backtrackers to see if users report real issues with the stable kernel tree and those vendors take fixes for those issues we want to make sure that those issues are being taken into the upstream stable kernel tree as well and another thing we've been doing more recently is reviewing all the commit ranges for older stable branches just to make sure that we haven't fixed any that we haven't missed any fixes going into those older kernel branches and this is sort of an ongoing work so if you see a lot of patches for older kernels showing up this is why and please comment on them if you think that we either missed any fixes any like older fixes or if we've taken a patch that we shouldn't have into those older branches so maybe to summarize here the process the stable kernel takes has a lot of safeguards in place just to make sure that we don't mess anything up we're really careful about not to induce any regressions and i think that the safeguards have listed are are more than enough a really good measure to make sure that we don't have any any regressions in the stable kernel tree the end result of the stable branches is that they're way better tested than lunacy's tree right because they're seeing actual real workloads rather than just developers trying it on their laptops based on historic data of regressions in the stable kernel tree we can see that the regression rate is very low compared to lunacy's tree so there really shouldn't be an issue being afraid of waiting to a newer version of the stable kernel tree and really usually should be upgrade upgrading very frequently and not fearing of of us introducing new regressions it's also the case that when moving forward between major versions of the stable kernel tree so for example moving between 419 to 54 we guarantee that there won't be any new regressions introduced by us because we never backport a patch to an older version if it doesn't exist in the newer version so for example if we have a fix we will never apply it to 419 but not to 54 and this is for this reason if users upgrade from 419 to 54 we make sure that they won't be seeing any new regressions that were fixed in the solder kernel so really upgrading to newer stable kernel tree should be a very easy process and users shouldn't be worried worried about doing this if you do encounter regressions please do report them to us and we'll be more than happy to take care to take a look and try and address them with regards to missing fixes we also have a lot of mechanisms to make sure that we're not missing important fixes that go into the stable tree we have a few safeguards in place to audit the all the commits that go into linux's tree and to make sure that anything that might be a fix gets reviewed by us and if it's indeed a fix that's relevant to the stable branches we'll be taking it in the autosol process which i've mentioned is an important process that found a lot of fixes that doesn't have a stable tag and it's really improved the the amount of fixes and decreased the amount of fixes we miss that go into the linux's tree but not into the stable tree for patches that have a fixes tag even without the stable tag we will still review them just just because of that fixes tag to make sure that if the fixes tag points to a commit that's in the stable branch we will actually look at that other commit to validate that whether it needs or it doesn't need to be in the stable tree as well and similar story for downstream trees downstream vendor trees we really audit those trees very often and we really do our best to make sure that there aren't any there aren't any outstanding fixes in the stable back in the state sorry there aren't any outstanding fixes in the in those vendor trees that aren't that don't exist in our stable branch and that's it i'll happily answer any questions or comments that folks have thank you everyone thank you for listening and i'll be happy to answer questions um self sarang is asking with regards to fixes and testing the stable trees run specific test cases to confirm a patch correctly fixes the problem and b there is no other regression introduced so there are no test cases on the patch by patch basis but we do run systematic tests on the stable trees before they are released and users are more than welcome to plug into those tests and basically gait releases of stable kernel trees if we cause regressions so if you fixed an issue that was a regression before and you provided us the fix and we took it in the stable trees you can easily add the test case either in an existing system such as kernel ci or ltp or so on or test them yourself if it's a use case that matters only or mostly to you and we will make sure to look at those results before we do stable kernel releases to make sure that we haven't regressed the fix if there's no other questions i'm sorry there we go florian is asking where is the whole process documented so the stable tree process has some docs in the commutation folder in the kernel tree it's not completely it's not fully complete with regards to the process it's more around the rules we have for the stable kernel tree i don't think there is a full documentation as to just the general flow of everything all right so ph is asking i'm new to kernel development and create the patch and send it out who gets the patch first the maintainer or developer so maintainers are also developers so usually what happens is when you send a patch out it goes to a few recipients who are who are listed in the maintainers file and those folks are both the maintainers for the subsystem as well as reviewers who help maintainers so ideally your patch would get out to all the people listed in the maintainers file basically it doesn't matter if they're the maintainers or if they're reviewers for a certain subsystem so before you send the patch out make sure you send it to the right people rob is asking there seem to be a lot of lf projects like cip that are trying to use lts kernels is there any coordination between teams yeah definitely projects like cip contribute back and they base their work on the lts kernel trees there's a lot of cooperation we work both on making sure that we have all the patches we need to integrate into stable kernel trees as well as we both work on initiatives to help testing of these kernels for example the kernel ci initiative where both cip and the upstream process kind of come together to work on making on improving the testing for the stable kernel trees as well as approving testing for scenarios that are answers to cip and other members of kernel ci i'm sorry may questions are a bit off there you go Alexander is asking how about using sysbot 3 pros as regression tests so that's something we're looking at i think that the plan was to integrate them into the case health tests right i think at the very least we want to be running them maybe in the context of a different test framework on zero day or kernel ci we talked about this for a few years i think there are some snags here because i don't remember what happened but it's definitely a test we want to have it's worth following up on why it's not there yet i think the meter will know better the meter view of the is behind all sysbot work i think that we definitely want to have that as a regression test now adam Ford is asking when they have the fixest tag added to a patch i've seen them apply to stable kernels without seeing stable some cc stable what is the rule and went to and not to cc stable so the rule is really if you want your patch in the stable kernel please cc stable explicitly we try and review patches that only have a fixest tag as well patches that don't have any tag at all to make sure that they're not missing the stable tag so we may take them in manually but really the process we should be doing for having patches included in the stable kernel trees to cc stable explicitly and you can have both right you can have a stable tag and a fixes tag in the same patch and that's sort of the preferred way to do it and that way we know how far we should backport your patch based on the fixes tag but really the the cc stable tag is really important and you really should use it to make sure that your patch ends up in stable trees so rand is asking could you share a link to know more about stable tree testing i don't have a single resource but maybe let's chat on slack after this and i'll try to collect a few links that can help i'm not aware of a single resource for that pavel is asking is there a way to mark patches is not for stable that's a great question i don't think that there's a standard way right now but we can definitely add that in usually we find that having a comment inside the patch like in a commit message enough if you just write don't include this page in stable it's nice if it explains why it's not stable material otherwise i don't think there's like a standard tag for that all right i'm not seeing any more questions so i guess i'll close it up thanks everyone for coming and listening to the talk and i'm available on slack or email if you have any more questions comments or if there's anything i can help with please let me know thank you very much