 All right. This is the end of the paved road. Maintaining links to 4.4 beyond LTS. I am somebody who has been paid to do Linux stuff for a very long time. A few weird acronyms that need to be defined so you all understand what I'm talking about here. CIP is the Civil Infrastructure Platform. It's provided by the Civil Infrastructure Platform project. And it's a kernel and user space for civil infrastructure and industrial applications. The Linux stable kernel maintainers once in a while pick a branch of the Linux kernel and maintain it as the LTS long term support kernel. And this is maintained for a couple of years and it's something that is used most prominently in Android and Debian and in some other applications. Once in a while the Civil Infrastructure Platform project picks one of these LTS kernels and chooses it as the super long term support kernel. And the way that works is that while LTS is maintained the SLTS kernel runs in parallel with the LTS kernel and once LTS goes out of support the CIP project takes over and maintains that kernel for as long as it takes but for a minimum of 10 years. And this is what I'm going to talk about today, the SLTS kernel, how and why. When you talk about old kernels people have opinions. When the CIP project was looking for an additional kernel maintainer who eventually became me or I became that maintainer. Chris Patterson was scouring the darkest corners of the internet to find a suitable candidate and one day he dropped into a private IRC channel where I and a bunch of other embedded kernel developers hang out and asked if somebody wants to help with that. So this is the first reaction he got. Maintaining a gerontic vendor kernel with a billion backported patches until the end of times sounds exciting, not really. A lot of people don't want to do that kind of stuff. I don't know why I'm quite happy with that. But yeah, this is a matter of personal preference. So people just viscerally don't like that idea like old stuff. We don't like to maintain that. Another kind of feedback that we got is after the announcement by Pavel Mocek that we are going to keep maintaining the 4.4 kernel as an SLTS kernel, a prominent kernel maintainer wrote this, it costs more money and time the older the kernel is to keep it alive and it's cheaper and easier to use more modern kernels. Well, how valid that argument is and how relevant it is is something I'm going to be talking about today. And to understand that we need to talk a bit about time scales. How you perceive time depends on what hat you're wearing. If you are like a chip designer you think in nanoseconds, if you're a geologist you think in millions of years. Linux kernel developers think sort of like that. Now is the time where we are at. The last release was a complete bogus done by imbeciles. We don't talk about that anymore. And 12 years ago that's antiquity, it was kernel 2.6. Now I'm poking fun at this but if you tell me kernel 2.6 I think Jesus. Did that even have networking already? It feels incredibly old even though it was only 12 years ago and 32 years ago is the beginning of time before nothing existed. This kind of thinking in these scales is good if you are developing products that have a similar lifespan. Mobile phones being the canonical example. If we're looking at civil infrastructure things are a bit different. First thing you can look at who makes civil infrastructure and for that look no further than the members of the CIP project which are obviously engaged in that kind of stuff. There are nine of them. All of them are corporations and four have been around for more than 100 years. Those being Siemens, Toshiba, Bosch and Hitachi. So that's generations. I think Siemens predates Germany so that's quite a bit. So these are super human time scales. Nobody lives that long. So that gives you kind of a hint. We left that frame like back when we were young and we're now at the point where we say back when my great-great-grandfather wasn't born yet. But these are just names of course. No person who works for these companies is the same anymore and they're also largely not using the same technology anymore. So what about the actual stuff? I call it stuff because honestly I could not come up with a term that describes everything from elevator controllers to fusion reactors. I just call it the stuff. Here's a bit of stuff. This is the Yaruga-2, a hydroelectric power station in Croatia. That's the River Krikka in the front. And it's number two. It has an older sibling number one, Yaruga-1. That was built in 1895. And its distinction is that it is the second oldest of its kind. The second oldest multi-phase alternating current power station. The first one went up two days before that on Niagara Falls in the US. And if you didn't know anything about the state of telecommunications technology, at the time you would think they did that on purpose. But it was probably just a coincidence. So a couple of years later Yaruga-2 was built because new industry popped up in the area and more electricity was needed. So this was completed in 1903 and it went online in 1904. And that's the end of the story because in this year of our Lord 2023 it's still online. And it still does the same thing that it did 120 years ago. Without any fundamental changes it has been refurbished about four times per century. I think in 1936 or 1937 they installed an additional generator. But it's fundamentally the same plant. And it's a production power station. It's not something that technology enthusiasts keep alive for fun or as a tourist attraction. This is production. It supplies energy to the grid. And it is not an outlier. Power stations of similar vintage, there are plenty of those that are still running. And if you look at slightly younger ones, younger as in only 70 years old or 80 years old, there are at least dozens if not hundreds. I didn't count all of them, but I would say it's in the hundreds. So the actual stuff also lasts for forever, for generations. So if you have something like that, if you want to run Linux, or if you want to use Linux in running it, you probably do, what kind of a kernel would you want for that? You don't want any regressions ever. If something works, it should keep working no exceptions. That is understandable that you would want that. But there's an easy way to achieve that. Just don't update your kernel, which for some applications would actually work. But in many cases you want stuff fixed. You want bad bugs to be fixed, things that might break stuff in the future that haven't been triggered yet. And of course, exploitable vulnerabilities. You don't want anybody else to break your stuff. So that actually needs updates. So it needs to be maintained. You also want to keep the same code base as long as possible, and that's maybe not immediately clear what the benefit of that is. But one thing is in major kernel updates, regressions are inevitable. And I'd like to open a parenthesis here. I wanted to back that up with numbers because that's a claim that is kind of, you know, I pulled that out of my ass. But I wanted to show a nice graph that shows, okay, you change your kernel, you die. Unfortunately, we haven't been doing that for long enough yet, so I don't actually have meaningful numbers. I can crunch the numbers, but if you want to identify regressions, you need to look into the future. Because if we knew it was wrong, we wouldn't have put it in the kernel yet. So we only know in the future if something, if a patch was bad. And we've only been doing a few releases so far. So I might be undercounting regressions, especially in the super long term support kernel. So I didn't feel comfortable giving you numbers. But unofficially, I am incentivized to think that it's bad. It's actually worth very bad. The other thing that is less disputable is that if you change your entire operating system kernel, you might have to recertify. I don't know anything about the actual practice of that. I'm an open source developer. I throw code over a fence and then what people do with that, it's their business. But I've yet to meet anybody who says we had a great time doing recertification. So how do you provide something like that? We thought about that and we have an approach. How you can provide such a kernel in the face of limited resources. Step one is limit the scope. We are not maintaining a kernel that you can drop into everything from IoT devices to supercomputers. It's intended to be used in specific cases. And what those cases are is informed by our members' needs. If you're an embedded Linux developer or probably an embedded operating system developer in general, you know that you're not necessarily told what your stuff is being used for. But at least in the CIP project, we know our members' platforms. We have reference platforms and we know their kernel configuration. So we have an idea what our stuff is being used for and we can focus on those things. Now I've been telling somebody that over a beer recently and the first thing they said is, oh, that's a proprietary thing. I'd like to point out, no, it's not. We are not exclusively doing that stuff. We don't intentionally leave things by the wayside because they're not part of what our members' needs. We do the other stuff as well on a best effort basis. And we also accept outside contributions. So if we say, okay, we didn't put that in because it's too much work, you can do the work for us and say, I did that, I tested it here. Have some and we will look upon that favorably and we will probably include it. The other thing is, within that scope we want to minimize changes as much as possible. If it's not trivial or important, it's out, that's an inclusive or. If it's trivial, if you can see it's a one-liner patch, it's correct, it's in. There's no question about that. If it's important, if it can break something, if it's exploitable, if you really want to have it fixed, it's in no matter how complicated it is because what are you going to do? We just have to invest that effort. If it's neither, which in other words means it's a complicated useless mess, then it's not going to go in because back porting may introduce bugs and you need to have some benefits for that. Why would you do it if there is no benefit? And the more complicated a patch is, the more likely it is that something will go wrong. So we only do that when it's actually important. The good news is that most patches for stable currents are small, like as in really small one-liners, and they typically deal with stuff where you say, well, of course. They didn't shake far out of memory, didn't bring down the clocks again when the device fails to probe, checked for the wrong return value, checked for zero when you should check for negative one, for equality if it should be inequality, things like that. That's the vast majority of patches in stable and these are of course very easy to back port if you need back porting at all. So this is how we do it, or more specifically how I do it. I am maintaining the 4.4 kernel and part of that is policy, part of it is best practice and part of it is my own judgment, but this is how it actually happens. So step one is ingest all the patches. We originally base our stuff on LTS. Once LTS is no longer maintained, we don't have patches anymore that we can apply. So we just do the next best thing and look at the next supported LTS kernel at the moment that is 4.14 and just hoover in every patch there is. We have a tool for that and just throws all these patches at the 4.4 kernel and see what sticks. If it applies, okay, we leave it in for now. If it doesn't apply, it gets put on the side and this tool produces a log file and that is in my opinion the most important thing about the entire project, that log file details what we put in, what we don't put in, what automatically applied, what had to be manually back ported, what was ignored, why it was ignored. Basically, you know exactly why something went into the SLTS kernel or did not if you look at that file. Once that is done, you have to look at the stuff that didn't work. What about the patches that didn't apply cleanly? Most of them are not applicable at all that is to say they don't make sense in the context of the 4.4 kernel. The most obvious case being it's a fix for a driver that doesn't exist yet. You can obviously ignore that. Another very common case is it's a fix for a problem that was introduced after 4.4. You don't need that either. If not the majority, then at least the plurality of patches fall into the category. Most of the stuff you can just ignore. The next step is back port the easy ones. Remember when it's trivial, it goes in. If it's a one line and if it's correct, you just back port that. I don't generally spend time thinking about how important is that. Do we actually need that because it's faster to just back port the thing if it's obviously correct. The other thing that needs to be back ported and are the important patches, the stuff that basically the point why we're all doing this is they're really bad things that need to be fixed. What was I getting at? These are the ones that were mentioned earlier in the comment by the kernel maintainer is that it gets increasingly difficult to maintain an old code base, the older it gets. That is true, but it only applies to this specific case, the important patches that need to be back ported even though they are not trivial. In practice, that extremely rarely happens. There's maybe one per release. I think the worst was like two or even none. This is, yes, that's difficult, but it's a very rare case. Once all the patches that didn't apply have been reviewed, you have to look at the ones that have applied because just because something applies doesn't mean it actually works. In some cases, somewhere down the line a bit of code was removed and then somebody thought about it and maybe we shouldn't have removed that and then it gets added again. But it has never been removed from the 4.4 kernel, but because the patch only sees a small snippet of the context, it might still apply, so it might have the same piece of code in that twice. You actually have to look at all the patches that do apply. Do they make sense? After that, we basically have everything that we want in a release on top of the previous release and we need to run a few tests. Usually these local compile tests, they go together with reviewing the patches that automatically apply because that's actually the easiest way to find them because they use APIs that don't exist yet. Stuff that doesn't make sense that easily falls out of those compile tests. And the other thing is the Linux CI PCI, which we, as project members, we can use to just push a kernel up and have it run through the entire test suite, through which all of our stuff goes once it's released anyway. After that, the next step is push out a release candidate tree. That is for review. And then I send a request to CIPDef, that's our developer mailing list, and ask for reviews of the back ports. Final step, release. There are four flavors of the SLTS kernel. Once reviews are done, test pass or I release the 4.4 ST kernel, that's the vanilla SLTS kernel, that contains all the stable patches that are not specific to vendors or to our member stuff. The next step is merge this new ST kernel into the 4.4 CIP kernel. That is a kernel that contains a lot, and I mean a lot with capital letters, of member patches that are specific, that are submitted by our members that they would like to see in their 4.4 kernel, even though they were not yet enabled at that time. And that is the thing that you saw mentioned in the first quote about the billion back ported patches. And again, yes, they are a billion or something like that, but completely hassle-free, never had an issue with that, even though there are lots of them, it just works. So that might sound like a big effort, but in practice it actually isn't. And then there are two more flavors, which I'm currently not maintaining, probably still maintaining that. Those are the real-time branches, again 4.4 ST, the vanilla SLTS kernel, with real-time patches and 4.4 CIP RT. The process is similar, so I'm not gonna go into that here. So, what's the point of that? Why am I telling you that? Why am I telling you in detail how I do every patch, et cetera, et cetera? The thing is, cheap and easy, if you remember the quote from before, that's not what we're going for. It's long-term stability until the end of times. And what this is to show you is not to encourage you to go ahead now and start a new project with 4.4 SLTS, even though it's such a great kernel maintained by me and everybody wants it, you are not supposed to do that. The point is to show you that we are willing and able to do that. It's a proof of concept, and to show you that if you start your project today with 6.1 SLTS, we are going to be there and we are going to maintain that thing, we'll keep it running for as long as it takes. That's it already. So, thank you for listening on behalf of the team as well. And I think I was much faster now than I thought I would be. So, there is time for questions. Yes. Thank you. In the patches that you apply, what is the percentage of security versus all the general backfixes? I can't give you an exact number, but I think it's pretty low. We also have an announcement for every new release that details which security issues have been fixed. It's a small number compared to the overall number of patches. So, now it's 4.4, I just saw 6.1, this is the next one. Have you thought about scaling because per new SLTS you decide to have, do you have a separate maintainer? This will stack up over time, we will have a lot of SLTS. I think we don't want to go beyond what we currently have, which is 4.4, 4.19, 5.10 and 6.1. We already have four. And you're maintaining them all? No, we have three maintainers. Okay. As you can see, a lot of other people who are working on that, but we have three people who are specifically for that maintaining the SLTS kernels. Thanks for the talk. How are you making sure that there are no new regressions introduced or maybe more security issues in the old kernels that you're trying to maintain? Is there some testing process going on in the vendors or at CIP? There is, yeah. Chris, do you want to say anything about that? We're collaborating with other testing projects like ComCI to build up, reuse their stuff as well. So yeah, we're testing. We don't have so many specific tests for like specific CVEs and seeing whether they're exploitable still and that it is kind of like, would be nice but as with all these projects it's a case of how much time you've got and resources we're doing. Slightly getting there. Hopefully that helps. So anyone running such an old kernel would probably do so on an old hardware so how do you plan to test on the long run? I'm sorry, I didn't get that. I also can't see you, where are you? Yeah. So, what hardware do you plan to test on the long run? We have reference platforms and we use those. That is the best I can tell you about that. Maybe to follow up on the question regarding how do we scale, how do we want to scale? So the current promise is over 10 years. So as long as it takes is we think we take us 10 years right now to keep the kernels alive, that's the promise and that does the mass that if we continue to release or to commit on one kernel every two years we will end up at max with five kernels in flight that has to be kept this way. Obviously the scale out depends on the capacity that we have as a project and that's also a limitation factor. I mean, if there's a larger interest then you have possibly the probability to extend that. If it really has to be extended beyond 10 years what's your plan on that? I'm interested if you have to respect some kind of upgrade paths between the kernels or is that only on the 25 year scale? That is a good question. I don't actually know that. Upgrade path? About the upgrade of the kernel itself. So if the super LTS kernel is somehow fixed there needs to be an upgrade on the facility on the stuff itself. So is there some kind of agreed on border what kind of upgrade on the stuff is considered harmless so there is no need for a re-certification or things like that and beyond that there is a need for re-certification? I guess that depends on your regulatory body when they require re-certification they try to make the kernel in such a way. The point of the kernel is that you can really just drop it in and don't have to be afraid that something goes wrong. Of course do test it because you know something always goes wrong but the aim is that it's dropped in and there is no issue with that. Thanks for the talk. Do you have a list or a collection of fixes you're not going to backport if they are too complex like the micro-architectural IOU ring or something like that? Yes we do. Like I said we have that file 4.4.org that details everything and specifically what isn't included because it keeps me honest because I can say okay it's easier to throw something out because then I don't have to do any work but there has to be a reason for that and that's detailed there and if there are very prominent components that are left out we also have a known box file inside the kernel tree that explains those things. Apart from announcing the CVEs that got fixed do you also have a more elaborate process for handling CVEs because I would imagine that over the years they accumulate. You probably have fixes for like 400 CVEs there or even thousands. There is a repository I unfortunately forgot the name I'm actually using that to make these announcements that tracks these things so the CIP project has its own repository that tracks such issues. So now the question got me curious did you backport the spectrum mitigation fixes? Nope and the reason for that is it's very intrusive it's very tricky and it's very hard to test so we thought is it actually necessary and the conclusion was the use cases run a specific set of user space binaries and nothing else so bar there being some other vulnerability that allows you to get in it's not exploitable these speculation bugs are not exploitable so we left it out that is a very prominent example that is also in the aforementioned known box file we didn't do that. Do you use the fixes tag in your backport process you didn't mention this in the process so do you make a use of the fixes? Yes you wouldn't make use of the fixes I use it to semi-automate finding out if stuff actually applies or is necessary or not if something fixes an issue that is not in 4.4 I can find that out via the fixes tag and that makes it a bit easier I still look over every single patch but that already gives you an idea which way you're supposed to look so yes we are making use of that. Then the follow up question what else from the upstream maintainers could make your life easier I mean is there something which you want to ask upstream? Correct fixes tag is really very helpful because many aren't like people just come up with ok yeah linux 2.6 is what's being fixed you know that's the favorite and some are just not correct because the bug originates somewhere else and it's just triggered by something and then that is considered the fixed thing which it actually isn't because the bug actually originates somewhere else so correct fixes tags would be really really helpful So recently we found a bug which had originated in 3.7 kernel and we noticed that you have picked that patch up thank you in reverse when you find bug fixes that say 4.4 kernel what's the process that you insist on ensuring that bug fix goes upstream into the latest kernel Basically so we find a bug in 4.4 that is also in upstream How do you enforce that? We get our fixes from LTS so typically those things are already sent to be included in mainline and just are siphoned off to stable and thus end up in our kernel so that usually doesn't happen I don't think it has happened so far that there is something if we have something that is specific to the SLTS kernel then it's something that is a regression in the SLTS kernel and there's not somewhere else that has happened before but typically we don't have anything that actually needs to go to mainline Yes So for the back ported fixes the rules in stable are that you should never back port it to older LTS already in the newer LTS so it's a waterfall model but what do you do with all the extra code that's in the CIP kernel because some of that are back ports from a very recent kernel some of that is code that's not yet upstream that may be back there as well Only back ports really But it means that if you back port so everything should be fixed so if you back port a fix then if there's a similar issue in one of the CIP extra stuff then it should have already been fixed upstream as well because it's upstream only Maybe it also clarified that so really we have this upstream first policy for the suggested feature back ports about support back ports it's a clear rule I'm not aware of exception of this if we had any plug it up that applies obviously you cannot only look at 4.14 to identify all fixes relevant for 4.4 kernel because actually some may only affect 5.10 because this is where the story started so to say for that stuff which wasn't in 4.4 back then so that's why you are reviewing also you're starting to review on the 6.1 baseline currently and end up even newer to get aware of fixes which basically have to jump from the oldest LTS or the newest LTS whatever to our CIP kernel but really this is not a vendor kernel in the sense that you really have stuff there which hasn't been improved by upstream yet Hi How do you track your patches do you have a tool for that do it manually How do we track the patches for example you back port patches from your LTS kernel to yours and then you obviously drop something getting automatically applied something you need to rebase obviously it requires lots of maybe manual work or maybe there is some tool that you use I'm not talking about tool I'm talking about tracks your patches that tool also does the tracking or at least it creates the template for the tracking so we have a look of these patches and we append that to a file and as the process goes along as back porting happens as review happens this template gets filled in with the relevant information has it been put in or not put in how did it go in, why did it go in why did it not go in where all of that stuff is being tracked so yeah we do have that Thanks for doing this I'm wondering how much time do you spend doing this each week on average Personally I spend two days per week Do you use any of the tool language Greg and Sasha use to assemble the LTS kernels because apparently they use some automated scripting as well I think ours is completely homegrown Yeah Well theirs is probably as well but is there some overlap functionality wise there probably is overlap but I have to admit I haven't looked into that because I basically joined the project and the tooling was there so I'm just using it Thanks I think you had one sentence on the slides where you talked about not only the kernel but also other stuff of the system is that other stuff is that or misunderstanding that The CIP the Civil Infrastructure Platform includes user space that's the only thing I can think of right now so it's a combined user space and kernel So you also do patch tracking there or That I don't know I'm only a kernel maintainer Jan can say something about that So the strategy for user space is that we align with Debian and initially thought we had to do the same thing as Ulrich and Koz are doing now with the 4.4 but fortunately Debian LTS and ELTS projects step forward and we are supporting them for a while now so the strategy here is to basically bring our requirements and do these projects and support them together with others from other domains So what would be the next version of an SLTS? The latest and greatest is 6.1 The next one? After 6.1? I don't think we hold that for yet that depends on how things You got an estimate? I cannot really estimate it It will definitely be 6.10 or 6.15 6.10 is a good guess because just from the distance of origin numbers what we have is 4.4 4.19, 5.10 and 6.1 so you can extrapolate from that if you want an estimate So that's no real practice? Yeah That sounds about right Thank you