 We got a couple more people coming in, all right. My name is Justin Forbes. I'm the Fedora Kernel maintainer and have been for a disturbingly long time. What I'm going to go through today is, this is the first time we've done the State of the Fedora Kernel talk since NEST 2020, I believe. So there's been a whole lot of change, and anybody who's working with the Kernel package would... Well, it's ugly. It's ugly for a reason, though. So what I'm going to go into in this talk is a little bit on the end user experience, stable kernels versus raw hide kernels. A little bit there. I'll go into a little bit of the Kernel contributor developer experience. And then I'm going to go into people who are debugging testing patches, things like that upstream and how you deal with that. From a Kernel perspective now, if you look at the spec, it's messy. It's not as clear how you do some things. So I'll go through that. I'm going to try this all in 25 minutes. So instead of having a nice flow, it's going to be a little bit disjointed, section to section. So the first thing is stable Fedora kernels. And if you're a kernel user, you probably... If you're an end user and you're not a developer, you probably notice there are a whole lot of kernel updates. And why are we getting so many kernel updates? One of them is Fedora does try to stay very close to upstream. We push almost every single stable release as an update. If you want to know why we do that, there have been 117 kernel CVEs so far this year. Now, a lot of those CVEs don't affect everybody. Some of them only affect very specific hardware. Some of them affect older kernels. We've already fixed them. But just in 2023 alone, there have been 117 CVEs so far. One of the other things that happens is we get regressions as these updates go through. And it's unfortunate. So we got to get bug fixes and try to get those regressions through. One thing I did want to stress here with karma is you are going to see sometimes, often times, we will push a kernel that I know breaks users, breaks specific users, right? If somebody says, I have problems with my AMD graphics card and it's a specific card or something, I'm not going to hold back fixes for a bunch of other users just because of this one specific problem, this new regression that's come in. And the idea is because we release so many kernels so quickly, hopefully that regression will be fixed soon as well. Now, there are exceptions. Like there was an XFS issue that could corrupt data. We did not push that kernel. We held on to it until we could get that issue fixed and get it pushed. Data corruption, things like that are not things that I'm willing to let go as a regression. But if a specific piece of hardware doesn't work or sounds not working right or something like that, a lot of times other users still need the fixes in those kernels. Another thing about the stable Fedora kernels is if you look at the rel kernel or the rawhide kernel, you'll see there are actually a few extra patches in that. There are some rel specific things. And the reason those exist is because of ELN. We build kernels with most things in rawhide. It gets rebuilt for ELN and it's the same package. With the kernel, it's the same disk it, but the ELN kernel build is essentially a rel. It is exactly a rel kernel. It's got a rel config and it does have a couple of patches that we don't have. One of the things that I do as I make sure that when these patches go into rawhide, is anything that's rel specific, anything that is not wanted or needed for Fedora, Fedora has to have if-defs so that that code path is not executed by Fedora users. So there is a config rel differences config item that basically says, we're just going to bypass all of these chunks of code that rel is added. And rawhide that's there. Running the kernel should be no different than it would be if those patches weren't there. But it does make it a little bit more difficult if you're dealing with upstream, say I've got a bug and I'm talking to upstream and they want you to apply a patch and the patch doesn't apply cleanly because there's this weird rel code there. So stable Fedora releases, those patches are removed. And part of the reason why is to make sure we stay close to upstream. So the other thing about karma is if you get a, whoops, if you've reported a bug already, if a bug say happened in 6.4.3 and that bug is not fixed in 6.4.4, negative karma is really not helpful there because what that's saying is my bug is more important than any other bugs that might be fixed by this. I'd need the negative karma if it's a new regression because I read all of those. It's good to know we've got a new regression, what that regression is. But four kernels in a row that you're applying negative karma to for a regression that happened five kernels ago isn't necessarily helpful in that regard. So that's kind of the end user experience part. And now we're going to talk a little bit about kernel maintenance. And the reason that kernel maintenance is important is if you want to do your own kernels, if you want to look at our code, if you want to do any of those things, it's much easier. What we're doing now is everything is maintained in a source git repository on GitLab. And that's the kernel arc repository. In that repository we have branches. There's an OS build branch, which is the raw hide branch. And then we have a branch for each stable Fedora release. And when I say stable Fedora release, I don't mean Fedora 37, 38. I mean Fedora-6.4 is the 6.4 kernel series. And the kernel is the same for all versions of, stable versions of Fedora that support it. So, you know, 37 and 38 are both on the same 6.4 series right now. And it's actually one script that just updates the disk git for both of them when we do a stable build. That sometimes changes when you have a release going into life, like I didn't rebase Fedora 36 when we had less than a month left. So it stayed on the previous version. In that case, that doesn't mean we won't do updates. It just means I'm not going to do an update unless it's a security fix or something like that of some importance. It is end of life from a new code standpoint at that point. Another thing that's interesting about this, though, maintaining everything in source git, it's an exploded kernel tree. And we have scripts that actually generate the disk git, and they do it by creating a source RPM and then just exploding it over the disk git and adding everything again. That means anything that's done in disk git would be overwritten. Now, that's a weird workflow, and I know a lot of people aren't used to it. So luckily we don't have people frequently committing to disk git in kernel, but I do look at the diff every time. So when I commit, I actually have a script that doesn't commit and does a diff, so I'm looking at the diff before I push, and hopefully if anybody did change anything in disk git, it would be noticed and it could be picked up and brought into source git correctly. It also means, though, kernel doesn't really pull requests from disk git on pager. We do have on pager. We do have merge requests on GitLab. We highly encourage them. Several people in the community have contributed, and it's been a great experience, at least from my end, and hopefully once people have figured out the process, it's good on their end too. One of the downsides to doing this this way, well, when you're looking at the kernel spec in disk git now, it's gotten pretty nasty. And I hate to say it, but it's going to get a lot nastier. And the reason that is is because we've introduced variations. There are variations that are there for rel now. There are variations that are coming for Fedora. So the real-time kernel is supposed to, at some point, be merged upstream. I'll say any day now. Clark would really appreciate that, but we don't know when it's going to be. It's been any day now for a year or two. When that happens, there's a good chance we're going to do a Fedora real-time kernel. There's a good chance that we will do a 16K kernel for ARM devices. Those things are all variations, and while it's very easy to manage them from the build side, it gets pretty ugly in the spec. Eventually, I kind of hope to break those things up and make it to where you don't edit the spec or the spec template directly at all. You're just editing chunks which relate to what you need to do and make it all a little bit more manageable, but we're not there yet. Now, the configs portion of all this. If you need to change a configuration item in your kernel, say you want to turn on a driver that we haven't turned on, or you need to turn off something, any time you need to change the configs, they're terrifying. There's thousands and thousands of files in the Red Hat directory. There's a Red Hat configs directory. And inside that, you'll see Fedora directory, a common directory, and a rel directory. And the idea is anything that is unique to Fedora is set in the Fedora directory, and rel might be set differently. So anything that would be unique to rel is set in the rel directory, but anything where they're set the same on both would end up in the common directory. Tries to make it a little bit more manageable, but it's still pretty scary. And it gets you a little bit more interesting when you say, I want to turn on the config option, and you find no file there. If there's no file there, it means that adding one and turning it on will do nothing. The reason that is, is because that when you build the kernel, or when you actually, we have a script called make disconfigs check even as well, that you don't have to build the kernel to even get to, it goes through and builds your configs and checks them and make sure that they're valid. If something is unset, it's going to say, hey, there's nothing set here and you have to set it in one way. Now, the reason that you don't have a file there, if something is unset and you're trying to turn something on, you know that's a valid config item, it's because there's a dependency that it's missing somewhere. So if I don't turn on netbender qualcomm, for instance, there's four drivers underneath there that it doesn't even ask about, it's not going to check because I haven't turned on that dependency. So if you are changing things, and this is important because we even have internal kernel developers that stumble on this, it's just, unless you're dealing with kconfig frequently, it gets somewhat confusing. If there's not a file there, it means you need to find out what that config option depends on, and perhaps go change that as well. Once you do set your configs to what you want to do, there is a command make disconfigs check, which builds all configs for all arches and checks them to make sure they're valid. It will tell you if something is missing, sorry, it'll tell you if something is missing, meaning it hasn't been turned on, or it hasn't been set in either way. It will also tell you if there's a mismatch. So let's say I set a config item as a module, and for some reason, due to dependencies, it's forcing it to be built in and can't be a module. It'll say, hey, you set this as module, the generated configs are showing it as built in. You need to go fix that, and it won't let you continue without that. Please, please if you're doing a merge request for the kernel changing any config items, run this command first, because if you don't see I will fail, it's going to have to change anyway. The other thing I'm going to say is help is on the way. I know how messy this is, I know how bad this is. Others inside Red Hat know how bad this is, and there have been several discussions, and there's even been code being generated now by someone within Red Hat that will be made public as part of this kernel repository where you will be able to manage configs in a much easier way. You'll have a way to say, I want to turn on this driver, and it's going to come back and say, okay, as a result, we also have to turn on these things. Is this okay? And that's it. It'll be a lot easier to manage, but it's going to take a little bit of time. So I'm hoping by the end of the year, but we'll see. In the last bit, historically people dealing with Fedora kernels if they get a patch from upstream, they want to test that patch. It used to be a lot clearer how you add patches within the spec file. It's still not horribly difficult. There are two ways that you can really do things. If you're working in Diskit, you can actually just add that patch to a branch and make dists-rpm, and it's going to generate a source-rpm for you to test. But if you really want to work within Diskit or you're working from a source-rpm itself, we have a blank patch called Linux Kernel Test Patch, and you can take a patch or a series of patches and cat that to that file, and it's already there, it's already applied so that you don't have to go through and mess with things that way. We also have an option in Diskit for config overrides. So if you want to change something in your branch, if you're running a branch somewhere, you want to change a config. You don't want to mess with our configs and worry about conflicts as you update over time. There is a directory called Custom Overrides. Red Hat configs Custom Overrides. Follows the same arch layout. When all configs are generated, the last thing that happens is we go through and say anything that's got an override, we apply that last. So the order of operations is common, fedora, and then Custom Overrides. And so whatever you set there will be the final thing set. Again, you still have the problem of worrying about dependencies, worrying about mismatches, those things, but at least you can maintain your branch with Custom Overrides and not have to worry about updating and us changing something and the merge not working correctly. That one quicker than I thought. Sorry, I tried to rush through some of that and I'm happy to discuss anything you would like to discuss as far as questions on any of the topics or anyone? Yes? Rail patches are applied. So if we're using raw height, we're kind of using the rail kernel? No. Raw height disk git has rail patches applied. But like I said, anything specific there, so like there's there's some code to support a different type of multi-pathing that rail used to support. Anyway, upstream has rejected that, so it's a rail patch. They are caring for compatibility reasons or whichever, but they that code that enables all of that is if-depth in the kernel itself and it's under Config-Rail differences. So rail and thus ELN has Config-Rail differences turned on, meaning that code path exists, but raw height does not. So functionally, the kernel is the same as it would have been on a stable Fedora branch, so anyone else? So I looked at this whole workflow a few times in the past and it always struck me as very, very complex. I mean, I know that there are reasons for this, but like long-term, do you think that there's chances that it will become simpler or do we really need this complexity? Some of the complexity I think is required and some of it we can simplify. Like I said, the config stuff I know is a nightmare. We're trying to simplify that. We're trying to break out the spec in ways that it's easier for you to do a custom variation, do custom kernels, things of that nature, but as far as the doing a merge request through GitLab, that whole workflow, it is what it- kind of I think what it has to be right now. Now part of that is rail requires controls on everything. They have- there's a 3x system all of these things that have to happen for rail patches. And what you'll find out is, like if you submit a merge request to Fedora 64 or like any of the stable Fedora branches, all that it doesn't go through any of that process. If you do a code change in Rawhide, it does have to go through those processes because it requires acts. And then if you do config changes, as long as it only touches Fedora, again, it bypasses all that process. Now I also have a bypass lever because I don't actually build kernels in Rawhide out of OS build. I build it out of a directory called ArcLatest. Excuse me. ArcLatest is regenerated every night. And what it is is the contents of OS build plus any patches that are not- or any merge requests that are not merged but they're listed as include in release. And the reason that's there is so that I can make sure that Fedora is doing what's best for Fedora is having to wait for rel acts or rel process for those bits. I still have the ability to react and do everything that I need to do and what's best for the Fedora community. Now everybody on the teams that I work with internally are certainly most of them are actually they're all running Fedora. So they're happy to they think about Fedora. They're happy to have things going for Fedora. But they're worried about getting coded and doing things for rel in that rel process. My only job is the Fedora kernel. And so I'm the lookout to make sure, hey, we can't overstep Bouncer. We have to make sure that we have ways around all of this to do what's best for Fedora at all times. Is it on? Yes. When can we expect your new book to be published then in the art of kernel maintenance? I don't know that that would be quite the book, but let's say when I started kernel maintenance, I had a lot of hair. About upstream or not upstream bug reports, what is useful to have in Baxilla report and what is not useful? Assume you have a wireless call that not quite works as an access point. Is it useful to have the Baxilla report there about it? I assume it would be an upstream issue, but I don't really know. So we try to stick as close as possible to upstream. And yes, most of the time it is going to be an upstream issue. Now it's good for us to know that there are several people with these types of issues because one of the things I have to do is watch for patches upstream. So I don't want to bring in patches to the Fedora kernel that have been rejected upstream or have not gotten any comments or anything else. But I am happy to apply patches that fix bugs before they've made it to a stable release upstream. So I'll back port patches from Linux's tree to a stable release or even from Linux next to a stable release to make sure that we're fixing users. Yes, ideally if you can deal with upstream directly it's going to make things better but I still need to know that it's happening. Anyone else? I've recently started using raw hide more and I've noticed when I do a system upgrade there will be what looks like two different versions of a kernel or multiple in a day just for an example it was a 6.5.0 rc3 and then a long number afterward and then later on in the day I was just checking for other updates and it was the same 6.5.0 rc3 but another long number. Could you explain that? So the long numbers are it's the get commit ID that that kernel was based on from Linux's tree. It doesn't matter what we've changed on our side this is Linux's tree merged in. The reason you might see two raw hide kernel builds in a day it just depends on what time of day the kernel was built. I try to do one every single day but it might be the first thing I do in the morning or it might be something I do later in the afternoon so if you update in the morning you might get yesterday's kernel and then you update in the afternoon you might get another kernel that way but usually it's just one a day. The only time that I would ever do a second is if there's something that's a major major security update or if there's reports of serious then I might if we've got a fix for that we might push that quickly but usually it's just one a day. It's usually also built after the build usually it'll land after about 9 a.m. Central time in the U.S. so right right the other thing we have a script that runs every day that merges in Linux's tree to OS build and that script runs at 9 a.m. my time and then script an hour after that that builds the ArcLatest directory so even if I'm traveling the scripts run when they do I can manually trigger them if I need to but I usually don't I just wait till they run. Hello I work for the QAT and when I look at kernels at Bode they usually in updates the stink usually they get a lot of karma quickly a lot of feedback do you have any like approach that you take how long you keep the new kernel in updates the stink before you push it yes and no so one of the things we do we turn off auto karma we don't want the auto push we realize years and years ago you just skip updates testing we usually get enough karma to push immediately so the typical weight now because you're doing one to two kernels a week is it sits in updates testing no less than one full day we push after that but a lot of times we just kind of wait until the next kernel comes out and we push to stable and then that one will go to updates testing obviously if there's a critical security bug or anything like that then we push it through as quickly as we can sometimes even with Kevin and QA helping out we end up with heroics and people going above and beyond to make sure that we can get a serious security bug tested push to testing and then we push to stable within number of hours I'm asking because sometimes it happens and I think it happened recently that some part of our user base is affected for example certain AMD cards no longer boot and usually you can see the negative feedback only after for example two days three days because it doesn't affect everyone so it will take time until the affected people update report the feedback so if for example new releases you wait for at least three days or like what's your approach to minimize the impact that's so that was an interesting one I didn't want to push that kernel when I did the idea is yes you give it at least a day in updates testing and usually you'll get feedback by then the AMD issues there were fewer cards than it did it turned out it impacted more cards and so there was a second kernel that was never pushed and then we had another update that came in that fixed some of the cards but left some of them still broken unfortunately there was also a critical security update there and I mentioned that in the Bodhi when I pushed it we had to push it because of this I'm sorry we'll try to get this regression fixed as quickly as we can and I do have one related question extra so if I see some people complaining on discussion or somewhere that they have a hardware related problem and it seems to be like really specific it's not like tens of people complaining about the same thing what's the best advice that I can give them like report back in Buxilla read at Buxilla or reach the upstream if possible but it's much harder to talk to the upstream especially kernel somewhat easier with Mesa and stuff because they have GitLab and everything but for kernel and mailing lists and everything it's very difficult for the regular contributors so what's the best approach to do Red Hat Buxilla is probably the best approach there like so we wish you could work with upstream but I understand it's difficult for me even to remember oh well this upstream use like ext4 great you want to file upstream bugzilla but you go to another subsystem there's a kernel bugzilla you can enter in a bug for it but they'll never look at it every subsystem has a different way of working so that's kind of one of the ways we have to help I'm out of time out of time I'll be around all week and happy to answer any questions take any contributions do we have Miroslav here already