 For those who don't know me, I'm Laura Abbott. I'm one of two Fedora kernel maintainers, the other being Justin Forbes right there. We are two full-time engineers at Red Hat whose job is to maintain the Fedora kernel. And this is a session I called Fedora Process Review, but I don't have a whole lot of content to actually present because I was hoping to be able to get, the purpose here is to get feedback and discussion for people who have topics they want to discuss. So the way I want to run this session is to get a list of topics and then hopefully get an idea about where we want to start discussing for the two hours. So who has a topic they would like to discuss? OK. What do you want me to call your topic? Have a topic they'd like to discuss with us. Don't make me say nine USB webcams. I mean, the purpose here is also to do processing, so if there is a way the process can improve nine USB webcams. What am I doing more about non-Red Hat community engagement what's there now versus not how you can bend today? Anything else? For anybody who's new to this quick primer on what it is that is happening right now. That's a good one. I'll probably start with that before we get to any of the other topics in there. And then I'm curious if the kernel is going to make use of any of these arbitrary branching stuff. I think you're actually going to get multiple streams whether anybody wants to think about that or not. This looks like a good set of topics. So I will start with going an overview of some of the topics about how the kernel process works. And then we can go on to these. Then I think if there's no objections, I will just kind of go in order here. I will probably see if we can try and limit discussion on some of these to, I'd say, 20 minutes first. And then if there's still engagement, we can continue going just so we don't get rattled too much when we try and have a smooth flow. So for the people who are watching this recording, I'm sorry, this may not be a great session to try and watch remotely just because it's intended to be a discussion session. If people who are trying to talk would like to come up and use the microphone, we can. But I don't think this is going to be very effectively recorded. So I may just give up. Could everyone hear me when I was talking without the microphone here? Yeah. OK, so for those who are not familiar with the kernel process right now, the Fedora kernel is essentially like any other Fedora package is that we use a package get model where the package has a list of sourcing and then a series of patches we apply on top of that. That's the model we use versus, say, having an exploded tree which has all the source file available to build. A lot of what Fedora does for the kernel is dictated by the way the kernel upstream process ends up working. Fedora's goal is to try and stay as close to the upstream kernel version as possible. Rawhide is a snapshot of Linus Torvalds master branch. Basically, as soon as he patches anything out publicly, Rawhide will be giving it within the next day or so. So if you're running Rawhide, you're running the very latest kernel. Fedora stable releases tend to get the last stable kernel. So for an example, the current kernel that's in the development cycle is 4.13. We are on RC7. That's in Rawhide. Fedora 26 and 25 have the 4.12, which is considered stable. And once 4.13 is released, and it's gone through a couple of kernel stable cycles, we will rebase that for the Fedora kernel stable releases. That generally tends to happen after a couple of stable releases. That's the gist of it. Justin, is there anything you want to chime in about the process? A couple things. One is for people wondering when that rebase happens. A lot of people have heard that we rebase on downward stew. It is true that we will probably never rebase before.2, but sometimes it's way after. We don't have a firm policy on there other than we want to make sure that it's fairly sane. And a lot of times you'll see things discussed upstream as being not sane yet. So we'll wait until .3 or .4. We would until .4 with the 4.12. And still had a few pretty major issues that I guess nobody noticed in Rawhide in testing there. So there's not a firm set in there. The other thing is, while we are like most packages that use source and patches and apply those things, the internal mechanism for building the RPM, we actually use Git. So the kernel tree is exploded as a new Git tree. And then patches are added on top of that. It's helpful with a number of things. But the reason that it's important that everybody know this is if you submit a patch, that patch needs to be something that will actually apply with Git AM. If it doesn't apply with Git AM, because it's missing the standard headers, date, author, and subject, then we have to go back and fix it. What happens when I actually open a forward bug? If I open a bug, take me through the process of how that bug actually gets fixed and what you guys do. OK, I'll start with this. It really depends on what kind of a bug is out there. Generally, if it's something that I know how to fix or I think I have a reasonable attempt at knowing how to fix, I will take an effort to look the bug, see if it's something that has been fixed upstream already. And if it is, great, that's made my day a lot easier. And I will be able to pull it in. But a lot of what we do sometimes is trying to figure out, is this what I like to call hardware independent or is it hardware-specific problems? Just because hardware-specific problems tend to be much more difficult to try and debug unless I either have knowledge about what might be going wrong with the hardware or I have the hardware itself. Software independent bugs are much easier to deal with if I could just get a specific test case. It also depends on what kind of test case or reproducer there actually is. If there is a nice, concrete script or program to run that demonstrates a problem, that also makes it a lot easier to things that are trying to work on. So that's roughly what I try and figure out. Suspend-resume bugs, unfortunately, a lot of times just are pretty popular, tend to be hardware-specific, so those tend to get left to the side. I always try and tell people, if you can have a working and non-working kernel and you can run Git bisect to try and find commits, that is incredibly helpful. Just be able to point to a specific commit and either report that to upstream or try and look at it yourself as a great starting point to be able to get things fixed. Do you too report that step upstream or would you direct the original report instead of report upstream? Either. I encourage any Fedora community member who wants to report it upstream directly to go ahead and interact with the kernel community. The kernel community has been not always known to be the most welcoming, but I still encourage people who decide if they choose to want to interact kernel community to interact with it, it may in fact go better for them than they expect. So that or if for whatever reason, someone just had the time or the experience wanted to do that, we can report the bug. The problem is then you're a man, you're the person in the middle. What you can do is relaying for back and forth. Yes, and that's a good point. And one reason why I don't always like trying to report the bugs upstream is because it does feel like I'm just playing man in the middle in terms of trying to get reports and things like that. And we have hundreds, so it does come to, I mean, there is a process of prioritizing, right? This is something that really needs to be addressed quickly and to try to get people on it. This is affecting one user that it's not critical. It sounds not working well or something, but we'll get to it if we can, and we certainly... I'm also curious. So I know it's just to be, when this is gonna sound like a loaded or angry question, I don't mean it that way at all. I mean, this is very introspectively. What do you two consider, since it's just the two of you or a very fast-moving distro, what do you consider doing a good job? Like, what does that mean to you two? Is it probably different than the rest of the packages? Well, I don't know. I'd probably say getting out, getting out of release that I'd probably say is booting on everything, is at least a good starting point. This seems like a low bar, but there are some bugs that we'll not be able to boot up at once. So I'd like to see that. And then my hope would be is that success would be on classes of common hardware. So the most popular app pops, the most popular servers, and things like that. If those are working effectively, then I think that's a good notion of success, because the problem is that there are two of us and we can't support every weird quirky setup. Do you have an impossible job? Yeah, so there's been some focus on trying to make laptops works well, and I think that's really what I think is a good measure of success, is that for the common hardware out there that people actually want to run, is Fedora running well? And I would further that by saying, making sure CDEs are covered, making sure that as we do push out bugs, we do push out bugs, because that's what, I mean, it happens every time on a free basis, but making sure that we are not pushing out major regressions. But that's the racket of software development. We create the bugs and then charge people that have been fixed. Yeah, we see we don't get to charge people. So. So I think that's probably a general overview of the kernel process. And if there's nothing else major, I'd like to move on to topics. Is there one more question? Yes. How different is the Fedora kernel from the upstream kernel? That's a good question. The Fedora kernel just carry, tries to minimize the number of patches we carry on top of it. Usually the patches we tend to carry are a lot of what I think of are quirks to try and make things a little bit better that maybe upstream doesn't care about. We do carry some large out of free patch apps. Currently the biggest one is secure boot, although we are making progress to get parts of that in. When ARM64 did not yet have ACPI support, we were carrying that out of free. So I like to think of at the core level, Fedora is very, is similar enough to upstream that you should be that we almost always report bugs and they are relevant to exactly what is up there. But we're all, and the other thing is that we hope that the patches that we do have tend to be short lived. At least we get a lot of churn where say someone points out a bug and says, there's been a fixed upstream, can you bring this in? We bring it in and then maybe it drops out when it comes in with the next stable release. I see 84 packages on lower 26 and 56 in raw high. Yeah. The kernel is not at all. No, and really you have like the ARM64 stuff is moving really quick right now for hardware and everything like that. So a lot of times there's several patches that are grouped together in one patch there, but it seems that pretty much every rebase has many levers that are coming in. So it's a pretty steady thing, but we're not like carrying things that are destined for upstream. Anything else? How heavily used is the I386 cross? It's the people who use it, it's used by I'd say a handful of people who steadily use it. We are not seeing a lot of new uses of it, I would say. As far as actual numbers go, it's hard to say. Yeah. From what we've heard, the check-ins, the largest number of check-ins on I386 or I386 is actually very old unsupported releases at this point. They're still running them and they're still hitting repos because the automated process sees if there are updates, but they're not there. There are users and there are some very vocal users, but it's, I would say it's probably a very small number. Certainly not on the scale of I386 or I386. Do you think ARM is more? It's much bigger. ARM is harder to measure though because of the types of devices those are. You get all these little ARM boards like Raspberry Pi or whatever the year. I've got three of them in my house that are running music servers. They don't update. They don't check-in, they don't update because they're not connected and they're only connected internally and there's no reason to do anything until I'm ready to re-emission with something else. Can fake options for the kernel which is in the drawer? We're actually gonna spend some time talking about that, but the short answer is that we make a best guess effort when new options come in and we set them. A lot of the options that have been set have basically been sent there since they first came in. So many kernels ago, they had not really been reviewed and one thing, the dog's gonna talk about is that what we're looking to try and do to hopefully improve some of the cycle about that and hopefully maybe get a little bit more review for people who care about. If you have an interest in any sort of kernel config being turned on or off or adjusted, feel free to file a bugzilla. Chances are if we may not know why it's set but if you can come up with a good argument why it should be set one way versus the other, we're happy to set it. Do we do have people occasionally say, can you enable this? And there was a valid reason for not enabling it but it came in. It was something that either came in through staging or came in as unstable or experimental and it wasn't turned on and then when it became stable and supported, we never revisited that because it doesn't show up as a new option and nobody's asked for it. So. You wanna go ahead and talk about it yourself? I'm sure. Can we stay up there? Yeah. My name is Don Zickus. From Red Hat, I'm a kernel engineer for those who don't know me, I do recognize about half the people in the room. One of the topics I wanna talk about today was kernel testing. One of the new initiatives we're doing at Red Hat with a Fodor host, Fodor atomic host, that fast moving host. Initiative is the kernel's gonna be moving a lot faster, we wanna make sure these things are stable. So one of the things we wanna do at Red Hat is to provide a lot of our kernel testing for Fodor. Ideas to kind of help stabilize Fodor to support this initiative, this project. So Red Hat, as you probably know, we've had for years a lot of testing, internal testing that we put on the way for the rail kernels we produce. So we wanna start moving that some of that to the public arena, start testing that on the Fodor kernels and providing some value and community service back to Fodor on that front there. To go even further to help support it, we were looking to kind of mimic what the Intel build bot does. For those who aren't familiar with it, Linus kernel main list, they don't host a patch within four hours, Intel's got a machine that goes and takes your patch, builds it, and tells you if it builds properly on a variety of configuration options. We're looking to do something similar with patches, we'll test them internally on our machines and tell you that these patches at least satisfy how we configure our kernel and let the community hopefully fix those bugs early on rather than have them trickle down at the door and have them get filed in language for a while. I think this will help stabilize the Fodor community, the kernels a lot better than they are today, and provide us some value add to the community service and stuff like that. So one of the projects we did kick off, we put on Peugeot, it's called SKT. I don't have a way to write down that link, but I'm Peugeot.io, I guess there's, Jeff, how do we do these? It is, I think it's SKT, it's a project name. Sonic kernel testing, as we're calling it, and you should be able to, it's very early in the prototype, but the idea is it's gonna go take a mailing list, build kernels and test them and then report the results back. One of the, so that's one of the things we're hoping to do and hoping the community to see some value there and whether testing efforts will combine with Justin's work, we can maybe get more engagement from the community to help run some of these tests on their machines and help report the feedback. Yeah, it's just Peugeot.io slash SKT, if you wanna write that. Oh yeah, okay. And I think this gentleman over here, you asked a question about configs. One of the things, Fodor, Justin and Laura are kind of overwhelmed with some of these configs sometimes, a lot of times it's the best effort on the initial config and later on when a community comes along saying we need this driver, we need this feature enabled, they'll ask for it. So one of the things we were trying to work on is having redhead engineers get engaged in some of these configs and say, hey, you know what? From our experience, we've learned these config options, if you tweak them to this setting, you get better value, better support. This feature is interesting if the community probably wants this. These settings are probably a little better for the community, the community wants this. So we're gonna start hopefully providing value there and adding, doing our own internal review and providing suggestions to the Fodor community. And hopefully they see value there and accept it, otherwise we can have a conversation about the technical grounds and stuff like that. So we're hoping to help kind of offload some of that burden off Justin and Laura and put some redhead and internal folks involved in that. I think that covers my line. Any thoughts on who you should add? What's the next phase of SKT with MySect? Okay, so our SKT project right now is pretty simple. It just kind of takes kernels, builds it internally, and it runs a test suite on our machines. And of course, then if the test fails, when the value add is provided, email feedback to kernel.org and have those guys, like here's the test we had, here's the failure, can you guys help fix it? An idea is if you get it early enough, people post the patch, can respond appropriately. This failure feedback only works if you can kind of, it works great on the patch per level, but if you're doing on a Linus kernel, get tree, sometimes you need to do a bisect on it. You got to start bisecting patches automatically. We're looking to build that logic in the SKT to help with the failure case so we can narrow it down and commit. Respond appropriately so that it really helps engage the kernel.org folks to say, hey look, okay, this is the exact commit. Here's the test that's failing. Okay, we've got logic to think about what went wrong and produced the correct patch. I think you have a question back there. So, Dan, is this separate then from Adam Williamson's open QA efforts and trying to automate all of that, or is that going to be built in with that framework? Are you in need of a hardware? I think it's the same thing, except I think we're using Red Hat resources, he's using Fedor resources. We all have to have that conversation with him, I think. Two different efforts. But they should be complementary. Yes, absolutely. So I was just talking to Pierre before I came in here. He's been working on some of that stuff too, and it sounds like we can integrate all of this into, so they're complementary to each other. I'm sure we can. Yeah. I mean, we're all... So it's kind of a lead-in question, and then my next one is, do I need a name to talk to to make sure that they have hardware that I'm awfully interested in and I'll turn it to bar connectors? Well, I mean, you obviously understand what hardware Red Hat has. So if you want folks like Adam to participate and run on public hardware, that's probably a good question to ask him. Will Adam be running the SKT then? As of today, I haven't had that conversation with him. We haven't gotten that part. Sorry, I'm always staying in contact with port sites. So let's take a step back. There's running a test, so what we're doing, SKT's gonna do on capturing upstream kernel, like netnecks or maybe some alliances trees, so we're catching different Git trees there and then running tests, whereas OpenQA may be taking a standard Red Hat, our Fedora RPM, and running tests on there. So at the end of the day, we're building something and running a test on there, and Adam might be running a similar test on a standard Fedora RPM, so he may never come around to running SKT, but maybe if I had a conversation, it might make sense in his workflow. I have one Fedora requirements for CICB, say that the tests were curated in Fedora. So that will make it probably a slightly different subset of testing. Also the hardware will be slightly different. Well, and there's a different way. The way I'm seeing it is also that, if we do gate, for example, every time you rebase, we take a large test suite and say, no, you can't rebase the current one, because it's gated and it has all these problems. That's not, it's gonna be right, but it's not gonna be productive. So by going all the way upstream, we actually give the feedback to the people who can take action. That's almost a prerequisite to then, maybe even using exactly the same test suite to gate ourselves further downstream, where by that point we'll have less to deal with, less actual gating going on, or another same test suite, than the other person. Yeah, and we do end up with cases where sometimes we do have to push something. So like the 4.12.5 update, when we first filed in 4.12.4, nobody had, apparently nobody had been testing in their life, because there were some pretty major issues that nobody had noticed until then. But then by the time I got the first round of those fix and I was putting 4.12.5 we had to push that, because there were security fixes in there that were actually really critical. So QXL is still broken. There's a worker out for it, so it's not that critical that it's broken, but we had to push it, sorry. Well, actually that goes into one of the questions I have. So SKT is available in the figure now, but it is something that's gonna be running internally, but the results of which will be made, actually. So are those results being posted somewhere right now? Right now it's done internally just to kind of get the pipeline established, but the goal is to kind of run public tests and somehow find a way to put it, publish results publicly, so Jeff can figure that out. And for example, some of the problems we're running too is when we take the next branch and the next branch fails, but it fails because it failed the boot because it's scuzzy. Well, it's not necessarily a net next problem. So we don't want to spam net next with scuzzy values. We don't want to send them to the person that needs them. We do have a way for where we have a front end for the current test system that we're running in Fedora that it takes log files based on kernel build, and then it's got a status update. The status update on the front page is only updated from tests that are submitted from official test systems, not from anybody else, but you can click on a kernel version and you can see everybody who's run that test and submitted results, you can see all the results. I'll tell you who they are, I'm just gonna show you a bunch of results there. There is a way that we could take some sort of a log file summary output from this system and input it directly into there. So you can look at a kernel, and especially if we can find a way to market as this is an SKT results, then you can just look and see. So that web page will be like, it's a very crowdsourced way of getting kernel test results, even if some of which will be people and some of which will be in 30 bots. Right, okay. So then my other question is, I guess this goes back. So let's say we get those results in there. One of the things that has still, it still compounds me a little bit on the Fedora side, and it's probably for, for Justin is when you, when we find out that something is broken, how do we go about fixing those things? Especially if there's gating in place, because there's only two of you. So gating can become problematic there, I said for reasons like I just mentioned, right? QXL, I didn't want to push out broken QXL, but at that point we had had enough build up and there's a little bit of a loop there. So what we did initially, as soon as that showed up, it was posted upstream. We asked for help upstream. We've gotten crickets. So now I'm, I said I can do something about it. Yeah, that's the big problem is that I think we, we really rely on upstream kernel developers who have much more in-depth knowledge of all the subsystems on the Fedora kernel, maintain a role, is kind of a general stroll. And you can only, we, Justin and I, we can only do so much. So we really do rely on that. But again, the problem is that if upstream doesn't respond, then we're kind of stuck with not knowing exactly what to do. So. And I know upstream has no sort of SLA, but what is the common expectation for a bug? Is it usually a month turnaround, six month turnaround week? A lot of times it depends on the severity of the bug. And who is irritating? Yeah, some bugs, we get less than 24 hour turnaround. Sometimes you will post something to the maintainer and they see it and say, oh yeah, hold on, you know exactly where that's coming from and they give us a fix or, oh yeah, that's really serious. Let me see what I can do with that. How many times is that one of our guys? That's holding up the fix? No, that is doing the fix for the 24 hour turnaround. Cause I know that we, I don't know how many upstream maintainers we have with an ad-reader at that company, that was a bunch. Yeah. I would say probably less than half. Really? Okay. It's not just, you know, it really comes to, who's maintaining the code a lot of times, if it's driver stuff, you know, it's whoever's working, I'd go after. I know, Red Hat's got a lot of networking folks. Yeah. Yeah, networking stuff. And firm stuff will be coming from Red Hat, but inside of that, I'd pick it up. And there's some things that we just honestly don't triage, like butterfesses, you know. Right. Do we still have that name? Oh yeah, absolutely. Yeah, Fedora has a very different approach than RHEL in that, you know, we're going to enable as much as we can because people want to use it. And because we don't have to support SLAs, then, you know, if it breaks, you can just ignore it. Right. So, you know, butterfess for a while was, hey, this is great. And then it's kind of gone in a different direction. So, you know, it's enabled and I don't see us turning it off for any particular reason, but we also don't have resources to look at the bugs and the resources upstream are frequently interested in other things, not the types of use case bugs that desktop users are seeing right now. So, you know, the questions? So, I guess, what are, like, in terms of getting, I mean, did, I guess, just to go back to getting results into, what was the thing you were talking about, the results? The kernel test. There is an app called kernel test at Fedora app. So, right now that we've got this other test, it's a quick test that users can run and that we run on everything. So, it does things, checks a few common cases. Unfortunately, every time we ask for community help to submit tests, we usually will get a couple people say, oh, yeah, well, we'd love to do that. We'll write some tests and that's the last we hear of them. It's hard. So, it is, I mean, it's hard and it's not a high priority for a lot of people, but hopefully this will work better with all of that as well. But right now, users have ran that and it's many results and we even actually hand out badges for people who run kernel tests for us, which is more of a motivator than you would think. We have a lot of people running tests just to get badges. And it's cool that they run it on various hardware and because you get increments in those badges, then there are people who test every kernel. And that's fantastic. So, if we can get something better out there, all we have to do though, do, if there's some sort of a summary that comes out of these tests, some sort of a summary log file, then really we would just change, we could change a header and do a slight change to the app and the database to take those results and have them work. So, does that sound like a reasonable endpoint? Yeah, so one of the things I just wanna, people may not know, but specifically the kernel generalist team is contributing a lot of tests to this project called LTP. If you look, you'll see a lot of brand new tests being added by Red Hat Associates. And specifically, this is one of the ways that we're looking at using a test suite, LTP. So, it may not go into like a fedora diskit or an upstream diskit repo, but yet the LTP project is adding, we are adding tests directly to that. And that is one of the test suites that we will run. So, indirectly, you get the test suite. And these are the kinds of ways we can make the fedoras that will help. I'm sure there's a wiki page, or some sort of page upstream that says how do you get involved in fedora kernel testing? We can add, this is how we run tests, we can run with your test suite, we can tell them, explain how to run LTP tests and contribute results and then wrap around LTP to kind of bowl the results. Yeah, that's really, all our test suite is, is it's a number of subdirectories to go through and there's a runtest.sh in that sub directory and it will, it can, there's no reason it can't pull down things like even the NVIDIA driver test we have just to make sure that it's working. Actually, checks to make sure that you have the latest package downloaded. And if it's not downloaded, then it'll go downloaded and then just extracts the kernel pieces to make sure that they build against that kernel. It fails most of the time because we're testing on raw hide and NVIDIA does a once a month release and only supports release kernels. So it fails a lot, but at least we know it's failing and know it's coming. So before, before rebase happens, we can say, this is, this is a concern. Obviously we don't support that driver at all, but we know that people use it. They would be very easy to say, all right, we'll grab all, right now we have some subsets of LTP and it would be easy to say, all right, for people who want to run the more extensive test suite, grab all of LTP and run it. The good thing about LTP is you can create confiles in the confiles is what you can execute with PAN. So we can provide a confile that is the thing when it gets executed and people can grab it. So this is just what we use. But we could throw that file in the test in that test suite. Yep, I've never seen this. Is this difficult to do? As a user, just to run these tests, I mean, you both provide the test. Check out the test and hit run test.sh. Now, if you want to actually submit, there's a config file that you'll have to say, yes, I want to submit as authenticated, because, and then there's a couple of other, like do you want to run it? We don't run the third party module, which right now is just the NVIDIA test by default. But if you change the configs as it does, that'll happen and then we don't see them. So let me ask you this, you know, obviously, and there's a lot of information you probably didn't realize where this is. Where would you think to get this information? Where would be an intuitive place to look? I'd go to the Kernel channel in IRC and ask how I run tests. I don't, you know, I mean, obviously there is a Kernel Wiki page. I use it when I need to remind myself how to properly build a fresh package without having to. But if they had a link for how to test kernels without being useful then, all right, fine. It might already. I didn't ever run to the bottom of it. Oh, is there, oh. There's also with the badge, you know, the actual badge itself wins back to it. No, I hadn't heard of it either. I didn't know that anything. I mean, I knew that there was something in the back of my head that there was something relating to testing, but yeah, just, you know. There's something we should maybe have you guys announced maybe once every six months, hey, is how you. Yeah, I mean, I guess we probably need to re-announce it. We did like sessions on clock with it and there have been posts on the plan. I think I should go out with the beta and GA announcements. It's not going to take up more than a second or more than a sentence or two. I'm just going to go in for more of both tests. Yeah, we can push all the tests we want out there, but if people aren't aware of how easy this could be or how easy we're going to make it then. I mean, I think you could do it. I just think it was easy to do. Right, it's a communication problem. We can fix this, we should go and fix that. Are the tests there known to be disruptive? No, well, all right, so the ones that are in there are not. Mix of answers would be terrible. The ones that are in there are not destructive. We have a directory for destructive option. Yes, there's a disclaimer that says if there's one thing, it's the same item. The stress test. Oh, yeah, the stress test. You have to pass a flag to one destructive test, but there aren't any actual destructive tests. The stress test is different than the actual destructive test. We're asking for obvious reasons. I don't want to get that far. For you guys making the last question for now, because Laura's got a couple other topics and if we get through these, we can come back to it. Yeah, this has been a really good discussion, but I do want to make sure we get a chance to get to other things, so. I work for the Fiverr Q18 and I wanted to say. I'm sorry, the Fiverr Q18. Okay. So I wanted to say if there's a way by which we can have a current test day for release, so just after the mass review and the rain date, could we have a current test day specifically? Yeah, Laura and Jessica. Yeah, I used to run for Fiverr. We can do them. We would want to wait until like at 27, it's going to be based on 413. 414 won't release until after, so we'd want to wait until 413, that's probably, yes. RCC2 is good. RCC7 should be there. Yeah, RCC7, I thought I should have been there, but I think in general the answer is that, yes, this sounds like a good idea. I think we'll just have to find a good time to do so. So that's a good suggestion. Sure. Yeah, I'll set one up, because we've, yeah, we haven't referred all the time to that question. Maybe a suggestion. It sounds like that rawhide is the expected staging area where a kernel will go that is very minimally bootable, and then it's expected that people test it, find the problems with it before it becomes stable. As we get rawhide in better states for all the other packages, that would become more a thing right now or it's just avoiding rawhide. We don't see that now. Heard someone's working on it though. Yeah. Well, actually we test rawhide every night with open QA and I'm sitting here thinking we'll just add these SKT tests or LTP in the anatomy and testing. Good. So the suggestion is that perhaps on the, where we send people to, suggest them where they can engage with kernel testing, we have a static spot that says, here's how much of this testing passes so far, or is there some way of measuring progress? Because if it is green, we can gate on it by the time I guess it's at that point. Is there a subset of the tests that people run and then report bugs in their hardware or some way of measuring progress? Sort of like how the translators have this, yay, we're getting 90% or whatever. So as it's in staging, people can feel a sense of, oh, I've helped make this go all the way over here. It depends on how the test suite works, depends on the architectures, the configurations, and so on, but if there was some way of measuring, okay, we're here, we don't know what it is, we're getting all the way here and we're ready to go. Even if it's just like the charity thermometer we're going to get 1,000 tests in it, like even that would be something else. I think that would actually be better because one of the problems we have is like our automated testing is in VMs. We don't have a lot of hardware that we have access to and so that's why we want users running the tests. Now, it's been a failure in that regard because we don't have any hardware-specific tests at this point where it's very easy to write a test if you have a hardware to check for it, to have certain hardware to check for specific things. If this module exists and it's loaded, then I'm going to run this test. If it doesn't, then I'm not. And that way you can test all sorts of various hardware that way. But the idea is if we get people running it and then we also end up getting tests for specific hardware things, specifically problematic hardware, then you get people with varying devices there running that test, doing, giving feedback. All right, I know Laura's got some controversy about if I cut this off and just revisit the first. This is a two-hour session, too. Yeah, it's a two-hour. I think it's one hour. No, it is a two-hour session, but I wasn't sure how long. I mean, we could definitely come back. I don't want people falling asleep over this test. I mean, how much more do people have when I want to discuss the pod hub is we could probably go another 10, 15 minutes before I'd like to cut it off and maybe have the last hour be on the phone. Mine is actually very analogous to this. So we wanted to hit the non-radio engagement. We touched on it a little bit already. Yeah. I can just go into what I wanted to know there. Yeah, sure, that sounds fine. We've got these tests supposedly in some future where we're now getting results and we're seeing things failing. It's an impossible test for you two to fix them. And so I'm trying to figure out, do we have any sort of kernel community in Fedora that we can engage and get to fixing these things or is that the wrong place entirely? Would we just go straight to upstream somehow and hope that they would fix it? I've thought about this before and for various reasons, I think it's difficult to try and engage the Fedora community to actually get, actually fix the bugs for most generic bugs. I think this is because the Fedora community is best at, I think, reporting and testing on a wide variety of hardware. But I think kernel development in particular is, covers such a wide variety of areas. It's hard to be able to know where to begin, even for Justin and I, who do this as a full-time job, sometimes. It's an easy skill set to acquire. Yes, it certainly is a skill set people can acquire, if they're fit, but again, like you said, it's not easy. And what tends to happen is you can tend to learn how to do things from one particular area. So I think trying to, we wouldn't, I think we're not for lack of bug reports, certainly. But I think, I don't think the further we can start. It's safe to say this by like a handful of people in the community that are not really fixing arbitrary Fedora bugs or more of like, they're finding a piece of hardware they want to enable or fix or get working. And then they spend all the time and push a bunch of patches upstream and want Fedora kernel to backport it. I'm thinking like, Hans to go. We do see that. And we do see, I mean, non-Red Hat people in the community who are testing things all the time and do interact with upstream on the app. It's a handful of people. But there are people who are actually actively making the Fedora kernel better on the community side. We'd like to see more of that. One of the biggest issues is, you know, triage. So we did a couple of years ago, several years ago, Josh wrote up a whole, this is the triage process. And every once in a while, I will have somebody come and sit at their entrance and they'll spend a couple of weeks and do a fantastic job and then they disappear. Because it is the least sexy thing possible. I mean, we don't have a sexy job. I remember you said exactly the same thing two years ago in Rochester. And I went and tried to triage like five bugs. And it's the hardest thing. It really is. Yeah, I mean, it's just not easy. I mean, so. So even, I wanted to double-check on this assumption on this, but even if someone did come in, find a bug that was open in Fedora and submit a patch, like your charter is pretty much that that patch is going to have to be accepted upstream at some point. So at that point, either you have to do it or the patch just sits there and merge because you're not gonna merge it if it's not gonna be upstream as well, right? We're not gonna merge something that is not going to make it upstream. But bug fixes, typically they're submitted upstream. Well, they will do one or two things. Either they will accept that and fix the bug and it'll show up in a tree. Unfortunately, because of some weird things in the way that the upstream maintenance works, it might not show up in a real tree until the next merge window and then get back ported to a stable. We know where it's going at that point and we're happy to take the patch. The other thing that happens is you'll submit a fix and yes, it fixes an actual problem, but the maintainer says, I don't like the way that this is fixed, I would rather do it this way or I had something else in mind. And in that case, a lot of times we'll still take the patch because what they're doing is actually a rework that's something that's not gonna be back ported to stable but we take the patch but this is going to be fixed and move it a different way next merge window. As long as it's something that's going to be going upstream, that's fine. I mean, that's been my experience too. I have worked through relatively recently a couple of annoying bugs, one in NFS and one with doing some weird scuzzy control and tape libraries that got broken. And so I had never really done this stuff before and I learned how to do, I mean, I didn't fix the bugs myself but I learned to get enough information that I could get with upstream. In the NFS bug case, it was already fixed in one of the trees that was destined for the next merge window and that just had to filter its way back down. And in the other case, I mean, I found where it took me three days to bisect it. Server hardware is terrible to reboot constantly but eventually I found the commit that broke it and traced that back to upstream is actually once you have all the git information it's not hard to find these things. It's just there's so much procedure and so many kernel recompiles it takes so long to do. And I think- It's fun to have a tool to help you with that, right? Yeah, well, I mean, perhaps, I don't know. I mean, there was a continuous integration for abstract, somebody, the DeLorean thing somebody was talking about yesterday that basically gives you at every commit point a downloadable package that you can then test so you don't, as a user, just wanting to test these things out. You can bisect by RPM, not by, I mean, for what we do in kernel, it would just be impossible. But I pull the daily git builds out of raw hide and it gives me enough information to get two bisect points so that I only had to compile my own kernel. And your tape thread, because of the work you did in working with this, or the SCSI issue with the tape thread, because of the work you did in working with the maintainers upstream and getting that fixed, we had to fix that before we even tried to push the rebase. Right. So you saved anybody else from ever seeing that problem? But not that anybody would have. They may or may not have. Yeah, could have been. We don't know what hardware people have. Unfortunately, there used to be a tool that would collect that information you could opt into, and that went away, and there was supposed to be a replacement. Someone talked about it at BudCon seven years ago. That was SMOLT, I'm into SMOLT. I'll take some playing for that, but then I left to do OpenShift, so SMOLT went away. Sorry about that. But there's been a replacement, so a lot of these things, it is hard for us to say what everybody's running on, because unless someone files a bug or yells, we don't really know, other than through this little bitty check-in thing for Arch that is sometimes. LSHW, we've been using that for Beaker now. We use that LSHW to scan our machines to figure out what hardware's on there. We uploaded to our database, so as a curl developer, you want to test them. Figuring how to do that, though, in a way that goes through our proxy guidelines, though. Yes. Well, if it's opt-in, somebody will run it, but somebody will run it. Especially if I've got the weird hardware that I want to make sure doesn't disappear, then I have an incentive to run it, as long as I know it exists. I did run SMOLT, but... No, we do get... We can look at the AVRT reports, the tracebacks and say, all right, well, yes, a lot of people are running this hardware because it's got bugs, but... I mean, I've found AVRT to be noisy for the kernel in particular. This is not to say the AVRT is not a useful tool, but I think for the way the kernel ends up going, it's by the time we actually collect enough data points to figure out exactly if one of these reports is something relevant we should fix, then we're mostly on to a new kernel or it may have already been fixed in the latest version. AVRT is actually a very interesting question, because the thing I was wondering about, it seems to me, if we add all this automation, up until you mentioned AVRT, it felt like a lot of the data that we got back was very artisanal. It's just kind of like somebody's, somebody's got the weird hardware I'm gonna run, my kernel test suite, if I happen to know about it, or I'll log in and open a bug, whereas if we're gonna be starting, generating a bunch of tests internally and other CI tests externally, my concern was, I assume that the impact of that is that many more bugs are gonna be opened upstream as well, so if I'm involved with Red Hat at all, but I'm a kernel maintainer, presumably the odds of my seeing some sort of bug from Fortora somehow is increased, like we're gonna find more bugs. But then you mentioned AVRT, it seems like, is AVRT just generating noise at this point, or is it actually, do you guys follow the dozen file bugs upstream? Would you even know what to file? I don't remember what you're doing. There are some filters, there's some things that we can filter out. A lot of the ones you'll see though, are things like, out of pre-drivers, things like that, that they're useless. Well, there's things stuff that are automatic. I was trying to follow it for a while, but I gave up because I was not finding bugs where I could actually take action on. Sometimes what happens is that, because the kernel doesn't move so quickly, we'd find bugs, people are still running versions that were still out of date, but AVRT was still reporting them. We got, a common one was graphics backtraces. Graphics is handled by the dedicated Red Hat team, that's a very good job, but they also have their own stuff, so sometimes those bugs can stick around for a little longer. So I was not finding it a good use of my time to try and follow it. Now I haven't actually checked on it in a while, so maybe things have changed, but it's... Not, yeah, they haven't changed that much. Okay, yeah. Do you think the process that Donna's proposing, do you think it's going to generate a similar amount of noise to AVRT, or do you think it's gonna be a higher quality set of information? I think it's going to be higher quality. AVRT tends to grab any sort of warning, any that comes up, which is okay, but on the other hand, this means that you're... We tend to get warnings for things like, Wi-Fi driver timed out because it's got a mismatch firmware, your graphics driver timed out because something, whatever, whereas I'd say the tests that have been talked about before have been vetted, and so therefore I think they're likely to generate something we can actually take action on. We actually even have that. We're curing a couple of patches in the kernel, and it doesn't fix all of them for essentially turning off warnings that upstream has said, we want to keep those warnings on because we want to know that this is happening, but the end user doesn't need to know and doesn't care because it doesn't actually impact their day-to-day. But it does show up in AVRT. Yeah, it shows up in AVRT. So we silenced a few of those, but that's only a couple of them. So, yeah, it's expected behavior that the developers are aware of it. They don't want the warnings to go away. The user wouldn't even know about it if it weren't for AVRT, right? I think you have hit on a problem though in terms of things we see in the kernels is that typically if anyone sees anything that just says error in the kernel, they want to report it, which the problem is that sometimes there's our innocuous errors. I think a good example is they did an upgrade of the ACPI table parsing code and that's been generating some warnings. And I think this is because there's a slight mismatch between the ACPI table parsing code and whatever ACPI tables there are. So this is not something that actually causes a problem, but because users see it as a warning, they want to report it. That's the only value that sticks because we see that on the Rails side as well. We get hardware partners who have actual money spent dealing with customers who report that as a bug that they're seeing and they're thinking it's causing their problem somewhere and it's not. But they think it is, it costs money to deal with. You know, it's the amount of the number per month kernel-related AVRT coming into Bugzilla. They come into Bugzilla, across all releases probably just under 100 on Bugzilla. Now, the thing is, AVRT is smart enough to not open a bug on one that's already opened a bug around. So the AVRT reports going up into the server are a lot more. So I was just thinking, I guess if it does, I've looked at it, it does have some kind of machine learning techniques there already. But this is an area where we could apply some unsupervised classification. So you get a highlight like a big thing that people are running into. And if you dismiss it, which is that, it doesn't really matter in the future, it's a warning, it's a thing. We start actually teaching machines that go through the AVRT stuff and actually make it value. So there's a problem with that though. Because you're assuming that people won't be running somewhere, I've been in situation where people are running in labs and are running 100 of the same machines and they all generate that same warning and that's gonna offset your data. So as you, one of the interesting things that you do is you teach, you have to obviously send up so you teach it, I don't care about these kind of warnings. Stuff that's classified there that starts to learn about that and starts to see if you see a big jump all of a sudden, 100 instances of this, then you start to, okay, maybe that's worth investigating. You get some actual feedback. You can actually use that data. Right now it just seems like useless. All the same, a warning is treated the same as a crash on an oops and blah, blah, blah. Whereas if we actually teach, put it into something like that, a structure like that. Now the amount of data that we're talking about, let's say 100 bugs, that was up to 100 different unique things per month and I don't know if you bump that up in order of magnitude to say the number of instances. That's low for machine learning, I would say. So there's a challenge there. But yeah, especially if we're able to take advantage of more data like that, we can definitely make something happen where. Well, so that's, it becomes useful. That's the difference between bugzilla though and the Retray server, the actual server that plugs the ports. I can see quantities there. Okay. So if bugzilla has this once, and actually it's having 7,000 times in the past week, it's got reported once in a month. What percentage of the ABRT reports are actual real bugs versus what I consider bogus warnings? Do you even know? We have. Just off the top of your head. It's not an actual, I'm not holding you to a real number but just eyeball it. I honestly have no idea because I haven't looked at it in a while. You may have looked at it more recently. I look at them. It's very hard to quantify though for two reasons. One is you'll have a user that files the bug because ABRT prompts them to do so and then they disappear. So we don't know if they ever saw it again or if it was just a. Let me ask you this in a different way. We have a lot of splats of warnings that come out of the kernel because bad bot it will report right in there, bad BIOS. I'm asking about those. Those not even a situation for you anymore. You guys see it says, oh this is because you're sitting on a bad bot so you should just. Yeah we tend to ignore those. Okay. We have to figure it out. I didn't know if that was causing you pain. That is some of that. I'd probably say there was a while when we were getting a lot of warnings which is, which even had the warning said, your BIOS is big enough to fix it. I know the exact warning you're talking about. And then I submitted a patch basically saying this is more or less saying this is annoying me. Let's just turn it into a print day and maintain our acceptance of what a way. I've done that more than a few times now. The other side of that though is we have, so there's a bug right now in the way that we were parsing. Well, I think the bug is actually in the front. It's all from where? From AMI that is running on laptops with hyper graphics. And what happens you get into in the way power state works. You can do an LSPCI and crash machine. Remind me about this bug. Remind me about how it looks about that. Actually, Peter, do you know about this bug? Yeah, I mean, you told me about it yesterday. Yeah, great. Well, I found out about it yesterday because we had a laptop that was locking up and the guy didn't know why. And he said it went away when I switched to NVIDIA binary drivers. And what do you do? He had his laptop forced to use NVIDIA only at that point. Well, since then it disabled the INE 15. He didn't have hyper graphics anymore. So he spent a little bit of time debugging and started looking and there's a bug on kernel.org, how do we handle this? And the problem is it's not just this one HP laptop. It's HP, ASUS, Dell, MSI, and a few other models. And the workaround that works for different people, which was the ACPI override, 2009 was the right fix for this particular laptop when it was 2009. For others, it's just not 15. Gotcha. I get it. But it's all based on the same bug and it just depends on how it was implemented inside that particular firmware. So I kind of want to draw a picture real quick. Is it? Go ahead. We're dealing with a, I just realized we're dealing with a funnel. So I'm going to draw a funnel. I'll go back to this one. So as it is, we've got something that looks roughly like this that comes in. And it kind of slinks down at the bottom. And basically somewhere in here is where upstream happens. Somewhere in here is where the Fedora kernel is. Somewhere up here is just the reported bugs that come in. So it's like the, just bugs a little bit. And then up here we've got a few different classes of things that are going on. One is the artisanal reports that come in. Somebody report, a human being reports a bug, bespoke reports a bug in there. We have ABRT, which is sort of a little bot. Reports bugs here. This is largely being ignored right now, it sounds like, because it's just very low quality bugs. There's not much you can do. I mean, sometimes. If it's something that a lot of people are hitting and we ask for more information and they respond, then they will get looked at, but a lot of times they don't even respond. I mean, it may be useful for statistical analysis or something. It's just a fire hose. If one month you've suddenly got an order of magnitude increase in ABRT things, then that's kind of like an all-hands-on thing. So then, I guess we have two others in here. One is this new pipeline that we're talking about. New, you said LPT. I don't really know what to call this, though. Like the, I guess I'll say SKT, inputs, and then includes the LPT test and other stuff too. It's going to be only a few years. And then this one was the, you already have, what's the existing kernel test suite? Kernel test suite, okay. And it's going in here. So at the very start, at the top of the funnel, we have all of this, we have a lot of stuff coming in. And depending on what happens here and here and here even, this probably seems like a much smaller number to me, like an actual physical person, but this is also the highest quality. Actually, I think you're going to see a much larger number there still, ignoring ABRT, compared to the automated test suites. And the reason is the automated test suites are not running on near the variety of hardware that end users are. Right? There are thousands of models of laptops out there, different models of computers that, they can have anything from just root bias configurations to some sort of flaky hardware. We realized that one time a few years ago that there was a whole class of laptops that was made to hit a certain price point, that after like six months of use, they couldn't keep up with the thermal demands anymore because they, That could be a field in Bugzilla. How much did you spend on your laptop? But it's, seriously, we finally realized it came down to just these few models and it was thermal issues. Sorry. Yeah, and how much human time was spent thinking that? A lot. I'm sure, yeah. So, and that's kind of where I'm getting into because this is a mix of, this is human input, but it's not us necessarily. This isn't coming from us. So we've got SKT, which in theory over time, this probably grows a little bit as we add more tests. I assume we're gonna find more bones. This one here also grows over time as we get more users and as does this. And so the funnel gets bigger and my big concern is what happens basically in between here. Cause we've got two people right now who are gonna triage all of this stuff. This goes back, this really goes back to the triage question. Cause if the goal of this, if very, very few bugs are gonna be fixed here, most of them are gonna make it all the way to upstream, hopefully get fixed by somebody else in a reasonable amount of time and then they get sent out to the world. And so my question is, if we greatly increase automation here, that seems like it's gonna also greatly increase the amount of bugs that get tested here. And so like what, I guess the question then goes to me, it seems like we're gonna have a triage problem and we may already have that today. So I think one of the things is when you have upstream down the bottom, SKT is going to also feed in directly there. SKT will bypass the whole funnel and go straight to upstream. It's gonna do that. Yes, and we should be, the closer we get to the change when the change happens, the more likelihood we are getting to have the developer that introduced that change to fix the problem. By the time he commits it upstream, a month goes by and it's merged in and all of a sudden there's an issue and he's on to something else. It's a context in which it's a problem. So as soon as we get the immediate feedback, as soon as we can give him that feedback, the more she's willing to fix the problem. So then at least- So it has to solve the funnel problem, right? Well, it solves parts of it because in theory if, I mean, maybe we can, we haven't fully connected the dots between SKT and an actual bugzilla problem. We've got SKT going into some sort of result set that will be reviewed. And it feels like at a minimum, if that results it's also going here, it's mostly a matter of waiting for him to fix it. And then- Right, so the goal is to when SKT is run because it's run on Linux's tree and it'll also be run on next trees before they can get to Linux's tree. The idea is, yes, SKT might be generating more data, but the number from people, your artisanal bugs and your AVRT bugs should actually go down considerably because those bugs never actually make it to a shipping pure problem. But and the other thing is I think the bugs that we will get will be much more actionable and concrete just because, and those will be able to get more attention from keeping just an arrive or upstream for some reason they do make it further and we don't catch it then. Like I said, if someone manages to do a bisector has a commit that makes it a lot easier to look at things and be able to fix them down. I mean, I don't know all the problems, but sometimes I can make a best guess effort of what might be going wrong. No. Okay, so then then that may be a less of an issue. And then just another question. So this sounds like garbage to me just in terms of how we talk about it. Is there any sort of time limit on like, do these auto close at some point, AVRT bugs? When they're released in the last, is it? Well, no, actually I think that back. So when we do a rebase, we usually put out a mass bug filing to everybody. Say we rebase the kernel, is this issue still a problem for you? And we said everything to keep them. And if there's no response to that at all within two or three weeks, then the bugs are supposed to be there. Okay. So because we do get a ton of people who file AVRT bugs because they were prompted to do so, but then they don't even, I don't even know that they get the emails asking for further information. And the reason I ask this, because it feels like you two get involved basically right here is where you guys are. And so the smaller we can make this, obviously the better, but bugs are bugs. So the next best thing is the highest quality information that we can feed you guys the better. And so if you're gonna ignore, I guess my big question was if you ignore AVRT just because there's only two of you, then do they go away at some point? Or does that mean that they just sit there and you have to, every time you pull the bug with stuff you have to like ignore them again? Because that's work too. No, they go away, but I also do most of my bugs I'll handle into an email and not through. I look at bugs that have been given updates in my email, filtered for whatever release like right now, because I'm doing the 412 stuff, but that's on that 25 and that 26. So I'm looking at 25 and that 26, and then security and whatever's got unread that I need to look at. Very rarely do I have time to go back and say, well I'm just gonna crawl a bugzilla and see what's available. See what needs to be looked at, because I've hopefully taken a look at it when it came in. Yeah, okay, that makes sense. Doesn't that also mean you can't go on vacation? Yeah, I went on vacation twice and then you spend three days catching up. Okay, that counts. So, and then I guess my final question before I sit down, I guess, because I'm confused. So does SKT, is that gonna submit bug reports directly upstream or is there some intermediary there too that just reviews the results? A bug reports percent were, I think it's more like patch feedback. It's like we're doing a patch, like hey, we noticed your patch failed this test for the public test for the failure we got. Up to, do I do something? I think that's up to the maintainer who wants to accept the patch. We're hoping that if we get it early enough to submit maintenance, I'm rejecting this patch until it fixes. Yep, okay. And I would provide incentive for it to fix. Okay, that makes sense. I mean, further on with the human entering bugs that it's always been the CICD or QE in general, the theory is that, you know, with humans, if it's a bug, you look at the bug, like wow, we should have this test case in our test suite, things like any test suite open that over time. You catch these issues early on but there's less human reported bugs because. Yeah, but a lot of those human reported bugs are on, hey, I built this computer out of spare parts and something doesn't. These guys are the slow-churned butter bugs. That's what I got. Yeah, I love putting up that. I'd say it's like fair parts these days. Is that I bought this laptop and the manufacturer chose to use this really weird rated setup where she has to have a very questionable BIOS. Right. I mean, and sometimes there's legitimate bugs there. Oh, I'm not just speaking again. They're only triggered by, you can't say that the BIOS is to blame for every bug that originates in BIOS. Maybe we should be handling some of those, right? Actually, I can right now. Every bug originates in BIOS. I just said, you know. But I mean, yeah, but we should be handling some of those things. No, no, no, I completely agree. But you know, I always say that part of this bug reporting and triaging is teaching people to report better bugs. How often do we do that? We do that all the time. But the problem with teaching though is that, honestly, teaching itself is a little bit hard to do in it. It doesn't scale when they're only two maintainers. Yeah, I understand that. We should do that. I mean, it's not that it's not valuable, but I'm saying it takes up time and energy. Well, we have community members who are willing to put in work. We are happy to help you do that. It's gotten two bucks fixed from 4.12 with that, right? So, yeah, if community people are showing interest in putting effort, we're happy to take some time to help you do that. But we don't have time to go through every bug. I said this years ago to the AVRT guys. One thing I'd love to see attached to those AVRT reports for the kernel and some list of the hardware. So we can compare it to what we have and figure out what the hell's gone on. I would love to get that. You just have it run SOS. I'm not sure we need SOS. I'm wondering if we just need LS hardware or something like that. Did you get pushback? Yeah, it was years ago. I remember when AVRT first came in and it was actually reported that like we thought it was SOS report. Jeff, do you remember that? We were trying to figure out where the hell these things were coming from and we went to the SOS report guys and they're like we have nothing to do with this and then we discovered it was this whole new project. Doing this. We at least got them to be able to pin down the warning because at one point they were just printing out one line of the panic. It was pretty abysmal. Alright. So at that time, this was again years ago maybe I should ask again and say would you guys consider making this a part of AVRT? Do it CC me on it. I mean the worst thing they're going to say is no, we think that's too invasive or something. It seems to be pretty interesting and engagement forcing. Yeah, that sounds good. Again, this was five, seven years ago. We just need a quick hardware of us and we need you to turn on the camera and take a quick picture. Not the major. Well, really easy. You could also in the kernel test we would say would you like to report your hardware? The problem is we do have users who are using pre-release hardware because they're doing development stuff How do you handle that? Well, it's part of their, I mean, AVRT still prompts. It's still part of their board and those guys ought to know better. And if they don't, that's on them. Here's one thing that I mean doing the tiny amount of kernel bug triage there's no standard format for people to use to report the hardware they have. I think Ubuntu has something like that. If you're going to report a kernel bug please attach the output of this thing. We don't have anything like that that we can even ask somebody to run. I mean I'm sure it exists as a tool. We have the tool. We have tools that will collect information. Is there a particular thing that someone is expected to type and attach when they, and attach the output of when they submit a kernel? No. Maybe there should be. The problem is to give a default command we'd be collecting way more information than we would typically need on a typical client. It's better to have it and not need it to have it. Especially if it's not pieced in the boat but it's attached. I disagree with that. No, but users will complain that we're asking for too much. Oh, I don't have to provide it. Yeah, that's true. One thing that I actually used for is to be able to tell how broadly a certain kernel was tested with the hardware that we built. So you can tell, did this get tested for the same thing or on a whole bunch of stuff? If we had that as a reusable tool not just for bug reports but for them for the test suite to also include in the results, then... And it was a consistent identifier, at least for a given release. You can kind of tell, okay, this is pretty well tested over 1,500 different configurations and so on. If I'm going to the trouble of dealing with our boat zero and waiting for it to come up and typing in and going through the six steps it takes to get to the point where I'm submitting the bug. He's making us agree. Then, surely well, two things. One is if it's asking me for all the information that someone's going to ask me so there's no more roundtrend. He's like, I gave you everything. Right. But also, I've shown that I'm willing to give you information. Obviously. I've given you my name. What PCI cards I have plugged in is not probably going to bother anybody. The only thing that sucks about it is that there might not be a worse communication medium than Mozilla. That's the other part. Mozilla is a communication mechanism at all. It's a storage of data. Yes, it is a storage. It's just wrong. This has been a good discussion but I think there was one more topic about arbitrary branching. So I'm going to say arbitrary branching, maybe come back to have the last 10, 15 minutes be a wrap up so we can get maybe a summary of what we've talked about to do a readout tomorrow. I figured the answer to that is simply no and then you move on. Can you give us just a background? I want a background. I don't know about that. I can give a small background on it just because I was in the pesco when we reviewed the ticket and approved it. As things have now moved from package DB to one of the drivers for that is the ability to create an arbitrary branch meaning a branch that is not F26, F1, F7, F28 they can be anything. The benefit to that has completely to do with modularity and how they do their builds. I do not see any benefit on the kernel side of things at all to that. We do have an arbitrary branch called stabilization that we use but we haven't seen you couldn't do builds off of it because you wouldn't build off of it. You could generate an SRPM and do scratch builds or you could build off of the covers. I don't see any reason to be using it. Part of the reason why you don't need to is because I can take a raw high kernel R23 machine and it works. Yes and no. So you can but if you have to build an 100 tree model or something you've got to do probably you can't do it. Things like that. When Laura talked about VZ's you generally say and such and so laptop did this weird thing but when Donna and Cara talked about VZ's they're probably thinking about some normally server that has work balls hanging off of it. Is there like room for a server kernel versus desktop kernel? Is that a use case? So there's two big components I think that differentiate a server kernel from that. One is desktop kernel supporting a much wider variety of hardware so we have a ton of models. You know what doesn't take you up a small amount of disk space and servers don't care all that much. The ones who do are the people doing things like BERT and stuff like that and that's what we have kernel core which should boot on any BERT system and then we have kernel modules which is basically the drivers for common hardware and some even not so common hardware and then we have kernel modules and extra which is we're not sure that anybody actually uses this other than one or two people who ask if you turn off. So really one kernel can kind of serve those needs. The other thing that you see different is server is always going to prefer performance over power management. Workstation tends to prefer power management. And one of the things that we're planning on doing is going to and kind of auditing all of those I think a lot of them can be changed or run time. Yes, there are kernel config options and we choose a default but there are things that can be able to run time and maybe one of the things that we take to the server so you can say consider making these changes for the server spin so that this works great. Also I would strongly caution against assuming that servers are going to prefer performance over power management and that stops a lot and stops a heart. These days pretty much everything is going to prefer power management. Unless you just have an unlimited power budget. Historically in the case at this point you're building full data centers with that in mind. Even now it just isn't what it used to be. Everything pretty much wants the same and weird hardware is not exclusive to desktops or servers. I mean once you get down to 4800 you get Ethernet cards or something. That's still pretty weird. Either way it keeps it I don't see much of a driver to build a separate kind of but I think that we can kind of review big options and perhaps figure out what is suitable at runtime that would be... I have this scenario to see if we can fit into the model of no arbitrary running. That's the case. Fair. Let's just see if it works. It seems like there's a stabilization model for the kernel. Whereas for everything else we're targeting and gating model. We may after the fact send a bug saying you really broke something out of performance or system D totally ruined the system that's been up for a thousand hours whatever. That's hard to gait but most of the stuff we're going to gait. Looks like the kernel is going to be hard to gait. So what if the scenario is a lot of what is. What if the atomic host says we know the new kernel is not fit for purpose in the cloud or on DigitalOcean or AWS or another cloud. We're not going to ship that kernel until this changes. So we're at this point where they're going to essentially gait it. They're likely going to use a module to gait it. They don't need an arbitrary branch to do that. They can refer to an older disk commit. That's what modules do. Use some mechanism to ship the older kernel. The modular is who uses arbitrary branch. Right. But in addition to modules they can refer to an older committed disk. The kernel doesn't have to have arbitrary branching for modules to do a little bit of their magic. So now you have a situation where for some time unknown period of time, hopefully not that long, but other period of time it essentially has branched where we're shipping in a release of the door atomic host a kernel that you guys know about the really thing. It has essentially branched in some way. And one could imagine that a security update comes and you're like, where do we do this? What do we do? Do we fix that security update for this kernel that's not going on on the atomic host? They haven't sort of jumped in. The other one's not stable enough. I'm not saying all of these are givens. Let's use that as an example to play around and make sure we have a good answer to it. A couple things to bear. When you say splice, are you referring to case splice, cache technology or are you just using that as an example? But I meant cache. So the idea is that you would have to build and reboot. Mostly because I'm having a hard time imagining this as branching just because you're making it sound like, well, we're just going to pick up an older kernel and that's not really branching that's using an older kernel and then somehow expecting it to be supported. So the other side of that was there needs to be a cv fix on top of that. I mean this is that whole worst case scenario. This is your Fedora support model versus your role support model. That's kind of the way it works because otherwise, if we start maintaining arbitrary branches for all this there's two of us. Yeah, it's not going to happen. I guess the question is would it ever be possible to have them all such that other people somehow have their own model? That was the question I asked. How many people need to support multiple branches? Well, I mean for this purpose ideally it would be a person for branch. The problem is we have three people having a three person rotation just because you did get a small one but work on the testing initiative it really helps for sanity. That's the second most disturbing thing I've seen. I would like to see another branch probably, another person for branch. My problem is I don't still don't know from a user perspective how this arbitrary branching is going to work but you mentioned earlier that obviously no one tested the raw hide kernel because, well, there's a lot of times when I would test the raw hide kernel but I don't want to test raw hide. Now I know that I can pull the raw hide kernel manually but you're one of the people who does test the raw hide kernel frequently and these are not problems that these problems showed up in a rebase on 26. Sure. So it was that kernel on 26 that showed up. What I'm saying is we don't have the breadth of it. I mean, people who are building their own modules and securely assigning them and throwing those keys in UFI it's probably a very small number. Yes. But it would be nice to at least test it. It's just if there was an easy way for me to run my FDOR 26 otherwise released machine to follow the kernel that you'd like me to be testing. Right. I would do that. I have no problem with that. I mean, I can always just reboot. There is a way to do that. You can go into your young config and enable the raw hide repo or if you really want to use the raw hide node bug repo and you can, raw hide node bug is kernel only anyway so that's all you get. Even if you want to do all the raw hide though kernel only. And it will exclude every other package. You do a DNF update and it will grab kernel from raw hide provided it's newer. Or if for some reason it were older it would still grab it from the FDOR 26 but it won't look for any other package there. But didn't you say earlier that there might have been problems with tools aim mismatches though? There can be problems with tools aim mismatches if you're... If you want to build out a tree box and those raw hide kernels are signed in all of that mess I mean it is... So the raw hide node bug kernels are not signed? Because that comes from a separate repository. They're not signed, they're COGIS crash builds. The actual raw hide kernels are signed in their secure root type. But they have the bug? No, the bug is... I fixed that bug. They have debugging symbols and all that? They do have debugging symbols but it was like a year and a half ago to figure out what was making the bug kernels so slow and we did turn off a couple of options and speed it up. So they are slower but they're maybe... It's not like you used to be worried it was almost a new speed. I'm sort of seeing... We have a lot of problems with our testing it's not just a kernel. It would be really nice if there was one place someone could go to deal with testing of their release. If there was a topic here if there was a place there should be a check box for you. Give me the weirdest kernel possible and let me see how it works. We did just get a suggestion asking us to do a Fedora test day for kernel. Just like they do test days for several other things so I'm going to do that. I'll see if there's time to schedule for app 27 but certainly if not for app 27 we could even we might even be able to do that on rebates. If we do the week before rebates I create images for people or you can download their kernel from this repository if you want to run it on your own. So that's something we could do to maybe get more people actively testing every major release because this one wants every three months. Would it be helpful for the copyright for the congested kernel stuff? Without having to have a raw hide and the next good thing and Well we've got a raw hide node but copper won't sign anyway so there's no difference between raw hide node and copper. The other thing copper doesn't build as many architectures as doing Kodi that way and copper actually has I don't know why they will not change it for us their timeout value is exceptionally low like 664 686 won't fail because it times out after 6 hours. There's just no timeout. But in response to what you were saying to me the most logical thing there is we have an issue that is preventing a kernel from working on a kernel that shouldn't have to sit down that should be something that gets addressed before then. I understand that we've got an issue that's hitting one or two users somewhere because they have arbitrary hardware but things are affecting a lot of users even this QXL bug has gone on all the time and I wanted to part of that is because we've had so many other things at the same time that I was walked upstream and let me see if they're going to do anything. So the mismatch we'll just have to find a solution to this pragmatic one. For this match it's very likely for some of those delivery mechanisms they will date at that point. For example if it breaks open shift and it's a container oriented host and it breaks open shift then they'll date and keep the new old one so then there's suddenly a window and you're right it's important to fix it there's not really a window where the security of it comes where we don't really have an answer where it's scrambled. The other thing is the number of CBEs that come into the kernel are very large actually multiple every week but the number of those CBEs that actually would impact is usually with X weird driver well it's weird driver there was like one I patched yesterday I committed yesterday but it is a you can crash the system by doing this to the driver that you shouldn't be able to do you have to be rude to do it if I'm rude I can crash the system no matter what so it doesn't matter I understand the way the security research community has gotten but really that's not a critical fix right. The route update that caused me to push 12.5 with what I consider a fairly major route that was a local route that's something that yes we needed to push it's been months since we've seen one of those something about that scale well usually they get their own website marketing and t-shirts and everything well those are more likely the remote route but yeah but these days local route is the local route because we have a membership that's not what it's those are treated with absolute priority right because yeah but still but those are few in part between those happen and what I would like to know is sometimes which means somebody knows about them before we find out about them and I'm kind of wondering if there's a way that we could find out about them to make sure that we're ready on day one and if we know this is coming maybe we're not going to push a security base a couple days before because we know that this is coming and we're going to want to do one last build because our process is long well the the process is around some of that security stuff we're on some very close list and when I was at our path I was actually on most of this so you should you should discuss with we have security people on that list it's just a matter of making sure that we get another plan there's a thing I don't know there's no problem telling you and in general just pragmatically speaking there will be a problem if they fix it before well does well a long time you could have fixed it before well does anyway just because of how quickly we move through our curiosity however that fix might also but that fix might also include some regressions yeah the fedora fix is like the fedora fix is there but it's also part of the stable update that's got 130 some other patches that haven't been through anymore a good discussion and I think we'll probably open up the floor for the last time but I just want to clarify did you have further questions about the config process I think it got touched on a little bit but we did kind of gloss over it oh no just that it was best effort and if you don't like them then I would love it I did want to clarify it sounds like there's no explicit relationship between the fedora config and the row config and that's intentional but at the same time there's nothing preventing any changes from happening to the fedora config if we can talk to you guys first no in fact that's one of the things we've been talking about is how can we leverage some of that because there's two of us there are a whole lot more row kernel engineers and we have people with specializations in all of these different areas when we make a best-guest effort we would love it if somebody says actually if you do this you're going to do it I know for sure there are a couple of configs where I talk about how the engineers have why do they have this stupid setting the problem is that even for me where they don't know how to tell you maybe you should say this this way what would be the best way for you to get that feedback is it a bugzilla for those types of things or for the work on melanus I don't want to put it up that would be okay absolutely we would like to see it that's the kind of things that probably should be usually polygons for all height and then they would trickle down and rebase and there are certain things that we can some can pick up and actually will not trickle down this is like kind of a perverse thing with kind of engineers that don't have the incentive to send you thoughts with them because they're using all kernels well that's why I mentioned that if you just send an email if that gets us feedback then yeah we'll take the feedback because we want to do what's best but during the merge window there's sometimes dozens every day for like two weeks of new config options we have to I'm talking about all config options that are new right so that's the problem we take them when they come in but then there's something that goes in reviews and then maybe this although it should change and I'll be across whatever every once in a while that stuff got reviewed once or so we've got drivers that we're supporting now or it just doesn't exist well it exists nobody will appear on the mailing list and tell you it exists one way to know whether it exists or not that's what we had done in the past there have been passes where hey I'm turning off a whole bunch of stuff and let's see your complaints yeah it happens in rawhide it happens too when the rebase gets pushed out to testing for those types of things what we typically do is just turn them off in rawhide when other kernels get rebased to that version they don't carry those config changes but when we do f28 then all of the testing that is testing without those modules that says hey this worked in 27 but it doesn't work in 28 I'm not going because somebody noticed but then you have a whole bunch that got through with nobody noticed and nobody brought it up we still have another half hour so if people want to talk about go back to any other topic about the testing or stuff that can just be open for for whatever we've got to report something back tomorrow right do you have what you need for that I mean if people want to work on that now sure I can I mean Friday is supposed to be like a readout of today's session so as far as kernel testing goes I think it was mostly report feedback I think it heard what I heard was that people were generally happy with the approach proposed about bringing in the new kernel test suite to be able to run hopefully this will be able to get it running on more more hardware and more people to be able to get that feedback and then that kind of ties into the non-read head engagement in terms of just the bugzilla feedback loop where I don't think we necessarily got a lot of action but I think it sounds like that the hope is that the testing will give us higher quality bugzillas and be able to put that in and there's also the really good idea about arranging a Fedora kernel test that we're going to be able to do today as well so that's one of the big things that came out with for arbitrary branching I think the conclusion is no the kernel is not going to do it it's just too much work for us I really think if somebody were to provide a very valid argument on why we should do one just like out of tree branches or out of tree patch sets our default is no it's not going upstream very soon we're not going to carry it we're going to be able to do that really chip the latest kernel and it's an artifact I have one place which is AWS so kernel core gets rid of most of the unnecessary this is a good discussion I just want to clarify is there anything else anyone would like me to highlight in the summary I would like to see it highlighted that had good resources to testing Fedora kind of awesome yes that's a very good point I'll make sure I make that I will say the one thing I did not know coming in is that there's still they're magnificent people there is only two of them I will stop trying to run the show then and let you let y'all talk my only question was probably for for Dom I think he stepped out do you know when the SKT stuff is sounds like we've got internal results do you know when they'll be external the problem we don't want to do is spam people we're still in the testing phase maybe the way to say it to find out what the emails need to look like do they have all the information so we're just going to send them internal for a while and then but just in a ballpark should I expect if I don't have anything for should I expect weeks, months, a year so I'll give you status in the roll-ups but it should be weeks not months a lot of the details around this is scrubbing the logs you don't want to send everything into a causal log I did have another question for Justin and Laura Laura stepped up so with I'm going back to the funnel with the triage if you guys had like a very low level almost an intern level person would that help you or would they get in your way there would probably help or triage type stuff that would help immensely I'm not saying that triage is trivial but it doesn't require a high level we don't need a consulting engineer no we don't need but it is very time consuming and very thinkless so the problem is if we had an intern to do it, we're going to go through interns pretty quickly you're basically asking people to give more information and they're like going down to the okay well this is an ATI government so they get assigned to and make sure the right people get their CC'd on them he's getting coffee though I would never turn down in terms of triage I mean part of my first job right out of college when I was learning to do kernel development I spent a lot of time learning how to do bug triage and it is very it's very valuable and it's a good skill to learn how to have it but you do also make sure there's the mentoring feedback to make sure you're not just throwing them out into the weeds and they are learning something that person becomes the first person anybody talks to after they submit a bug unless they submit a really good bug and those people are generally upset because they had to submit a kernel bug in the morning well a lot of it we're thinking of a lot of it it's a higher level than that right it's just making sure that they're assigned to the right people that see them and things like that sometimes you're not asking for more information there's a decent amount of information there but this is a i9-15 bug so I need to make sure that it's because they follow actually based on the the xorg driver component instead of the kernel component so even though it's a kernel problem it gets assigned there that's a question, there's really only two people and I found a bug in a sound driver so we can CC to the sound people well there is that we really employ sound people I mean there is a separate line to CC but I don't know who actually that really needs to go upstream and that's the worst part unless it's like sound that's in a laptop that a whole bunch of people have it probably needs to go upstream because there's no so I have one question what kind of calls from the earlier point is there any level of testing that you would accept as gating before automated testing you would accept as gating before it goes into a non-raw hide stream so yes, like the test we have right now on the what to go to a non-raw hide stream we would consider a lot I would personally consider a lot more testing than going to a raw hide stream even a raw hide stream should be gated by what we're running right now as the quick test on the independent which is really does it boot does it is it signed correctly for a secure boot because unfortunately it's a problem sometimes P sign flips out and we get time for the red hat test key which won't boot on people's systems so we want to make sure that it boots at least in a VM we want to make sure that it's signed by a real key couple of phases probably start also plugging it into the pipelines then we can work from there that to me is even acceptable gating for raw hide if we want to talk maybe a step above that, I would say can you do basic networking can you think about a basic network virtualization do the IP does the IP command work at all can I even access the booting machine let me ask you I'm sure you've probably wanted a bigger testing for the gated but is it lack of tests lack of time to write the test the red hat kernel provides more public tests which kind of help provide a bigger testing absolutely I would love to see them and I would actually yes, I would take a large a large number of those and say yes these are critical enough that we would data rebase for stable releases okay then we can actually start to at least hook up the dots so that we can see how gating works then we can run the tests from what you have I guess my one concern though would be how do we make sure if we find a gating problem actually getting it fixed I mean the hope is that if we the things we find would be fundamental enough that say kernel maintainers would in fact care but who knows what they have for practice it's almost like in control over the test suite that gates it's curated that's the model that coincides the best I guess the more tests you add the more bugs potentially apply which means you need more people to fix them it's also hard in and of itself to say this bug is critical right I don't know if there's a good answer for us to say oh we run LTP and random tests we say hey we're gating on that it's that thing at this point it's going to take the human eye of all that and say what actually happened there and is it gating yeah I think that's a good point I think right now my opinion is that more tests are great I think you're right we need the human gating and not auto gating and saying sorry you don't get to release so the way the gating works there are some fedora gating that Adam's talking about where for example if an upgrade doesn't work the door doesn't even want to look at it that's enforced by policy and the project on the package level all the gating is essentially under your control yeah if you feel like that test is bullshit or the kernel has changed sufficiently or really all the odds are stacked against us we have to get this out security vulnerability despite this you get to change the test disable one boom and it goes the gating has changed what you think is the problem I was like I said I wouldn't even plan seeing some of that on raw high to the point of we have to at least boom and we have to be good so yeah we already have those parts but it would be nice if we could gate just on that for raw high because that way you're not sending out a absolute kernel bit and we have the power now we can enable that within days right now also if you have the test suite there we can actually do that okay great much more testing the next merge one is going to be opening up soon so I'll be curious to see how effective the test is during the merge window versus out in the merge window it's going to open next week though yeah next week or the week after I can't see him opening it at plumbers I mean this is also Labor Day weekends sometimes he gets lazy and decides he's not paying to that's right he might push it because this weekend is Labor Day weekend and the next weekend the next week is plumbers this is the first week of the merge window that's going to suck yeah well it'll suck for us and it sucks for them a lot of the maintainers are out of plumbers a lot of the people trying to push in things we have that problem right now with blocking the components so that won't get there well I mean it does have to start all the people are in a room together which is something that doesn't normally happens we lock them in a data center together it's loud and cold a lot of work gets committed to it not a lot of work gets done