 Good morning everybody hear me Morning. Good afternoon. Good evening. Welcome I'm going to start since talk is a little long None so I might try to get it going For those that don't know me. My name is Don Zickus. I'm a senior Colonel engineer at Red Hat been there a long time Today I'm going to be talking about how we plug into the Red Hat Colonel CI ecosystem talk about What we're doing in this space and how we're interacting with the upstream community so enjoy So we're all here. We all know about the Linux kernel. We're familiar with it It's been around for a while. It's one of the biggest open-source projects out there. I Think a lot of you know it gets released every eight to ten weeks. It's pretty fast And on top of that it's got 14,000 commits per release. I mean that's that's a pretty amazing stat if you think about it I mean that's a lot of a lot of change going on and it's you know, it's community tested and everything and Despite its rapid release and the volume of commits it still delivers a pretty high quality product I mean think about it. You can take a release kernel put on your laptop in a boot and you get it pretty much Works out of the box get all the features you need Which is impressive for the volume of change that's going on However, it's it's not quite good enough. I mean if you talk to all the downstream distros that consume the Linux kernel There's still a bunch of quirks the little Missing features here and there are some features not quite stabilized So they have to spend time debugging it fixing it and trying to push changes upstream to the point They committed this this project called Linux stable to kind of aggregate all that the distros efforts and Put the changes in one spot so they're breaking some benefit Which is it's a good step. It's a good step forward. It helps out the community But if you look at the stats there, they're still doing 50 to 200 commits per release And those releases are happening in Linux table every few days I mean which is great. That's what you want, but same time. That's still a high volume of change going on To the point where like you you start to wonder is this enough I mean is this enough that I mean there's that much change going on things quite aren't as stable as they should be And that's what you hear a lot about downstream is that the Linux stable tree isn't quite there yet We need to do more we need to do better So all this change in in stability stabilization kind of exposes a bunch of problems I'm going to be focusing on today. I got three right here that I want to talk about The first one is you know fixing issue after change is committed is expensive and what I mean by that is you do a change and you You commit it and then the downstream distros get it and they find out it didn't quite work now They got to spend time trying to debug it and fix it and not all these downstream distros like Fedora, Suzee, Ubuntu, Yachto, Rel, etc. That's what they have all the manpower or the time or the resource to fix this stuff So that's kind of a challenge and then on top of that when they do find it they narrow it down to a change It happened working with the upstream community two to three months later after the change was was released It's hard to engage that developer. So that delay right there is a problem So really how you want to solve that is you want to shorten that feedback look you really want to Attack the change right before it gets committed to kind of reduce this burden on So the downstream distros The second issue I want to talk about that is detecting a change that doesn't regress without a test is hard What do I mean by that? What do I mean by that so? When you when you push a change up there, I mean it fixes a problem in the moment and then 14,000 commits later all sudden it's broken subtly and Without a test to verify that it hasn't broken in a small way It's difficult. You just that change all sudden breaks two or three releases later And no one notices it until it propagates down to the downstream disto and everyone complains so what the what a lot of the community has done is Usually provide tests when you incorporate the change But with a kernel community, it's a little bit challenging incorporating tests for every change you provide upstream and a lot of that is not necessarily the community's fault, but it's Provide you there's no easy avenue to provide a test near whether contributing a test or figuring out what test suites to run Or even having the hardware to run those test suites So we need to build an ecosystem that kind of makes it easier to contribute tests that go along with the change So we can make sure we don't regress This high volume of change and the third thing I want to talk about today Is running community tests a new hardware in a private lab is challenging One of the big things kernel does is it sits on it lights up hardware. I mean, that's kind of its job And so a lot of new hardware enablement happens And so these these companies that develop software to light up their hardware. They push the changes upstream And they do the best they can with with the test suites they have But they don't have all the the test suites that all the downstream distributions Have and various labs that all the other companies have so it's not quite as stable And so it usually you have to wait for the hardware enabling to go upstream it propagates downstream and then The downstream just so they work on their stabilization and then it pushed this changes back upstream And usually it takes about two to three releases for this hardware to stabilize It gets there. It's good, but to really want to wait two to three releases, you know, could we do better? We try to focus it On the first release and in the solve that problem I really need to figure out a way to engage these hardware labs by giving them the test suites They need that everyone else is using Getting them more involved in how the district downstream distributions are testing and using their hardware and stuff like that So that's some of the challenges we want to face and that's what these are the challenges I want to talk about today, so let's dive in or actually first Here's kind of a this is kind of a high-level overview, but This is kind of the flow of the ecosystem. We're looking to do We start out with the so I'm actually let me take a step back Red Hat's gonna talk about an ecosystem a CI workflow that we want to incorporate today So that's where I'm driving and the ecosystem we want to talk about is this where you have a maintainer Kind of pushing out change to a CI system which detects with tests to run Runs in a whole bunch of labs gets the feedback and if it passes pushing on upstream to Linus's Main Linux tree, so that's kind of the model. We're working on an ecosystem there, but the question really is How do you build that or how do you enhance what's already there in the community? So it would start at the bottom. You know, I got a bunch of pieces into this ecosystem here And we're gonna start at the bottom with hardware First thing you need to do in the ecosystem is is test the kernel on hardware, right? So that means you need a lab full of machines Well, how do you get run tests and all these machines? You don't want to do it manually so you need a you need a service that manages these machines is a service that can Control tell you what machines you want to use what machines? Which kernel can go where and how to how to run a automated tests? What Red Hat's been using for over 10 years is a service called beaker. It's open source That's been out in the but out there for a while I And what this does is it it's an amazing piece of software. It basically provides your remote console remote power A database of all the machines custom Kistar allows you to do reservation on the service and It really allows a developer to dive in grab a machine run is to run tests and debug it if need be and It kind of I guess it is the the basic building block of doing a kernel testing it for well at least So I got a couple sides kind of showing the power of beaker. These are these are web pages off the beaker So just to give people an idea. It's not it's it's a public project, but not everyone's seen it So internally a red hat we see stuff like this You got a bunch of machine names who you know what architecture who's loaned it out who's reserved to this and that See, you know you can just click on a machine reserve it Provision whatever install you want and a way you go and you can it's fantastic for debugging a kernel You run a bunch of jobs on there you get a screen like this it tells you The tests you ran do they pass? Here's the logs you have there's a console output everything you need to help you debug or figure out What's going on with the kernel? And it's got apis you can you can do automation scripts you can do CLI and bots and stuff too And the final page I want to show you about beaker is this is a you click on a machine It has all these details one of the power of beaker is this inventory database You put you plug your machine in the beaker you scan it and it tells you all the pieces about it tells you about the CPU how much memories in there how much disk all the PCI cards and everything so when you want to run tests You want to run networking tests You can get the exact networking card you want you just do a query to beaker and you say like I need a you know 40 gig knit card boom it gives you a queries its database Figure out which machines have that give you that and runs a test and you It's amazing. We've been using it and we've been using it for testing fedora stuff too now So that's where we start so we got the hardware the next thing you need to do is now you need to put tests on top of it Now red hats bread and butter over the last 15 years to rel is the testing aspect We have a lot of in-house tests one of the big pushes we want to do if we want to stabilize You know the kernel upstream we want to build a public ecosystem. We need to start pushing our tests in the public space So we've been doing that over the last couple years and we've pushed out a Majority of our bigger Kernel tests so I got I say 90 public tests. These aren't just small tests We're talking you know stuff like LTP XFS block tests these test suites that have hundreds if not thousands of tests inside them already, so we've been pushing that upstream are pushing I Guess a directory that has a glue code into beaker that can execute these tests for you One another thing we're doing with testing is Every change that gets committed upstream you can't just run all these tests can't run all the tests on there It'll take multiple days to do all that stuff. So we have to frame it One of the things we did early on is frame it to about two to four hours We we find a set of tests we can do that makes sense for this particular change in the kernel and we reduce it down two to four hours And this is in order to handle the volume of change. I mean 14,000 commits. You need a Huge lab to kind of keep all that parallel testing going So by framing it allows us to keep up the volume a little bit and That requires us to kind of create a new technology called k-pad here, which is dynamic patch detection or True, you know, we passing get trees and kind of allows you to detect what type of Which tests map correctly to the patch or the source tree that we want to do And that you know we kind of use triggering patterns to Based on code cover to determine exactly the the files that get touched and what tests they need to trigger We also know we're working with the community on these various tests with them reducing false positives a lot of times We if you want to a proper CI system, you can't just have a test that passes 90% of the time Just doesn't make sense because now you got people 10% of the time You got a test failure and you're telling the developer that hey your your change failed when it really was a test and it gets Least of frustrating situations. So we're focusing on and making sure these tests Reproduce or run reliably so a lot of the false positive work. We've been doing in last year on these various test suites The one thing we haven't done in the public space is workload testing. That's kind of a separate area A lot of it was a proprietary test, but we do workload testing internally and we do provide some of the results upstream, but It's not really been the focus of our CI ecosystem just yet One of the questions I get about testing is where do I contribute I have a new test How do I contribute and at plumbers last year? We talked to a bunch of other companies and developers and they said LTP is the place to go if you have a test you want to contribute to our ecosystem I contribute the stabilizing upstream kernel push your test to LTP if LTP doesn't make sense for you There's case health test Those are the two biggest things now. There's plenty of people who have their own standalone test suites XFS tests the popular one block test whatever That's fine. We can incorporate that and in top. In fact, we have this CKI directory right here With test speakers, which is basically a collection of all these test suites We put on our little beaker wrapper this glue code that kind of connects the test suite into our big beaker machine execute the jobs correctly So if anyone wants to get involved in this stuff or if you have a test that you want us to run in our ecosystem I encourage you email us at CKI project at red hat comm Get involved. We'll we'll work with you will enable your test and we'll get it running and We'll start providing I guess feedback on that particular subsystem of the kernel So next thing is okay, so now we got hardware we got tests How do we put it and we got an upstream kernel we want to test or a maintainer's tree How do we put that together? And that's where that's where CKI comes in we have this the CI service we call it CKI continues kernel integration This is kind of the glue that puts everything together. This is it. It's basically it runs in git lab using pipelines and It's a service and we just put all the pieces together and Well it It runs the automated test it parses the results and emails you the results back to the mailing list or Developer who's interested in these results? It's it's a lot more complicated. I'm what I'm talking about here And there's a there's a nice little diagram. They give a CKI talk yesterday It's probably recorded you can look it up by a yuck up in Naki, but it does a lot of stuff There's a lot of triggers it pulls in stuff everywhere patches and get trees co-g copper and everything But really this is kind of the engine of our CI service of our ecosystem that drives all the changes in the testing Some stats we got going on here So we've been plugging in we do a lot of work on CKI on Linux stable arm already made trees. We fight about four to six issues a week and We're running about up to 90 tests on various subsets of that on on four different arches So we've been finding good stuff. We're looking to expand it trying to get more trees involved more test suites in there And expand the stuff, but we're making good progress We're impacting the community and stuff like that and again CKI project at red hat comm You know what if you want to get involved you want to get a git tree of your own You're a subsist to maintain you want to get a tree involved contact them though. They'll hook you up more tests Let us know we'll we'll get involved all right, so Arc arc is another technology that we've been working on red hat and what is arc arc is the the ability to kind of the glue CKI The kernel developers. What does that mean? So you take an upstream tree, but There's there's various ways to configure it and package it and CKI doesn't necessarily know that so what we need is we need a tree So a code base that allows us to do that So we we have fedora kernel and we have always ready kernel and these have they provide the configuration the kernel spec file That the packaging rules and stuff like that to CKI to say okay based on any git tree or any patches We have up here. This is how we're going to connect it package it so we can run it through our CI system So provides all that distro magic and on top of that it also as a side benefit one of the things we have right now is Fedora is big fan of disk it, but it's not a natural way for kernel developers to do development models So we've been trying to turn it to a source git model and plug in this distro magic into the embedded into the source tree So developers can just hack away like they normally do and then run commands like make RH RPM or something and it takes your source tree the way you hacked it up and convert it to a source RPM and build it locally or you Can send it to our CI service and and stuff like that. So that's kind of the glue That takes a source code as a developer and plugs you into the CI system. So that's a magic we're doing right there and It's based on technology we've been using to develop Ralph for last ten years So it's it's a stable technology. We're just putting it pushing into the Fedora space A lot of this stuff. I've already talked about There's a beta link right there. We're slowly pushing that out Feel free. It's based on git lab. Feel free to check it out Any requests or issues you can contact people there require file tickets and issues there Okay, so we That's kind of the ecosystem there But that that ecosystem we got hardware tests the CI service the glue code back to developers that works great in the red hat area Well, one of the problems I want to solve is all the partner labs all the hardware enable it in different labs. So We need to expand on the different labs. So that's where we have a technology called DCI Distributed CI and this is where it gets involved. This is basically taking All these different labs, you know, there are hardware OEMs or hardware enablement I mean anyone who makes hardware take their labs. We can connect it over to the red hat labs or the red hat ecosystem And Uses beaker underneath so you got beaker in both places. So it's easy for our CI system to interact with their their lab Now you can't just talk to the D. You can't just talk to their beaker lab as Is because there's things called firewall But what DCI does it provides a public cloud instance where you kind of do the magic where you kind of push the cloud and Pull it back down to the lab. So breaks. It's an easy way to get through the firewall and they set up this unique network lab situation so allows us to Easily engage in in these partner labs quickly and now we have we're working on technology where we could take any Any change upstream we can we can push it in the CKI service and it can you can build in our local Beaker lab and then push out to all our partner labs get their test results and bring it back to us And on top that we can engage with these partner labs and say hey look you want to run your own tests using This system they can do it on your own term you do a BIOS update you can take our test suite Run on your systems independently of us and verify your BIOS or firmware updates isn't breaking anything So it's a really kind of a cool system that we've been promoting and then pushing so There's a website distributed CI.io. I Reach out to them get involved if you're interested in stuff like that They have a we have a whole bunch of labs we've set up. So you're not gonna be a unique situation Alright, so we did it. We built an ecosystem right we got we can manage a hardware We have tests. We have a CI service. We have glue code for the developers. We can incorporate red remote labs This is an ecosystem. We can put us all together. We can start stabilizing the upstream kernel And we can get people to interact and participate. So that's great however We're not the first ones this this party We're not even the second ones. In fact, there's about 400 people who've already been down this road in interacting We have Intel zero day. We have Lenaro's LKFT Google sys bot with the current CI org effort So we got a lot of a lot of people developing their own ecosystem in contributing to stabilize the upstream kernel So on the surface, that's fantastic. The more CI systems we attack the problem The faster we get stabilized the kernel everyone happy as a big win, right? No It's not and the reason why is it developers push back and saying look I don't want five different reports from each CI system I don't want to five different ways of plugging your system. I don't I don't want five different ways to write tests. So We failed not not failed, but you know, we created a victim of Rowan success here. So what do we do? The kernel CI org folks decided you know what? What's unified this stuff? What's what's what's unite the clans let's figure this out So they want to reach out the Linux foundation. I said, let's start it. What's sort of project? We'll call it the kernel CI project We'll bring a bunch of founding members in there and we'll have a mission you know Enhance the quality stability and long-term maintenance of the Linux kernel and create an ecosystem that everyone can participate in Unify over and they reached out to a bunch of companies and said hey, what's what's all get together and try to unify the CI system make it a easier Experience for developers Testers maintainers horror manufacturers stuff like that. So reached out to us and red hats idea. Absolutely. We'll be founding members Us along with Google Microsoft sip bay Libra clabra founders. I we all collaborate was like, yeah, this is fantastic. What's all jump in? And we we that kicked off in November so we've been spending last couple months kind of just getting the project going all the The business side of things getting the mission statements straightened up and everything so Now we've gotten to the point now the business side is stabilized and we got to the point like okay What's our what's our short-term goals? And one of the things we're trying to do is unify reporting methods Encouraging kernel maintainers to utilize CI services and documenting how to plug in a lot of stuff We have been focusing on a central database where we aggregate all the the data Into kernel CI org and then there's a front end that all that can pump all that data back to the maintainers So they did landless stuff. So that's being rolled out. Hopefully they'll go live soon But that's you know one link the early developments of this effort And on top of that we're still seeking membership So if any company out there wants to get involved and define the standards for the kernel CI project or Drive the direction of it. I encourage you to get involved. You can reach out the kernel CI.org You can reach out to myself. I'll hook you up with the right people. We can have these conversations So a couple more slides This is the curve. So again, we're all trying to Drive towards stabilizing the upstream kernel and using it in the ecosystem sort of like this where you kind of put in CI Encourage more testing across various hardware is more test and everything and then if it passes these tests and this ecosystem push it upstream because you know Fixing a change before it's committed is far easier than after it's released. So this is the effort we're doing I encourage everyone who wants to participate reach out. We're building this we're driving. We're trying to grow it. I Give you all these links If you're not sure what to do just reach out. We're happy to guide you steer you do anything to get you going You know, whether if you're a developer Maintainer hardware manufacturer test or whatever We're happy to happy to get involved with you So that's all I have. I know I'm really short on time Here's some links summary of all the links I showed you before I guess Last thing is questions. I know it's a little fast there, but what kind of short on time Nothing someone's got to give me one question. I got Neil So just spitballing here, but in several of your slides you You noted that that we've got this distributed CI system whereby we incorporate labs from multiple vendors One of the interesting things I think I was thinking about while you were presenting that was a lot of these these labs were established to To test a specific distributions code IE, you know beaker set up to test rail we can test ARC, of course or or anything We want but we invested in it to test rail Susan did the same thing canonical thing. What have you have you considered? the possibility of a correlating results database that that allowed individual distributions to To test their own distributions and submit results based on information regarding the test They submitted the the errors they may have found and what code they incorporated from upstream So that they'd be more likely to use it Yes, so the question is I'm going to summarize a little bit is Obviously, this is for a red hat perspective the ecosystem using beaker and testing in a partner lab But if other distributions want to participate run their tests, how do we aggregate their data, right? I think so that's what the kernel CI or project is trying to do We're trying to centralize that data trying to get okay all the different distributions or run their data What's what's put it to a central database and we'll get someone to? parse that data analyze that data And see if there's any difference here discrepancies hopefully all that all these testing is passing But if one distribution is failing other ones passing, how do we dissect you bisect it and figure out what's going on there? we're also trying to figure out ways to encourage beaker and other Systems out like beaker like LKFT to expand across different distributions that way more people can contribute It's easier for horror manufacturers to get different distributions inside their labs. I Am out of time maybe I could sneak one more question in no no no, I mean kicked off I Did that on purpose right now? Thank you everybody. I hope you learned something