 Hello everyone, so I'm Tril Rouloud Tril, and today I'm here to present for you with my colleague Kevin Hellman, how patches to grow up and go to the Colonel C.I. School. So a little bit of a small introduction, let's start with introducing our company Bayib. It's an embedded Linux consultancy company. We are based officially in Nice, south of France. So we work together almost 40 engineers and we are working basically on open source domain. So we are the top 20 Linux kernel contributor. We are working also on agile, which is the Linux foundation project. We contribute to UBoot this year, and we do a lot of automated testing. So I'm Rouloud and I'm a junior software engineer. I'm based in France and as I'm presenting for you this, so I'm a Pro-NCI contributor. As for Kevin, he's a co-founder of Bayib. He's also a Linux kernel developer as well as a maintainer. He's the co-founder of the Colonel C.I. project that unfortunately stays in Seattle and not in Nice. Who are you, Kevin? So let's talk a little bit about, well, just a little background on what we're going to talk about for Colonel C.I. So a little bit of what the project is, Colonel C.I. is kind of three things. First of all, it's a free service for testing the Linux kernel. So this is the part you may have heard of before. So Colonel C.I..org has been around for several years now. So that's kind of the service that we provide. But behind the service is all the software that runs the service. So Colonel C.I. is also an open source project where all the software we use to make Colonel C.I.org is publicly available and as open source project that we can use to collaborate. And Colonel C.I. is also a collaboration hub where we can focus on what we're doing next, how we can improve testing, especially hardware focused testing for the Linux kernel and various tooling around the open source automation. So Colonel C.I. is all of those things and we recently became our own project under the Linux foundation last October. So there's also a little summary of our mission statement focused on just improving the Linux kernel quality. So as a Linux foundation project, we launched with this group of founding member companies and these companies are working together under the Linux foundation project now to kind of advance the state of Colonel C.I. and work on what's next. And so that's a bit of what we're going to be talking with you about today. So our main topic is how a patch should grow and goes to school. Well, we have a newborn baby, which is just an idea in the mind of a developer, a maintainer. He develops that into a patch. So we are passing to his total years when he sends that to the mailing list. So there he will fall down a lot. While growing up, he needs to go to school while he will land in maintainer's tree. There he will learn how to play with others. Maybe he will get a little bit that he needs to survive. Then we are a little bit bigger. So it's for high school, which is Linux next. Well, there, you know, these years it's going to be a little bit difficult. So he will fight with his friends and find a way to be integrated. So then after the high school, he will go to university. Here we are talking about mainline and there he will make new friends. But finally, he will graduate in a long-term stable and maybe he could get a commercial job. So this is a little bit of the flow, kind of a high-level flow of how kernel CI works all the way from our student developer all the way through our pipeline. So we are going to use this slide throughout the talk to kind of highlight and we are going to highlight each of these steps along the path. So as a path flows from the source code to how we build it, how we run it, how we get results, where we store results, how we analyze results. So we are going to go through each of these step-by-step through this talk. So for our cheesy little metaphor here, these will be our school classrooms and we will go through the various classrooms as we go. So let's start with the beginning, let's start with the source code. So where does all the source code come from? So here we have multiple ways of getting source code into the kernel CI process. So we can get patches right off the mailing list. We can get patches and get commits right out of for multiple maintainer trees. So we track quite a few different kernel maintainer trees. And from there, we're also getting patches from Linux Next, from mainline, from the stable trees. So there's multiple places where the source code enters the kernel CI pipeline. And at each phase of the patches lifecycle, it goes through kernel CI. So by the time it gets from the mailing list all the way up to mainline in the stable tree, it's actually have gone through kernel CI multiple times. So one of the things we're looking at doing next, so I mentioned patches from the mailing list and from trees. Right now, we're not actually testing directly from the mailing list. We're testing from the maintainer trees and from Linux Next and mainline. But one of the things we'd like to do as a next step is actually test patches in series directly from the mailing list. And there's actually some really cool tools coming out of the kernel maintainer community right now, like before that can easily get patches right from the list in the mailing list archives. And we can start using those tools to easier test patches right from the list. And also the more compute capacity we add, the more kernel builds we can do so we could actually start building more trees. So moving on to the build classroom. As you saw, we do quite a few different kernel builds. So we're building several different kernel trees. And so in our and for each of those kernel trees that we build, we build many different configurations. So we build multiple architectures, we use multiple compilers, we do multiple different def configs. And we have this various inside of our source code. We have a YAML config file that people can change as they want so we can do different customizations for each tree. So for some trees, we might not build for all architectures. And for other trees, we might. And the kernel build process includes not just building the kernel, but also building the modules and also building the self test that come as part of the kernel. So to orchestrate all of this, we today we use Jenkins. We have a primary Jenkins instance, which actually kind of manages a lot of our build resources. So we have static VMs and cloud VMs. And we're starting to use Kubernetes actually to manage the cloud VMs. So on average, we're doing around 2000 builds a day. That's on average. Some days it's a lot more, some days it's less, but that's quite a few builds today. So we need we need quite a bit of compute capacity to manage all those. And as I mentioned, the cloud compute now that we've become a Linux foundation project, thanks to both Google and Microsoft, we now have quite a bit more cloud compute capacity. So we're scaling up our compute capacity using Kubernetes actually to do a lot more builds. So a little bit about some of the build tools and data that we use. So we have we've created some base Docker images for each kind of architecture and tool chain combination. And that allows us to build kernels in a consistent environment for each architecture and for each tool chain. We have this tool called KCI build that actually kind of wraps up a lot of the build logic into a Python library and a command line tool. That manages the cloning of the repos, the building of the kernel, the modules, the self tests, the install step, and then publishing the kernel to our storage area, which we'll talk about in a little bit. So down at the bottom, there's a URL here for the kernel CI core repo, and that's where all these scripts are published and maintained. So you can you can use those and test those and send the send pull requests and stuff against those tools as you want. As far as the data that comes out of the kernel build process, we generate all the obvious things. So the kernel images that are built, the symbol maps and so on. But these kernel builds are also generating for some architectures, there's quite a few device tree binaries that are built as well. The kernel modules, the self tests, as well as all the build logs, those are all artifacts of the kernel build that we save and we store in as our artifacts. As well as that, we track a bunch of metadata with each kernel build. So we're tracking the image size per section. We're also tracking the build time. So we track how long the kernel takes to build. We track some information about the hosts platform so we know for each of the different VMs that we're building on how long it takes. We keep track of the compiler errors and compiler warning counts and so on. And so we're tracking a lot of this metadata that we also carry along with the builds and we store in the database to deal with later. So this is just an example of some of the build metadata that we keep, that we track for a given build. So here, for example, we have a bit of information about the build machine itself that was used, the number of parallel, the number of build threads, so the number of parallel, essentially the makeJ level that we can use, the little bit of information about the compiler, all sorts of things. So then we can see the tree, the kernel version, the git commit, all these things. So all this stuff we track per build and we store in our database so that we can analyze later. So some of the things we'd like to work on next in our build step, like I mentioned, we're working with Kubernetes now. So this has just begun when we started to migrate some of our infrastructure to Kubernetes. But this is definitely an area where we could use some help. We are learning a bit about Kubernetes as we go, but it's definitely not something we are familiar with as a team. So if anybody wants to help kernel CI and has some good Kubernetes experience and kind of Cloud DevOps experience, we'd be grateful. We're making a lot of our, I mentioned our Python tooling. We're kind of reworking a lot of our Python tooling to make it more reusable in library form as well as command line form, migrating it to Python 3. I mentioned we're using Jenkins, but we're also kind of tired of Jenkins in many ways and having a hard time keeping it maintained and keeping it upgraded. So we started to look at some other CI CD pipelines as well. So any help in any of these areas, if you want to help kernel CI, we could use help in any of these areas. So please let us know. Okay, so moving on to the next classroom. Let's talk about how we run the tests. Okay, so the third classroom for us now, it's the run classroom. Well, for that we need a lab. So for a kernel CI project, we have almost 10 distributed labs. In the library, we keep maintaining two labs. So let's define a little bit what's lab for kernel CI needs. Well, it means there is a test platform that manages several different types of hardware. So for test platforms, the missions are to schedule these hardware to allocate resources and to check dependencies. So as an example, we are using Lava, which is an open source platform developed by Dinaroo. Also, there is Beaker, which is used by one of our members, which is Red Hat. So the purpose of kernel CI is to test on very different types of architectures. So for now, we are supporting ARM, R64, 86, SNPs, and RISC-5. For that, here an example for these architectures with boards. Also, not only we don't use only hardware, we are using also VMware as a KMU. Also, we have different types of boots. So in here, as you can see, we are testing on Uboot, Fastboot, Crab, UFI. Also, new types of Uboots being added lately to our labs, Coreboot and DepthCharge. Well, next, we will talk about how to really run these tests. So after the build, there is some build artifacts are generated. What we need from those artifacts are, of course, the build images to be tested. But also the DTBS JSON as an example in here. So, Kevin, in here, as you can see, an example of the DTBS JSON file, which contains the name of the exact DTBS. Well, in this example, we have just one board. But in other examples, you can find a very long list with the different names. So it depends on the architecture. As well, be meta-json file, which contains the metadata information of the build. So if you can see next, architecture in here, this example is risk five. The environment is GCC8. Also, the git branch and the git commit, which is the show end of the patch to be tested. Also, the status of the build, which is pass. So to generate a test job, we are using QCI test tool, which is a Python tool developed by Kernel CI community. Well, to generate, we need the test plan. So which type of test we want to run the lab, because each lab has different types of boards. Well, the storage from where to get the artifacts, the user and the token. After generating that test job, which is a YAML file, it will be submitted through QCI test submit tool. Also, we need a lab, a user token, and then the job definition, which is generated previously. In the bottom of the slide, you can find the link for QCI test tool. So next, we will have examples of the job we are making. So in the first part, we have the metadata we spoke about previously. It's the same that in the images in the file. Then this part, which is contained, the deploy part, so the URLs of the kernel, the RAM disk modules, the NDTB, the boot section to define which type of boot we are using. For example, here, we are using new boot. And then the next part, which is the test itself. So in this example, we can see it's a sleep test plan. So for now, we don't have a lot of different tests. We keep doing four kind of tests. So the other test tools are power suspenders, which I'm presenting in this example. Then we can find media, which is for L2 compliance, graphics, IGT, and a new USB test, just smoke test. So in this section, the run next, we need help more to create new tests. So new variant of tests. So what's in progress are CASA of test and KU unit. Well, we want LTP, size, both size, color, or producers, XFS test, and you can contribute with your favorite test suite. Well, we move now to results. So after running these tests, we are having results. So in this, we can see we have different end labs. So different labs distributed. Each lab has its own way to send results data. So either via REST API, or structured data, or even command line tools. For structured data, we keep giving the example of Lava callback. So where this data goes, we have either kernel CI backend, which is a tool to store this data and analyze it. It stores its data locally using MongoDB, or to the CASA IDB, which is in progress, work in progress. And the difference between CASA IDB and kernel CI backend that kernel CI DB store the data in the cloud databases. So in the bottom, you can find the API for the kernel CI backend. And please have a look. Okay. So let's see an example of a result test. This is from Lava callback. An example for Meson KDSM3. First, this is a generic format. So we have the actual device, the lab name. So from which lab we are receiving this data, results and logs. So next, we will have an example of the results expanded. Yes, this is an example of a test suite for the message. And you can see next, we are having, for example, the name of the test case and the result is passing. Okay. What's next in results? So we need, we are needing, we need to scale more data. So we are receiving lots of data. And we need to develop new tools. To have something in common and generalize it for different types of labs. Okay. So after receiving those results, we move to the next class, which is store. Well, we have two, well, in the store class, we are receiving from both sides results and build. So as we can see, the build artifacts are, it could be build logs. The kernel, the RAM disk, the router face, test images, DBEs and meta data, as we previously seen, an example of the beam meta JSON file. As for test results, the run artifacts, we have the test results, which are stored in MongoDB. We have boot logs. So in the test, we have two types of tests. So boot tests, we just test if the device is boot. Or we have, as I mentioned, different types of tests. So in this example, we can find a boot log for a Mason board. And this is in the type of text format. But not only that, we can have also another format, which is the HTML format. In the next example, we'll see a test log, not a boot log, in a HTML format. Well, if you visit kernel CI, storage kernel CI.org, we'll find a lot of different data and build artifacts and run artifacts as well. So what's next in store? As we are receiving a lot more data, so we need a bigger storage to handle that size of data. Well, we do not only store data, but we do analyze also these data, which is the purpose of kernel CI. So next, we will see the kernel CI backend, which receives data. Then it analyzes if it's built a result. So it's going to be pass or fail. If it's test results, we'll have test pass or skip or fail. So when it's fail, it could be always failing because that platform can support that type of test, or it's a new failure. So when it's a new failure, it's causing regressions. We need to track that regression, and by doing that, we are doing bisection. And so we get to the bed patch or our bed kid. And of course, we have to report the bed behavior. So reporting. Okay, what's next in analyze? As we are having more data, we need to get deeper analytics, for example, which are the platforms to fail the most, which are the trees of materials that have a lot more problems than others, or which one that keeps failing boots. Well, we can have different types of analysis and statistics based on this data. Okay, our next in our last classroom in our kernel CI school is the reporting. So let's talk about that a little bit. So for reporting, what we have today is primarily email based reporting. So we're doing email reporting for builds and for test reports, as well as for bisection reports. So the next few slides are just going to show a couple examples so you can get a little flavor for what this looks like. But you can also use our web interface to find some of these and look in more detail on your own. So for a little flavor, the example here is a test report that you can see. This was on mainline master. It tells a little bit about what it is. So you can see it's the 58 RC one. It tells the test plan, our baseline test plan is kind of like a minimal set of tests after a platform boots to make sure we can actually run other tests. So we can see from this regression summary that actually three platforms actually have new regressions on this kernel release. So you can see that zero out of one tests actually passed here. So you see the platforms, the architectures, the lab, the def configs and so on for each of these platforms that actually failed. And then you can actually go to the, for more details, you can, oops, for more details you can actually click on the links here and go into the web interface to get a little bit more details and we'll show you some examples of the web interface in a little bit here. So at the top here you see a summary of the platforms that failed farther down in the email. It'll give more details for each of the regressions and you can actually see even segments, little fragments of the logs to show what happened and why it failed. So that's just one example. Another email example is when we actually find a new regression and we're able to bisect it down to a specific commit, we'll send an email out that isolates the commit itself that caused the problem. So we'll send out this bisection report. So this example is from the same tree. It says this particular test plan had a problem on this particular platform and then it goes on to give a little more summary and in more detail it actually will show you the commit that actually caused the break. I removed the author to avoid public shaming here and then also it'll give you the get bisect log so it tells you can see where it actually started, where it found a good commit, where it found a bad commit and how it bisected all the way through. So those emails go out when the kernel CI actually successfully can bisect to a specific commit. Another form of reporting that we have is our web dashboard. So this web dashboard has been around for a while so you may be familiar with this already. Here this is just an example of a recent snapshot. So here it's listing a bunch of the most recent trees that were built, the branch from each of those trees, the build status. So this shows how many builds we did for each of these trees and then pass and fail or kind of skipped or unknown status here. So we have build status here and then test status. So for each of these trees we see how many test plans were actually run on each of them and then a failure count and so on. So then clicking on each of these lines will take you into a little bit more detail of these views as well. So this is kind of the existing kernel CI view that's been around for a while. The test results, we've been running more and more tests so you may not be as you may be familiar with looking at builds and boot reports but now we're doing a lot of tests as well so you can look at those as well. But just browse over to kernelci.org on your own and you can browse this and see what you can find. And we're welcome, we would love feedback and opinions on how this works as well. We're well aware of its limitations but we would love some opinions as well on how to do it better. So here's another example of, this is the link in the email report for the bisection that I mentioned before or for the regressions email that we looked at before. This is the link that would be clicked on. So here is the web view of the same thing where it shows the same three platforms down here that actually failed and it tells you the same type of information. But then here from the web interface you can click on each of these sections and it will take you to more details about the platform including a way to download the whole build, sorry the whole boot and test log so you could look at yourself if you really want to see what happened. So here we can also see the total number of tests that were run so out of the 400 or so tests three of them actually have new regressions. So again this is all browsable from kernelci.org or from the links that come in the email reports will take you to views like this where you can dig in more detail, have access to the logs and so on. So Harood mentioned the KCIDB. So KCIDB is kind of a parallel universe for a storage that we're in the process, it's been a work in progress that we're now bringing into production soon. It's kind of our new way of storing a new schema and a new cloud-based storage. So we've also been experimenting with new web dashboards that are built on top of KCIDB. So this is a quick view of the Grafana-based UI that we've built on top of KCIDB. So at the top of kernelci.org you'll see a beta section for KCIDB which will actually take you to this Grafana interface. This is a new kind of work in progress proof of concept that we're exploring to build better a little bit more usable web interface as well. So again we'd love some feedback opinions if you're a Grafana developer or a cloud database person we'd love some help on these as well. Right so that leads me into this. So what's next for the reporting side? So we are well aware that our UIs and our web dashboards are pretty primitive which is what you might expect if you have web dashboards built by kernel developers. So we are looking for help and begging for help from people that want to help us kind of modernize our UI and user experience. So again as we've been begging for help all through this presentation if you have if you have a interest in helping our project and you have some experience here in doing kind of new web designs especially kind of analytics type of web dashboards and UIs we'd love some help. I mentioned that we're migrating towards using KCIDB for the rest of our backend and storage as well and then another thing we're working on is having email reports that are a lot more customizable so people can select just the types of things they're interested in and get more specific email reports out of that. Again these are all things more things on our wish list not things we actually have the resources to work on at the moment but want to highlight areas that we know we can improve in and we would love some help. So to summarize our kernel patch has gone to school as it's grown up through all of its phases of life from the mailing list through all the different trees and made it hopefully makes it all the way through its rough times in Linux next and mainline and and then hopefully graduates. So after many times through kernel CI school we hope that it learns enough lessons and gets a good enough education that it can graduate into LTS someday but it requires many trips through kernel CI school to get to that point. So here's a few ways that you can catch up a little bit more with our project what we're doing how to get involved you can follow our blog you can follow us on Twitter if you want to get involved technically we have the mailing list IRC we have a weekly technical call as well that's open to the public you're welcome to join as well as you can follow things on LinkedIn as well all these are ways you can get involved you can also reach out to us personally if you'd like and a few credits for the photos so with that we can move on to the question and answer time. Okay let's see if we have some questions in the chat here so there's a question about build bot many open source projects such as open wrt build bots used yeah we actually explored in the very early days of the project we explored build bot and I actually forget why we decided against it these days as we look at things besides Jenkins we're actually looking at other tools like that are a little bit more cloud native things like Tecton and things like that that are a little bit kind of built around Kubernetes but we haven't really made any specific decisions there yet here's another let's see is Debian the only distro support on kernel CI system it's the only distro actively used we've built things that are flexible for many distros but we actually use what we use as build root for small kind of RAM disk like minimal boot type testing and then we use Debian for building root file systems full of that have test plans and stuff ready to go just because it's really easy to bootstrap a Debian root FS we can do other distros if there's interest there's nothing kind of there's no there's no assumptions about distro specifics in the way we've built things let's see next question do you run syscall fuzzer covariate so we don't do covariate but we're starting to look at the syscall fuzzer so we're actually not running sysbot or syscaller directly but the syscaller project is having a lot of the tests that have that have caused that have found bugs the reproducers for those tests are finding their way into other test suites like Linux test project so as LTP as we run LTP we're actually being running some of the fuzzer reproducers but right now we're not actively running fuzzers ourselves again this one of the areas we'd love to run more of that so if this is an area if you're working on fuzzing and want to get involved with kernel ci we we could definitely use some help there okay um how much do you think the proliferation of kernel test project affects the complexity of project lakes ours yeah it's um it definitely affects the complexity um what we what we've kind of said is kernel ci we're not we're not working on writing test suites ourselves what we want to do is kind of make it easy to run more test suites on a broader set of hardware so in in one sense we're kind of test suite agnostic um on the other hand a lot of test suites evolve in the context of a test framework and become really tied to test frameworks and so that definitely makes things uh complex for us but that's one of the reasons that kernel ci was created we wanted to make a place where the collaboration on these types of projects can actually happen there wasn't really a kind of common place to develop test frameworks and this type of especially kernel and hardware focused testing okay where's the best place to go if we want to help mailing with slack website um probably the mailing list to start with uh we have we have weekly meetings as well where we where we meet on uh we have some video calls so it kind of depends on the topic but maybe starting a conversation on the mailing list is the best place and then we can point you to the right people in the right other conversations i think i got all the questions uh let's see yeah i think that's all the questions there's a couple other comments that i in there um oh that is one kernel developers from loving emails and cli to loving web dashboards times have changed yeah i wouldn't say they've changed i mean the kernel developers still have a strong preference for emails and command line interfaces but there's definitely a growing set that that like using dashboards and are pretty opinionated about dashboards and so on so but it's definitely an area that we'd like to evolve and see are there any more questions i'm not seeing any other questions thank you all for the comments uh oh here we here's another question how long let's see if i can publish this first uh how long roughly does it take to test a patch um that kind of depends on how busy the the builds are i mean basically if if the build we build for certain trees we build all sorts of different configurations but that can all be completed within a couple hours so if the builders aren't super busy with lots of other trees we we can usually have results with on the order of a couple hours um however we we're doing as i mentioned we're doing quite a few different trees um from next to mainline to stable trees and so on and on some days those trees are updating a lot so the builders are quite busy and so it depends the the build is kind of the bottleneck at this point although um as we get more compute capacity like i mentioned kubernetes and so on that's going to be less of a bottleneck and uh we'll be able to test patches really quickly we there's quite a bit of hardware available for the for the testing so we can do a lot of the actual tests in parallel um it's more the build stuff it's the the bottleneck at the moment how easy it to set up and donate a new lava lab to kernel ci um oh right now that's that's it's relatively easy um it's bit poorly documented on our side to be honest but uh but that's gotten a lot better in the last few months we've been fixing some of the documentation here um but uh yeah if you already have a lava lab running it's actually quite easy um that's a good conversation to start on the mailing list because we can get existing lava labs hooked up pretty quickly to kernel ci oh here's a good one too uh do you consider git lab um as a replacement of jenkins actually yeah we we've started to look at git lab ci and there's a lot of there's a lot of stuff in the git lab ci that makes that does a lot of the same stuff that jenkins does um and we've actually we have a small proof of concept that we've done with git lab ci um the problem with git lab ci is it kind of everything is tied to a particular git repo and so when we're monitoring we actually monitor you know i don't know 50 some well more than that and somewhere between 50 and 100 git trees so it would to do it properly in git lab we'd have to find out a way to kind of coordinate that or or give essentially give people a git lab ci file to drop into their repos and plug into kernel ci but that's that's one of the things that we've actually explored and kind of prototyped but haven't taken much further than that okay i think that's all the questions so far okay now we're just staring at each other oh here's another uh oh no okay there's a comment about git lab frw which i don't know what that means uh do we run any code coverage test not currently know um this is an area too where we're we're definitely open for submissions if there if people have existing test suites that are doing kernel code coverage stuff we would uh we'd be glad to take a look at those as i kind of hinted at in the talk there's a lot of things we have kind of on our radar and on our to-do list but we're still kind of a small team um so any a lot of these features we could if you're something you're interested in feel free to join the mailing list and uh make some proposals and and pitch in okay how are we on time here the other questions oh there's some advice any advice for first-time presenters at ELC maybe hulu you can answer that one since this is your first presentation at ELC do you have any advice well actually um you have to figure out what you want to to present as a topic and just uh big courage is all you need i'm not seeing any new questions coming in you're welcome thomas well we hope that was useful for you oh somebody's asking uh we're interested in more hardware um yeah we're always interested in more hardware uh there are kind of two ways to contribute hardware the the best way for the project is to just for people that have hardware to set up their own lava lab or set up their own board farm but there are a few board farms that are already contributing to kernel ci that may be interested in adding some hardware um there's always a bit of maintenance involved in keeping the hardware running all the time and um so there's a bit of overhead there but uh if if you already have hardware hooked up to a lab it's even better okay well thank you all for listening if you're still listening at this point of silence but uh appreciate you all coming thanks for joining