 So, hi everybody, thanks for coming. I'm gonna be talking about the state of continuous integration in the upstream Linux kernel. It's a little bit one-sided from the side of CI and testers, and it's targeted in a big part to people like me, but I would also want to have kernel developers say what they think about it or they think it's the reality or doesn't. Do we have any kernel developers here, engineers? Oh, hi, great, thanks for coming. Okay, so, I'm Nikolay Kondrashov, I work at the ECI project that Red Hat, we are doing continuous integration for the Linux kernel for internal and for upstream as well. I'm also working with the Linux Foundation's kernel CI project and I'm developing the KCI DB project within there. I do electronics and embedded as a whole bit. So we will go briefly through what's the landscape like for kernel CI and then I'm gonna be defining continuous integration and measuring it, you know, providing metrics I think could describe it for the purpose of this talk and then we go over what we have and what we cannot have and what is hard and what we can do with it or at least what I think we can do with it. So as probably many of you know, there's a whole bunch of different systems that test the Linux kernel and this is just a small sample. Each of them sends their own reporting mails and results in their own way, in their own time during the development process and most of them have some sort of dashboard and they're all different as well. So kernel CI is the Linux Foundation project that aims to be the kernel CI for the Linux kernel but so in a big part they are a CI system themselves so they check out kernels, build them, test them, they have hardware and they use hardware provided by third parties connected to one central system so they have the dashboard and their own reports but also within the project there is the case IDB project which aims to get the results from various other CI systems and put them in a single database and generate reports and notifications and have a dashboard for that and extract some value from aggregating this data together. The structure of case IDB is very simple, you just send a JSON containing your test results, build results, etc and put them into a database, show it on the dashboard and then monitor what's coming in to generate subscription notifications about various results. So we get around 300,000 test reports a day and it looks like 20,000 builds and around 100 revisions per day in case IDB right now so the dashboards look like this, it's a Grafana prototype and we have reports which aggregate the results if you look real closely you can see that there is this report was generated from results from four different CI systems. So let's define just an abstract way to continuous integration so it's quite simple, you simply test every change done to the code and all as close as you can and then give feedback on the result, we'll stop on that. So out of this you can come up with some metrics so the coverage of course which tells you like how much functionality you are testing, what's the percentage then latency, how fast you get the results after testing, after the change was made or presented and then how reliable the results are, how many false positives or false negatives it generates and how easy it is to interpret the results that the CI system produces like how hard it is to figure out what's actually broken. So for these metrics of course the ideal CI would test everything, provide instant feedback and we'll be always spot on and we'll just tell you what's exactly broken, not requiring you to go in there and read the logs and try to reproduce those problems. And the worst CI would of course test nothing or test something else and it will take forever and it never produces the correct result and the result it produces basically impossible to comprehend. So the worst CI is much worse than no CI. So with the upstream kernel CI nobody seems to know like since there is no single system and just a bunch of separate systems and they don't really verify coverage that nobody seems to know what was the coverage is like so everybody focused on particular functionality but nobody it seems really measures you know code line or function coverage. Of course that's not very useful but I mean like it doesn't really tell you how much functionality is tested exactly but it is something that could measure them, could look something like this from the, you probably saw this report before. So this is from a single run at Red Hat CKI 4RAM64 and just typical set of tests and this includes just some directories, most important ones but not all of them so this percentage is, well just some percentage. So nobody seems to know exactly what's going on there. Typically it takes from several hours to multiple weeks to get results for your change or for whatever you merged in a tree. A lot of CI systems still do manual reviews of any complicated results because there are still a lot of false positives, false negatives and accessibility of our eyes based on the CI system, some systems provide very clear and very useful reports like SISBOT or Zero Day, others you know more you know bare bones so it's different but the main thing is that they are all different. So there are of course limitations and first of all we can only get as much coverage as we have hardware because the kernel is an abstraction layer for hardware so they need all the hardware to exactly test everything but we cannot get all the hardware so that's a hard limit for CI as well as for latency because you know to go faster you need more hardware to run on. So reliability limit is obviously the hardware reliability as well but also the how well the kernel is working so if the kernel is buggy and tests fail all the time like in interesting ways that the creators of those tests don't participate the reliability of the CI system suffers but that's what CI is supposed to fix. Accessibility is again hard limit is hardware availability because often you get a failure on some hardware that you cannot access yourself, you cannot reproduce it so it's hard to figure out what's going on so you have to do the back traces, ask the owner of the hardware what's going on so it's a difficult process and of course the kernel complexity is a limit for accessibility so tests can only be as simple as the kernel is. Okay so the coverage I think is mostly mostly good I mean like there are many people who want to write tests and who write tests but because of the other other metrics such as reliability latency it's difficult to get that coverage out there and to deliver it to developers and it's kind of sometimes depressing how hard it is so that's a challenge. So problem with latency is that of course not much pre-mercily is done and when you do it you have to be extra careful because there is no authentication for patches that people send to mail this like anybody can send patches and you would think ten times before running that on real hardware in case it blows up for example. So that's a problem you can we can run tests in NVMs but that's a limited effect and then of course the slow human reviews the required reviews for the actual test make it longer like if the test completed right after your left work and then you come in in the morning and then you start going through your mails and checking those results and it takes a while so it could delay the results being sent to the developers quite a lot. So the main problem with reliability I think is the fact that tests constantly distinct from the kernel state and that's also a problem because the latency is too high so by the time test result arrives that somebody decode with the bug is already in the kernel and it's getting stuck there until we fix it and then as a result we get more failures with other systems and that's a big problem as well as well the problem is that you have multiple kernel branches multiple trees and bugs you know jump between them as maintainers pull the changes in and it's like you know hitting it at flies with the flies water so and that accessibility the only problem I think is the fact that that we have all different reports that and we have to as the developers have to try to interpret them all in their own ways and it takes time so so there is like this it's it's it's sort of a network of problems that affect each other so and for example low low reliability and accessibility lead to reduce trust towards results from kernel developers so they they're reluctant to look at them or you know might say it might say like I don't send them to me if they're you know unreliable or hard to comprehend as a result gating is impossible if again if they're unreliable and and as well the the feedback is you know less than it could be towards the code and towards the test themselves so the broom improvement as low broom is as low so people just don't get their feedback don't run enough tests and it's not improving and again the real ability and accessibility lead to high latency because of because of human reviews etc and high latency as well is another reason why there's no dating and the reason why bugs stay in the public code that everybody uses longer and result in more time wasted for by everyone so there's again less time for improvement you have to review those results and try to understand what's going on and this leads to even more latency so I tried to summarize what I think how I think is this whole system works in this little graph but we will take a closer look at this later but if I had to pick one thing it would be a latency okay that's enough boom you get your dog tags I get to drink a little so what can we do with this well at least what I think we could do this or other what we can't do first thing that we do that we must remember is that the kernel community is not a single team it's and moreover it's not a single company where you can just say like okay now we enable gating with whatever we have in fixed tests and that way we start the whole process and we tighten the feedback loop and that's just not gonna happen because nobody owes anybody much the companies that run the tests and the developers are kind of have to work with trust and you have to first fix the test and gain that developer trust before they will start listening and looking at those tests so I think that's one of the main things in this situation so what I think we can do is for coverage obviously beneath more hardware we need to attract more companies to run tests and to provide their results and if if you have your own system you can you don't want people to control your hardware you can send your results to KCIDB and we can also help work on the report and to make it more efficient if you would rather contribute hardware and not you know do your own tests you can always set up a level I've been connected to kernel CI if these links lead to instructions how to do that and in any case you can write to our medallist and ask questions for latency I think the most most important thing to do would be to pre-merge testing because again it introduces a number of public bugs and has you know compound effect on many things on the reliability as well so if you'd like to do pre-merge testing on patchwork or not patchwork with patches then you could use patchwork and that's what some CI systems use before pick patches using the patchwork and then test it safely as much as you can probably in VMs but while resourcing this I noticed that there's about 50 repositories in the maintainers file who are already using GitHub or GitLab instance and that could be a way to to improve the situation by for example connecting the GitLab or GitLab CI to your CI system like kernel CI is thinking how to do that right now with this new kernel CI API that they're working on so the idea is that you public for example publish a action in GitHub actions that takes your patch submits it to kernel CI and then gives you a check mark after it's done or a red cross and why this is better than patchwork is because you get to authenticate the users who submit the merge request you can say okay these users get to get to get to the real hardware and the rest of them like needs to be verified before they get access to the kernel CI and then you actually get your tests running on real hardware and that could be this this effect could be a selling point for other maintainers to get them at least to have a repo on GitHub and GitLab for testing to get access to real hardware in this way and perhaps merge do the testing of merge request and pull requests so since since reliability is a problem like you should not you know try to do all the tests at once it's enough to just start with one test and put it into into pre-merge and talk to the maintainer decide which one which ones we need to do and help start the feedback loop well I have an idea to do use to use case IDP subscription for that and the case IDP subscription system that allows you to pick for example which branch you want to report song compiler architecture like particular CI system anything so if you're interested in those results send a message to the mail list or we'll just talk to me so then the thing is the latency is that manual reviews are hard and slow so it's great if you can automate that and that's what various systems have been doing and that's what we are trying to build in case IDP and for example graphics CI does this and you can see the system for entering various parameters you know patterns and the parameters too much in test results to detect when it's a known issue and it's already a bug open and so that you do not notify your maintainers about this problem or developers and you know keep keep the CI working same thing is done in CKI as well as Red Hat and Seasbot is doing a similar thing but they're additionally able to automatically extract identifying information from kernel crashes and then group the crashes into into a single group under a single bug automatically so that works very well for them and with this they're able to produce a lot a lot and a lot of crashes and reproduce bugs etc automatically without human review so this and this will be probably controversial so what I think is could be done about the lab is the best way to approach it apart from fixing the test all the time etc is to make sure that the tests are in sync with the code and what we can do is you know move at least the most popular most impactful tests into the kernel repository proper like LTP for example and integrating the current documentation may make them official so that will join of course key unit and case of tests but with you know wider coverage the problem of course is that once you integrated with a single branch then it starts like if you if you integrated into master then it starts test and master and the older branches will stop working so you would need to backport whatever you integrated to those branches and fix them but afterwards the complexity of the tests decreases and they become more reliable because they don't have to account for all the different kernel versions and can focus on testing this one and you can keep them in sync but that of course only works if we actually run those tests and run them fast so in the system it's important to you know to prioritize execution of entry test today they're executed earlier and the results produced faster so that you can shorten the loop and keep them working well accessibility is not so bad but I think that if you have decent enough uniform reports it could save people time if they always receive the same report but this there could be different opinions of course and often it's the xkcd comics of about the virus standards that comes up so it could be just another one more report as a result but doesn't mean we shouldn't try so come to me and tell me what what you would like to see there just try this or help the development of the report system so here's here's the influence graph again with the improvements I think should go in there and that's how they could affect the whole system in particular developer trust and the ultimate targets the stronger code feedback and test feedback which lead to better quality that's all thanks everybody any comments questions ideas what do you think there's somebody behind it so the question was when patches have been sent for for maintainers to look at are there any requirements to run tests on those is that right yes so in some subsystems and some maintainers they do have requirements and they define the test that you need to run but largely I don't think there is anything that large and that's what Veronica here wanted to work on yes yes you did and I think that's what Tullis is going to be doing talking to maintainers and trying to determine which test could be called canonical for their subsystem and to put that into for example maintainers file to have it documented and to have for example check patch output output the recommendation requirement for the testing when you generate the patches yes yeah so you have an idea how many of tests like are possible to run virtual machines versus the ones that need to be larger and of course of course the of course they should percentage I don't have a number my feeling is that it's a fairly high percentage especially things like LCD could run in a virtual machine I can only say about red hat I think that our test database contains maybe 40% tests that can run virtual machines anybody knows better like it's just my feeling I'm just poking I haven't calculated that but we do have tests which explicitly marked us can run a virtual machine and that's one way to improve this and to reduce latency and to arrive at the primers anything better than nothing like anything that will start the loop of fussing back and you know get us closer to gating anything goes yes with how do you keep that private yeah we have private tests and oh yes the question thanks thanks Dallas the question is how does CK I deal with testing private hardware how do we make sure like the tests are not visible outside or or the NDAs are not broken yeah we keep tests in a separate repository to test for the themselves and I think that I think that most of the tests are still in the hour keep at database but I'm not sure exactly how they like the very secure the various secret tests added there but they do try to keep them away from the public anyone else kind of so at CK there is there's a system for picking tests based on patches and we worked with it for a long time so we can share that the sole open source but we can share the experience and you know the approach with anyone who's interested but like since we test on real hardware a lot we have to be careful with that obviously and not only to reduce latency but also to reduce the load because there's a lot of patches but yeah we have a system where we can match like okay when these files changed and run that test things like that it's quite sophisticated and works quite well yes do you see some better engagement from the kernel of subsystem engineers and is there some way in which they can trigger the tests for the things they care about like they collect the branch for the next full request or consent to the mail-in kernel can they easily run the test on the hardware that affects that So there is the question was and I'm sorry I'm forgetting to repeat the questions but the question was is there a way like do we work with maintainers to do we provide a way for them to trigger tests for a particular change that like they want to they want to do on real hardware I'm not sure about CKI but like exactly like do they have special branches but CKI picks up the main like monitors maintainer trees when they push something that triggers tests and so do plans of other systems plus there are agreements between maintenance and systems that when they they ask a good can you monitor this specific repo where I will be pushing it will to check my things and get a report so there is that too a special branch that they asked to test then get the results after a while yes was the question is it possible for an individual contributor to to request testing of a patch not as such as far as I know so some systems they do pre-merge testing of patches on a mail list like for some subsystems I think that the Intel Intel's graphics CI does that and some subsystems but like at least one subsystem has a good library when they have they have CI set up there so when you open an MR maybe you will get some tests run I don't know what kind of access control that they have there but there is no universal agreement how to do that and there's no single CI system that does does this universally well if it's yeah for maintainers from outside that's problem of course you just try your best to understand the problem isn't reviewed and you know send an email out manually that's what our main key person does from time to time when they notice like so basically human review and explanations because it cannot normally give access to hardware that's inside it had to outside people for inside developers of course it's much easier you can just get to the hardware and we provide all the help to tell them how to exactly run that test so that's a problem of course now that's that's the problem I was talking about yeah of course of course of course and like sysbot is very good at that they basically generate can generate a reproducer for you like a C program automatically generated that you can run and you know expose a problem in the syscode yeah but yeah VMs in that that way are great but of course CI systems they do like to control their setup they have their own provisioning and everything and to arrive to exact exactly you know the same setup that the test was running in is not always easy to communicate to maintainer for example CTI like we have our provisioning and we have our you know test setup but thankfully that's not the problem most of the time was still like you it is it is a bit of a problem to explain exactly automatically especially what you need to do exactly I'm not sure yeah of course when we can do when we can do VMs that's much easier but still kernel consists of a lot of drivers and those do not test it that that is a problem so the question was the question was could it be a good idea maybe it's a good idea to at least run the test itself in a stable virtual machine environment and pass through the hardware to the VM so but at least the test setup and framework and everything could be automated yeah that kind of makes sense I'm not sure that anybody has tried to do that yes of course like two kinds of devices again like there has to be support but it's an interesting idea especially for some kind of devices right anyone else comments I wish I could contribute to that so that the maintainers would actually run tests on that and catch the bugs before they actually reach the mailing tree but from what I'm understanding even if I go there now and try to enroll the hardware in some way it's not gonna happen because people are not actually running any sort of pre-merge test not no not pre-merge test they still catch the stuff that maintainers merge like for a particular subsystem and you might might get in might get your test executed before it reaches Linus or another maintainer so that's that's that's a bit of a problem so communication there is difficult but I think it could still be worth it to talk to Colonel Sia and expose your lava lab there you talked about it so you can talk to us on the chat or on the mail list just ask what you think about it and give you an extensive answer and ideas all right anybody else okay thanks everyone