 everybody thanks for coming I'm gonna be talking about the the thing that is kernel CI upstream kernel CI or something that is in place of it or kind of trying in place to be in place of it something that exists as sort of more of an idea than the actual fact but nevertheless something that is people want or trying that people are trying to do and I'm gonna be talking from the point of view of a CI system maintainer and the person who is developing a CI system for doing this kernel and participating in these discussions not from a point of view of a maintainer or a developer and the talk is targeted I guess as you would to people like me because that's that's what I see but I would like to hear opinions from other people looking from outside as well as maintainers and developers so do we have any maintainers here oh yeah oh great thanks thanks for coming so please there's gonna be quite a bit of time left for for a discussion so I'll be glad to hear what you think so I'm Nikolai Kondrashov I'm working as a software engineer at CKI project at Red Hat we're building a CI system for Red Hat internal use and also for running tests for upstream for maintainers who are interested in that thing I work with the links foundations kernel CI project on the KCI DB system which I'm going to be talking about a little later and I do electronics and embedded as a hobby and that's partially why I'm here apart from you know doing the kernel work I'm living in Finland but I was born in Russia so briefly short overview of the kernel testing systems what kernel CI and KCI DB are doing and then I'm gonna try to define the continuous integration just for the purpose of this talk and we will take a look at possible metrics how we can measure CI where the current where the kernel kernel CI is and what the hard limitations of that what the challenges are and what we can possibly do with that in my opinion so and these are slides that I put in all my presentations these days but there is a tone of a lot of CI systems which are trying to help test Linux kernel from all interested parties and this is just a just a sample of logos that we that we have there they all have their own report emails that they send to developers and they all have their own dashboards or something like that and kernel CI is the as I said is Linux foundations project which is trying to be the kernel CI kernel CI has their own labs and their own CI and the testing system and run tests on various hardware but also there is the case IDB part where we put together testing reports from various systems and this is the current state of who is sending data to case IDB and we're trying to build a system which would have which would aggregate those testing results and provide a single dashboard and single report to developers who are interested in this thing to you know to save time and to save effort for development and for interpretation of those results but conceptually it's very simple we just get some JSON from the from the submitters we put that put it into database displayed on dashboard and we look at what's coming in at the changes and we generate the notifications about the new results what's coming on we get about 30 300,000 test reports per day and maybe 10,000 builds for a bunch of revisions dashboard looks like this it's a Grafana prototype and we have reports which can look like this but I mean they are customizable so this one is aggregating results from four CI systems and displaying like the build results and then the test failures and the world status and gives you a link to see and see the dashboard and see the actual up-to-date and four results there so for our purpose purposes I'm gonna simplify this and but the main idea of CIS you all know is to test everything test every change as many changes as possible at every moment and provide feedback from that point of view we can define let's say four metrics for base metrics that we can measure a CI system by so how much functionality is tested which I know what kind of coverage we get then latency how how fast feedback is produced is it is it coming in you know after a few hours or later reliability how how you know how can you trust the results how much can you trust the results can you can you get the failure when it's actually a failure or a pass when it's actually a pass and finally how easy it is to understand feedback accessibility how easy how easily can you figure out what's actually broken where the bot you need to fix from that point of view the ideal CI is covering everything provides you instant feedback as you as you submit a change or as you create a change and it's always always true and it just says like what's broken so you don't have to figure it out and the worst I of course is not covering anything useful and takes forever and never tells you the right thing and you cannot understand what's the same so from that point of view it's it's worse than no CI so between those extremes so where we are with the current upstream kernel CI so some people try to measure coverage of their tests and because everybody is testing their own thing there is no single entity who controls the DCI there or knows all about it of course we don't have the results but I have heard rumors that some people tested and tried something and you know have some numbers but nobody really seems to know like this this is the usual lcove output page with coverage this is for run at the Red Hat CKI for AR64 and it could look like this that we had a bunch of tests and we ran them and it's been I cannot 12% code lines was covered but this is not the whole of code base this is just you know the most important things I suppose I don't remember exactly which directors we covered but it's quite a lot of them but in part because of course instrumentation in the kernel is kind of difficult and doesn't always work and makes the kernel much slower but it could look like this so nobody does really know nobody knows which tests we run where and and how much they cover well we at KCIDB know a little bit more because we get all those results we can take a look at them so looking at the mail list and this is just like you know unscientific and I didn't have time to you know to really scrape the mail list and see what's the actual situation because that's quite a lot of work but looking at that and taking a little bit of samples it comes out to be several hours just several hours which is quite good for kernel CI in some cases but it could also be a few weeks after after the change was posted of or you know on the mail list but it's faster of course for the for the merged commits where people pick up commits from a branch and the pre-merge CI is mostly non-existent so in those test results are quite unreliable and many CI systems and CI system maintainers they they do actual manual reviews of test results before sending them to maintainers and to developers because things go wrong quite often and accessibility it is it is quite good in places some some CI systems go like a long way to make it as as accessible as possible and the kernel maintainers are quite demanding that so that's why I guess but all those results they are all different all done in a different way so that makes it a little more difficult so hard limits of course we can only like since the kernel is an obstruction layer for hardware and to really test test that you need to test against all the hardware right so the natural limit for that is how much hardware we have we can test on and that's how much coverage we get of course that's that's not a problem for general functionality but for the hardware obstruction latency again and just just the hardware availability so how many how many tests you can run at the same time so the more hardware we have the faster it can go and reliability again is dependent on hardware how reliable the hardware is if hardware is not working reliably then it's hard to do reliable tests and of course the kernel itself but tests contribute to improving kernel reliability so that's great and accessibility finally again hardware availability because understanding what's happening without access to hardware which is often the case is quite hard so that's a natural limit and the kernel complexity also affects it a lot because the test can only be as simple as the kernel itself that it's testing okay so what are the challenges so the coverage is I think that there's quite a lot of people who want to write tests and a lot of them write them and there's there's a lot of tests in the wild and all the companies are writing tests so that's not a problem except that it's held held back by other difficulties that we have so which will go over in a moment so latency so the challenge with latency I think is that it is not safe to do pre-merge CI except for general functionality because anybody can post on the mail list and you don't want to just run anybody's code on your hardware directly you don't want them to start mining Bitcoin and you don't want them to wreck your hardware so this situation kind of prevents people from running more on real hardware in pre-merge CI even if it was you know more widespread and then the slow human reviews of course contribute to that because it has to go somebody has to go and take a look at it when they walk up next day and then they send the message and it takes a while so a big problem for reliability I think is the fact that the tests are often out of sync with the kernel kernel changes test change and issues come into the kernel tests start failing they keep failing while somebody is still fixing the kernel and they keep failing and they keep failing and the maintainers don't want to hear that they don't want to know like if they've seen somebody have seen that failure they don't want to hear about it they don't want to be said said that you know you submitted this change and failed while it's not their failure and of course developers the same everybody doesn't that nobody does want to be waste their time investigating a problem that they have nothing to do with accessibility is the is the I think the only problem with accessibility is that there is just so much different things different reports different dashboard so it's difficult to figure out and that's something that I think we can improve so and this this this is like just the basic challenges and they come together to give us more problems the low reliability and accessibility they lead to of course to low developer trust towards results so if you if the developer knows that that the test results often fail because of some other issue that they have nothing to do with and because and if those results are hard to understand because they don't have access to hardware and the report doesn't include some information on includes too much information their trust and interest for these results limits so as a result of that nobody wants to use those test results for gating right if if if you spend too much time investigating it and then it's you know it's not your failure like you're not gonna look at those to decide whether you can merge this or not as a result of that the test developers don't get feedback on the results and tests don't improve and the code is not improved because people don't look at test results so the whole improvement the whole feedback loop is starving because of this problem so high latency again contributes to to lack of gating because if if you have to wait for the result you know for for a week or two weeks that's kind of that kind of sucks so as a result in changes going without you know without consulting test results bugs get in they stay longer in the code base tests keep failing and again this contributes to more latency because you have to you know review more tests review more issues you know each time the test fails you have to okay this is this test it's failing because of that issue and the more issues there are the harder it is to try edge and slower it gets the more bugs get into the into the kernel tree and that brings high latency again so it's a vicious loop so this is a little diagram of how I think things affect each other it's a little complicated but I used this to kind of record what I think about this whole situation but we'll take a closer look at this later so if I had to summarize this I think that the this is the main thing that we can take out of there but enough of that it's my dog so that's enough gloom and I think we can go on to you know actually trying to to think what we can do with this or or what we can't do actually so the difference from like of kernel community and any open source community actually that it's not owned by single companies that you cannot really force people to do CI right in the company you can try to do this thing but like you get your tests just good enough and then you say like okay now we do gating and we fix the test afterwards and we start up the feedback loop and there's after a bit of stalling and you know fighting it starts up you're gonna do that here you have to you have to get the tests working perfectly to keep the developer trust high and keep you know thing going because without developer trust it's not gonna work so so what can we actually do and we'll be going over each metric some of the more important some of them are just just for for notice so for coverage obviously companies have the most hardware that we can use for testing right so if we attract more companies into testing into sending results it's gonna be better we will have more tests more coverage and hopefully that's gonna improve things so if you're a company and you have your own CI system and you want to contribute to kernel testing you can talk to me or just send a message to the mail list and we will set you up give you credentials and you can send your test results to KCIDB and help us get those results to developers if you have hardware that you want to contribute you can talk again to us and we can set up a lava lab for example and connect it to kernel CI and build submit tests on your hardware and the developers will get those results so in any case right to the mail list latency so of course it would be great to get pre-merge testing so to get less pub less less time to limit the time the box stay in public order just eliminate that as much as possible and to improve the speed of the loop to shorten the feedback loop so many people were trying to do those things and there's many approaches to do pre-merge testing and some people some people use patchwork systems to pick up patches from the mail list and then you know test them and then submit the results to patchwork and patchwork has facilities for that and and and that's working the only thing again is the authentication the patches that are sent there can be from anybody so you have to be careful how to run those another thing I think is a potential avenue is that is that there is about 50 repositories in the maintainers file that are on either on github or on a github instance that that allows us to to set up authentication actually for those patches and to connect the CI system so one idea that kernel CI is exploring is for example offering a github action that submits your patch to kernel CI and then gives you a check mark or red cross on your merge request and the the main benefit of that is that you can actually get access to real hardware for testing and in your pre-merge workflow in your contribution workflow so for that reason if that thing works we can we can then talk to more maintainers and encourage them to use those Git for just github CI and I know this is controversial and discussed to death in the community but this is this is one thing that some people can be open to as we can see that some a few let's say a few trees are actually doing the merge request and pull request workflow and the in this case as a selling point the CI integration can be actually a set you know a thing so to get that started I think of course the CI systems need to talk to maintainers and one step at a time and we can we can for example talk first first thing that actually many many systems to do in this offering to test a test a staging branch in the maintainers repository where the maintainers can push changes that they want to test and that gives us authentication and gets the tests running it's not pre-merge but it's better than nothing and it's and it can establish the trust and prove that your tests are viable and stable and good enough to go to the next step so after this we could you can like do you can pick a few tests that you can start with that are stable and for example setup okay what was I talking here about yeah so the thing is that you need to start with the test that they can trust but case IDB can be a part of that of course setting up the subscriptions because case IDB allows the the user to subscribe to particular tests particular branches combine those architectures whatever you want to start with to limit the set of data that that you get notifications on and that you get receive results for to just the test that you can you know you can start working with so that could be one thing where you can start as a maintainer or the CI system developer if you want to sell your CI results so there's there's this thing that is happening across systems is that of course there's manual reviews but there's just so many so many tests and so many failures that it's it's inefficient so many systems are setting up triaging and looking at the test results where they automatically can determine if there is an issue in a test result we are building in case IDB right now a triaging system like that but already for example graphics CI has a system where they can and there are various parameters of the tests and the conditions where the test executed and the actual files where to look for for the issue and regular expressions where to match for example if there's a string in a test output or in a console log then that means this is this issue and then we don't raise this test failure the same thing is actually happening at Red Hat this with CKI there's a UI describing an issue linking into a bug report and they the best the best system is at Google sees Google sees bot where they able to actually identify if a kernel crash is the same as the one that is they saw before based on a string that they also extract automatically from from the previous one then they're able to combine those crashes into single thing and not alert the maintainers or developers about this issue again and at the same time provide samples of those crashes so so we are building thing like that in case IDB to let CI systems actually who have this kind of data about more issues let them submit that information to case IDB so that results from other systems can also contribute and that we can finally get something green out of there because at the moment basically every revision that we get in case IDB is read because at least one build usually tens of them failed multiple tests failed this way or another so as is it's basically impossible to send those results to maintainers because there's always a failure even a single say system has a hard time managing the output so that so that it's you know usable for somebody and not failing all the time but when we combine all of these you get you get read all the time so reliability and all this is going to be controversial as well so I think that that a good good good step could be moving some of the tests more tests into the kernel tree that would help us keep the both the kernel and the and the tests in sync that's that's how it's normally done in CI right so that when you submit something for a change you can also fix the test at the same time and then we have less desynchronization and I think that LTP could be one of those tests to start with and of course we would need to integrate this into kernel documentation and you know to tell people that this is the test that you can you can run and make it you know kind of more official there's of course the problem that when you have that test there in that in that particular branch like for example in mainline it kind of gets bound to them to that to the state of that mainline and we have the situation right now where LTP is being executed against mainline stable all other branches and LTP has to actually handle all that and has to know which branches actually work where is the where is the issue how to work with this branch how to work with that branch so when you integrate it into the kernel tree it's kind of like you don't no longer need that and that helps simplify the test code a lot of course but then you can no long longer test all the branches and to handle that you would have to backward LTP to those branches as well but that's possible and that makes actually things simpler I think in some way so and of course if you want that actually work and to have an effect you have to you have to exercise those tests more often to help keep them in sync so those tests that are in tree need to be prioritized execute it more often than faster and closer to the actual change so accessibility again I think that KCIDB can help and you can subscribe to those notifications and you can give us feedback and you can help us make them better there's a link to guides how to start development so this is the complete picture I think how things could work out and how we could affect this whole situation and that's all thank you all right any comments questions ideas we've got one question from the stream maybe with that Lucas asks have you consider considered ignoring tests that are known to fail they've done that in their company and it really helps them start their CI up of course all the systems have a facility for that some less developed some more developed and that's what the known issues detection does for example at CKI and Intel graphics CI and other systems and that's what we are building in case IDB there is there are interesting challenges in that and I can talk about that more but I guess it will be a bit long for the remaining time many more questions there was a hand yes so I have one comment about your slide move tests in tree noticing with the code I fully agree with that with that statement I've yeah I fully understand and I will explain to you why I was the primary architect of a neural simulator long time ago we had about 20,000 tests in total the tests were specified in a declarative format YAML was a possibility for instance that was those tests were always part of the code and we converted the YAML specifications of the tests into HTML which was then picked up by documentation writers so this gave us a good guarantee that the tests were in sync with the code and that documentation was documenting what works and it gave gave us also a very nice structure for conversation between developers and documentation writers including tutorials by the way so it's just a comment that I had something that put up in my mind when I saw that slide thank you thank you yes yes that these are these are absolutely there's a lot of benefits of doing that and kernel of course has k-unit and case of tests in tree already but this these are just a small set of tests that everybody wants to run so there's much more tests that can go in there hello hi I'm the maintainer of the media subsystem and we also have some demands and one of the things that I'm also doing while at Intel I'm working on the two set that are testing GPU drivers and we had a lot of documentation there and I actually developed a new new way to document things so maybe we can try to sync up and maybe I can we can help together in order for both increasing the coverage on media drivers and also that documentation issue which is something that it should be really bad the best commented then what we have right now those days of course thank you of course and I also have one question right now you are actually using what do you should run the test I mean it is just such it is just you can see our database where you are collecting data from all the tests on dangers or whatever or do you also have your own testing environment where you are running the test directly so the kernel CI project thank you for the question the kernel CI project itself has a CI system there is the called kernel CI native where we run LTP and other tests for where it is interested so there's a lot of that and as well we have the case IDB system where we put those results plus results from Intel Microsoft ARM Google and others so there's both yes Tim so really like the analysis of what the deficiencies are and kind of where the gaps are it's not that was really good one of the things I thought was very interesting you know hard access to hardware is always a problem when you're a contributor right because you don't know if you're breaking someone else's hardware and so I always thought it would be really neat if there was a way to as a contributor request testing on specific hardware but you have this chicken and egg problem where people aren't gonna make their boards available to third parties for for random tests you know until they'd really trust kind of the ecosystem is not going to break their stuff right a lot of the boards you want to test on our proprietary behind company firewalls at least in my space and so you know we're not gonna let some random code come in and test our stuff but it'd be really nice as a maintainer or as a contributor if you could say well I want this to run on like a bunch of different hardware or maybe a bunch of different GPUs or something and so setting up some kind of ecosystem where that's accessible on demand it would be really good I think kernel CI has kind of the seeds of that where someone can set up a lava lab and take job requests but to flesh that out I think would be really nice yes absolutely and I think that the key thing for that is really the authentication of this of the request okay like you're not gonna get access to NDA hardware right but and the test results on NDA hardware not gonna go out in any case but it's possible to set up a system where authenticated users like certain authenticated users can request this kind of stuff like say specify like I want this hardware and I want this test to run there for for this change and that change can include the change to the test that you actually want to run and this way you get to exercise the actual you know functionality you want to test on that hardware and get for just like it lap and get how provide facilities for that you can put you know stuff in your commit message that would affect which tests are actually executed for example there are other ways like you know having bots who go and read your comment and take commands from there that execute that that's a little more involved but these things with with authentication and with some sort of platform underneath it these things are possible I have a comment about the reliability and I recommended read it's a blog post by Netflix on how they track performance regressions because normally you have a threshold and you're not allowed to go over the threshold and a lot of the times you might have a fluke that temporarily go above the threshold and then you increase the threshold and then over time you suddenly have a regression so what Netflix does is they look at multiple test results in history and then they look at the derivative are we going the wrong direction and also look if they can also determine if this was just one fluke in a series of tests that's a recommended read it's called the fixing performance regressions before they happen yeah that's that's a good idea yes and that's that's what I have in plans for case adb but I am the main contributor I have one in turn right now but I don't have time to actually contribute myself because I'm busy with red hat stuff but I would love to see more people contributing to case adb and I'll be working on adding more documentation for developers so that's easier but really like if you have an idea come over and like I have these plans absolutely like an idea is how to do that and there are very interesting challenges that I can talk about so there's could be a lot of fun to be had any more questions comments okay awesome thanks everybody