 Hi, thanks for joining us at the Automotive Linux Summit 2021. We're here to talk today about how to build out sustainable platforms, and in particular how we can drive a wider adoption of testing QA and CI throughout upstream open source projects, so we can really drive the adoption of open source and get what vendors actually distribute to users to be much, much closer to the actual projects themselves. So my name is Daniel Stone, I'm the graphics lead at Calabra, covering projects such as Mesa, Wayland, Western and the free desktop.org ecosystem. Hello everyone, my name is Gio Tucker, and I also work at Calabra. I've been leading the CanalCI project in general, and also I've been working on it as part of Calabra for the past three, four years. I'm currently also chair of the advisory board for the Linux Foundation project. So today we're going to cover a few areas. In particular we're going to start with the existing ecosystem that we have with upstream open source projects, which projects are interesting to us. The challenges we've faced as we've driven heavily the adoption of these more rigorous testing and CI and QA procedures throughout them. The results of those efforts as well. The testing frameworks we've been able to build out in each project, the results we've had from those, and what we've learned along the way really in terms of things like process, adoption, socializing, and then finally how we can build out from the slightly more siloed testing frameworks that we have at the moment to having something a little bit more coherent and shared between all the different projects. The main challenge we have is the sort of disconnect between the two different models. In a traditional product development model, the products have really worked on in almost a waterfall fashion where they're fully tested at each step along the way. Everyone has a clear idea of what the goals and the metrics and the acceptance criteria are and there are various sort of gating processes along that. If you compare to traditional open source projects, it's been very much at a commons project with not necessarily a shared vision or a shared set of priorities or even the ability to break those kind of deadlocks and the disagreements and try and enforce some kind of priorities. So testing of upstream work often fell into a bit of a gap where it wasn't really native to the projects themselves and the users didn't necessarily see the need for upstream testing because they already had all of their own testing on downstream branches and what they shipped and their own QA departments that they would almost take over but with all of the cost falling on November and none of the benefit being delivered to all of the other users of the upstream projects. But we do believe it's possible to bridge this gap so yeah today we'll be talking about a lot of the work we've done throughout various projects to bring this more native testing and CI and just take it into the process for all of these projects. So the projects we've been working with include the Linux kernel itself, Mesa, which is the de facto standard for open source graphics drivers and acceleration, Wayland and Weston, the again de facto standard window and display system under Linux and GStreamer for multimedia support as well. So starting with a Linux kernel we'll go through how the kernel development workflow typically works. So first of all you have developers sending patches via mailing lists then maintainers apply patches after some review on their own branches on their own git trees and then some of them might have a branch that they share to be tested with Linux next or with various CI systems and eventually the maintainer branch will get merged into another maintainer branch until it gets merged into the installments tree which can take up to three months. There's a merge window every three months for merging all the new changes. So in the workflow I just explained I didn't mention testing anywhere so that depends on how it depends on how each subsystem actually functions. Some subsystems will do all that testing directly when people submit some changes. Other subsystems rely on the maintainers to run their own tests so they would have their own manual workflows or maybe some automated workflows but none of that is really systemic. So you have a collection of test systems available that will test a mainline kernel and a variety of git branches available. So one of them is kernel CI which has now become a project of the Linux foundation as it has kind of been chosen as the the main project for testing the upstream Linux kernel. So it's focusing on what we call post merge testing so it's after a patch has been applied to a branch. So it's monitoring a number of git branches and as soon as it detects a new revision it will test it build it and send some reports. We can see that gradually more and more subsystems and maintainers are starting to engage with kernel CI and rely on the results that it's producing. On this slide you can see a diagram like a big picture diagram of how currently test-driven kernel development kind of looks like. So you have a crowd of people first it's like the ecosystem you have different types of developers you have OEMs you have also maintainers so they all contribute to the kernel source code itself via git branches. They also contribute to some tests so LTP, KSELF, TESK unit are the main examples for upstream oriented test streets and then kernel CI will build some kernels and also build the test streets and then run the tests against the kernels and share the results and then report the results via emails or a web dashboard to the developers and that's how the loop is closed. To understand a bit better what the developers actually need in order for kernel CI to be more widely used we've run a survey in 2020 we called it the community survey there's a blog post available on kernel CI.org website if you want to read the whole report but here there's a small summary of the main takeaways from that survey so we need to ideally we would need to test patches before they get applied so like pre-merge if you want to call it like this because then you get really short feedbacks so when someone sends some patch on a mailing list you could get to reply really quickly whether it's breaking anything or not that's really important as you know based on the results from the survey and then for things that are run post-merge it seems like what makes a lot more sense is to run really long tests that maintainers don't have the time to run or things that are difficult to run by hand especially say on stable kernels when there's normally like one release per week it would be okay to have like tests that take 24 hours for example and then the third thing is improving the web dashboard so we have currently one dashboard on kernel CI.org it's been there for several years and it's showing the results but there's many things that could be done to really improve it so that more users would be using it and we're collecting user stories what we're kind of collecting feedback ideas and suggestions from you know anyone who would want to have you know what would be your ideal web dashboard and we're starting to derive from that a set of requirements to start really designing a better dashboard this is driven by the Linux Foundation kernel CI project at the moment we're hoping to see some concrete results in 2022 so kernel CI runs a number of well it builds some kernels and then runs some tests mostly functional tests initially it was doing only boot testing to very to check whether the platform would boot at all now we've started running more and more functional tests things like IGT to test what DRM KMS as well as some GPUs as well now and classic test suites like a linux test project LTP and K self test and K unit which come with the kernel source to itself so we're running about 15% of what LTP provides and K self test provides we're not really running K unit yet but that's coming soon and we've been working with a key unit maintainers to get this enabled in kernel CI so that's what what we call the native tests they are all orchestrated by kernel CI itself now in addition to these tests we can look at kernel CI from a functionality point of view so what does it do so first it monitors a number of trees there's about 100 good branches that are monitored from individual maintainers you know subsystems architecture subsystems and then you have bigger well mainline and linux next like integration branches and stable all the stable and long-term stable branches as well as some branches for member companies of the project like CIP and and we're starting to get Chrome OS kernels as well and one really interesting feature of kernel CI is the ability to track regressions when a test has been passing in previous revisions of a kernel and one day it starts failing for an individual specific test case as soon as it starts failing is detected as a regression and typically automatically there will be a bisection started for that which will try to find the commit between the last good revision and the first bad revision to understand which commit actually caused the problem and this is particularly useful on linux next where you have a lot of changes from one day to the next and thanks to this we're finding a lot of a lot of issues and reporting we can report the issues directly to the developers because if you know who you know if you know the author of the commit you can send the message to the author and related maintainers and people around the maintainers related to the code that was changed by the patch itself then another big aspect of kernel CI which is a bit more recent is KCI DB so this is only a database that's meant to collect results from any CI system that's running kernel tests so the native test like i explained in previous slide the native native tests are collected there but we're also collecting results from other test systems and if you have your own test system you can also you know anybody can submit test results there so the idea is to avoid to reduce duplication and in principle a new web dashboard will be showing this information which is like a superset of what you see right now on kernel CI.org by position to the native test you have like a non-kernel CI test of things that are run outside of kernel CI the tests that are not orchestrated directly by kernel CI such as zero day and syscaller fuzzing bot and red hat cki and several tools from linux as well there's linux kernel functional tests lkft and tux suite which is more like a service everybody could subscribe to it to build kernels and start they also start to support running tests and the result of all these and a few more actually from ARM and gen2 kernel CI and a few more all these results are currently being contributed to kci db actually it's not all of syscaller but some of because that's huge data set but that is some of the syscaller results are being contributed to kci db and it's that's growing there's a weekly reports on the kernel CI mailing list you can see to have status of all the different contributors mace is kind of an interesting contrast to this I think because it is all that's the de facto standard for for open source GPU drivers um on linux so we're talking open gl open gl es Vulkan everything you need for for both games accelerated desktops you name it um it's much more limited in scope than the kernel so of the drivers we have we have eight different hardware vendors um obviously all with their own you know uh big generational or or smaller generational bumps and then we also have layered and virtualized drivers in our software reference driver um it's a much smaller development community in mesa compared to the kernel um and these teams are often not directly supported by the hardware vendor it runs the entire spectrum from the hardware vendor has teams of people directly working on mesa and producing their drivers a first-class output um a bunch in the middle where the vendor will assist and support the development team but the development is being done externally to the hardware vendor right the way through to completely reverse engineered efforts where the vendor has no involvement at all so one challenge we've had is really in bringing mesa up from from this kind of scrappy underdog where you're happy that it works to um now where we have we've gone from you know one driver that's been conformant for the past few years up to several drivers having gone through the official Kronos Kronos conformance testing um and now it's been something we've really needed to um to back up with some some really extensive testing to make sure that we stay conformant um you know it's a really it's a really sort of hard one battle and and you know we don't want to be slipping back and we also don't want to lose the development velocity that we've been able to have within mesa um and this can be quite difficult because as mesa is relatively understaffed as a project compared to something like the kernel um the development community can tend to naturally silo a little bit um even though people will work on the core of mesa itself of course um often their first target is a particular driver and so their attention might be taken away by new hardware support or particular feature enablements or anything else which makes it difficult to have kind of a shared global overview of mesa as a whole rather than you know your your own uh driver world but luckily the one thing we have got um that's been kind of a gift is Kronos has over the past few years made its conformance test suites publicly available so it's no longer just Kronos members who can run the official OpenGL and Vulkan conformance test suites but they're available to the whole public and we're able to run those in in public and distribute those now which has really been a godsend um so having that large amount of API coverage for the official conformance testing is great and running those you know that that is the Kronos conformance process essentially is running those so you know where your driver stands if you're able to run those in addition to that we also have other test suites such as piglet which are kind of built from the reverse direction so the conformance test suites have been built out by um the API designers in parallel with the API being designed and it's really focused on that whereas piglet has just been incrementally built out by mesa developers who will find a bug realize that this could be particularly common or crippling or what have you um and then they'll put in a piglet test for that to make sure that that you don't progress um and yeah it's possible to to do this with um both actual hardware GPU drivers um but also it's possible completely possible to just test the uh reference software uh driver we have which has no hardware dependencies but we'll just run on any CPU uh with an LLVM backend so doing that is a really nice sort of little uh quiver in our bow I suppose to um be able to test the core of Mesa without needing dedicated hardware so the testing that we do have in Mesa um that covers several generations of all of AMD GPUs the Mali GPUs Broadcom's video core in the Raspberry Pi all of the Intel GPUs the Qualcomm Adreno that comes in their Snapdragon SoCs and also the Varysilicon or Vivante GPUs which tend to come in processors like NXP um and all of these have achieved at least for some hardware generations or some API versions official Kronos conformance so we're again we're very keen to to make sure that we keep that and we don't regress backwards so we do quite extensive testing of those um and the interesting contrast to the kernel I think is that we have a slightly more traditional for open source I suppose pre-merge testing process which is blocking so when you submit a merge request and it's being reviewed and it's good to go you assign it to a very cold and unfeeling bot who will go and run a ton of um tests and merge if they all succeed or tell you that something went wrong if any of them failed so in order to support that process without everything collapsing um we we want every merge pipeline to turn around in 15 to perhaps 20 minutes in extremis um but that has to cover running um you know some generations of GPU will run over 300,000 individual tests for every MR um between all the the different various test suites um so in order to do that we we had to build out a a custom test runner framework but it's not just um conformance tests that we run but we also have traces from from real life workloads captures from from games or desktop clients or any of these where we take the um GL and Vulkan command streams that they actually emit and we replay those um and make sure that the output isn't changing um or at least not not changing in a way that's seen as bad um because OpenGL isn't pixel precise nor is Vulkan so you might have minor differences here and there um we have some tools which allow us to visualize the differences and see what the change is to see if it's an acceptable change um yeah all of this has to come within this relatively short time window um and it's something that we have to get right essentially because you know rather than being a more advisory post merge thing where code gets pushed out into the wild and then later on you get an email telling you that it broke something um if a test fails then your merge request won't get merged um it turns out that having flaky tests which block people's MRs is a pretty good way to um get developers to tell you what they really feel about your test fleet and if we look at other projects um so Wayland and Western were very early adopters of of CI and um having GitLab on free desktop.org um but it for the longest time they didn't get too far beyond build testing um that's because one of the challenges we have in Wayland is the lack of an official universal conformance test suite um so we have all of the tests on the server side inside Western that we've written for ourselves as we've developed it but we don't have a similar target like we do with um with the Kronos APIs to be able to work towards and give us a yes no answer but even so um we're making use of we actually test Western by starting up a new virtual machine um with a clean known kernel and a virtual KMS driver which just simulates a display controller um and that gets us a lot of what we want um because we're able to to exercise a lot of different paths within our back ends and make sure that they run about as well as you can when you're you're working with a virtual driver rather than a real hardware driver um yeah that that's where we are now for for Wayland and Western the the back end testing testing things like rendering correctness and internal consistency um but these are these are really home-built tests. GStreamer on the other hand for multimedia um it's got a very well established um set of tests which have almost always been there for both for its um individual modules and also for end-to-end functional and integration testing so GST Validate is a suite which checks the modules and makes sure that they behave according to the GStreamer API contract so in isolation they look like they do what they should and then Surbro is a monster integration test suite um which does real end-to-end testing um put in through various workloads again that they've captured as they've gone on and places where they found bugs and then it's been added to the test suite to make sure that all of those corner cases um work and that's been a part of GStreamer upstream for a good couple of years now again concurrent with the the move to GitLab on free desktop.org um but this is all happening um as software-based testing so it will run in containers and virtual machines on general purpose hosts and we're not yet able to um to test how GStreamer behaves again with different um hardware drivers say for video for Linux or also for sound or um any of those other inflection points Now we've looked at um how the kernel is being tested upstream and how uh Graphics and Wayland and GStreamer are being tested upstream um we can start thinking about a general concept around what does it take to move on a pencil project to being really test-driven you can see like we said at the beginning um a commercial product on a fully integrated products are tested very thoroughly and they have control over their their own universe so there's no real bio in terms of people adopting it because it's a team working on it so they'll basically uh adopt the the workflow and the automated testing because that's just the way they do it and have complete freedom over how they do this. For an open source project you have contributors from many different um different horizons um so they might all have different views about how how to work also you could have um in a large project like Linux kernel different parts of the kernel will need different types of workflows some parts will be changing very quickly other parts will need to be really stable so in the same way that open source code is really coming from like built by the ground up so people contribute code and that's how it happens nobody's planning what's really going to happen for the next kernel in the next few years or even the next few months so it's in the same way that the changes are not really imposed of course things are being designed but nobody is like nobody has a master plan and decides exactly how things are going to unfold so it's the same thing for testing it basically like people send code and request for comments on the mailing list about code changes for testing people can provide tools and and suggest ways of doing things and as people see that there's value in it and that it's something that they can adopt and gradually it will be adopted um so basically this is what you can see this this happening already like we've explained with kernel ci when some maintainers are starting to engage and look at the results and maybe if kernel ci doesn't work they have their own or if it doesn't provide the results that they're looking for they have their own um manual test and they can still carry out like for stable kernels typically kernel ci would be sending results for each stable kernel uh so if the results are there and some problems were detected by kernel ci something will get done about it some people will try to fix them but if for some reason kernel ci disappeared uh the stable kernel will still be released it just maybe some bugs will not be known and they'll be found later that's currently how how things work um so maybe with time it's a bit like a clutch mechanism maybe after a while if we all spin at the same speed then we can really engage and then the test system will be working hand in hand with with the kernel uh so this is all this has already happened with miss ci and like um um like daniel just explained and of course sometimes you have a few sparkles in clutch systems so it's not always easy to get it completely right without any smoke coming out of it but that's the it's really a price worth paying basically because now you know when a change comes in it has to pass the tests and that's really where you want to be so like i've just explained you have some some tools available so kernel still is one of them for the kernel but also this uh zero day will be sending you emails um and sysbot will be doing this as well you know fuzzing uh syscalls in the kernel to try to try to find corner cases that nobody else has found before um and so these are available and of course they send results by email and if people don't like the emails that happen sometimes is like this report is not useful people will reply and then things will get adjusted and um yeah for miss uh missy and wailen and cheese trimmer um it maybe it's a little bit easier to uh to have things enforced it's a bit like maybe like a subsystem in the kernel if you have a small enough subsystem uh then it could operate in its own autonomy basically and then decide to to accept a test or workflow test driven workflow so that's kind of the step by step step by step process that we have to go through yes daniel i think that's a really good parallel i mean i always thought the problem with the kernel wasn't that it had no master plan but that it had like hundreds of them at any given time right whereas yeah just i think having that that smaller scope makes it much easier for us so now we can see some numbers to have an idea of the dimension of uh what is being done on the kernel side well at least from kernel ci point of view um i didn't i haven't put uh stats about bisections but every week there's one two or three bisections that lead to actual bug fixes and that's growing um so this is you know some metric that will be we'll be producing some stats at some point about that you can already see the number of tests being run so on linux next which has the biggest coverage uh there's about 12 000 individual test cases run uh every day because that's for every revision of linux next and that is growing as we keep adding new tests and new platforms and we have also new test labs joining that quite quickly increased the test coverage on the second graph you can see the kci dv number of builds so this is all the builds from from the native kernel ci builds but also here in this graph you will see builds submitted by um by lkft and uh all the other submitters to kci dv so tax suite and also cki from red hat and also some builds from arm so this is you know it's gradually getting to the point where you see the actual number of people testing the upstream linux kernel uh well this is for all all the revisions so we get about 20 000 builds so for any for each build you have maybe maybe a thousand two thousand tests so um we don't have all the tests yet in kci dv but that gives an idea of the size and of course maybe as we can see we can start seeing the results put together maybe some duplication will be removed after a while if we'll keep building the same kernels maybe we can reuse each other's kernels we can we'll see how that works how that will work out over time and similarly for mesa i mean you can see some um a nice shiny graph there um with i think a pretty interesting pattern in in the number of tests versus um the number of commits or the number of merge requests that were made to mesa um it's definitely an iterative story um essentially of building out test coverage as wide as we can and then you know for what that gives us um it turns out sometimes that's a bit overkill um so one of the traditional patterns is that we'd introduce um the first version of the first iteration of testing for a particular hardware generation um and we'd have a lot of those tests run um just to to shake it out and to get it completely stable and find out where all the issues are before we stepped it back so for example if you submit a merge request which only modifies the amd driver then it's not going to run any of the tests on uh panfrost for the arm marley or the fradrino driver for qualcom um because we know that there's going to be no impact whereas if you um if you submit a job which touches call code or submit a merge request which touches call code you can see up to 155 jobs um per merge request just uh just touching a full extent of everything we can test um and yeah they're not they're not entirely correlated um these graphs because they're they're somewhat independent um we do do manual test runs outside of the merge request context um sometimes you have core changes which are much more difficult and finicky so they require a fair few uh passes through the automated testing before they can be merged just because no one can have you know like 30 different generations of of gpu available to them on their desk um you can also see uh one particular spike which was one of the least fun months of my life uh when we had a lot of infrastructure issues on free desktop um not really related to the test system um but more about things like networking um where the tests were were so unreliable that we just had to keep running them over and over until they did um they did eventually pass um that one was was interesting that it definitely taught us some lessons about things like um make sure you have not only really good monitoring for false positives but something that's quite um dynamic and really easy to to modify uh so you can pick those up and you don't push the burden back to uh to developers to deal with themselves um one of the things we found out is that beyond a certain point of unreliability uh some developers will just smash retry every single time a test fails even if it's failing because their code doesn't compile um so there's definitely you know coming back to Geum's point about um being iterative and and building confidence um which kernel CI certainly does there's a lot of sort of shadow testing in the background that that you don't see um that's one thing we picked up from from Mesa as well is that it's really important to deliver the results as accurately as possible so people not only get confidence in it but they're also able to buy into it a bit more so you know testing becomes something that the whole community cares about rather than having developers and people who do test stuff um which is never a particularly nice dynamic to be in so yeah now we've talked about um what we've been doing in our individual projects um and a lot of the the challenges we've had there how we've resolved them you know the ways that um that we have brought testing to those projects one really big thing um that's coming up for us is it's bringing those all together um and having them be more integrated and and less siloed um so for example for all of our Mesa testing which we do on hardware um we pin a really specific kernel version it's the only way to make Mesa testing tractable because otherwise you know we'd just be subject to a million things which are outside of our control um but it's still really useful to have I mean kernel CI can test all of the kernel aspects but it would be really useful for maintainers to know that you know their changes broke actual running user space workloads like Mesa or like gstreamer for for media to code um and being able to to have that fast integrated feedback and this is true everywhere conversely western pins a version of Mesa and kernel CI pins known versions of um of user space components just to keep the whole stack tractable um so one thing we've been looking at and working on is being able to to share our workloads our definitions and the ways we parameterize them as well um so we can bring in more integrated testing um and be able to at least at least work on sort of fragments of of the tests so you know for example we couldn't we couldn't run the entire conformance test suites for every GPU for every kernel or a vision because there are just too many of them but what we can do is run a smaller more targeted subset and at least have a bit of confidence that things are roughly working as they should with with new kernel versions and that's something that helps us all you know it um it lets people know about regression sooner um it makes sure that those regressions don't hit actual released kernels or Mesa versions or anything but they're discovered before they can be found by users so just cuts out that that manual feedback loop um and one thing we've found with Mesa is having that testing um even in western with more limited testing there it's let us move a lot more fearlessly um we can be much less cautious about making sure that we don't break things um there's a lot less manual testing which takes up the developer's time but you can do your code review and be sure that something else is going to pick up the more visible aspects of correctness um and that that would be really great if you know the kernel would be able to move quicker without having to worry too much about Mesa and Mesa and Wayland didn't have to be terrified of upgrading the kernel because who knows what might break um so that's something that we're looking forward to having much more integration with as time passes and so if even if you're not participating in upstream open source projects this is still meaningful to you um because as we were saying at the beginning you know the the upstream qa and ci has only really been meaningful um to the upstream projects themselves because there's often been such a distance between those projects and the vendors in terms of the time to deploy new versions downstream customizations which get made um those often never find their way back to upstream because you know it's been so long and the code base has moved on anyway um and this is a real problem when we think about not just things like spectra and meltdown but the entire security landscape um you know with such complex and rapidly moving software that you have to keep on top of it to ship something secure to people so this is a huge benefit because the amount of testing that we do in the upstream projects these days gives so much more more assurance that the vendors are able to pull in much newer versions of upstream software much more quickly than they have done in the past um but this is obviously only possible if you have a much smaller delta of changes that you've made to existing upstream trees so for the first time there's a real incentive for for vendors to work with upstream both in terms of the changes they make and also in terms of helping us with the qa the ci and the testing that we do because everything done to upstream means you can ship better software um to your customers faster and it means that you can just have more and more assurance that what you're shipping is secure it's validated it's solid and it will do what you need it to do so that's our quick summary of um of the landscape we'll have some more details available in a blog on collaborate.com um with a lot more details on how these upstream projects work how to to get involved and and help out especially with the testing angles um but this is the thing that's here we're not talking about the old days where open source was a kind of wild west fund project um and you know as the the vendors who who brought some kind of rigour um to these projects it's now something that over time um the open source projects have been able to embrace as part of their their projects and their methodologies and that's only going to to increase all of these efforts that they're open to contributors kernel ci is a linux foundation project mesa ci is something primarily being worked on by colabra google legalia red hat valve all companies who are just interested in the long-term health of mesa similarly western g streamer the testing infrastructure and development is shared between the entire community the more we do this the more you can benefit from what we do upstream so please come get involved um don't be afraid help us out and we can ship better software to you so you can ship better software to your customers and with that thank you very much um we'll be here to to take any questions you might have and if you're interested in getting in touch our email addresses as well are available in the title side so thank you