 Okay, well thanks everyone for coming. I was not expecting that many people, but I mean this is definitely a topic I'm pretty passionate about, improving the quality of Linux and making more people use upstream rather than having their own forks, you know, older versions or whatever. I don't like this approach so that's why I'm very passionate about this topic. So I'm Martin Perez. I work at the open-source graphics center in Finland from Intel and I've been working on the Intel Graphics CI almost since its inception so that's about two and a half years ago. And as the title says we're gonna see how we can do validation the Linux way and first we need to know what the Linux way actually is. So that's why we're gonna first go through the unique development model of Linux and then how the regressions get in and how to prevent that and we'll use Intel Graphics CI as a case study because it really is different from the other solutions and then quick conclusion. So some numbers here. Not everything is important but what you can see is that there's a lot of drivers. A lot of drivers, a lot of developers and it's one single tree. So the crazy thing is that there's a new version coming every basically two to three months with 14,000 comets there and an average of eight comets per hour in average. So the traditional way of doing validation here is definitely not gonna scale. So what do we have? So there's no architect which is one of the feature of Linux so it's this bazaar type of development rather than the cathedral which has been a very successful model. So yeah there's no architects but there are rules. So if you make a user visible regression the code has to be reverted. That's the first thing. Second one especially that is true especially for the DRM subsystem. If you make a kernel feature without user space that matches it to actually use the feature then it's not gonna be accepted. So I think this is the reason why the Linux went from being a niche operating system. I mean come on I just started by a student making something announcing it on a main list somewhere. Hey look I made something and then someone took it and started adding stuff to it without regressing the feature set so that made it more interesting to more people and as contributions come in it increases the scope and or the user base basically. So I think this is why it has been so successful. This strictly improving every version. But however in practice you know regressions happen I'm sure you all had this and this is why a lot of companies just freeze the kernel and say no we don't update it because this is too scary to update this. So we validated it throughout our processes that took months and yeah this is it. So that's why your phone is running terribly old kernels and I think this dilutes the Linux development so if everyone could be working on the latest version that would be beneficial. So that's why I would like to fix it. So how do we prevent regressions? So as I was hinting before Linux is a validation nightmare. So one code base with a lot of sharing between drivers code sharing and then the release calences just insane two to three months. Developers well they usually only test on the particular machine they have at hand if they have two it's like yay got to but come on it doesn't it doesn't it's not representative of all the systems that are running Linux what doesn't run Linux anyway. So because of this because we have so few unit tests inside the kernel so that means that even the part that are hardware independent are not actually properly tested either it's yeah it's a big problem and then yeah we don't have that many self-test these new tests that are inside the Linux kernel. I didn't know exactly how to count them so I found something like 600 so I rounded it up to 1000 because well I mean the scale just shows that one test per driver like yay that's quality. So yeah so why traditional QA cannot really work well there's too many hardware and software configurations it's just no way to know I mean even when you write code for Linux you don't even know how it's gonna be used by other people that's that's the beauty of Linux but it's also what makes it difficult and also something difficult to explain to marketing like why they should be investing more money in Linux development and upstream. It's always the same problem we don't know our users so that's why we need to make sure that at least as a user you need to make sure that there are tests that are actually capturing what you're doing and making it so that it can be executed automatically so you can more easily test for regressions but we're not there yet. So there is this then well two to three months release cadence well a lot of the validation cycles they take weeks even months so how can you test everything and by the time you actually validated something it's out of date so what's the bloody point so that's why people use yes stable I mean yeah companies usually are focusing stable trees and well what do we get then well Linux is basically tested by users so that's that's what it is but actually few users use it because well they don't want to break the system they also need to rely on their machine it's not just for fun that they are running in it so here is why we do get regressions so why do we need CI then well I think that the big value of Linux is that the cost of integration has been put on the person making the change so that means that if you change an interface and another driver is using it you need to make sure that this driver is not going to be broken so if we use the same logic and apply it to testing we can say we can provide a testing system that is going to tell you hey you are regressing or not and so this way we can make sure that if something is wrong you don't have to go and file a bug later on or something it is something that can be caught pre-merge and that means less time spent on bug fixing a few of bugs also either tree keeps working it scales well with the number of people as long as your testing system scales which is one of the challenge because well you know if you have 10 people pushing patches to the CI system it's one thing if you've got more 1,000 it's going to be another one so of course you need some decentralization here and I'm also saying that the test system needs to execute tests out of public otherwise if let's say that as a developer make a patch to a subsystem I don't know and the test the test system that tells me hey there's a regression here and so okay cool but how do I know what the test is doing and what do I do to fix it so that's why we need to have open source test suites and well the Linux development model is not everyone is working into the same tree everyone has their own fork of branches and then the patches flow towards linear store worlds so that means that when in your integration tree sometimes you need to do back merges you take the latest version of Linux and you put it back into your tree and when you do so you get thousands of commits and depends on how often you do it so what do you do when this happens because you can make make the traditional CI ways everything has to be green at all time and when it's not the world is broken so when you have a model like this when you get a lot of commits periodically you can't really have a model like this you need to handle well not you need to basically have the concept of known issues and and these known issues need to be filtered out from the testing reports so as the people can actually trust it and not just say it's not my problem it's someone else's problem so I can push my stuff and then actually it was their problem so we are so what do we need to have actually in the in the pre-merge reports for developers well there's quite a lot of things actually so at first you need all the information about the hardware so the machine you can get a reasonable output from DMI decode and kernel logs and and what is the display configuration I mean we do graphics so then you need the full logs of the execution of the test and the test that were happening before you need to push the the different versions that you tested so the the git version I mean two different trees so you can make sure that people can reproduce and if possible you should also push down the build artifacts the the actual compiled versions this way if your compiler was odd then can be checked so really what the is the objective here is only show the change the the impact of the changes made by the developer if you add more things it's just noise and it increases the chance that the developer is going to ignore your your reports so and this is because integration testing is very noisy by definition well you how many millions of lines of code are you exercising with just I don't know calling the suspend function for instance just writing one bit in a file right and boom you get not only the kernel but all the I mean the like the x86 part but also all the drivers that need to go to suspend and then after this you've got the bias so that's yeah that's insane and then the hardware of course can be flaky and not come up or the ATC clock can be broken so yeah that's what it is so it is extremely noisy so we need to find a way to to contain this noise and and say this is a known issue so we need to be able to take these known issues and label them and filter them out and in the end show also the list of components that change so if you change multiple things like your test suite and the kernel and then something else the bios version whatever it needs to tell you because otherwise it's difficult to keep track of things okay so how do we filter known issues so we need a tool for this and the tool needs to be able to take the so-called post merge results that means the executions that you do when it's not a pre-merge request and you need to create these signatures or filters that that well either you do it manually or automatically and these signatures need to be associated to bugs and this way you can have this place for communication saying okay well I understand what this is coming where this bug is coming from here is what we should be trying and so you have a form of discussion it's very nice and and then when you have this you're gonna get a shit ton of failures because every single failure that you have in your integration testing is gonna be a bug somewhere and so you need to prioritize and for this you need tools again that tell you what is the impact of a certain bug what is the reproduction rate and which machine it's seen on and things like this and of course if you can trigger an auto bisectom this is yep so this tool is actually existing and I've been working on it for quite some time and he finally got open-sourced this I mean last Friday I mean not this Friday won't be so I was expecting to open-sourced it for the last post them it just took freaking forever so this tool is called Seattle blog it's hosted in GitLab on free desktop.org so go and check it out and because I believe in dogfooding I made this tool and I wrote the issues myself and yeah I wrote more than 700 bug reports last year and that actually was just nine months not even a full year so yeah interesting days even right before the presentation I was filing bugs. Enjoy. But hey here is an example of a report that the CI system is giving so here what it says is at the top what is the different configuration so it goes from a CI DRM blah blah to patch work so that means post post merge to pre-merge comparison the comparison is a success no regressions found yet it had found known issues but they have been categorized you can see that there are changes in IGT IGT is the test suite that we use and we have so something went from past to incomplete that incomplete means the machine died and here is the bug associated with it and then this one here chameleon on KB leg pass to fail and then you also have the other way around sometimes it go from well failing to passing so that means that your patch series may actually fix something so that's very interesting thing to know here it says from one to pass okay I didn't even have time to file a bug for this one apparently and here this particular failure actually matched not only one bug but two bugs just for fun and and it was seen in two different tests that's what this plus one means here because sometimes you've got you know like 500 tests that that go from pass to fail and it's for the same reason so we're having an insanely long report is not helping then at the top you can see that there's links and this is marked down so as soon as we're gonna I mean if you want to render this as html then the links here will bring you straight to the URL there so it's easier to see then you can see which hosts have been there so there's only four hosts disappearing because they will be busy doing something else probably so you can see that they were not there so if you were expecting something to be tested on particular hardware then it's not gonna be seen and the only thing that changed here is Linux it went from this version to this version if you want to get this version you get it from there it's this particular Shah at this address and for IGT it's this one and for the patchwork it's this pre-merge testing since it was a Linux thing you can see here and you can fetch it from the same place and in the end there were only three comments that were added and you can see the three of them so that's the report on an example of it this is how you file a filter you have different tags on the left that means so for instance if you've got multiple trees Linux stable 4.14 or the latest integration tree here we only do the integration tree which is called DRM tip but at some point we're gonna expand and that's why we already have this feature here then you can select machines that are where it's visible so either you select the actual machine or you select a tag so that means for instance coffee lake is a generation of hardware we've got a lot of coffee lake machines so we just tag them as yeah it's a coffee lake machine and then it allows you to create filters like this that are a bit more broad and if there's a new coffee lake machine coming in well I don't have to change my filters because there's a shit ton of them so yeah and then you select the test that's that's see the problem of course everything is filterable so you can look for them more easily and then you select what is the status like pass fail crash whatever and you can see that we also run piglet so it just so the warn of piglet and the one of IGT is not the same so you want to say this one yep and then you have regular expressions that you can use on the studio to CDR and e-mask so that's the way we right now we have signatures I mean yeah we write the signatures so far it's been working pretty pretty well and when you type it tells you how many of the unknown values that have not been categorized yet are being matched by this so you can see in real time if you're screwing up your regular expression or not oh and also when you actually screw it up then it tells you what is the error and yesterday a new machine got added and that led to a lot of new failures that need to be categorized enjoy that's what I was doing then what I was saying is that you need a way to know how to prioritize things because there's too many bugs so you need to know how to prioritize them so there is this view that shows the pass rate of in in the history you can see that for instance here it went down because they're not run to the machine one machine started dying probably so it went it went a bit up and then down again and you can see up it's fine again it's going to this 98% something like this and the most hitting bug on these platforms is this one and you can see the hit rate is 0.73% and that meant 2,707 well whatever test out of the for the entire history selected here was 400,000 test executions and so of course I mean here it only shows the first two but there's 81 that were caught then because developers also need help knowing which bugs they need to look at when we have I mean we we have an SLA on the bugs so we say that we need to look at the bugs depending on the priority at a certain rate so for instance this one is a medium priority and that means that we have 60 days to comment on it and the deadline is actually oops in seven hours and 48 minutes so that means that someone has to have a look and then you can see who is actually involved there who is a user who is a developer and this way this this is how you know when it was last updated by a user or a developer so it helps you with the prioritization so yeah Intel Graphics CI so a quick comparison with other try other CI systems zero day is really mostly build testing it can do some other things but it's not geared towards this it's only anyway using some Intel servers and the result latency is really way too slow to be used as pretty much you cannot wait for the results and you're not even sure you're going to see them because if it passes you don't know that it passes it's only when it fails that you know that so then there's kind of CI post merge distributed build and boot testing very interesting project very nice part is really that is distributed the well the problem is that it's just boot testing mean come on it's not exactly screaming quality if you boot then there is yes no patches was it's not an actual instance it's just open source software but it's it's nice that is some there's something public that that you can use then Intel Graphics CI it builds it boots it runs IGT which includes a lot of suspend testing it execute Piglet on quite a few hosts it gets patches from the main list just like snow patch and sends automatic emails with the results that I've shown before and it's mostly open source it's not completely open source there are some things that anyway we are not we don't really like and we need to rework them so we're not gonna open source the bad parts but the good parts they are all out and 30 minutes for the basic tests and say up to six hours for the full results that's something that we really want to keep okay so our objectives really are to provide an accurate view of the the states and hardware hardware software combination and as we were saying before yet has to be transparent fast and visible and stable so that's that's what we are trying to do we here is just it's it's not really well it's just a lot of data is if you want to watch again the stream or or read the slides then here it is it's what we do but I guess what is important is the address here Intel graphics CI is your own dog okay so now what can we do we can collaborate on the infrastructure so there's a new community that got started at XDC after a testing workshop and this community really is aiming to make a distributed system like an LCI but then still having the the nice part about the world Intel graphics CI and making it as a community that could be used by all the drivers not just Intel so right now it's pretty much I mean public CI is this on the Intel in graphics for the next and so I think we need to change this yes so you can see the URL we just started with Jacobo you see yes that's nice and yeah this we're just working basically on the interfaces and and glossary for what we actually call what and this is gonna be how we are gonna create these this open source toolbox really that's what we want to achieve there's no single project that is gonna work for everyone so we want to have these components that you can mix and deploy your own system on your own hardware and but for this we need to have good interfaces that make sense and for this we need more people contributing the I know I know 15 is the name of the Intel driver for Linux so the infrastructure for it whatever is open source is can be found at this address can on GitLab for I GT you can so it's the test suite that is gonna be talked about by Eric in an hour so what you can do is write new tests and improve the test well the driver agnott agnott driver agnott sticks once or you can write your own driver specific ones so AMD has been doing it Broadcom also so it's not just Intel anymore and the funny part also is that we can write create hardware that is meant for testing camille Google has made the chameleon project that allows us to fake having screens connected then we can fake hot plugging or verify if encryption is working or HDMI CEC actually is is gonna be it's gonna be a talk about this and chameleon allows us to send these events as if someone had pressed the remote control so it's really very powerful tools and there's no way around this so we need to collaborate that's you okay conclusion CI makes upstream development easier faster and less buggy it's not almost started like that but not really so yeah that's it any questions I have a question for you one thing one thing is about I can afford because there are many more facts in the screen part when you output the graphic to the screen the color whether it's a normal whether we can output user for resolution of a screen this tool cannot tell the result after that the weather screen output is the normal isn't it it doesn't have the detecting of the actually result but just the sales test for code codex itself isn't it just a self paper line testing something like that not actually detect the result so okay it's hard to rephrase this question okay I think I'll just explain something and yeah so what basically you're describing is what the test suite can do or not I will mention a sample of Samsung out of all the part of parking later but I just say usually in our trip design company what usually test we do because you'll say that we've read our kernel is about every screen whether it's works normal and in different color format color can convince color space whether it will work but so these two only contest it's just like the sale for longing without tracking the actually result isn't it but it only actually result well of course this tool is doing exactly what you're saying it is just looking at the results and finding signatures well that was this when you write filters that's what it does here but you will use a sensor to watch the actual result what you're saying is just can your test we do something or not and yes we do it's a separate question here we are talking only about the tools to keep track of the results and curate it and it's not just for graphics it can be used for anything well I'll be in the whole way if you want to ask me anything