 My name is Nikolay Kondrashov. I'm SPB Nick on pre-node, Twitter, GitHub, GitLab, everywhere else. I'm a software engineer at Red Hat and I work on the CGI project which stands for Continuous Kernel Integration Project where we are testing our kernels, Red Hat kernels as well as upstream kernels. I'm also working with the Kernel CI project which recently became a part of Linux Foundation and I'm a KCI DB maintainer and developer. I'm also a DigiMent maintainer on a break right now where I'm working on writing drivers for graphics tablets. I do electronics and embedded as a hobby in free time and I'm from Russia originally but living in Finland right now. I'm going very fast because we only have 20 minutes. So open the slides. If you would like to click the links there are going to be some interesting ones. So as most of you probably know, there is a ton of kernel testing systems out there. These are just a few of them and there are many more and they're all having their own dashboards maintained and presented to developers and maintainers and they are all sending their own reports to the developers and maintainers. And this is difficult to manage for the recipients and also it's a lot of wasted effort. So about two years ago, well, actually, no. Well, 2019, yeah, it was in autumn. We were at Linux Plumbers and after that we had a sort of a hack fest organized by CKI where lots of people from different companies came and we discussed all things kernel testing and how to collaborate, what we can do to improve the situation and how to make it less burdensome for developers and maintainers. And right there we came up with the idea to come to start at the end of the pipeline and try to join them, namely at the reporting stage where we would try to get everyone's testing results and put them into a single database so that we can have a single dashboard and a single notification message coming out to developers and with the results. So at that moment, KCI DB was born and we started hacking on it right away. And one of the primary decisions was to start gating data as soon as possible, not thinking so much about what the schema should be or how to express everything that we want ahead of time. So get the data first, whatever data we have then see what patterns emerge, formalize it, make the schema and then GP. And we are currently on our third version of the schema and there is no end on site. You've probably all seen this cartoon, this comic from KCI DB discussing, I'm sorry, my dog is making some noise. So probably seen this comic that talks about the proliferation of standards. And there's a danger of that happening but actually KCI DB is the first standard to try to unify reporting for kernel tests. So we hope we'll be able to keep it at one but probably that's futile. Let's see. So the bigger picture of KernelCI, the project is that for a long time, KernelCI has been creating a testing system which other people can deploy in their hardware. And there is a number of labs that are using it and running tests when commits are made into kernel repositories. And this is mostly implemented as lava labs. But although there are also custom labs reporting their results and taking tests to run. Currently, KernelCI is starting to run actual tests like above the boot testing and planning to implement static checks. And this is all done in KernelCI backend. But since we started KCI DB, KernelCI is also sending data to KCI DB to the common database as well as a number of other testing systems including CTI where I'm working. Gentoo G KernelCI ARM and Google C's bot with others working on joining. Actually, in our attacks suite is almost ready to switch to production and others at various stages of interest or consideration and working with us. So the smaller picture of KCI DB is rather simple. Every submitter just generates JSON sends it to our message queue which gets delivered to the database which is currently BigQuery but we are not really tied to that very much. And we have a dashboard which accesses the database. Plus, there is a subscription system looking at what's coming in and sending out the notifications. So we have a bunch of command line tools. We have a Python 3 library a decline for submission and querying the database basic querying and normal submission. We have a prototype dashboard made with Grafana. We have a proof of concept email reports or notifications as I would call them sent to our development mail list which you can see there are links down here if you'd like. They're all available. You can see everything. That's linked here. There is also an alternative implementation being done at CKI for managing the data storing it and displaying it and the dashboard. And additionally CKI is using is basing their own internal communication about results on KCI DB schema as well. So we are stretching it in that direction as well. So the schema looks like just basically a JSON object with three arrays containing revisions, builds and tests respectively. And you can send them in any combination in any order first revision in the build and then some other revision and more builds for both of those and then some tests for the first revision and the same submission and then another one like adding some more. This doesn't matter how you send them but the data becomes like is processed and then becomes available in the dashboard only after you've provided all the parts of it and it's consistent. So we just use a basic system where for the most part the submitter provides the ID because this is a distributed database. We don't maintain our own IDs. We don't want to have the callbacks like I want to create an object, give me an ID and then I get given an ID and then I have to submit it with these ID, et cetera. No, we just let the submitters generate their own IDs give them a namespace with their submitter name and off they go. At the moment revisions are different because we use a commit hash plus the sometimes patch set hash to identify revisions but let's see how it goes later in the presentation. We also have a field for the origin who submitted the CS system which submitted the report and this is actually all the fields that are required for the report to be accepted and validated. You can start with just this to send your data and as you build up your interface to the KCI DB submission you can start adding other fields and then some more and some more as you feel comfortable. These are all of course not all the fields and speaking of that we also have a special MISC field standing for Missile NS for every object where you can put whatever you want. You can use that for debugging but also more importantly you can use that for stuffing data that you cannot put into other fields right now because they're not defined there is no field for your data. You can start pushing this data immediately and get it delivered to developers while in a rather not very readable state but still the data is there and more importantly we get to accumulate the data you generate so that we can make our decision on formalizing it and unifying it with others based on actual data instead of prolonged discussions. So look at all the details at this link here this is the current schema, schema V3 and going forward all the submissions that you send you can keep them at previous versions and they will be converted to the new version as we upgrade them automatically most of the time so you can keep using one version and move at your own pace. We have a database in BigQuery and the schema is defined in Python. We keep one data set in BigQuery per the IO schema and the current one is named Kernel CIO4. Our data set is open to anyone on the internet currently anyone authenticated with Google and you can go to BigQuery console like by clicking this link for example and explore the database schema as well as query various data from the database like the revision statistics, the build statistics or the test statistics and you can also use tools like Google Data Studio to make your graphs or whatever you want to whatever you want to dig our database for this is the graphs for the queries that you saw on the previous screens. So here's an example how data goes from from a source to our database. For example, here's a revision published on kernel.org. Here's this revision, the same revision on Kernel CI dashboard and in the Kernel CI system native tests running there and here's the same revision at CTI dashboard and here's an example of the JSON that's generated by Red Hat, in this case CTI and here's both of those revisions and their builds in the dashboard. Here's the Red Hat builds and there's the Kernel CI builds they all joined here, the results. Another example is a build that's been made by Kernel CI and here's the JSON for it and here's the dashboard with the build and the tests that were running for this build and at last the test example again and Kernel CI native and there is the JSON for it and the dashboard view for it and the same, yeah, it's another test from CTI also the dashboard, JSON and dashboard for KCI DB again. So of course the main purpose of this exercise is to get to bring the data together so we have to work on correlation we have some something done in that direction but we will need to work some more, of course there's lots to do. So far we have revision IDs that are constructed from the commit hash or commit hash plus a hash of the patches that were put on top like for example, distribution patches or patches that were downloaded from a mail list archive or something like that, it looks like this. Here's an example of how you can get a patch set hash generated using shell or patch set hash generated using shell or Python. It's rather simple, of course it doesn't work for all the cases and we'll be working on formalizing it more but this is good for the start. We also maintain proof of concept test catalog. For example, here is the case of test entry. We just asked you to define the name for this test that we can all use so that we can correlate the results. Yeah, you define the description for this test and the home page so that we can make sure that we are talking about the same test. And in JSON, this name is used in the tests like this and the dashboard is visible here. Further on, you don't have to limit yourself to just the name of the test suite. You can also specify a specific test and report them all separately like ARM does for LTP results. They go all the way to all the details about LTP test runs and you can report your details this way. You can also describe your environment but at the moment this is not formalized. There's just a description field like human readable description and the miscellaneous field again where you can put your stuff. We still have to get there. Describing the environments and details like the CPU RAM and everything. So the subscription system is basically at the moment is Python modules that you create when you want to subscribe with our help, of course. You just write the Python module which gets an object that is incoming into the database or was changed in the database and you can access its fields and filter by for example, repository, your branch in that repository, particular tests or architecture, whatever you want and then decide whether you want a message about this or not and where to send it. Subscription for, proof of concept subscription for stable repositories looks like this. This is a complete subscription at the moment and a message can look like this. For example, here's the total overview of the status and their vision whether managed to submerge it or not, the details and their vision where it came from, overview of the builds, here are the kernel CI builds and here are the CKI builds and the same message. We see for example, the tests from kernel CI, a bunch of them, there's many more and a bunch of tests from CKI for the same revision. So to submit the data to KCI DB, you can use a command line tool, just pipe JSON into it and it should match the scheme, of course, or you can use Python interface, which is almost equally as easy. Here is the screenshot of complete source for the current kernel CI interface to KCI DB that comes up at 265 lines. Well, there's of course hooks in other places, but this is the meet part. Here's an example of Gen2G kernel CI interface like under 200 lines of code, Python and shell and another one written in go under 200 lines as well. For the most part, there is a separate file with the schema, but it's just technical details, it's generated. This case talks directly to the Google Cloud using the message queue interface. So that is also available if you have particular requirements, although it's of course less stable regarding the changes, future changes. We have a submission how-to, which you can take a look at. And when you start submitting the data, we give you credentials and parameters to submit into a special place called Playground. Basically, you can do anything you want there without a fear of breaking anything and experiment with sending data. Here's the day screenshot of TAX Suite sending their data there and I think they're ready to go to production, just need to do the switch. So you get a place to play freely and experiment and see how your system goes, submit automatically, manually, whatever you want. All of that, of the above was about the current release KCA-DBV8 and for V9, we're gonna be doing many changes, most of them concerning notification system and our primary target for that is reaching actual developers and maintainers who are doing most of our work towards that. Among the changes is the, we're naming revision object to checkout to represent the data better because it matches better what we are sending. It's revision was not granular enough and checkout matches better what we are doing. So when you check out the kernel, you can submit an object for that, like when you reference from the build, what you actually build. Then we are adding support for log excerpts so that just a piece of the log that exposes the problem about the build or checkout or a test and right now we just have log URL which you can use to link to your log hosted somewhere that doesn't work for everybody. And so we're adding log excerpt fields to store just a particular piece of the log that is relevant to the issue that will come in useful for Intel Zero Day and also this way we can directly paste the log excerpt into the reports or the dashboard if you had space. Additionally, right now our database backend is Google BigQuery but we've also added the SQLite backend for testing and also implementing some mostly testing features and that will let you experiment without having access to Google BigQuery or being offline or things like that. We are expanding our object relational mapping for the database so that our subscriptions can also explore more data and be more flexible and present more readable reports and notifications. We are adding the ability to extract the denormalized data from the reports. We're using distributed SQL database and that means we're using denormalized schema. So this way we can extract the data, more data from it and be more flexible. Gets in a little smarter with the notification system. So that release is gonna be hopefully in the spring, this spring and further on, we're thinking about implementing known issues so that we don't send reports of test failures that are already known and triaged and shouldn't bother developers about and the same with the builds, hopefully. We would like to accommodate the static checks and the interest as well, for that we will need a separate table and objects. For the wrong, the talking back to them, going back to the environments, we'll need to somehow be able to correlate between similar environments in the database so that we can say like, okay, did this test in this similar environment produce the same results or not? And we have some ideas how to implement that but nothing concrete for now. Finally, we would like to do bisections and benchmarks of course, who wouldn't? And here are our main reports in the KernelCI project on GitHub, you can click the link to find them. There are of course other reports if you're interested in KernelCI. We have good first issues tagged there so if you're interested to help, we'd be more than glad. So go there and check out the issues. We have a mail list. You can send it directly or subscribe. And of course we have an IRC channel on FreeNode and that's it. Thank you. Oh, actually there is one from Rado. Thank you. For what are you going to use Levenstein distance? Yeah, it's just a wild idea. So if I understand it correctly, the Levenstein distance is used to describe similarity between two texts, let's say. So you have two texts and you can apply the Levenstein distance function on that and it will give you a certain number which says how similar they are. That's my understanding. So we have to compare our environments somehow. So my idea was like, okay, maybe we can just dump JSON for both of the environments and do Levenstein distance on those and be able to say how close they are. That's just a wild idea. Maybe there is some certain function which works for arbitrary data structures like that or maybe we can just do that. It's just as I described, something like that. We need something to be able to compare those environments and I don't think we'll be able to keep the structure of those like the data structure the same and unmoving and uniform across all the submitters because some submitters have some data, some other submitters have other data. It's like, it's always different. So we have to come up with some fluid methods to compare those roughly. It doesn't have to be perfect. Just give you a sense of how similar the test environments. Actually, there is a last minus question from Santiago. Are there any tooling kernel CI core to submit data from a local kernel CI instance to KCI DB? You mean you have your own independent kernel CI instance, right? Well, I assume that's, yeah, okay. So the support for sending to KCI DB I think is merged. So if you'd like to send your data from your local instance, you just need to get the credentials and figure out how to set it up. So I suppose it should work. We can ask Guillaume and at kernel CI and jump in the channel and we can figure it out. We can give you credentials for playground to try it out and see how it goes.