 So after this video from the last talk, I want to try and take you into our addressing qualification of Linux. Last year we introduced a project that is still not completed on qualifying Linux to Safety Integrity Level 2. And one of the specific activities there is how to come up with a residual fault or expected fault rate in something as complex as a Linux kernel. Basically what I'm going to take you through is a short context of where this comes from. I'm not going to go into too much detail there, the way we run out of time. Then some of the problems, some of the issues that we have, how we might mitigate these issues and how we can use this for prediction. Because really what we're trying to do is not say Linux is safe. We're trying to say we assume Linux is safe and then we can do predictions and we can monitor these predictions. So let's get started with that. That should be context. So the context of the thing is what's called a Route 3S approach in 615.08 on related standards. What you are doing is you're taking a pre-existing piece of software that was not developed in compliance with a specific safety process. And you are assessing the non-compliant development against the objectives and the intent of the safety standard. So basically our primary claim here is Linux should be suitable or a Linux-based system should be suitable for safety-related systems at safety integrity level 2. The basic assumption that we are making here and the claim that we have to derive from that that we say there is a process. We're not saying it's a compliant process, we're just saying there is a process in place. And now we have to look at how to develop this evidence. Now why do we need this? It's because we have a black box process basically. The software that we're using, the element that we're trying to use is done. We cannot go back into the requirements phase of the Linux kernel and say, hey, well, the scheduler has some systematic fault, let's fix that. We just can't do that. We have it as is. So basically requirements designed up to the implementation is a black box project for us. And we have to say what properties does this process have, what failure rates or what level of systematic faults could be in there. And then we can do an assessment of process and come up with sort of an estimated number of faults in this process as a result of problems in the process. And then we just plug it into a relatively simple, simplistic viewpoint and say risk is just probability times severity. Now of course assessing the severity is not an easy thing because that depends heavily on the specific deployment case. And it also depends on a lot of assumption that you actually can do statistic approaches here. That means that we actually have a homogeneous population of bugs below that we're actually looking at. If the bugs themselves or the faults in the kernel are not of a homogeneous nature, then of course this would be invalid. So we have to look at that first. So systematic faults. So before we go into that, I have to sort of try and clarify a little bit of a contradiction here because software is considered to be faulty at a systematic level. Software doesn't have random faults in the sense that if we identify a fault in software, it is associated with a state of the machine and a specific input vector. And if we regenerate this state and present the same input vector, then the output will again be the same. Now if putting a machine into an identical state is technically possible is a different issue. But basically that's our assumption. That means a systematic fault does not have a rate. If your file system corrupts as soon as you create a file that's larger than 4 gigabyte, it will do it more or less every time. Maybe it's a locking problem, then it's not so super reproducible that you can sort of create an environment where it has a very high probability. And so we cannot just look at bugs and sort of look at their time development in the kernel and deduce a simple rate from that. But what we can do is we can look at what is the root cause of these faults and those are human actions. And if we then say, okay, our problem is not the software fault itself, what we're interested in is in the failure rate of the humans behind it. Then we can associate failure rate for each and every phase, be it requirements designed all the way to deployment maintenance. And with this approach, we can indirectly evaluate the failure rate of software and that's what we're basically going to be trying to do. Now, what's behind this is that we have a modified development life cycle for a safety-related system that is building on pre-existing software. Now, basically all of the functional safety standards do allow to do that in the one or other way, including automotive standards like 26262. The idea behind it is that we know that software elements can be very reliable because they have field experience or field data. And if we look at evolution of open source components, we also can see that there's a lot of code in the Linux kernel that has been in there for 12, 14, 15 years and is not being changed because it just showed that it's doing its job and it's working correctly. So there is a certain merit to recycling on well proven or well tested code, but of course it also is a dangerous thing because you are changing context when you build a new system. So this can backfire if you're not careful with it. But essentially it is possible to do this level of recycling. Now, this is the intention of allowing this route in the safety standards to take what industry has developed in previous projects or maybe even predating standardization and then continue to use it. We are kind of misusing this approach a little bit because we're applying it to basically the entire system components. So basically we're applying it to the kernel G-Lypsy, busy boxes, a runtime environment for safety applications and the generic libraries. Of course we're not applying it to the safety applications themselves. So if you write a Blinker controller or a hard rate monitoring system for medical device, that's not in scope of what we're doing because that's not a pre-existing component, that's a new component. But it has to be able to build on assured properties of the generic components. So we handle that as we go into regular system design, taking the use case and design reference missions into account, derive system requirements, flow that down into the system software architecture, which may be partitioning, maybe redundancies and high level mitigations. And then rather than going into the regular allocation to hardware and software, we then go into a selection process and say let's select pre-existing elements that can do the job. And from that we have to then derive the requirement that we're going to put on the safety applications, that it can be constraints, that they have to make certain assumptions or that we have to limit certain capabilities of the open source or the pre-existing element. And from that we then derive a limited configuration space. So we cannot permit, if we qualify something like the Linux kernel, then this is not going to mean that you can configure each and every driver in there. It will be a subset, there will be constraints on what you can do, at least in the initial run and probably for any safety related systems built on Linux. So that's where we're plugging in the selection process and this selection process now has to basically find some way of selecting elements that have suitable properties. Of course we can do it in a very qualitative way and say well the development process is good, so we'll just trust it, but that's kind of a dangerous thing to do. And the approach that we're trying to take, and I'm not saying that we can do it actually, we have good evidence up to now and our certification authority is happy with what we're doing but we are not done. So we want to quantify each and every attribute that we will put on this element and ideally of course then allow to come up with a conclusion of the overall properties. Now we look at the high level, there are of course Linux processes defined. Now the Linux process is of course not a compliant process and we look at how it was developed over time. Initially there was no full process specification for the Linux kernel development that was added a little bit more than a year ago. There were sort of odds and ends processes like submitting patches, checklists for patches, submitting drivers and of course there was kernel documentation more or less in sync with the development. But if we look at a lot of the details like design level and requirements level there was really no systematic process there. It was a very evolutionary process. On the other hand the development life cycle had a lot of mechanisms in place to prevent real crap from going into the kernel and it does not happen that often. Now aside from these high level process specifications and specific I would say working instructions like submitting patches. We of course have some very generic qualitative metrics in there like check patch checking, usual brain dead mistakes that developers make, static code checking like cut in L sparse, GCC plugins and some external tools like stance or blast that were used for the Linux kernel as well. So we have a lot of these tools in place. We have a relatively systematic development but of course we do not have this formal qualification driven approach that industry is used to using. Now how good that formal approach is is on a different page. It has its weaknesses as well and you can get a certified safety engineer in three days. If you just go to the right organization and pay a few thousand dollars for that three day training and then you go out with a certified safety engineer for automotive systems which is frankly a joke. But it's sort of one level of protection and what really is protecting us is not the individual measure but the overall set of measures from the qualification from the organizational requirements down to the code and sort of rules how these systems are deployed. So we have a subset of these required capabilities built into the Linux kernel process but of course it's not complete. So how good do we actually do? Let's start with a really trivial example. If you submit a patch to the Linux kernel that fixes a problem you should have a fixes tag on it. You don't have to but it's recommended. Basically the process says if you find this problem the origin of the problem by bisecting then the patch should carry a fixes tag and the fixes tag should carry 12 at the hash of the initializing bug initializing commit at a length of 12. Now this is a trivial requirement. It's not enforced in any way although check patch in the meantime does check it and flag it if it's not. But if you look at the distribution here and this is 4.4 to 4.4 13 kernels with a very short time frame you can see that while the intended 12 character hash is the most frequent we have quite a few others. So the failure rate of this non enforced requirement is actually quite high with about 17% or so. Now you could say well that has no impact on safety. Well it doesn't but the problem that we are trying to address is we are trying to find out the safety properties of the system by looking at attributes of development. And one attribute of development of course is how good do people follow rules that do make sense. So for the case where the hash gets really very short we could have theoretical collisions and actually end up in the wrong patch during failure analysis. But it's just one small indication so we can take very high level attributes and then evaluate how good are these processes followed. More interesting is to look at code. So let's take one first example. Of course any coding standard would expect that we have reasonable conditions. When we look at this code it doesn't have very reasonable conditions. If one unequals zero if one equals one and if you look at the switch statement it's very funny because it has an or one equals equals one in there so that will always be true. And sort of to make it really bad there's a comment on top of this piece of code that lists a whole bunch of seemingly variable names that don't appear in the code and there's no documentation at all. This is in Mainline Linux. I have no clue how it got in there. How this got by a subsystem maintainer. I contacted the author and asked him how did he get the idea to write such absurd code. And the explanation. Does anybody have an idea how this happens? Cut and paste would have been a guess. Yeah but it's not cut and paste. It's actually a Windows driver where they tried to turn off a feature and they didn't want to do it with if defs. So they created a variable or define in here and depending on if this define is set to Windows or Linux it's zero one. And then they have these conditional statements in here but after the preprocessor or code generator ran it replaced this by a zero. So if the statement would be saying if is Linux equals equals zero do this it would be understandable but this way of course it's not understandable. So that was just a code generator problem. It's not a very frequent problem but it's an existing problem. How is this discovered not by code review but by a set of static code checkers that I call my brain dead set. It's actually code checkers that I started writing when it was checking code from students. And so they just got infuriated by their really stupid mistakes sometimes. I started putting that in static code checker and then one day I got the idea run on the kernel and it was shocking what came out. So next one reasonable control flow. Else if else if else always doing the same thing. Actually the whole sequence was 56 lines of code that condensed down to two lines of code. So you had a whole else if else if hierarchy always doing the same thing. I did contact the author on this one as well and ask him how did that happen. I never got an answer. So I don't know what the root cause of it is but it's obviously not something that it would consider to be reasonable control flow. As we heard in the talk before you do expect some coding style coding guidelines. I'm not going to say the horrible word that started with MI and ends with SRA. It's a horrible standard but Linux kernel coding standard is quite reasonable. Obviously this is not very sound code. Again I don't care about this driver because it's some multimedia driver that we will not have in our configuration. But what I care about is that it's an indication of the review process of the intended procedural requirements for getting code into the mainline did not really work very good. I heard that explanation a few times but actually statistically staging is doing quite good. If you count the number of spouse warning cuts and arrow warnings, Ubisoft warnings and driver staging compared with some other drivers then driver staging actually is doing quite good. Greg is doing very good at getting this review, getting early reviews and fixing up this code. I know a number of network and Wi-Fi drivers that do much worse than staging. Yeah, but this has nothing to do with technical expertise. He had eight if-defs in a row always doing the same thing. That's true and you're right that the example here with staging might not be the best. I could have taken one that's not in staging as well. You can find some outside of staging. I didn't really pay attention to that. I just was taking one example from each category. But I would agree that staging should be treated differently and statistically we do treat it differently. For instance, we'll look at commit logs and commit counts and this. But fundamentally there's sort of a baseline where I think this ISO should be flagged quite fast or it should never have made it in. But I agree that we could discuss if staging is a valid case here. The other driver was not in staging as far as I know. What is the side effects? The next one was condition without side effects. Now this is not a staging driver. This is actually an extremely old driver. I tried to backtrack where this got in and it goes all the way back to the 2.0 or 2.2 kernel. I'm not sure but it's really very old code. And basically what is this code doing? It's just a simple way of encoding a retry. Where you just say well the invite might fail the first time because the device is not ready yet or whatever. So you just try it again and hope that the second time will work. The problem with this is that in many cases such retry loops introduce sort of entropy into the system where you can no longer analytically now say in which state a driver is because you don't know how many best transactions actually happened. Now in this specific case they actually said it's fine because it's basically proven in use. The CMD640 IDE driver is really extremely old driver. And there's probably no point in changing this because it just showed that it's working. But again it's not about saying this is wrong code at the technical level. Technically this might be an absolutely sound thing to do. It's the problem that we have to come up with arguments why such behavior that is flagged as being unsound and a lot of the standards is actually permitted in the code. So in this case it would perfectly find if this code would have carried a comment saying retry and don't fail just because the first invite didn't work or something. Just would have had a comment saying retry then everything would have been clear. So it's not just the technical issue of saying it's possibly a bug or possibly incorrect code. We're looking at the process side and saying well if you do something like this it should have carried a documentation. Okay. Number of parameters in the function? Well this is from CIFFS file system but it's relatively new. I have to shrink the font size otherwise the parameter list would have not fit on the slide. I don't know if it's readable but it's 21 parameters and about one third of them are structs. So I would say that's not a very reasonable interface at least by all coding standards. It's basically not really possible to understand what code does if the interface complexity is that high. Now it might be justified in this case. I'm not saying this is necessarily crappy code. Just by looking at the parameter count. But it's definitely a piece of code where you say you would have to exclude that from a safety-related system simply because it's analytically not tractable. I also seriously doubt that there's a necessity to build such Fortran-like interfaces. If we look at the overall parameters in the Linux kernel. Left plot is a linear plot. Quite what you would expect basically a power law. So most of the counter parameters in the Linux kernel code looks absolutely reasonable. I don't know what the mean value is but it's somewhere around three or so. So what you would expect for sound code design if you look at the log plot then you can see that this exception that I just showed you with CIFFS is actually not so much an exception. There's roughly about a thousand functions in the Linux kernel that have 10 plus parameters in their function interface and that's probably not really reasonable in very many cases. It just makes code much harder to read, much harder to understand and so it's something that you probably want to avoid. Okay, bug? Does it need bugs? Okay, so we looked at this parameter as one of the requirements that we would place on the code that we would have in our minimum config so that we take a minimum or very restricted configuration for our safety-related system it's a multi-core platform supporting basic containers based on C-group, namespace, SECOMP and CPU shielding and then try to come up with a minimum configuration then we'd check the parameters and other attributes on this minimum configuration and there the maximum was nine and ten. Both of those were in lock step. A lock step is actually not really a tool that you would have turned on at runtime, it's a verification tool so we have to look at verification tool properties as well but they're not that critical and since it is a tool and it's recording information about the function that was called from it's probably reasonable that it has a larger number of parameters. So the key point here, that might not be that beautiful but that's just how it cuts in all scripts that are used to do this dump it and we can get very precise information where such problems might be. We can review them and basically sign them off that this is okay or not okay. Sometimes we might have to fix it, others we just have to try to get around. Okay that's bug count, type consistency okay. Next issue of course or one of the common issues is type consistencies. Type inconsistency is account for a lot of subtle problems, time overflows. There was one type inconsistency that we located in the scheduler that is fixed in the meantime and one of the kernel developers and actually confirmed that this would only trigger on 32-bit systems and probably only in a very theoretical role over after I don't know how many years. So something that's very hard or very unlikely to ever affect somebody but we could not exclude in safety-related systems that might be in continuous operation. So if we look at the elements that we're intending to use at the moment, Linux kernel, GNUBC and busybox, the versions are on the slide. It's a 4.1 kernel I think that I looked at in the 2.9 GNUBC and more of this random busybox version that just happened to have on my machine. But it's just to get an overview of how bad are we doing with types. The Linux kernel type system is actually really one of the biggest problems that we have found up to now. It's very hard to actually weed out that type system but even though it is a very large system with 376,000 functions in there, the typing consistencies are not that bad. So we're talking about 2.85% if I have that correct, where we add typing consistencies and functions. What is a typing consistency? Basically you have a function that's supposed to return int and the actual return values of a different type. Or you have a function call and the return value of a function call is assigned to a different type. So both call and call are side typing consistencies. GNUBC, though it's significantly smaller, about a factor 30 smaller, does about justice good or bad. And busybox, even though it's two orders of magnitude smaller than the Linux kernel, has about a factor two only on inconsistent types. Now that would imply that the Linux kernel is actually doing very well in general. I would be really interested in comparing some of these metrics with commercial, safe-related operating systems. Unfortunately these companies are very unwilling to let us do that. So let's look at the kernel a little bit more in detail, typing consistencies. This is only semi-automated at the moment and it might contain some incorrect data. It is for x8664 only because of course when you're looking at typing consistencies, you are bound to a specific architecture or the natural length, word length of the architecture. In some cases it might be a typing consistency in one architecture, not in another architecture. But basically sign, assigning or switching sign bits during assignments or during a return is generally something that could be a problem or has a relatively high probability of being a problem. So that's flagged as being problematic and we can see that there's quite a bit in the kernel core and in the network core has also quite significant amount. Now in the network core it's probably explainable because in a lot of cases you're converting network to host type and vice versa and you don't care about sign because you know that it will fit or you know that it's not going to cause a problem. But that's the disadvantage of static code checker metrics. It doesn't necessarily give you a causal explanation of what's going on. But it's something that we need to review for the specific configuration. This is now for the entire kernel at the moment. Basically what we do is we filter it out for the specific configuration that we can assess it and possibly fix it or in some cases just document it. Downsizing, that's basically a truncation problem, not that many in the kernel. I think most of those are actually bugs. We didn't yet go through all of them. Some of them were fixed but not all of them. The one that's probably not so self-explaining is the false positive on there. False positives occur because the Linux kernel type system is really hard to get a complete grasp on. We've found 12 different ways of how unsigned ints are declared in the Linux kernel. And for 64-bit types it even gets worse. So we have these translation tables that are probably not completely consistent to actually evaluate if types are equivalent. So what you see is that a type is assigned to a different type but you don't immediately know if it's of the same length or same sign. So you have to sort of map it to the basic type. And that's of course again architecture specific and it's not always easy to do. Actually resolving some of the type in the Linux kernel turned out to be very hard. Now this is a problem, a systematic problem with the kernel. It doesn't mean that any line, any piece of this code is wrong. It might all be correct but it means it's harder to read, it's harder to review, it's harder to understand. And we know that a significant portion of problems in the Linux kernel happens by cut and paste code. A gentleman just mentioned that as a possible source before. And of course if you have some implicit type conversion or downsizing at one end of the kernel it might be perfectly legitimate. But you then copy this piece of code or this code fragment to a different context and it might well be incorrect in that specific context. Okay, that's about great. Next one was to take as an example an at that time undocumented kernel API which was a completion API. That's why we selected that. Then write up formal specifications for it or actually static code checkers checking certain parts of the formal specification using cut and paste. Then run that over the kernel and you can see that there's quite a significant number of inconsistencies that can be found. Some very simple like double initialization, some of them not so simple. Like being in the wrong context although in that case there were no findings. Again signed and unsigned checks. This was everything from completion returning and unsigned value and then these being checked for being negative or being assigned to an unsigned value. So there's a whole bunch of different problems that can be uncovered. We attribute the quite large number of findings related to completions to the lack of the documentation. So the mitigation here was actually to sit down and write the documentation that's in mainline by now and of course fix up a very large number of these problems. It turned out that quite a few subsystem maintainers are not very happy with these things. I can understand that because basically they're saying, well, you're removing useless test or useless condition in the kernel. That's not really hurting us and we don't want to change code that has been stable and running correctly. So that's basically acceptable but we just have to get it off our list for qualification and then just say okay based on feedback from developers they're not going to change it because they don't want to change it, it's working properly or correctly. So it's not saying, again, these metrics are not about saying code is crappy. These metrics are saying how good, how solid is the process. And this is maybe one of the key issues with this review of pre-existing software and safety-related systems. Why open source has such a fundamental advantage is because we're actually getting a lot of feedback from the people. So when we sent out mails about the root cause of some of the bugs, the return rate of email was about 80%. Anybody that ever did studies in sociology knows that the return rate of 5% is something to celebrate. Open source is a little bit better. Okay, next one was Uselie Branges. Uselie Branges is documented, documentation, Timers TXT. Has a documentation for Uselie Branges but it's a little bit quirky interface. The quirk here is that you pass Uselie Brang Min Max to tell it that you want to sleep at least minimum and at most maximum. What's the intention behind this? Uselie Branges uses high-resolution timers. So you don't want to burden the high-resolution timer subsystem with large numbers of timers, notably that actually don't need to be that precise. But the next level of timers using something like M-sleep really is hat-scranlarity, which is very coarse grain. So the compromise is to say you take a high-resolution timer but you're not going to be picky about when it should fire. It should fire somewhere between min and max. And if there is a timer already allocated in this range and we just can hook it up there and let both of these callbacks fire at the same time, we don't need to actually extend the arbitrary for the high-resolution timers. The problem with the conversion is though that the timer that is actually initialized is not the minimum value but the maximum value. And obviously if you look at the conversion, most of the developers converted them to the initial value would be converted to the minimum value and then extended to maximum. And this is extremely close to the maximum. So if you take a high-resolution use-leap range timer and just test it, you'll find that on an idle system more or less 100% of them fire at the maximum value, not the minimum value. And on a loaded system, they fire almost any time because it's not an atomic context. So you can be scheduled out and your worst case will be in the range of hundreds of milliseconds even if you try to set a use-leap range of 10 to 12 microseconds. So we just looked at how the recommendation and the documentation said that the use-leap range minimum should not be smaller than 10 and the maximum should not be more than 10 milliseconds. So we just did a simple check to how often that is correct or not. And as you can see for the hard-coded values over the constants, it's about 4.5% that are violating this specification. If that really has an impact on the correctness of the code, it's again not really the key issue. I would expect that in almost all of these cases, it doesn't matter with respect to the correctness, but it does matter with respect to adherence to coding guidelines. And the preprocessor was similar and for those that are runtime variables, it's a little bit hard to evaluate. So those counts are in parentheses because they're actually based on some heuristics and rough estimations. I wouldn't sign them off as being correct. But you can see again roughly 5% incorrect rate. Okay, are we running out of time already? Okay. So yeah, that's the bugs. Yeah, build bugs, build bots, all sort of build bots. That's from Mark Brown, the one. And you can see, look at left plot, which are the errors or the failures. You can see that there's no real pattern for the warning. There seems to be a pattern, so we can use that as sort of a reliability growth approach. So I have to speed up a little bit. Yeah, how do we intend to use this? Basically, most of the code is clean. That's the good message. And actually, if looking at the data that we have, most of the problem cases were in drivers. Some in FS turned out to be my favorite subsystem in the meantime. But it's not as bad as it might seem from this presentation. There is a problem with type issues that needs to be addressed. And that brings me to, I'm just going to skip that, how do we actually intend to treat bad code? And basically our trick, and that's trick that you only have available in the open source community is by selection. We can actually just say, okay, let's look at the properties of the attributes, reliability attributes of different file systems and take the best one. That doesn't mean that other file systems are crappy, but it just means that those other file systems are not suitable for safety-related systems. So we can do the best selection. Some cases will have to introduce constraints on the usage or on the applications. And of course, we can sit down and fix some of them. We actually have been doing that. So it's about a little over 200 patches that are in the kernel in the meantime from the seal to Linux project. But that's basically, it's eliminate the problem rather than trying to fix it by adjusting your configuration. And when we're done, of course, there's a stable base in the Linux kernel. If we take something like a all no config, we can see a significant difference in these attributes. Okay, that sort of brings me to how stable is that actually. So that should be the DLC. Okay, yeah. We're going to do a prediction. We call this a top-down model. The top-down model is based on saying we look at the kernel process as a whole. We want to evaluate certain properties, development over time within sub-levels, between sub-levels. We want to understand how are these parameters developing. We'll look at different trees. I'm going to just be showing you the stable in a little bit from next. Just time issue. The data amount that we generate this way is quite large. Basically, we can't look at these properties at a statistical basis or assuming that it is a very constant process. So the first thing is to look at how constant this process actually is. Well, this is just rough and probably not completely... You could find a lot of things on this DLC picture that you could probably improve, but it should just give you an impression that there's a lot of checks and balances. There's a systematic development in multiple branches integrating into Linux. Next, there's transitions of how patches come in from LKML. We mark them as being rejected going back if they are discussed in the V2, V3, V4. Then they go into stable through the commit window and then RC1 to RC8 is the stabilization phase and we have the next stable. This is a very time-driven process. So the top part is asynchronous, basically feature-driven and the bottom part is time-driven. Yeah, that's for the 413. So if we look at that process in practice with respect to some of the metrics, you can see that the first RC1 has some like 12,000 commits coming in, more than almost 10,000 files changed. The number of lines per commit was 57, it's quite large and you can see that it systematically goes down. A little hiccup at 4.4. I think it's 3 with 13 or 2 with 13 is a little bit of an outlier, but of course it's not that consistent a process. There might be a reason for that, but you can very nicely see the number of files being changed, the number of commits systematically goes down. How does it look in the long run? That should be annoying. This is multiple Linux next versions. You can see that the shape of these integration curves for the... I think it's 4.0, 4.1 and 4.2, 4.3 on the slide, but it doesn't matter. It doesn't change its shape over time review. So we have a very consistent process in place and that's the reason why we say we actually can apply a statistic method. So that's the neck bin. Okay, another way of looking at the consistency of the process is this is sort of just the timeline. It's not very reasonable x-axis because it's in seconds, but it just was easier to plot it this way. So we just take the starting commit, plot it in seconds and you can see that sort of the starting point is really close to perfect line. If you do linear regression on that, you'll probably get something like a r squared of 0.99 or so. And you also can sort of see that there is some coupling, a vertical coupling if you look at where this placement isn't to... that we will then use to strengthen our prediction model. So what does a prediction model look like? Basically, if you look at literature, most literature will say, well, but processes are Poisson processes, but we have massive over dispersion in the Linux kernel due to a lot of the bug findings being from review and analysis and independence not really being there because you find one bug and then you find three others that are related and fix them in a set of bug fixes. So these stable fixes violate some of the basic assumptions where we can handle them by adjusting for over dispersion by moving to a negative binomial model. You can see some other effects here like the last few data points that are still valid they sort of go really down to almost zero because that's from LinuxCon in Berlin to early 2017 where everybody was on holiday so no bugs fixed. The next data point was in 140. I didn't put that on the slide but it still fit into my 95% confidence interval so I'm happy. So that's basically what we're doing. We're trying to predict stability of the system. If we do this over many kernel versions then we can actually see significant difference. How are these kernel versions coupled? This is a stable bug fix count over kernel versions. The line going through the 4.4 kernel on the top we put 4, 5, 6, 7, 8 in the beginning of 4.9 which is also long-term stable. I think the coupling here is quite obviously visible. It doesn't mean it's the same bugs but it's very probable that it is primarily back ports or bugs that affect all kernels. So we have this strong coupling of the process and this allows us to then extract properties over kernel versions. And this is what it looks like. If you look at the slope which is the most interesting you can see that it turns from positive it systematically turns to negative. The 4.9 value is of course not sound because it's based on 5 degrees of freedom or 6 data points. That's of course not something that you can use for any prediction. But the others are quite sound and confidence intervals and the parameters look really good. So we can say that the Linux kernel development process is systematically improving over time and that is a critical statement for a safety-related system. Okay, I can do it in 4, 5 minutes to more slide. Okay, same thing for subsystems. Note that the confidence interval in subsystems of course becomes much larger because we don't have so much data points. But we can look at individual subsystems and predict roughly two years in the future what the bug trends would be and how many residual bugs we have in this system. I assure you in the night one if I show you file system you wouldn't think that's so nice. So what do we do with this per subsystem data? We then actually look at how much of these subsystems are we using in our specific configuration for drivers. You can see it's around 1.5% only. For ARC it's 3.75 or so. For other subsystems like kernel itself it's really a large portion like 60%. And so we now can take these predictions over the entire kernel and take our stratification by what we're actually using and extract our total residual bug rate or expected cumulative residual bugs in our kernel configuration which currently is around 23 plus minus 6, not bad actually. Okay, conclusions. Yeah, basically all the components that we need are there and we think that the code quality based on basic statistics approach is suitable for safety integrity, after safety integrity level 2. Don't try AZLD with this please. No cell 3, no cell 4, that's just not reasonable. The processes that we need to automate this are still being worked on but it can be automated and yeah. From what we have seen up to now and showed also to our certification authority which is to Feinland we are quite convinced that we actually can qualify a constrained, somewhat limited subset of Linux to safety integrity level 2. Don't try and use a full feature all singing, all dancing Linux for safety please. Okay, that's it. Thank you. So yeah, that was my strategy. Talked too long then there's no time for embarrassing questions but I'm around if there are questions please do come and ask. I guess we have none now.