 Hi, everyone. So my name is Rachel Sibley. I'm the automotive QE, technically, for the automotive program at Red Hat. And here with me is Priyanka Verma, who also works with me on the initiative. So we're going to talk to you today about how to qualify a safe Linux distribution in cars. So a little bit about the agenda. I'll talk about what is the Red Hat in-vehicle operating system in Safe Linux, our overall approach to Safe Linux, and how we're working towards compliance against the ISO 26262. We'll go into what is the test strategy like, freedom from interference, failure mode and effects analysis, the process aspect of it, and how we're managing the requirements based on man pages, and test assets and traceability and the work items that go into that. So what is the in-vehicle operating system? It's a smaller footprint of RHEL. We are inheriting everything from RHEL, except the kernel. The kernel is the only package we're rebuilding for hardware enablement. We're using the RT kernel. It's based on OS Tree, which is very common with running OS Tree in embedded systems. We're working to achieve functional safety certification to conform to the ISO standard. It wasn't really made for pre-existing complex software, but there is an initiative to adapt the ISO 26262 through an ISO pass to complement that better. So we're working with a company or a partner, Exida, Consulting Body, and they're helping us with achieving continuous certification for FUSA. So what is FUSA, or Functional Safety? It's the absence of unreasonable risk that could lead to harm or injury or even death later on in the road in the vehicle. So we want to ensure that we did everything in our power to avoid potential harm to the user. So by doing that, there's compliance against this international standard. We have safety goals that are derived from potential hazards. We provide technical solutions to be able to react to these faults. A big part of it is it's very process-oriented, very rigorous, detailed flow to be able to work towards the functional safety certification. So a big aspect of it is providing the evidence so that if we get pulled into court later on, or there's an audit, we have the evidence to be able to back up to say, yes, we did what we claimed that we were doing, and then we set out to do, and we have the evidence to back it up. So you might hear the term ASL a lot throughout the talk. ASL stands for Automotive Safety Integrity Levels, A being the lowest level up through D, which is the highest level of hazard. So ASL A would be something like your rear light, which is very unlikely that's going to cost somebody to be in an accident and become harmed in any way, where you look at something like ASL D, which is the airbags failing to deploy. So this is a high level view of the different capabilities in a car and how they would relate to the ASL levels. For us, we're certifying against ASL B. So ASLs are determined by three factors, probability of exposure, controllability by the driver, and the severity of the failure. So this slide has a lot going on, but I'll try to summarize it. So the ISO throughout the specification mentions the V model quite often. This is the verification and validation process. It's derived from the waterfall model. This isn't something we typically follow within RELQE. So we take what we're already doing well and complement it with the various aspects into this V model. The safety analysis is really what identifies how complex and how much rigor we really put into this. And these are the FMEA analysis, the failure modes, and effects analysis, which Priyanka will get into a little bit later on. So on the left side is the verification activities where we do our system design and requirements analysis. And then on the right side, we're doing our iterative testing at the unit level, integration system level, and so forth. So for the in-vehicle OS, we're leveraging all of the same tests from REL. They're already doing a really good job there. We don't want to duplicate the effort there. And they also take a lot of the tests from their upstream, REL being our upstream, and then further upstream, and so forth. So we take all of those tests, and we rerun them in our environment, which is an OS tree environment. Also working on adapting the tests to a new test framework. The tests weren't really designed to work against an RPM traditional compose, so there's a little bit of work to work with an OS tree system. So because of the aspect of not being able to write and not using DNF, we're using RPM OS tree and so forth. And then there's additional work to adapt to the test framework to get them to run in our CI systems. So the requirements are derived from the ASLB APIs. So for each of these APIs, we needed to derive test cases, have targeted tests that are 100% covering the requirements, which I'll talk about in a little bit. But where we want to be able to reuse the technologies and tooling within REL QE, we don't want to fork their tests, and we don't want to duplicate what they're already doing well and be able to reuse those technologies. So we have two main pipelines where we're running our tests. One is the AutoToolchain CI pipeline. This is specific to automotive, with building custom images and rerunning them in the pipeline. And then we use the, where we're using testing farm and TMT, there were other talks about that. Earlier, you might have attended those, but TMT stands for Test Management Tool and highly used within REL QE. And then CKI is the kernel CI effort that is going on in Red Hat. They have high engagement with the upstream kernel community with testing upstream kernel trees. So they have a, they test very early in the development cycle at the patch level, and we want to be able to reuse that as well. And then the other effort is being able to provide the traceability in Polarian, which complies with the ISO standard. So for requirements testing, we have the package level verification, and this has to happen per API level. A lot of, there's a very detailed workflow that goes into this, which starts with reviewing code reviews and static code analysis, structural coverage, so the code coverage analysis, the requirements level verification, which is derived from the man page of the API. We then take an API, then we take a man page and we break it down into low level requirements. So we are able to break that down into something more, that's maybe something that's ambiguous. We break it down into testable parts to ensure that we have the traceability for all the behaviors that are specified in the man page. And then if we find any discrepancies with the man page, it's also a chance for us to be able to file a merge request, to be able to provide those changes to make sure that the specification, e.g. man page, is actually doing what the implementation is designed for. So the integration level testing, this comes out of the safety analysis, which I mentioned earlier, so the failure modes effects and analysis. There are specific failure modes that are identified within an API and external SIS calls, and dependencies, and we need to ensure that our existing package level testing is already covering the integration level testing as well. And then again, more details about putting that into Polarian, adding the requirements. We need to have the requirements linked to test cases, and test cases to test runs, and so forth. So Priyanka will talk about that a little bit. So I talked about code coverage analysis earlier. So the ISO recommends the software is verified using code coverage analysis. The target is 100% compliance for the recommended quality metrics defined in the specification. We don't have to do this for every API, only for the complex APIs. We also use it to assist us with the simple APIs to ensure that we have the 100% requirements coverage. So within the specification, you'll see recommended, highly recommended, for all of the techniques within the V-Model. So for the code coverage, structural coverage, it's recommending both statement and branch coverage. So for code coverage workflows, we're trying to do more granular reports that are specific to the API level, rather than running against the entire package source, be able to do detailed analysis where we can drill down into specific API to see where our gaps are. So the low-level requirements also have code coverage analysis that are showing the traceability of the tests to the requirements, and the test coverage there. So that helps us understand where our gaps with code coverage and requirements, and therefore, requirements coverage, to know, well, I need to go and develop new tests and then push them upstream to be able to pull them back down to Raul. So another major aspect of functional safety for road vehicles is freedom from interference. Freedom from interference is about absence of cascading failures. As we can see in the image, the cascading failure is the failure that actually cascades from one element to another. So for example, there was an event that causes a fault in element A, which made the element A fail, and the failure of element A causes the fault in element B, and it fails the element B. So this is a cascading failure, so freedom from interference is majorly about avoiding the cascading failures. So it's also linked to the easels, that is, there should be freedom from interference between lower and higher easels, or a quality managed level to any easel, A, B, C, or D. So how do we ensure freedom from interference? First thing, we analyze. So there is DFA, dependent failure analysis. So this covers the cascading failures, which are for the freedom from interference, and then also the common cause failures, which are related to the independence. So common cause failure, as the name suggests, it is due to a common cause that at the same time fails element A and element B. So these failure analysis help us figure out the failure modes that could lead to the failures, and we try to mitigate those failures. Again, these dependent failure analysis are of two categories. One is deductive analysis, and the other one is inductive analysis. So as Rachel mentioned, that we are targeting easel B. So according to the functional safety standard for road vehicles, for easel B, it's highly recommended to do inductive analysis, which is FMEA, failure mode and effect analysis. So under this exercise, what we do is we brainstorm and list the failure modes that could be applicable to a particular component, and then further analysis happens where we calculate the risk priority number. So risk priority number is nothing but the multiplication of severity, occurrence, and detectability. So we get that number after the multiplication, and then we have the data for the failure mode. Now comes the chance of mitigating that failure. So we apply technical solutions to mitigate the failure that we have already listed in the FMEA. So after putting that medication, again, the risk priority number is calculated, and it's checked if it's under the acceptable range or not. And from the mitigations, then finally, there is one more way of deriving requirements that is through failure modes that comes into picture. Now comes the process aspect. So process aspect is majorly about the evidence, and then the traceability. So evidence is that we have to prove that we did what we were supposed to do according to the functional safety standard. We have to prove that we tested a requirement, or we made sure the failure mode was covered with the mitigations, and we followed all the timelines, the fault, time tolerance, interval, and a lot of more things. So these evidences could be for testing, for QE, it could be our test asset, it could be our test plan, our test specification, and our test reports that tell us when we tested the requirement, how we tested the requirement, what techniques were used, and what methods were used, and what were the results, which tools we used. So we have all the data if you want to retrieve. And along with all the data, this data needs to be traceable. So we should be able to make out which test is for which requirement and which result is for which requirement, whether the failure mode has the right requirement associated to it or it was a tested well, was it 100% tested or partially tested. So that kind of data should be traceable. So yeah, that's traceability is one of the things that add value to the evidence, and evidence which is not traceable is not valuable. So how do we achieve it? We are doing it by using Pallarian. We are using Pallarian in different ways, as you can see, for our test case management, for writing our technical safety concept, managing our requirements, and something that's in progress is the configuration management, tracking our change request, whether they are related to the tools, the process, or the requirements. And lastly, we are using it for the metric reports, like the traceability matrix and also the pass fail metrics according to the requirement. Now we come into how do we derive the low level requirements and the associated conditions of use? So as Rachel previously said, that we are taking reference from the man pages of the associated APIs that are in the safety scope. So first we analyze the man page and from that man page, we derive the low level requirements that are testable. And now there are other things that come into picture that are assumptions or the conditions of use. These assumptions of use or conditions of use come into picture when the fulfillment of these low level safety requirements is dependent on the context. For example, I define a requirement for an API, but that holds true only if my system is 64 bit. So that is going to be my assumption of use, that the so and so requirement should be verified and passes because if you use the right environment. So that's the assumptions or conditions of use. Later, yes, the verification happens and yeah. That's how we move forward. Next thing, when we talk about man pages for deriving requirements, the first thought that comes into mind is man pages are subject to change because they're upstream, right? So how do we detect and handle those changes? So we have an application that runs and gives us the diff if there is a change and then there is a CI that's automatically triggered and MR is raised. And there the analysis happened where, which tells us that whether the change is related to the API in our safety scope or not. If it's not in our safety scope, it automatically gets merged. And if it is in our safety, if not in safety scope, it's automatically merged. And if it's in safety scope, then the MR is pending for review. So what happens when there is an MR pending to review due to a change? Then our change management workflow comes into picture. A change request is opened and then the impact analysis take place. In that impact analysis, we take into account the technical impact, the schedule impact, and what are the different work products that will be impacted due to this change, whether it affects only documentation or testing or testing development and documentation. So that kind of impact analysis is performed. Based on that impact analysis, there's a change control board that approves or reject a change. So once the board approves the change of the man page, the related requirement gets the change of text and hence the implementation changes and hence followed by the VNV activities that is validation and verification activities. Now, the traceability aspect, how does it fit in the complete scenario? So we have low level safety requirements. We have failure modes. We have man page based requirements. And for traceability needs, according to the functional safety standard, we need to have bi-directional traceability between the different hierarchical levels of the requirement. For example, if I'm at the lower level requirement, I should be able to trace back to the top level or the parent requirement. And from there, I should be able to come back to the related or derived low level requirement. And hence the requirement should also be traceable with the test specifications that are verifying that requirement, the test results, and hence the test plans and test report. Basically all of our test assets. And same goes for the failure modes. So our failure modes should also be traceable according to what test plan they were planned in and the test specification reports and more. So this is an example traceability report, how it finally looks like. So what you see here at the top level is the man page based requirement and which is further associated to the lower level requirements that it has, the failure modes that are in the blue icon over there. And that exclamation mark is for the assumptions or conditions of use. And below that, if you see, there is the low level requirement associated with the test case. So we see that which test case is verifying which requirement and how the requirements are related to or bi-directional traceable to the failure modes. So that's an example. Any questions? Yes, please. Is it the standard of that software caused in hardware? Yes, actually the standard incorporates at, it has, it is for all three levels. At the software level, the hardware and the system level, which is software plus hardware. So currently here we are concentrating on the software part for now and maybe in future if it, if it, you know, the lifecycle permits than hardware and software interfacing at your site. Yeah, G-Cov, yeah, G-Cov R as well. Which, the question was which tool were you using for code coverage analysis? G-Cov, G-Cov R. Yeah, a lot of the work that we're doing for the safety scope, like a big part of it is affecting G-Lib C package for example, so there's a very large upstream test suite that we're re-running for RIVOS and then we're trying to break it apart by the unit level to show the traceability to the specific failure mode or the specific package level test against the API. So new tests are being developed and they're being staged and we're in coordination with the SMEs about how to be able to upstream those tests to the G-Lib C project because then REL QE can take advantage of that and then pull them back down and re-run them for REL as well. The question, yeah, the question was how we create this requirement hierarchy with the high level requirements of the man page and then low level, so there's even more levels actually beyond that. There's a technical software safety requirements that are defined from our safety goals and then they trickle down to our man page requirements which go to the API like get M and then from the API get M, we then look at the man page and then we look at the behaviors within the man page and then break it down into functional parts because not every part of the man page is something that you would test, for example. So those become low level requirements and then those get fed into code coverage analysis to show that we have 100% code coverage against those low level requirements. But now we might not always get 100% code coverage so as long as we can provide justification as to why that code path wasn't reached, for example, that's allowed, but otherwise for the complex APIs we have to have 100% code coverage. Yeah, where do we get our requirements from? The APIs which are our ASLB APIs, those come from the OEM so they provide the list of APIs within the safety scope. Of course, we work with them to influence them and help guide them about which ones should be in the safety scope but those specifically come from our customer which is GM who we're working with right now. So the question was, some of the man pages are very minimum and then they point to other documentation. Was that the question? We will have to handle that, yes. There's this call wrappers and that sort of thing so yeah, that's something that we need to handle and it's yeah, so part of it will be redirected back to the actual API that's wrapping around it so there's different use cases based on if it's like a Cisco wrapper or the API or and then there's different categories of very complex, complex, high, medium, low and there's a different rigor that's taken upon the complexity of the API as well but eventually those are gonna be broken down into two aspects, simple and complex once we get to a point where we can do that classification. Well thank you very much.