 Hello, I'm Steve Anderlees from the Vaughan Company. My co-presenter is Peter Brink from Underwriter Labs. The title of our talk is Debating Linux in Aerospace, Objections and Paths Forward. In this talk, we'll give an introduction, then talk about some of the assurance concepts that are important in aerospace for assuring software, and then look at a few of the standards. We'll talk briefly about the benefits of using open source, and then walk through in sort of a debate style, a number of objections to the ability to use Linux in safety critical avionics. Hi, my name is Pete Brink. I am an automotive functional safety engineer working for Underwriter Laboratories. And when Steve proposed this idea of having a friendly debate, I was interested because I had actually started my career working on jet engine control systems, and I've been familiar with how avionics software works, especially in a safety critical context. So it was important to me to have this discussion because I am active in the enabling Linux and safety applications community working with the open source engineering process team. My motivation for this talk is thinking through how we might use Linux in higher assurance safety critical software in aerospace, but there are some challenges to doing so, so thinking through how we might address those challenges we thought would be a helpful presentation. Before we get into that back and forth of objections and possible answers, we wanted to go through a few of the concepts that are important in aerospace for safety critical software development, and some of the standards that form the regulatory environment that we must work under. First, let's go through two terms, reliability or reliable software means a system that does the right thing, that it's doing what you specify, it's acting according to the behavior that you require. A safe system is the complement, it doesn't do the things you did not want. That is, there's no unexpected behaviors, no unanticipated actions that are coming out of the software. In aerospace, we have several, the idea of several different levels or design insurance levels, software levels, from the most critical is level A. If something goes wrong with this software, fatalities could occur, and so this requires the most rigorous analysis demonstration through evidence the software is correct. The least critical is level E. Errors in this software don't result in any disruption of the crew and therefore doesn't require any formal evidence. Level A software might be, for example, software that forms the autopilot of the plane. Level E might be an example of software running the coffee maker or the entertainment system where it doesn't impact the safety of the system. Although maybe not giving coffee to the pilot, you consider it an issue, but generally yes. Level A most rigorous and then down through B, C, D, all the way to E the least rigorous. One of the key things that we look for when we are doing validation and verification of a safety critical system is specifically that we're following the appropriate processes that will minimize or eliminate the systematic error that we have the opportunity to build into the system. So specifically, there are two terms that we look for here. The first of them is validation, which is basically the mechanism by which we demonstrate that we built what we set out to build and verification. And verification is or are the set of processes that we go through to demonstrate that each of the different steps that we follow throughout the safety critical development life cycle are in fact being performed and are being performed in accordance with either the quality or the safety framework that we've established. And we do all of those things specifically so that at the end, we can demonstrate that the requirements that the things that define what we set out that what we are going to set out to build are in fact in place. And so we have a what's called a requirements traceability matrix where each phase of the development life cycle, we have we start with the system requirements, we do the software requirements based on those and in the context of the system requirements. And then those software requirements were expected to develop an architecture, a design and code that reflect and demonstrate that those requirements have been fulfilled with the ultimate goal that we have a set of tests that demonstrate that the requirements can be complete. And of course, then the requirements themselves must have a what's the best way to say it, the there must have the requirements themselves must be testable specifically. Another part of making sure software is safe and reliable is checking that our tests have exercised the software. Level D does require testing and testing against your requirements, but it doesn't require that we demonstrate coverage. All the other levels above that do. At level C, we have to show statement coverage. That is, our sets of tests must have executed every line of code. If there's any code there that was not exercised by the test, some justification analysis is required for that. For example, you might have some defensive coding that that can't normally be tested. And then that would be the justification for the test not covering it. At level B, we not only have to cover every statement, but we have to show that every branch has been taken for any decisions. And at level A, we have to show that every combination of those possible decisions at a branch have been exercised. Next, let's look at some of the standards in aerospace. There are a number of them that form the regulatory environment that we must certify software or a system under that has that software on it. On the U.S. side, the Federal Aviation Administration, FAA, has published an advisory circular which specifies the different means of compliance for showing how one can assure software. That advisory circular points to the U.S. and probably the key document with an aerospace is DO-178C. In Europe, the parallel standard is ED-12C. So it's titled Software Considerations in Airborne Systems and Equipment Certification. This document is the standard that guides how we develop software for aerospace for systems that will be flight certified. It was published about 10 years ago, a little over a decade ago, and it has four other documents that are appendices to it. DO-330 through DO-333 that talk about specific topics. For example, 330 talks about tools that we use to develop software such as a compiler or automated test framework and whether those also need to be verified or in this case that we call them qualified. DO-297 talks about how to approach modular systems, and so this is sometimes referred to as partitioning. And on the U.S. military side, Milhand Book 516C is one of the documents that guide how one would write software, particularly Chapter 15 of this standard. And then on the European side, there's a standard that just recently came out, AMC 20-193, that guides how we use multi-core processors. We expect a U.S. version of this document to come out fairly soon from the FAA as of this recording had not yet been published. So on the U.S. side, we would probably refer to the CAS-32A paper that talks about multi-core. There's also some system-level documents from SAE, ARP 4754A, and 4761, which talk about safety at a system level, but then also mention how this gets allocated hardware and software. And so this would flow down into software development under DO-178C. So with that background for assurance concepts and a quick overview of the different standards that are applicable, why might we want to try to use an open-source product like Linux for aerospace? Well, there are quite a few benefits. We'll just mention a few of them here. First, in safety-critical domains, it's fairly important to have peer review, expert review of the software to ensure it is suited to the purpose and is safe, it's correct, it's assured. So using an open-source licensed software like Linux gives us better visibility, and so you get much broader review. Review by experts, review by regulatory authorities is easier because it's open-source. Another benefit is crowdsourcing. When new technology comes along, when better ways to approach something comes along, when a fix to an existing problem comes along, those get filled in more quickly. You just have access to a broader array of experts and perspectives that are contributing into the software, the crowdsourcing effect. And then lastly, because Linux is so ubiquitous, it's a very well-known API. Use of Linux is across many, many different industries. It's really one of the most commonly used OSes in the world. So that gives us a couple of benefits. First, we should be able to find competent developers more easily because most students studying computer engineering or computer science will have used Linux already. They're familiar with it. And then it'll be easier for others to understand the code that we're developing if it's within the environment and uses Linux as an operating system. So we've heard from Steve about the specific advantages and opportunities that open-source represents in terms of how we might use it in aerospace. We have in the safety community some specific objections, and those are listed here on the screen, that Linux doesn't have certification artifacts. It doesn't protect the code. The design of the kernel of Linux doesn't specifically permit safety. And the culture, the development culture around open-source is not where it doesn't fit within the bounds of what we expect in a safety culture. So the first objection, and let me be very clear up front, the mechanism by which Linux is developed and works, there is nothing inherently wrong with that. It's just that there are specific issues when we look at it from a safety and or quality context in terms of why this objection is here. So specifically, the first of them is that Linux was not designed. Linux was grown organically, and there was a specific idea of what it was supposed to do and the job or role it was expected to fulfill. From a quality and a safety context, we're really looking to try and define that up front in order so that when we go through the development process, we can demonstrate that the architecture and the design specifically are there to fulfill or match or fulfill those requirements that we set out in the first place. There are, do we lack a set of requirements? And that's a key problem because when we talked previously about validation, one of the key things that we're looking for is a mechanism whereby we can demonstrate that the demand that the product is done that we want. We want to have a set of tests that demonstrate that all of the requirements have been fulfilled. It lacks a specific architecture, which is another key piece and another key problem, because what we're looking for here is a mechanism whereby we can perform a safety analysis against the architecture as presented. So in order for us to do that, we kind of have to have that architecture in place for us to do that analysis against. And the last of these is a unit design. So when we do the architecture, we're looking to demonstrate that we have a set of functional blocks and their interactions, and then the unit design is specifically there to demonstrate how those individual blocks are designed and how they fulfill the interfaces that were specified in the architecture. Although Linux was not designed according to a single set of requirements and processes and according to rigorous standard to which every developer adhered, it does have design and architecture is just more emergent, really a crowd source effect that has produced over decades a fairly well-despected and well-designed system. However, for critical domains, we need to demonstrate that that design is suited to the purpose that it's correct and safe. So how can we address some of these challenges? The path forward probably involves reverse engineering the certification artifacts that is the evidence from the source code and also foreign engineering from particular system requirements for particular aircraft to show that Linux provides the functionality required and does in a way that's safe and reliable. And so weaving these together in a way that shows that the system objectives have been satisfied that the software is safe and reliable is a way to start providing that evidence that's required in regulatory environments like aerospace. We might also look to find what an appropriate coding standard would be and then apply that to the code being used from Linux for the aircraft. And perhaps identify whether there's some issues that the community would want to improve and address problems. Identifying the architecture and perhaps encouraging the open source community, Linux community to start coalescing around what that architecture looks like and simply just documenting what's been emergent over the decades. And then finally looking at different safety analysis to identify are there any potential flaws or areas where we want to make improvements to improve the safety of the system. The next objection is specifically that in terms of the mechanism of how Linux is developed that we don't that Linux doesn't protect the code. What we look for in a traditional safety development lifecycle or inequality development lifecycle is that there's a specific mechanism in place whereby we can go through and do an analysis on the code itself, not just the code, but the architecture and the requirements to determine the scope of what a particular change is going to do. And so as a consequence, we have this mechanism where the expectation if there's an architecture in place, we can determine what the scope of the change is going to be and what the impact and that's why it's specifically called an impact analysis on that architecture, that code, so that we can determine what the scope of the change actually is. And the fact remains that there are also specific competency requirements most of the time when we talk in a safety critical context. So the software engineers that work there are expected to be trained and knowledgeable on the safety standards and then follow the specific coding guidelines and design guidelines that define how we are expected to put the overall architecture or the system together. In this instance, anyone can write a driver. So we're limited to whoever is out there and there essentially isn't any confidence or at least no demonstrated confidence that the driver has produced is going to comply with what all the safety standards might expect. Well, it is true that anyone can propose in addition to the Linux kernel. It is not true that anyone can just put their code in. There is a system of maintenance that checks the quality and sufficiency of the code as it's pulled in and examined, considered, and eventually if it's shown to be tested and incorporated into the main line of Linux. For the aerospace industry, one way to formalize this is to provide a curated and baseline profile that is configure Linux for particular use, including only the code that's necessary through configuration and then maintain that in the company's own configuration management system, for example, within a Git repo. That configuration would include the compiler directives for how to build the system, the different configuration of which drivers are included or not. And since that now is a snapshot of the particular config Linux on the aircraft, that can be maintained and managed or curated. Objection three is specifically around the design of Linux, the nature of what the Linux kernel actually looks like. And this actually goes back a little bit to the idea around that anybody can build a driver and anybody can make a modification to the kernel. So specifically, the kernel is monolithic, which means that the kernel and all the drivers all draw, all execute in the same and the same privileged space. So on an Intel architecture, that would be ring zero. So specifically, what happens or what would happen to the kernel if a driver misbehaved? So the driver misbehavior, we have a wild pointer and it can overwrite memory pretty much anywhere, not just in the kernel, but in a in some nominal application or function that's executing in user space as well, because the kernel has indeed access to all of memory from a physical context. So if we look at the typical way that safety operating systems actually are organized, they usually have a kernel that is isolated unto itself and it's the only thing that can execute in a privileged space. And in fact, all of the user space, all of the drivers are expected to execute in user space. And only then the kernel during initialization or other things can give that the kernel can then give access to drivers or applications in order for them to get access to system resources. And that can be anything from memory. It can be CPU time. It can be peripherals or other communication media for in terms of how they're able to interact with either other applications or the outside world. Well, it is true that Linux is monolithic in that the kernel is relatively large, much the functionality is within the higher privileged operating system level. It is not necessarily the case that every driver must be included in the kernel. There is the ability to put drivers in user space and then the kernel provides a memory mapping so that the driver can get to those addresses related to the device it's managing and no others, providing some separation. The other path to take forward is to not include code that's not necessary. Generally at higher Dell levels we want less code that has to be proved and so we leave out through configuration the code that isn't needed for the particular function on an aircraft. And then for very high levels of assurance one can also consider partitioning. For example through a hypervisor where Linux is a desktop operating system within one virtual machine or partition and separating out functionality into different virtual machines. Understand this last objection is not a condemnation of how Linux works because it's clear that it does work and it's a strong mechanism for being able to produce an operating system that's used around the world. But we don't really have the confidence in it that we need to have when we're dealing with safety critical systems. And I'm talking about the software that operates a jet engine control system and keeps it flying or keeps it executing. The software that executes your brake controller on a vehicle. That's the kind of thing that's the kind of confidence that we want to have as part of that production. So when we look at these different items Linux doesn't have a safety culture or a quality culture. Well both of those are actually based specifically around what when when I say software engineering I mean something very specific. I mean that I mean it's the engineering of software. It's more like systems engineering than it is anything else. Because we follow a specific development lifecycle process. We follow requirements, architecture, design, implementation and then testing at all the different levels. Now it doesn't mean that it cannot be iterative and it doesn't mean that you can't skip specific steps based upon the complexity or the difficulty of doing those things. But ultimately what we want to have at the end of this whole thing is a mechanism whereby we can prove everything is in the code that we expect to be there and that there isn't anything else and there isn't going to be anything in there that doesn't work the way we expect it to. One way to handle the variety of code contributions coming into Linux and thinking about that coming from a variety of different developers that aren't necessarily working to a single standard or a single approach is to use a number of tools. First there are now available automated scanning tools that look for security vulnerabilities and this is something then that crowdsourcing can help us with to identify what are those vulnerabilities and what are the appropriate fixes to them. As I mentioned earlier in the previous objection, curation that is managing a particular baseline can help to ensure that the version flying on the aircraft has been carefully evaluated and so that can be done by a team at a narrow space company that is working within a rigorous safety process. So that then is a particular distribution. Think of this, for example, produced by Yachto as a set of build recipes that have been checked and managed to ensure that that is correct. Building trust at the particular configuration on a particular aircraft is correct and safe. Once that baseline is established, one is not forced to take a new driver or a new update of software. It can be evaluated and you can wait. You can see to just prove out its correctness over time and only adopt those features after they have been deemed safe. In conclusion, to support what Steve has said with regards to the possible paths forward, I agree completely and in fact I am part of an organization within Elyssa called the Open Source Engineering Process Group where we're looking specifically at what we as an organization can do in order to include the engineering processes that are necessary in order for us to be able to make any safety or quality claims about the development of the code. And with apologies to one of the ring series, one does not simply walk into aerospace using Linux. That is, there are multiple challenges have to be addressed. We've mentioned some of them here, not necessarily a complete list, but hopefully gives you a flavor for the sorts of things one might need to do to use Linux and aerospace. If this is an interesting topic to you, we'd like to encourage you to check out the Elyssa Aerospace Working Group. This is a group of industry, government and academic professionals that are gathering under the Elyssa project to think about how one could use Linux and Aerospace and tackle some of these challenges. I also encourage you to look for two papers in the upcoming IEEE Digital Avionics Systems Conference that will be on use of Linux and Aerospace that comes up in October. Thank you for listening. We hope that you will contact with the authors with your questions and try to respond to those in a timely manner.