 Hi everybody. Welcome to this presentation. What I'm going to speak about today is where do security and safety meet? First, I'll give a little bit of introduction. Assuming that the people who are attending this session have a good solid background in security, given this is the Linux security summit. People have some notions about safety, and I'd like to perhaps dispel some myths and to understand where really they can meet. The focus of this session is going to be on engineering aspects, basically how we can design Linux-based systems and what type of features are maybe relevant for making those systems meet more safe, and where we take some lessons from work that has been done in the past for security. Okay, so this is the agenda, basically what I just said now. We'll give a little bit of a background where they meet, where they don't meet, and then we'll go over some typical foundations from security and to understand if and how they may be relevant to support safety. At the end, we'll discuss some practical considerations, and I'll introduce a project from the Linux Foundation called ELISA, and I guess this is also a call to action if people want to contribute from an engineering point of view to design and to help to support such features. They can contact me after this presentation. I apologize in advance, it's a little, it's the first time I'm giving a presentation where I'm talking to myself and it's pre-recorded, but I really expect this type of interaction, and I hope you'll be able to contact me, sorry about the fast forward. Okay, so very briefly, who am I? My current position is assistant safety architect. I have many years of, and I work in Mobileye, even though this presentation is not a representation of anything on behalf of my employee, it's totally my independent thinking and initiative. What I do on a daily basis is to design as an assistant architect safety features, which are deployed in Mobileye products. Basically where I focus is in the low-level parts of the system close to the hardware, the hardware software interface, system move drivers, Linux infrastructure to understand what can be done to make those areas safer. I'm pretty new to safety and as you can see, I don't deal with anything having to do with qualification standards, certification, things like that. I'm leading that to the other safety experts. My work is totally engineering and design development of these features. Beforehand, I worked for many years as a security architect. I thought it was formally NDS before it was brought up by Cisco. And for a few years, I worked as a security consultant for some major European automotive concerns on behalf of some startups here in Israel. Okay, so I have a lot of experience as a security architect and moving into the automotive domain in Mobileye. Ideal, I still do some security architecture, but more my focus now is in actual engineering features, which are supportive for safety because of the focus on automotive safety, which is very necessary in this market. Okay, so first definition, what is functional safety? If you look it up in Wikipedia, you'll see that basically it's sort of an insurance policy. Basically, it's some kind of guarantee that we are able to give that we can, we won't, and the user of this product won't face unacceptably high risk by using the product. Okay, and the way that is done is by several steps. The first thing is a failure analysis, which is sort of similar to a risk analysis that you see in security, which identifies what type of system failures you may expect to have that may these failures may be due to hardware failures, software failures, or operational failures, design failures, all kinds of possibilities. And based on those failures, this is where in my work I step in to try to understand what type of safety features can we build into the system, something which is, they call here safety functions, sometimes called safety mechanisms, and to the build the system so that we have those engineering building blocks, each of which is designed to mitigate or lower the risk of a specific type of failure, which is a potential failure which was identified. And then we have to make the safety case, which is again done by the safety experts, have to take that and put together some kind of proof, some kind of documentation, which demonstrates to whoever is necessary to that we have done the best we can, best effort to avoid such unacceptable risk. Now, certification, there are loads and loads of standards out there. I gave two examples for all kinds of different systems which are highly dependent on safety. Okay, and it's a very painstaking work to have to deal with those standards. And but the most important thing which I want to take away from it here, in this scope of this discussion, is about the liability. Okay, so what we see here as safety is really is about liability and being able to prove the level of risk or and mitigate potential failures. And I put this table together to sort of summarize some of the key differences between security and focus and the focus and safety. Okay, so if we start, you can see that in the driving force of security is malicious intent. We're always dealing with this potential hacker, somebody outside and want to do the best we can to design our system that we block any attack by that malicious hacker. In safety, we don't suffer from that. Let's call it paranoia. We have potential failures. Things break down, hardware breaks down and can give erratic results. Software can have bugs. There may be gaps in the operational procedures. For example, I don't know if a system is built with the wrong configuration file, other such issues. And we have to be able to we identify failures which are either systematic, which deal mostly with software procedures, operations, things like that, or transient failures, which mostly deal with hardware type of failures. There are others going to more details, which I'm not going to go into. But the point here is that we deal with failures, and we have to be able to ensure freedom from an unacceptable level of risk of those failures. Okay, in the world of security, we look for vulnerabilities, because those are the weaknesses which open the door to the hacker and let them exploit, provide the ability to exploit the system. In the world of, in the domain of safety, we deal with faults, which are some abnormal conditions which may cause failures. The most common tools for security are based on crypto, because they give us the mathematical foundations to be able to say, to prove, let's say for example, integrity, confidentiality, and the different important features which security experts are trying to build into systems. Whereas in the world of safety, we want to bring down the risk to an acceptable level. And acceptable is a statistical thing. If it's one in a million, that's not the same as one in 10 or one in a hundred. And based on the type of system, there are very clear guidelines and standards in the market for different types of systems on what's an acceptable level for failures or meantime between failures. Okay, so the dividends is more statistical for those type of failures. For testing, security, we deal with pentesting, fuzzing, in safety based on the failure analysis and the safety mechanisms which we define, we have to be able to test the system. But it's not a pentesting type of testing, it's more common software type of testing where we have to prove that we have an hypothesis, we have this safety mechanism, we want to be able to demonstrate that that safety mechanism is sufficiently robust, that it reduces the risk to an acceptable level. And that can be done by stress testing or the type of testing or fault injections and investigating the behavior of the system and to see how those safety mechanisms which were identified and defined based on the failure analysis, how they prove an article can be tested in a real system. In the domain of security, we can use many types of tools. They can be either open source tools or proprietary tools provided by different vendors. In the domain of safety, life is a little bit more difficult because safety certification bodies have to be able to trust that those tools actually do what they do. And open source tools are commonly not developed in a rigid formal software development process. And which means that it's much harder to demonstrate that those tools actually provide the results that they claim. So there is an inherent mistrust of open source and part of the reason for this discussion is to expand communication and to demonstrate where we have proven features from open source which have been proven in other domains and to understand how we can adapt them and demonstrate their power and relevance in the safety domain as well. One area where it's not quite clear where the future will lead us are safety standards which are being adapted for security. One of the most prevalent areas nowadays is in the world of automotive safety. There's the ISO 26262 standard which defines safety requirements, safety guidelines for the automotive domain has been pretty much adapted and modified and into terminology which is relevant for cybersecurity. The ISO 21434 is very much based on and derived from the 26262. It's still on draft and it's a real good question where that's going to lead, how that's going to help support cybersecurity really can't help because as I said it's still in draft and it's something which is still being worked on. So much for introduction what I would like to do just to wrap up the introduction. We want to deal with engineering foundations and basically to understand how we can derive safety mechanisms from security engineering. Okay now the security engineering features there's nothing there. What's new in this talk is how these may be adapted or perhaps extended by relevant patches to the kernel or other mechanisms so that these security features can help be useful for supporting safety as well. That's where the novelty here is. Okay so let's take one example. In the safety standards they talk about freedom from interference. Okay so safety standards tend to be very, I don't know what to call it, even more like almost philosophical. Okay they use a lot of terminology which is quasi-engineering. So they define as something very very basic and very critical to in again this is from the domain of automotive safety. Simply I take up the examples from there because that's what I'm familiar with but obviously these can and they have been these ideas have been adapted for other areas as well other domains. Freedom from interference means that we have to make some kind of guarantee that there won't be cascading failures. Cascading failures is something let's say like a domino effect where if you have one let's say for example either software or hardware element of the system which fails which can sort of by this domino effect lead to failures in other areas and as a result something which may be minor in one area can cause a major breach or what in the term in terminology of safety is important a violation of some kind of safety requirement. So if we think of this in terms of software let's say for example if we have a some code which is safety critical which monitors some the temperature of the system or monitors some other aspect of the system was critical to the safety of the hardware and no software of the system and we have a lot of other code which is doing whatever it needs to do to to support the the business requirements or the functional requirements of the product at hand and if some whatever non-safety related code has some kind of bug or failure or some non-safety related hardware has some major failure if that can by lack of freedom from interference by some kind of cascading effect cause a failure in the safety critical hardware or software and some others element that's what we're concerned with here when we deal with freedom from interference. Now if anybody here who is familiar with developing secure systems they should right away ring a lot of very familiar bills. We deal with in again and these are derived from the security domain we define separate processes with each of those processes having its own well-defined access control requirements because it has its own virtual memory space and the data which it can access or cannot access in whatever way it is needed and it's expected to have reduced privileges so that it cannot access or or corrupt let's even say other areas and we deal with containers and perhaps even hypervisors which combine hardware and software features for this type of separation. I left out here capabilities was also Linux capabilities provide each process or whatever it is a software element it's necessary privileges but not capabilities kernel Linux capabilities but no more no less etc we can design the system so with features which from the security domain are meant to set up isolated islands let's even of communication of computation so that even if a hacker gets control of one area of the system maybe an area which is more vulnerable it's blocked that hacker should be blocked or his life should be made much more difficult if he tries to cross over from there into a more critical area of the system and these same type of techniques are relevant and have been found to be relevant for safety as well and this is just as an example to see that when the people who wrote freedom of interference I don't think they were thinking in terms of these Linux elements but in a practical sense this is the most natural way of translating the concept of freedom from interference into code okay and that's what we're really looking for how we take those standards how we take safety requirements how we turn them into actual code and what type of features exist which can help us to do that to build those safety mechanisms so in the talk today and I'll try to keep track of the time okay we deal with memory protection features um in the safety standards there are various areas which deal with how we protect different areas of memory and to make sure that memory for example the the heap the stack let's say for example we prevent um stack overflow stack underflow um stack corruption etc similarly for the heaps a simple similarly kernel versus user space memory all kinds of memory protection features um which are built into Linux that can help us um perhaps to support those safety claims to make our stack um safe or in more formal terminology to make sure that when we define the stack if we put our any um data on that stack which is critical to the safety of the system to be sure that that's that data will be protected for any of those type of failures okay that it can't get corrupted due to um um some faults that somebody um mistakenly overrides our data in the stack when they should not have um access or permissions to do so freedom for interference which we discussed before we'll go into a bit more details um pending round of time we have anything which I don't cover I won't be able to go into great detail again um the slides have my email please please feel free to contact I think that's one of the most important things I want us to take away from this is to invite people to um to to call to answer this call to action and great to join together to expand this work isolation techniques which we briefly discussed timing and execution issues which are fundamental also to safety we have multi-threaded systems any type of concurrency we want to be sure that one element um cannot corrupt another one due to basic um issues such as deadlock live lock race conditions all of these also in the domain of security are known to be um lead to potential vulnerabilities and can be exploited by hackers so the same type of techniques which are used to support and avoid uh to support security and engineering and um avoid such security issues in um in systems in concurrent systems we investigate the potential to use them for um safety as well um um ebpf is already well established as um as um as a quality as a as a context for defining tools um framework let's say for defining tools which are supportive for security and we investigate some areas of interesting areas in which it may be relevant for safety as well and fault handling when i'm talking about faults in the world of security we have failures it could be it's usually we're talking about because of malicious um um actions in the world of of safety we get faults because of system failures hardware or software it doesn't really matter where the faults come from in any case we can have built in mechanism mostly on the level of the drivers to be able to deal with those faults either um on one level by detection on another level by corrections and some of the same features which are relevant for security handling such faults um may be relevant for safety as well and the bottom line is safety and security are not the same but we want to try to find where they meet and where they can join together in actual code in um in engineering design okay so the first area which i'm going to focus on are memory protection features it's something which is so basic to security already and it's it's i found it especially when i started working on safety quite mind boggling that um there was no familiarity with the potential and the power that these um features can provide and about in after this slide i'll go into a few examples but um there are kernel configs very basic common kernel configs which are used they can be enabled or disabled depending on you know the specific setting at hand for um security a lot of those are similarly relevant except that the justification or the the context where they use this may be a little bit um different in safety but the same features are relevant for safety as well now what i did was i started putting together a database and if you see the link here if you get the slides after this talk you can go into the database and and and see the draft on what we have there of of a list of um some um configs kernel configs um we mapped them on into iso 26262 to give them the safety context and again um i'm coming from the automotive domain domain so that's where i focus but a lot of what is written there it's it's pretty similar to to the mindset and the requirements are defined um for safety in other domains as well and similarly they're also mapped into security cwe's to to give that context of vulnerabilities that those um kernel configs are are have the potential to deal with okay it's also divided down into different groups simply for ease of access different memory type c stat kernel memory etc etc and what i provided there is a very basic layman's description which is um um a bit more detailed and a little more more um um easier for a um a layman who's not an expert to understand the if for example compared to the um um kernel driver database um which is the common resource for kernel configs there's some we added some implementation guidelines runtime performance impact issues which can give um which is um the user should be aware of so that they can make judicious use of these um different settings and basically the idea is to identify configs which are potentially relevant for safety you don't make any safety claims i told you i'm not a safety expert i don't make any of of compliance with any standards or anything of that sort but we're just giving ideas i'm lighting up some light bulbs showing people hey you're dealing with a problem you have a safety mechanism which you need to be able to resolve that problem here are some building blocks which may be relevant for you and it's up to you as the integrator of the system to look into this and see how um they when you use these features how they impact your system as i said before most commonly the safety mechanisms are implemented and tested to be able to demonstrate um the level of risk how that's reduced okay but again all of that is i don't deal with it that's part of the work of the safety expert what we do give our to do is to identify configs which are potentially relevant the database deals most completely with memory protection features there are some details also on timing and execution issues which we'll talk about afterwards and this is also an invitation hopefully others can help to expand this database to make it more useful um and more general generally available and and to add results also from practical implementation practical use cases so some examples okay and again anybody who deals with security probably says oh of course all this is basic for security um what's new here as i said before is how these may be also at the same for the same price basically um have um value for supporting um safety cases i wrote down on the bottom the disclaimer which i which i explained before because i i don't make any safety claims or or any claims about how these um comply with any safety standards there are only examples which seem to be relevant and have been used by others in different use cases for this purpose okay but it's up to any user to to make those claims and proofs and their specific system okay so for example um defk mem um is to disable kernel direct kernel memory access which is something which we normally um don't want um to support so again this starts ringing bells of freedom from interference fortified source it's it's allows us to detect buffer overflows and common string and memory functions um again which um reminds us of various requirements which are relevant for um overflow buffer overflows and how these are actually related to different safety requirements um proc k core um is related to kernel debugging features again i'm not going to go into all the details here these are very basic very commonly used security features but again um the database provides some detail on how these may be relevant for safety as well strict kernel rwx defines um what should be basic and and and common sense um access control on kernel data kernel code um and again it should be relevant also for protecting kernel code for safety purposes as well and thread info in task basically moves the thread information off the stack where it's a little bit more vulnerable from security from interview and moves it into the to the task struck some of the thread information maybe especially in in heavy concurrent systems some of that um thread information maybe safety critical and by moving it into the task structure um it it may be a bit easier to prove how we can protect and isolate that safety critical data again that needs to be investigated in a specific context but it has been found by users um to be useful to support their safety claims on that data okay so these are areas these are different types of of kernel configs which are um acceptable for security and seem to be relevant for safety as well on the other hand we've had some surprises when we started this work the first one was dev mem which for anybody who comes like myself from security basically when you disabled um dev mem you disable direct physical memory access and um it sort of doesn't make any sense in the modern system that a user application should be um in given the power to um corrupt or access directly physical memory on the other hand in um in many um safety critical systems um we found that safety mechanisms were actually implemented in this way um bi direct memory access which is which created some problems and code had to be actually refactored we did find however that it was very useful long term by enforcing this setting it created the type of separation which we were looking for which was important for safety as well so we were able to build in more checks and balances by by defining code in a more appropriate and modern way where we define the different levels of hardware versus software and um we have better um APIs between the hardware and the software and we really don't expect a user application to be able to directly um manipulate physical memory um other issues i gave some examples here which deal with traceability in the world in the domain of security we normally would want to disable these settings because of the the the fear of information leakage on the other hand um in the domain of safety there's a critical need for traceability because um if we think for example what happens after a plane crash or a car crash crash a major incident um one of the first things they look for is that black box which um records what happened up to the time of the of the crash or the failure and um so this trace these features are necessary in the domain of safety for providing um that trace to support that traceability and here there's a tricky issue of having to deal very clearly what are the security requirements what is the level of security um which is necessary for the system and what are the safety requirements and how to reconcile between them so that we can support both the security needs as well as the safety requirements for traceability and the last area is something which is very fundamental also for safety is what happens if there is a failure um there's normally a requirement to to transfer to a safe state and a safe state is one in which the system cannot damage itself or damage others anymore and most often in that state is the state when data is collected from memory from the hardware from the scope of whatever and um collected so that uh uh postmortem can be um carried out by the people who analyze those results so again this may um this necessity to um support that switch from one state to another state something which is normally disabled um in security because you don't want a hacker to be able to to control that and the need to design the system to to give the necessary security support but at the same time to enable uh secure transfer to the safe state is something which needs to be taken to account and um again it's a play area where security does not naturally meet safety but by awareness and by clear definition of the security requirements the safety requirements the implication of the different features um then the system can be designed um to meet both safety and security requirements so that's where this understanding um is is necessary and important okay my time is is getting a little bit of tight tight so i'm going to go quickly freedom from interference is defined in the standard is in this way and it's it's uh uh you can look at it afterwards it's just it's like a whole shopping list of every potential area of corruption between different elements of the system focusing on timing and execution memory or exchange of information and it's quite mind boggling how um safety experts expect to support this and again Linux provides many features many of which are derived from security which um are helpful in this area okay the next slide i'm going to to skip over because i mostly had talked about it before but um when we want to be able to to give to define the safety architecture to support the freedom from interference we we deal with um um kernel basically Linux features i mentioned a few more it's a different list than the ones we had before but i mean these are all things which should be familiar namespaces cgroups kernel capabilities and we found that by using um well-defined configurations um such as for example but this this is not a necessity but it's it's useful system the unit files it's easy to to use those as a basis for safety claims because we can demonstrate what has been defined and built into the system and what is not timing and execution the next area there are some kernel configurations which can be enabled which help us to test a system most of these settings have a strong and and unacceptable impact on performance so they're normally not used for production systems which go out to the field but certainly for offline says testing if you want to um make safety claims i want to be able to test um how are how robust our system is how how how um what the level of risk of deadlocks and race conditions on other such issues we use these and these are useful also for security testing as well okay not in the sense of pen testing but in the sense of understanding how well designed our system is and how we can um expect that their system should be free from potential race conditions or whatever it may be um there are also some well-known tools which are useful for this um casey sand t-sand um again i'm not going to go into the details but these are very very useful both for security testing as well as safety testing ebpf and security as well established in safety it's a new domain we are fine that um we are found that many of the ebpf based tools which are useful for security or also useful for the tracing and profiling which is so critical for safety as well we also have been and um this is my personal thing i've been investigating the use of the ebpf verifier which um originated obviously as as a tool for security to ensure that the ebp f code is not malicious um when it's introduced into the system and there's an ongoing investigation of my personal to try to understand um how we can see this as a model for safety runtime monitoring as well and um also the whole domain of xdp which basically allows us to avoid the network stack and all of its complexity and the necessary for necessity for testing and to deal directly below the network stack with um communication and to understand how that helps us in our safety claims because if we're dealing mostly on the hardware level on the on the nick um it's a lot easier to deal with the hardware safety claims which are usually provided by the vendor um because the hardware is well tested before it's deployed and we don't have to deal with any safety issues on the network stack level as well so especially if we're talking about an embedded system um where direct um hardware um communication is relevant and xdp is the most um um performance effective solution perhaps um there are safety implications as well on how that fits into a safety case and how we can make claims and and prove um um the robustness of the system from a safety point to view okay fold handling is basically um how we can extend linux drivers so to support um both detection and correction of errors there are lots of features which are built in and this needs obviously collaboration with hardware vendors so that those safety features are built in and to provide that open source infrastructure so that faults are um detected by the hardware and the potential also for correcting them and there are some vendors who are very supportive in this area who provide um these features in their hardware products um there for example the infrastructure there's a a very complex um framework for advanced error reporting which can be leveraged and is very very useful for this type of fold handling okay there are some practical considerations um for example um we have found that some of these features are less relevant mainly for for practical reasons for complexity non um difficult difficulty to maintain long term um different difficulty to scale up from simple um basic examples that's been my experience um again this is an invitation for people to join to participate to contribute if people have a way of um defining selenix policies or actually seccomp which is something which should be very relevant for um safety in a more practical way it's it's it's an important area of investigation and um one of the areas which I would like to see similarly to the way we have linux security models perhaps to define safety models which will be open source building blocks but that's sort of a dream um but again if we don't dream you never know how far you're going to get okay and the last slide here is just an invitation for people to join um forces and to contribute to Eliza Eliza um I mentioned it before the charter is that um this says that the mission is to define and maintain a common set of elements across a tool that can be incorporated into linux based safety critical systems amenable to safety certifications um there's a lot of ongoing work by safety experts about um you know the the safety requirements and how those how things can be certified I'm not focusing on that what I am proposing here is an invitation to people who come from the more technical community people who actually produce such systems and which to contribute to expand on ideas which I've mentioned here perhaps other ideas and see how we can join forces and give more building blocks to those safety experts and who can help to to to define them in a way that's um they can actually be used in in safety critical applications and to um in actual safety cases okay and also obviously I'm sure we will find gaps and also to to introduce enhancements kernel patches to help um expand existing linux features in this way to support safety cases okay so feel free to contact me if you have any questions if you have anything to contribute anything to ask um if this area interests you sorry I can't be in person but um I'll be very very happy to hear from anybody um via email and um we'll see how we can take it from there and perhaps to get together a work group of technical people within Eliza if there is sufficient interest thank you and have a great day enjoy the conference