 Hello everyone, thank you to attend this talk. So the today's topic is how to enable Linux within a safety compliant architecture. So this talk was prepared by myself, Philippe Parfoll, CEO of IoT.BZDH and Stéphane Desneux or CTO. So before moving further, quick introduction about ourselves. So IoT.BZDH is a company located in Laurent, an arbor city located in South Brittany, a western part of France. We have a team of 30 Linux embedded engineers and we have a partner in the automotive sector but also in the aeronautics and the energy. So we've been working historically with Renaissance since the creation of the company and in fact we have to send a warm thank you to both Issao and Kurokawa who have been helping us a lot and in fact many of the topics exposed in this talk were done with their support. But we also work in other industry as with Safran the aeronautique or with Total in the energy sector. So in the end of the day, both the safety and security requirements are pretty similar in those different industries. They clearly use different terminology, potentially different standards, but the core root requirements remain pretty similar. Moving to the art of this topic, the first thing is that in order to make a complex object, safe and secure, you have to work very hard. It's not an easy task. In the car industry, we clearly see an exponential increasing of the complexity. A modern car today can easily have more than 100 million lines of code. Customers are requesting a shorter and shorter time to market, which means that when we have five years to do something today we only have two or three. We get smarter hardware, which is really cool because we can do more advanced things, but this also impose a more complex software. And on top of that, the industry is moving toward a software defined vehicle that imposes a central architecture where we share a lot of different services within the same architecture and this imposes to mix both secure and non-secure services within the same hardware. And last but not least, we have a full connectivity and we have on the update, we have remote control and we also have a customer who are demanding new services, which means that we have to implement things on the car even after the car went on the road, which obviously was not possible with the previous version of the software in the car. And so this is leading to more and more risk. Obviously it starts by increasing the surface of attack significantly. We get more complexity in the car and we have a strong connectivity with the outside world. On the other hand, we also have cybercriminalities that are getting smarter and smarter and a car remains a very expensive piece of equipment, so it's important to protect it with the right level in order not to be attacked on the first day you go on the road. We have more and more advanced connectivity as I said before, so not only we connected with the cloud and the cloud controlled by the EOM, but potentially we also talk with other services like maybe your house or some multimedia services, some weather forecast and so on. And we get quicker obsolescence of product, so people expect more reactivity to the market and they've been trained with their phone when they get new features every day and so they expect more or less the same type of behavior on the car. And this obviously leads to more regulation. Obviously the UN R-155 and 156 are probably the two ones we're talking about today because they are going to be enforced very soon in Europe and other countries, so it's a big shift moved towards that specification. But we should not forget about the EU Cyber Security Act that imposes you to maintain for the full life cycle of your unconnected object to maintain against cyber security risk. We also have some national protection law like in Europe we have the GPRDs that impose a strict compliance to the privacy. And we also have some more specific services obviously coming from the industry itself, so in the car industry we clearly have the ISO and SAE specification for both safety and security. And we should not forget that because we are connected to the cloud we also inherit from the cloud regulations or all the series of 20,000 X that we also have to take an account. And last but not least, each T1 and EOM is coming with its own set of specifications that you also have to take an account. So if we look for the safety standard constraints that we have to fulfill, the first one on the safety standard is the one obviously defining at the general level the guide life for industry. So typically the EIC specifications that define the CIR level from one to four. More specifically for the automotive we have the ISO 26662 that defines the guideline for AZIL not forgetting that we need AZLD for asynchronous driving. And but we have also a lot of related standards that we have to take an account like we have special standard for industrial communication network. We also have some cybersecurity specific to the vehicle on the road. We have some standard for software and programmable component and so on. And depending obviously in the industry, so if you are in the medical sector you also have some specific specifications that come from that industry. And there are more to come. So what we have today is only the beginning and more is going to come. Clearly the hot subject today is the UNR 155 and 156 that are going to be mandated for all new vehicle in Europe starting 2024. The same regulation is going to happen in all other countries Japan, South Korea, UK, Canada and China. They have slightly different schedule and date of enforcement but more or less everyone is going to shift very, very quickly. So this is leading us to the fact that on one hand we need to have more and more complexity. On the second hand we have to fulfill with more and more regulation. Traditionally regulation was done by leveraging the V-cycle but clearly the traditional V-cycle is not applicable neither to modern hardware, neither to modern software. So on one hand on the hardware side, modern hardware are just too smart to be fully predictable. We have a lot of Silicon Bay's continuation features so we have some built-in secure enclave in the hardware but we don't really know how the isolation is done. At the same time we have a lot of sharing. We share the CPU, we share the cache, we share the core, we share the RAM, we share the IO and it's not really clear what is the border between what is isolated and what is shared and on top of that we have a lot of optimization algorithm that are hidden deeply protected by industrial property and so at the end of the day you don't really know what is happening within your hardware. On the software side in theory you should be able to control everything because most of the time you get the source code of what you're running. I say most of the time because it's not obvious you get it systematically but most of the time you have access at least to it. But when you have 100 million lines of code, it's clear that a lot of those lines have been written by people you don't know. They've been written long time ago and potentially those people retire and so they're not there to maintain the code anymore and I just take the example of GDPC where the first release was done in 1987 and so you still have some code from that period of time. And cyber security is clearly not compatible with the two year certification cycle which is another issue. With the V-Cycle it's a very long process and you need to have a lot of reactivity with cyber security and typically the industry is asking you a time scale of 30 to 90 days between the time a CVE is published and the time is corrected in protection. When the reality today of the industry is more it takes two to four years to move from the time a CVE is published and the time you can put it really to push the correction in production so which is just incompatible with the risk we have to deal today. And not only that but we also have to maintain the code for all along product life which for the car industry clearly means more than 10 years in fact it's more than 15 years than 10 years that you have to maintain it and so that's also another challenge because as today more or less a car was shipped with a version of software and was keeping more or less this version of software for the full life cycle of the car. So when we look for all this new complexity coming in what is the response of open source project? What people are working on? And the most visible project is clearly EDISA that is working on the kernel configuration and option required for safety and security. It's a very very long road to go there are a lot of people working in so they're working pretty well but we do not expect any result in the next short coming time. Another group that is also working pretty well is CIP this one is really focusing on critical public infrastructure it mostly focus on cybersecurity and really focus on very very long time kernel maintenance we're talking about 20 years or more and as today safety is not part of the scope so they mostly focus on cybersecurity. Very differently we have a project like Zephyr so Zephyr is not a Linux, it's a real-time operating system we're going to talk about it later and Zephyr has done very significant project to move towards the SIL-3 level of certification they also target AZLD whether it's going to be COD's another story for the core OS only so not the extension but it's done pretty good work especially they revote a significant part of the code to comply with Ms. Rassie and other documentation requirements for the certification they claim they could be ready by the end of 2023 we'll see if they're going to be on time or not but they've done a good progress and so we're very optimistic that they're going to succeed in a relatively short period of time we have other projects that have been more dormant OSDL is one of them they focus on safety and security for industrial Linux but we've not seen any significant project in the past four years so if we're willing to go to certification I think it's important to remember that SMALL is really beautiful to certify so the bigger your project the harder it is going to be to certify and so one way of going to certification is like Elisa is trying to do which is certifying the full Linux or at least a small Linux so we do set up Linux but still a full Linux the other option is the option we took in our work with Renaissance is to isolate the safety part into a smaller operating system and in all case we selected Zephyr because it's both small and recent and honestly as I said in our opinion it is a mass-promised open source candidate there are other candidates but the code is much older and we're more skeptical about this capability to go through certification now on the Linux side we have a lot of legacy code and complexity and let's be honest Linux potentially kernel might never be certified maybe Elisa is going to succeed maybe they're going to fail and if we look at what Linus Torval said he said that the Linux kernel originally envisioned that was very small and very modular and very efficient is not what is Linux today and each version of Linux is getting fatter and fatter which means it's going to be harder and harder to certify so either someone is able to do a strong cleanup on the Linux kernel or it's never going to be certified for safety another good candidate for safety is the trust zone part in the ARM architecture we have this trust zone and in this trust zone we can run the micro kernel or real-time operating system which is potentially small enough to get relatively easy certification now on the trust zone we have one special issue the trust zone is sharing the core with Linux so potentially we may have some conflict between Linux having some influence on the trust zone itself so it might be a little harder for certification but it's still a good candidate and we've seen some projects for AZLB especially where people are running a loop check in the trust zone as today at IOTBZDH we considered that at least for 202022 the safety option remains a full hardware isolation which is running a real-time operating system in a dedicated MCU that is hosted within a modern SOC and for the open source operating system we see Zephy as really the best candidate with no real competitor as today. In the commercial sector it's very different, there are a lot of offering proven core, autos are clearly in the automotive industry is from far the most well-known capability to do that now on the cybersecurity space on the other hand Linux is clearly leading the space and Linux has a thousand of different options to implement security and we should not forget that we cannot get safety without security and potentially you can get security without safety but the opposite is not true and so when we look for the option we have inside Linux to implement cybersecurity we have a lot of tools the most well-known one clearly is a name-pace for container isolation with an unbind system call inside the kernel on the mandatory access control we have AC Linux and SMAC for kernel object protection we have C-group for resource control, we have second for kernel firewalling we have the trust zone to bootstrap the initial root of trust, we have the capability to implement encrypted filesystem and at least on the recent version of the kernel we have EBPF that allow us to implement directly within the kernel some hook for introspection which is very useful if you want to implement intrusion detection this being said all those mechanism might not be compliant with your hardware embedded constraints in the embedded world we have a limited of hardware capability so potentially your RAM your CPU, your battery is very limited and so some of those feature may request too many capability or too many resources to be compliant with your project you may also increase your boot time in a level that is unacceptable, this is especially true if you start to crypt your file system and also another element that you have to take into account is unfortunately we don't start everything from a green field but we have a lot of legacy, software and hardware we have to take into account and potentially those new mechanism might not be compatible with your existing software of hardware so if you run the Linux kernel version 3.x probably you're not going to use the BPF and the last element is you have to be careful that even if you're able to implement a safe and secure mechanism potentially if the complexity is too big and if you cannot implement the documentation that proves that your system is safe you're never going to go through the certification so you also have to mitigate the complexity that you can document in order to go through the certification so as I said Linux is a very good candidate and clearly is the best candidate for cybersecurity is clearly not the best candidate for safety so Linux alone might not be enough for safety reason you have also other element if you need to implement hardware Linux is clearly not the best candidate if you have to do very short wake up on data sensors if you have to certify as I said before for safety but also if you have to run on battery for a very long time for example if you have your car getting parked for 6 months you're not going to run Linux for 6 months on the battery or your battery is going to die so you probably want to run a smaller operating system that is going to check whether the battery is too low or whether the car is moving and in case something abnormal happen you wake up Linux and then Linux can exchange data with the cloud or to notify the event to the cloud but you don't have the Linux running full time just for the supervision of basic element there are other element where Linux is probably not the best candidate where you want to have some dedicated microcontroller or coprocessor video encoding is clearly one audio processing, neural computing and so on so there are many other element where Linux is not the best candidate to really implement very specific algorithm that have either very real time constraints or have strong safety requirement the good news is that new hardware multi processing allow us to run a different operating system on the same SOC so on top of the traditional symmetric multi processing that we had for quite a long time now where we can run a rich operating system on different core having a compatible architecture so typically a big a little Indian on Arm V8 where you run everyone is running architecture 64 bit on A57 or 53 so Linux sees that as different cores but it does not see really a difference in between the different core because they are compatible with a new generation of hardware it was introduced a capability to run very different core within the same SOC and so you can for example run a 32 bit core when the rest of the architecture is in 64 bit this is what is happening for example on on the renaissance R-Core Gen3 where we have a cortex R7 so it's Arm 7 when the rest of the architecture is running Arm V8 so Linux is running on the Arm V8 and we run Zephyr on the Arm V7 so this allows us to run multiple OS on the same SOC without an hypervisor so depending on the hardware you can run either a real-time operating system and then one Linux or you may run even two Linux because you can split even the 64 bit cores so it gives you a lot of flexibility and you can do all of that without requiring an hypervisor which is really cool so obviously when is that the supporting that but they are not the only one in fact most of the modern hardware support one or multiple microcontroller and hardware isolation at the SOC level whether it is NXP, Xilinx or ST architecture so clearly modern hardware is your friend so if we deep dive a little more inside Zephyr and Linux on R-Core this is the architecture of the Gen3 R-Core so we have Arm V7 R7 which is a dual lockstep processor so it is designed for safety and alongside we have for Linux for A57 and for A53 and so we can run Linux and Zephyr on the same SOC and so if we look for what we have at the R7 we have really a fully isolated zone so the Cortex R7 is a 800 real-time processor it has some dedicated device for example you can locate the CAN device or your device to the R7 and then Linux cannot use it at all and we use OpenIMP with a hardware mailbox in order to exchange in between the two worlds the real-time world with Zephyr and the rich OS with Linux and this model allows you to run the critical service inside the real-time zone where you really may have hard real-time constraints you may go through certification because it's a very small operating system so the amount of documentation you have to produce remain acceptable you potentially have very low power consumption because you can stop Linux and you can continue to run the R7 and you nevertheless can have a smart and standard mechanism to exchange data in between Linux and real-time as I said before in many cases you know the real-time part is going to do something specific so either because of real-time constraints or because low battery constraints and then it's going to send the result to Linux and Linux is going to compose those data it's going to send those data to the cloud or it's going to expose the data to the end user operating system and this architecture is available on many different architecture ST, NXP, Xilinx and so on even if most of the work we've done today inside IoT BCDH is done for renaissance boards on the Linux side we also have a capability to run isolation and we should not forget that it might not be enough for safety at least it's clearly not enough it's not going to be anytime soon enough for ASLB we could argue about ASLB because that's a lower requirement for safety and potentially we could potentially deal Linux on ASLB but this imposed you clearly to have a very clean and proper mechanism where you can define the privilege you can define the audit of your global security and safety you can automatically generate the rules of ASLNX for the security but also the home, the capability and the set group for the safety so you can guarantee that one given task has access to this limited amount of resource so you can really split your Linux in different silo to guarantee that your critical silo will have enough resource to fulfill what they have to do and on top of that obviously you have to supervise and report because enforcing is not enough you also have to supervise and report so Linux has a lot of different mechanism for do that and obviously container is a key element for that in IoT business we use a red pack launcher for implementing this type of features so introspection and report is a very important point obviously as Steki we mostly tend to focus on the different technical element to implement either the safety or the security but reporting is a mandatory part and if you don't do the reporting you're going to go through certification so you should never forget about the reporting and one key element that is mandatory is to have an IDS to report inappropriate behavior whether we're talking about safety bad behavior for example your system should generate 20 image per second and if you only generate 23 image per second you have to report that because potentially you have to stop the image for algorithm for recognition because it's not going to work properly with only 23 image per second so you have to implement this type of mechanism and to report that so the good thing is for kernel system introspection we have EBPF at least on Risson kernel we have a lot of standard and generate Linux mechanism for supervision whether it's supervision of whether an IP table has been changed you use too much memory one thread is getting crazy and so on so we have a lot of capability to introspect the system if you have micro service architecture hopefully you also have the capability to do introspection directly at the micro service architecture network censoring typically is something where Linux is very strong for introspection and local action as well so it's important that you collect the log you clean that up potentially you store them locally because you don't have connectivity and when you have connectivity you send them to the cloud the cloud is doing the global supervision is doing the correlation generate new rules generate updates and then update your system in your car so we see that anyway even if you have something in your Linux is not enough you also have stuff that happened before at the development time where you have to define the privilege the firewall rules who can do what and then at the SOC global supervision level you also need to collect the data and you have to look for what is happening and clean your system and update your rules and send the update to your device and this obviously imposes you to have a full SOTA photo mechanism where you can update both your software and your firmware at IoT Visage we use the Mander IO for that so Mander is an open source project done by a Norwegian team it's a very nice project there are obviously other options to be honest in the automotive industry most of the people we talk to already have something because they want to handle the update software with the fleet management but if you have smaller system or at least for test and development Mander IO is clearly a very nice option and you want this to be integrated completely within your development chain and your CI CD in such a way that you produce automatically the update and do not forget that if you're willing to react very fast having your update going to the car in 30 to 90 days you need to have a very fast and efficient model for both applying the correction, checking the correction doing the audit test and then updating your boards another key element in the embedded is the long time support the average age of the car in Europe is 11 years a car typically is going to be trashed after 20 years for the maritime sector it's even bigger or the energy sector it's very common to have 30 or even longer lifetime for equipment so it's important to have long term support on your system if we look for the existing distribution today of Linux we clearly see that outside of the main IT distribution whether it is Ubuntu, Suzy or Red Hat no one is providing 10 years or more of support which is probably what you want to start from especially if you want to implement 15 years it's important to already start with something that is supported for 10 years on the other hand if we look for the embedded targeting distributions like Bilhut or Yokto none of them have long term support in our opinion the only way to implement long term support at an acceptable cost is by taking an IT distribution and to tailor your distribution to do the embedded world it's much cheaper than doing the opposite which is taking an embedded distribution and then try to do long term support on top of it and so this is sending us to a kind of global architecture where on the left we really want to have the safety part running in a dedicated core so for this slide I took the architecture of the Renaissance R-Core V3U so it's not a R7, it's a R52 that we have so we run the fee on that and you can run on that processors, that microcontroller you can run all the safe and realtime parts while on the standard A56 core you can run Linux and you can run a different silo to isolate your different service on Linux so having obviously the core service of Linux running at the global core and then have a strong isolation with obviously the traditional isolation like the security, the firewalls the C-group and so on without forgetting the fact that you have to send all the data on the cloud and on the cloud you need to have a security expert that is doing a global analysis of the full fleet to potentially deduct some small noise attack and update the security rules and then push through the software update on the air or firmware update on the air the different modification directly in the core so as a conclusion what can we say as today we don't have a perfect solution and we don't have an out-of-the-box solution but we have many options to move towards a safety compliant model with open source Linux architecture the first thing is we have a very strong model for isolation and we have we can implement very lightweight containers we did that at IoTBZ with Red Pack, Red Hat is doing it with a bubble wrap so you may implement containers that are not like the traditional or at least see very heavy container you can implement very lightweight containers which is really cool for the embedded world on the safety side modern hardware architecture combined with modern operating system as they fear can really enable us to run a very small footprint operating system supporting very hard return constraints that are close enough of certification so the cost of documentation is going to remain acceptable for certification on the updates side whether it is FOTA or SOTA you can combine that with a smart CICD mechanism and you can imply the functional or security correction during the full life cycle of your system which means more than 10 years to be honest today most of the people we talk to are asking us for 15 years of support so you have to be ready to do an update on very very old system for the root of trust, clearly the trust zone is your friend but you may also have some hardware secure element if possible we recommend you trying to move out of the traditional PKI model which is very heavy there are more modern mechanism like OpenID Connect or OS2 that can enable you to implement more lightweight security and authentication models that are in our opinion easier to support on the long run and for the long term support as I said first you have to rely on existing IT distribution because the cost of maintaining a distribution by yourself is going to be just completely crazy but also you have to stop to implement one version of Kernel per project I think it's very important that the embedded industry understands that they have to stick on a common version of Kernel and I try to push people with this model at the maximum one version of Kernel per year if you do more than that then something is going to be really wrong inside your organization and you should work to fix it and the future is going to be wonderful the future is always wonderful on the code side I think the rest native support with both real time operating system like Zephyr and Linux and microservice is going to help us to get rid of the C++ memory safety bug and I remind you that Android considers that around 70% of the highly severity of vulnerabilities they have is coming because of memory issues so moving to a language where we don't have a memory issue is really something very important on the certification side as today we still have a lot of work to do as I said today is very common that it requires 6 months to 2 years to go through the certification process and that is not going to comply with cyber security updates that you have to do in less than 30 or 90 days there are some promising work ongoing and I recommend you to have a check on OSCAL from the NIST an open security control assessment language and the idea behind it is that you should be able to define through this language functional behavior of your system and then to test it automatically which means that in the ideal world we could fulfill automatically the certification documentation on the introspection side we cannot use the existing ideas as they are they're just far too heavy for the embedded world but it's still possible to skim them down and to bake some things that is acceptable for the embedded system on the integrity as today obviously we already have the inviruities that enable us to implement the integrity of the system but we still have to work to implement some things that enable us to have a partial update and especially per package update while keeping the integrity and so we need to have a full automation from the software factory through the production of the integrity which today is not really done on the privacy that's a place where the embedded world have a lot of work to do and as I said in Europe it's a very strong enforcement for privacy with the GPRE so we clearly have to take care of that I think most of the file system are going to be encrypted potentially fully encrypted at least for the early day but on the long run we want to have per application or even per user encryption because we want to guarantee that one user cannot read the data of the other user so this requires a lot of tuning and potentially we have to check how much this is going to impact the boot time of the system I think we have the low level capability inside Linux but the full integration is not done as today so thank you for your time if you have any questions Stefan is going to be more than happy to response to them I also had in you a few pointers on the source code of the different element I talked about but also the software factories that we have where you can train with the different tools with the OSS board or other board that we support as well as a bunch of articles including this presentation and this documentation that you may find on our website thank you for your time and we are ready to answer to your question