 My name is Miklos Balint, and me and my co-presenter, Ken Liu, are from ARM's open source software group, and today we're going to focus on compartmentalization in IoT devices. And first of all, I should highlight, although most of you are very well aware that compartmentalization is important, and we all know at least one story when compartmentalization didn't go as planned, and that had great consequences. So this is important. But there are challenges when compartmentalization needs to be applied to IoT devices, and that's due to the fact that these are high-volume, low-cost, low-power devices, so there are compelling arguments for using microcontrollers in these devices, but they have some limitations or constraints that need to be considered, and that is that they have a small die area typically with limited hardware functionalities. They don't have a memory management unit, so there's a single physical address space which makes compartmentalization more of a challenge than in an MMU-based system. They execute code in place in a flash, and they have a limited amount of SRAM to store data. There's also the problem of having a wide spectrum of use cases. Each of them might have different threat models, and those have to be considered, so any solution needs to scale in order for them to have a reasonable penetration in this market, and all of this means that there is a need for a holistic approach to IoT security. First, let's talk about establishing the right level of security. The most basic issue is that if there is some vulnerability in the business code, you may want to separate security-aware aspects of the system, so you create a secure processing environment for them. That works for some threat models, for some IoT devices. This is sufficient isolation, but in other cases you want to be prepared to mitigate the vulnerabilities in one of your secure partitions, and you want to limit the impact of vulnerabilities thereby isolating a trusted compute base, so you separate trusted and secure components from each other, and you separate the root of trust protected. In this case, there's still a concern if you have multiple tenancy on the secure side, one vendor might provide one secure partition, a different vendor might provide a different secure partition, and they may not trust each other's code, and in that case, even more isolation is needed. You want in some cases to isolate all of the secure partitions from each other, and that of course introduces more and more complexity with all levels of isolation. There's an additional aspect to be considered. On the non-secure side, some non-secure OSs are already supporting compartmentalization for their execution environment, and in these cases it's beneficial if the secure partition manager can be made aware of the executing non-secure context, and so different access policies can be provided depending on the active non-secure context when a call is made for a secure service or access to a secure asset. Just a brief discussion of hardware isolation, which is the foundation of software isolation, you can either have physical separation in your system when non-secure code and secure code are executing on different physical processing elements. In other cases, you have temporal isolation, which means that secure and non-secure code share a single processing element, and the security is a state in the processing element itself. Now let's look at some interaction scenarios, and I will primarily focus on crossing boundaries in a single processing element, because that was the first task that we were addressing in the Trusted Firmware M project that we were working on. The problems I'm presenting also need to be investigated from a threat modeling point of view in a physical isolation, but as I mentioned, I will focus primarily on a single processing element, which has various execution states for the various components. The basic use case is that you have a non-secure thread that requests a secure service that is crossing from the non-secure to the secure state. I will be discussing isolated drivers, in which case you have some interrupt service routine that is implemented by a secure partition, and you don't want to run that interrupt service routine in privileged mode. There may be asynchronous events on the non-secure side that might preempt a secure execution, and I will also discuss those cases. Now, the simplest use case or the basic scenario for reference is when we have a secure state change, a non-secure thread requests a secure service. In order to do that, it needs a crossing from the non-secure state to the secure state, and in ARM's VATAM architecture, that is limited to some dedicated entry points into the secure system, which are called non-secure callable address ranges. In this case, we need some sort of wrapper functions to trigger some privileged management code that will help us check access policies for that given non-secure thread to perform parameter sanitization on behalf of the secure partition, set up the container or the secure partition. I contain that unit and then invoke the partition code, and for invoking the partition code, TrustedFirmwareM actually has two programming models. We will discuss those later in the talk. SecureInterruptDeprivileging. This is the scenario when we have a device driver in an isolated compartment in the system. Now, this flow actually has already been out there. It's supported on legacy architectures as well, so this is not something new, but it's important to highlight from a secure compartmentalization point of view. In this case, a privileged interrupt service routine serves as a wrapper. It triggers partition management code, which sets up the sandbox to execute a deprivileged interrupt service routine, which executes in the context of a secure partition. Non-secureExecution can be preempted in some cases. It's up to the system architect to decide if that is an allowed scenario, but from a real-time point of view, we anticipate that it's a quite common requirement to have some real-time behavior to provide real-time functionality on the non-secure side. What happens in this case is that when the non-secure interrupt preempts secure operation, the secure context is stacked. This is done by hardware on ARM V8M, and the non-secure interrupt service routine is executed in a non-secure state. When the interrupt service routine returns, the secure context is unstacked from the secure stack, so execution can resume. There are some threats that need to be investigated in such a scenario and some challenges. We have to make sure that the secure state remains consistent for the duration of the interrupt service routine. This can be done in multiple ways. We'll again discuss some of that later. There's also a concern that by using interrupts, the non-secure processing environment can starve secure execution. That's actually a very complex problem and we'll not discuss that in detail in this talk. There are a lot of aspects there and we can't take the discussions offline. We've been looking at that, but as I said, the time is limited. As I mentioned, there might be a need or a benefit of having awareness in the secure partition management of the non-secure execution context. For that, we have a reference implementation provided as part of CMC's trust zone context management functions where anytime a non-secure context, a non-secure thread is created, deleted, loaded or stored. Any context change happens on the non-secure side. There's a notification, a function call going directly to the secure partition manager to have the awareness and in this case, we practically mirror on the secure side in a client container the non-secure context if needed. An example use case, non-secure threads are created. We create or we prepare two contexts for these non-secure clients on the secure side. When thread one calls a secure service, then the first non-secure client context is activated and the secure service starts execution. Now in case a non-secure interrupt preempts that operation and if it triggers a context change, then we get notification in the secure partition manager from the non-secure RTOS and in that case, we activate the second context on the secure side and if that makes a call to a different secure service, then that service call is handled in different non-secure client context on the secure side. If that returns non-secure execution of thread two resumes and when thread two yields, then we just perform the same actions as previously. If a context change happens on the non-secure side, we again get notifications and the original context can be restored on the secure side based on that and the first secure service can complete its execution. So this was just, of course, an example use case, but there's quite a big scope of uses for managing for the awareness of states and as I mentioned, all these context, various non-secure contexts can be associated with different access policies on the secure side, so thread one might have access to different assets on the secure side, then thread two. It's important to note that this relies on non-secure context management and the non-secure RTOS. If there's a vulnerability in the non-secure RTOS, there is a concern that assets will be served to an unauthorized non-secure thread, but it's important to note that these vulnerabilities contained to the non-secure processing environment, so if the non-secure RTOS is compromised, assets might be mixed up on the non-secure side, but no assets are exposed which would anyway not be available to non-secure entities at all. So let's take a look at the two implementations, the two programming models that are supported by trusted firmware M, and one is the one we call the library model, in which case all secure services provided to the non-secure side or to other secure partitions is implemented as a function. This closely resembles a bare metal programming model, and there's very substantial support for this in ARM V8M architecture, and so essentially each secure partition is a collection, is a library of secure functions. This is a synchronous execution model, so a non-secure thread is blocked on a function call to the secure side, and it has a low footprint because there's an opportunity to allocate resources on the secure side on demand, so resources can only be... It's possible to allocate resources only in case a call is made to one of the secure services. The other programming model or implementation model that we are supporting in trusted firmware M is the thread model in which secure partitions are implemented as execution threads. This is a more robust, more prescriptive framework. There's a static allocation of all secure resources which makes it more complex and more robust as I mentioned. In some cases it might be beneficial from a determinism point of view, but of course it has some overhead associated with it. In this case, interaction between the various partitions happens using connections and messages, and it's possible to have asynchronous processing of secure service requests, this is why it's well suited for physical isolation. If there's a physical isolation of secure and non-secure execution and each has its own associated processing element, then this model fits quite nicely on the secure side. With that, I would like to hand over to my co-presenter, Ken, who will talk more about the interaction within the thread model. Hello, guys. Let's continue. Here, in the following slides, we will introduce the interaction implementation in the thread model. We use the general concept inter-process communication for the name. The IPC is the interaction name of the thread model only. So there are two kinds of the secure partitions in TFM, sorry, two kinds of partitions in TFM, the secure partitions and the non-secure partitions. Secure partitions provide the secure services, and the non-secure partitions just request the services. The non-secure partitions reflect the whole NSPE. NSPE stands for the non-secure processing. So there is one thread in the secure partitions. After the necessary initialization tasks, it runs into an infinite wire loop and a wet message there. All the client calls are sent as messages. The client call was called by the secure clients. The non-secure partition is a client because non-secure partition does not provide any service, and the client could be a secure partition tool because some secure partition needs to request a service from other secure partitions. And the interrupt required by the secure partition is handled personally in thread mode. This is the point different with the library mode. So there's a compartmentalization consideration for the thread mode. So there's no shared memory between the partitions. The memory is copied by some streamed API. And during the copying and processing, the memory is checked for its integrity to ensure that the memory belongs to the corresponding secure partition. And the peripheral usage is also the same. The peripheral belongs to one secure partition is actively to be accessible only when the secure partition is running. And we changed the long-term production loop. We changed the production loop long-termly to switch the I-Solution. Let's take an animation for example. So the secure partition 1 requests a service from secure partition 2. And the memory information will be sent within the message, and the secure partition 2 will call the system API lead to copy the memory into its own area. And now you can see the yellow rectangle means the I-Solution board has been switched to a secure partition 2. And why secure partition 2 wants to return the result back? It calls a write function to copy the memory back to the destination memory. And then the result is returned back to secure partition 2. And now secure partition 2 is running. So the I-Solution board has been switched back. So we are not focusing on the secure partition scheduling or something because it's quite generic. We are focusing on the non-secure partition and the secure partition interactive on the ARM-waiter M platform with trust on technology. The trust on technology divided the whole processing environment into two. One is called the non-secure processing environment in the left side. And the right side is secure processing environment. The non-secure processing environment cannot access any resources that belong to secure processing environment. So since it cannot access any of the resources, how can it perform a secure call to request services? So there is a staging area called a secure and a non-secure callable for such kind of usage. There is a secure gateway in this area to check if the call has come from the non-secure part and check if currently the secure call happened in place is in the secure and non-secure callable area. And if the connection check passed, it will enable the calling to the secure side. During the secure call process, there are some logistics, some hardware condition status change. They mainly change the stack point. The hardware stack point, we are transitioning to the secure stack point during the secure call process. And during the whole, during the secure call process, the general purpose logistics may keep same and some status will just change it. And the most important thing is the hardware stack point. So let's take a simplest case. The single non-secure server requests a secure service. The non-secure side, one non-secure server requests a secure service and then the secure call happened and the hardware stack point switched to the secure entry area. And the secure entry area will call the connected API in the SPM. So during the non-privileged to the privileged call, a call frame is generated and pushed into the secure stack. And then the connected API was converted into messages and the messaging and scheduling would happen. And the destination secure partition will be activated as a learning and it will be served as a message and served. After the service is handled and the result is returned back to the client API area. During the returning process, the stack call frame is popped out and then we return the result back to the non-secure caller. We notice that there are two client API here, one in orange and other in yellow. The client API in the non-secure side and the secure side, the prototype and the behavior should be identical to the caller. The caller means one thread. So let's take a look at something more complicated. There are multiple non-skill threads because of the secure service. So as you are, the first secure call is performed and the secure partition is serving and the first frame is pushed into the secure stack. And then because our secure partition execution could be preempted by the non-skill side, so there's a chance that the non-skill authors get run again and another thread request and the second secure service. So the second, here comes the second secure service and the second call frame is pushed into a stack. And then the SPM schedule is working and we activate the corresponding scope position and handle the service. And after the service handled, we need to return. But the problem is that the non-skill side shares the secure or secure stack point. So there are actually two frames pushed into the stack. Which frame is the correct frame we should return to? Because we don't know the non-skill status now. So for this problem, there are two possible solutions. Actually, there may be more solutions, but the main solution, the main solution is two here. Solution one is quite simple. Here the first secure call comes and the call frame is pushed into a stack and the service is handling. And then during the second secure call, it wants to call the current API in secure side to generate another call frame. And we do a checking here. If we find there is an existing call frame there, which means that a secure, a provided secure call is still ongoing, so we just deny the second one. And solution two, we provide another set of APIs running the provision mode for the non-skill at all. The basic idea is that we could prepare a corresponding dedicated secure stack for each non-skill thread who needs the secure service. For the non-skill thread one, we could dedicate the secure stack one for it so that the non-skill thread two, let's take a look at a real working process. So the non-skill scheduler will activate the non-skill thread one, so it calls the API to notify the SPM to prepare the secure stack for the non-skill thread one. And then the non-skill thread one gets long, and it may perform a secure call. During the secure call, his call frame will be pushed into the secure stack one prepared for it. And then if the non-skill thread gets switched, it's still the same. The non-skill scheduler will sync with SPM to prepare the secure stack two for the non-skill thread two. And then secure thread two may perform the secure call. Okay, let's take a look at the interruptor things. So the secure execution in the secure partition could be pre-empted by the non-skill side interrupt. Let's take an example. A secure call is performing a service handling, service partition one is handling the service. And just before, just as it's secure partition one is running, there's a non-skill interrupt occurred. And then the context change would be something like this. There's a preempted context being pushed into the secure stack. But the non-skill scheduler, I think the non-skill thread maybe one is still running. It associated the preempted context with the current, with counter-learning non-skill thread. And then it service the ISR in the non-skill side. So the call is orange. And after the ISR handle the interrupt, it's returned back to the saved context and continue the secure execution in the secure partition. So the trust on hardware ensures that no information is leaked to the non-skill side. So during the secure context handling is something complicated to see, which means that the context is saved in the secure stack and all the general purpose registers are cleared to zero. And then hardware generates a magic number to the non-skill side to tell the non-skill scheduler that you have just preempted a secure execution, but you don't know the content. So the non-skill side just saves the magic into the non-skill thread context. And then after the non-skill execution is finished, then it's a switch back to the secure side. This is the non-skill interruptor preemptor secure service. And for the... Okay, missed some... missed some returning pass. So for the secure, secure interruptor preemptor execution, it's a bit different with the non-skill interruptor. So here we have a secure call and the service is handling the... Our service partition is handling a service and then you find that the secure partition finally needs an interruptor to finish the service. So it calls API wet for the interruptor event. And then the scheduler will return the execution to a potential partition. It may be secure partition, one secure partition or the non-skill partition. And then the secure interruptor occurred. So the secure isa first created a message in the thread mode or APIs, or APIs based on the messages. So create a message and push to the service stack. And the scheduler will find... will trigger the secure partition who is waiting for the interruptor to running and the secure partition just run and handle the interrupt and finish the service. And then after all these processes are done, we return back to the preempted caller, which may be secure or non-secure. So the color is mixed. It may be orange and may be green. So, okay. So this is the case for the interaction between the partitions. We focus on the non-skill partition and the secure partition interactions. So, okay, that's all. I think I will give the talk back to me once again. Okay. So just to summarize the topics that we've discussed. And this is a very brief summary. Compartmentalization in IoT is not trivial. There is no one-size-fits-all. There might be different approaches to achieving secure and non-secure isolation. We might have various considerations on the need for privilege control, both on the secure side and on the non-secure side. And the interaction between the compartments in the system may happen based on function calls, an inter-process communication type behavior, or in case of physical isolation, interaction may be as simple as a hardware mailbox. In any case, it's up to the system designer to pick and choose the right approach, both from a hardware selection point of view to a software framework point of view. Trusted Firmware M project is focusing on providing a scalable framework which tries to cover as many of the use cases as possible with a single framework. And so we try to provide various build configurations to support multiple configurations out of this platoria of selection that's available. And we are trying to maintain a holistic view on the security of the system. So with that, I would like to highlight another talk that was made earlier by Ashutosh from our team, where he was talking about security and IoT in general and how Trusted Firmware M implements that. Trusted Firmware M is part of the open source, open governments, TrustedFirmware.org project. We have a team here at the Open IoT Summit, and we gladly take any questions at the arm booth via email, or actually we still have a few minutes left, but we're open to questions at this point. Thank you. Any questions? Did you consider to use RTOS native scheduler's data of scheduling inside secure part? For example, by adding some extension to support a plugin, some switch context code, or something like that. So our primary focus was to, in the first iteration when Trusted Firmware M was launched, the primary focus was to get a low overhead, a simple solution. Now we're at the stage, this is the second year we're running the project, and now we're looking at making the solution more modular and to create more of an alignment to APIs. For example, an API for plugging in standard OS interfaces or standard OS tooling for the scheduling part or for the interaction part. So we're looking at various options for that at the moment, and we are open to discussions for any proposals on how to solve that in Trusted Firmware M. Thank you. If there are no other questions, then thank you for listening.