 Hi, so I'm going to talk about the shared system resources on a multiprocessor system and give away of accessing this shared resources. So let's go to the presentation. So short word about me. So I'm Lainel WL, so I'm currently working as a software engineer at ST Micro Electronics for now more than 10 years. So currently I'm developing on the STM32 MPU platform, which is currently the STM32 MPU 15 platform. So my focus is clearly around boot and the security stuff of the ecosystem that we deployed. And this is a way that helped me to contribute to some open source projects, which are on the trusted framework, the OTS, UBoot or Linux kernel on the crypto driver parts. So let's now have a look to the agenda of this speech. So the first item should be to discuss a little bit about the multiprocessor design, which is why we are discussing about the shared resources. And now what's defined, what are shared resources on the system? And the second item should be to know how to manage this shared resources. A way to manage the shared resources would be using the SCMI protocols. So I will give you a description of the SCMI protocols and give some associated implementation that are used to prove these SCMI protocols. So I will go a little bit deeper at the end of the speech just to give you another view of the SCMI protocol used by the STM32 MPU platform. So first of all, I will present you the multiprocessor designer. So why are we discussing about the multiprocessor? Because it's a standard in the SOC architecture today, because SOC architecture are multiple elements in a single chip that embeds some multiple heterogeneous processors inside. So the aim of the multiprocessor design is to isolate different functions, specialize for more efficient accelerators such as the real-time processors, audio DSP, video encoders, decoders, FPGF, for example. The gain of embedded such way of multiple processors is to increase the performance of the overall system, because you can run multiple dedicated tasks in parallel, and this is really more efficient to use some specific multiprocessors. So the good aspect of using multiprocessors is also to decrease power consumption, because depending the use case you try to play on this system, you are able to pull off some processors that are not used depending the use case which is running. It also reduces the overall cost of the system on chip, because it integrates multiprocessors in one package. But the overall cost of this solution implies to share the resources on similar, on different and multiprocess entities. So why discussing about also this multiprocessor design is that because you will find these different SOC architecture on a lot of objects today on a lot of projects which are mostly the most common of mobile projects, such as some SOC given by Samsung and Mediatek. We have some IoT gateway now that embeds a lot of connectivity-specific entities like the STM32 MP1 or some NXP IOMX series, or some artificial intelligency-specific SOC. So let's discuss now and address the shared resources. So first, the first is a point about the terminology which will be used during this speech is about what are shared resources and what are the resources in the overall system. So we can call exclusive resources as a peripheral clock or inter-recept or all other resources which are assigned and controlled by a single entity without any conflicts, which are most of the common resources that you will use. But we have to face some shared resources which are the central SOC resources shared by several processors or peripherals that can be used by several entities at the same time and which have to be shared like so GPIOs, regulators, clocks, resets, interruptions that may be used by the different entities at the same time. And this is the key point of this topic. We have also some common register banks that are mostly used inside the SOC, but they are most platform dependent. So here is a global overview of a system on chip that you can see that you may have some different processors connected to an interconnect. So whatever the processor are embedded in the single in the same cluster or different processors or FPGM, you may have some different peripherals also. And all of the systems are connected to the interconnect bus and they are trying to access some shared resources such as the memories or the other ones. So for the memories, it's a little bit particular because most of the system will embed, I would say, some specific firewall regarding the memory management. So memory is a little bit apart, but for the QS bus, for the sensors, which are like, I would say, voltage sensor, voltage detector, thermal sensors, clock, resets, power, interrupts on IEU, they have to be clearly properly under on the overall system because the CPU use them for example, the GVFS management of its own processor. So managing the power and the clock for that. And they have the CPU need to manage also peripherals for the runtime power optimization or managing the reset of the different peripherals they own. So here is a big picture that's showing the internal STM32 MP1 architecture. So I will let's go deeper in detail on each different blocks of the system because it will take too much time. But the key point is that we have at least three different firmware running in parallel in this sock. Two ones are separate on the Cortex-A, which is running two-side, A7 secure, A7 non-secure side, which own its their own firmware, and a Cortex-M4, which is running its firmware also. So now the focus is on the different parts and different blocks that may be shared between the different firewall, as are the XTI, which is external interrupt, and we have the GPIOs, and we will be also very interested of the SCU, which is the reset and clock control block, and the power, which is here. And if you look, all these resources are maybe shared at the same time between the different firmware. So let's go to the next slide. So now managing shared resources. So the question is how to manage these shared resources in a complex system. I will say by the past we identified that the Linux OS was running and controlling the power of the application processor, which is maybe good, but which implies some issues regarding some system behavior, if you're Linux crash, or for some system attacks on robustness. So the idea now is to dedicate a single entity that will be responsible for sharing the resources, which what we will call in the coming slide the system controller. So it will be the main point of our system, and the system controller must centralize the global knowledge of the shared resources state, which means that the shared resources at any moment are a state which might be known and might be asked by at a different multiprocessor. And the system controller have a global and a current view at any time of the state of these shared resources. So the system controller must follow some specific rules, I would say, to properly manage these shared resources. The first one is the idea of real BT, that what we just said previously is that it dedicates execution context for the system controller is a key to avoid any attackers, any issue regarding the robustness of the overall system to execute properly and to control these resources, which may be quite critical in the NDCSFN. The second item is also to be able to control and identify the access which are made to this shared resources and properly identify the different processors that try to access to the resources to avoid any issue, to accept or forbid the access to a specific resources. A second point regarding this system controller is about the flexibility of the system controller. Why? Because, as we know, most of these shared resources are clearly platform specific. And that means that the system controller must be adapt, design, and develop link to the hardware. So this is the main critical development that should be handled by each provider. It might also manage, in terms of flexibility, the fact that your peripheral and your system will evolve during the time, during the use case and depending on the feature you want to address. That means that the system controller must be able to adapt and to be open in case of use case that change during the time to allow resources to one processor which is power on one other which is power off. So that is the key of the flexibility. So now the last point is maybe the most important regarding this system controller is the accessibility point of view. Why? Because, so what we say is that we need a dedicated execution context which means that we need a stable API to design this system controller in terms of access. So a standard interface to access this system controller is the key of a correct system controller management because it gives a way that the software implemented in the different multiprocessor, the software drivers may remain unchanged regarding the access to the shared resources. And this is clearly a key. And this is why we will discuss now about the SCMI protocols. The SCMI protocol will be a key to access to this system controller. So let's now describe these SCMI protocols. So SCMI protocol is a standard specification given by ARM which is a system control and management interface. So the specifications give a way of defining messages that are exchanged to discover and expose services between a client. And so the client will be the agents in the SCMI specification and the server which will be called the platform. So these include two different layers. First one layer is the protocols itself. So as you see protocols are multiples. There are over domain protocols which will be managing the different various domain power saving states of the platform. So you have the performance management which will give you the control of the performance of a domain which can be composed of different engines such as application processor GPUs or other accelerators for example. It also includes protocols regarding the clock management. So it's set an equivalent rate on platform managed clock. So you have a protocol for the sensors which have the ability to read sensor data and be notified of sensor value changes. Reset domain management. So as its name says that controls the reset of the different entities. And the voltage domain which gives the ability to manage the voltage level of a domain and the voltage supply. So these are parts of the global of the specific SCMI protocol given by this specification. And the base protocol is a protocol that allows I would say a discovery and save description of the NTFS to the operating system. It's kind of dynamic discovery of the different protocols. So now the second layer is around the transport layers which offer a different way to transmit to transport from the agent to the platform. They are based on Maybox and shared memory based. So there are different transport types that we will explain a little bit later. So the key point is that you see that there are different entities where you can have. So you have the platform controller which is the entities that control the hardware and the shared resources and an operating system using the standard protocol interfaces. You may have a specific device and all of them are using different channels which can be secure or non-secure channels to discuss with the platform controller. The platform controller will use the transport layer as a similar and protocols and rely on the platform specific development to access to the shared resources. So at the time you see that only the platform controllers is able to access to the shared resources. So a little bit deeper. So of course the SMI protocol is a kind of client and server so discussing our messages. So messaging are two parts. So messaging are agent to platform, the A2P which are request messages and the platform to agent. We have P2A messages which can be synchronous responses, notification or delay responses. So all of them are transiting around channels through channels which are as say based on shared memory plus a transport protocol chosen which can be mailbox, SMC protocol or opti transport which is currently under development and specific to the optiOS. Channel could be one or more dedicated per agents because the request messages could be synchronous commands or asynchronous commands. The main point is for synchronous commands as soon as a synchronous command is sent by the agent to the platform the channel is blocked until the platform gives a response to the agents. So if you want to have multiple commands in parallel address from the agent to the platform you need to have multiple channels per agent. So two different channel type. A standard channel which is used to transmit an agent request and response between an agent and the system controller. And the past channel which is a particular channel unidirectional and which can be which is specific to the performance protocol management to reduce the latency. So it's for sure not a synchronous command. It's just unidirectional and gives low latency on this typical message. So what about SCMI around Linux today? So SCMI specification available now is 3.0 specification. So currently implementation inside the Linux channel v5.9. So the v5.9 is supporting now the SCMI specification 2.0. The first introduced patch around the SCMI are v4.17. And now we have clocked, reset, power, performance and sensor protocols which are implemented. The transport made box is also fully implemented and recently added in the last kernel official kernel 5.9. The SMC transport and the notification has been recently added to this kernel. So coming changes on the 3.0 are voltage regulator and sensor extensions. So coming next are QoS management and some security firewall coming stuff. So let's discuss now about how to manage the SCMI server inside the SOC implementation. So the system controller implementing the SCMI. So the first scheme I would say the one which is the real one given by ARM which is a reference implementation by ARM is using a dedicated processor. So the example here is the genome platform which indicates the Cortex M3 with the SCP firmware implementing this SCMI server. The SCP firmware is now an open source firmware implementing the SCMI looking based on the the specification. So this is I'd say the reference platform and reference implementation. And it controls the PMIC of the platform. So the different regulators of the platform and it also controls the clock voltage and power getting of the different elements of this system. So the good point of this implementation is that it's clearly independent execution context by using a separate and dedicated processor. But it also increases the overall footprint of the SOC because of adding this new dedicated processor just for running this SCP firmware. So the next possible scheme is what is currently under implementation between ARM and Linaro. So the project is called the Stratos project is to use the application processor and thanks to the virtualization support you can have the SCMI server in a dedicated virtual machine. So thanks to the VATIO transport there is a way between a Linux and the SCMI server to go in and to coexist in two different compartments and execution context which is really what we try to intend with this SCMI server in a dedicated execution context. So an alternative and possible scheme is also on a same application processor to use as a truss zone. So this is the implementation that we use on the STM32 MP15 and it's given by the flexibility of the Cortex-A from ARM is to have a dedicated secure execution context which has its own context completely working in this trusted environment and for sure the same similar good point of this implementation is to reduce the SOC footprint because we are still using the same application processor it's also cost reduction and it gives a full trusted and secure environment which is a little bit I would say better in terms of security rather than the virtual machine usage. But it remains the fact that as you are running your system controller on the application processor you may give I would say a master responsibility of the other system to this application processor because the SCMI server must be accessible to the different other OS at any time. So let's go now deeper on the STM32 MP1 implementation. So this is our implementation for the STM32 MP1 so I've only represent the Cortex-A7 split between the truss zone and so the secure part and the non-secure part just keeping in mind that we have Cortex-M4 running another core processor which is not represent here. The base implementation is using the SCMI based on the shared memory and SMC core. So as you see the shared resources are only accessed by the secure context so the supported implementation are using an optiOS integration or using the trusted firmware ABL32 which is a little monitor. So on both implementation you may see that the Linux parts will use the standard clock frameworks with a framework and regulator frameworks which are generic so you can use your generic drivers and the SCMI clock driver reset voltage are implemented following the the associated frameworks. All of them rely on the SCMI driver to choose the correct transport layer so on our system we will use the SMC transports which will call the SMC on the monitor part and go through the SCMI server in the secure side which dispatch the message to the SCMI clock reset and power domain depending the messaging send and we'll go to the platform specific drivers to access to share resources. So a little bit focus on the clock access sequence on the SCMI so here you have to quick we will have a look to how a peripheral driver will set a rate on the clock so here is typically an access from a peripheral driver inside Linux kernel to do a set rate on a clock so as usual the set rate will give the ID of the clock and the rate of course and the clock ID will correspond to a SCMI clock and the SCMI clock provider has been registered to the clock framework of the Linux kernel so you will handle this this call and go through the SCMI clock specific driver to add the correct protocol for the SCMI so this is a synchronous command adding the clock protocol ID in the message and clock message ID so we will do the transfer to the SCMI driver which will cause which will call sorry the transport layer SCMI SMC to write in the share memory the message and call the SMC that will invoke the SCMI secure server once the SCMI secure server receives the call it will pass the message and detect that it's a clock message and give it to the SCMI clock protocol the SCMI clock protocol will do the operation of the set rate and call the specific platform driver platform driver will handle and access directly the shared resources to into the hardware and go back to the Linux system so as you see the Linux system at that time is blocked we are on synchronous command only one call and the channel is blocked at that time and we'll wait for the region from this call so the return of the call will be handled by the SCMI server which will update the share memory with a written statue and go back the SMC to the Linux kernel informing the peripheral driver that it has been properly managed or not from the set rate so let's have a look quickly on the Linux server sorry on the Linux kernel part implementation of the device tree on how the SCMI it described on the device three point of view so on the left you have the share memory definition which is a common share memory definition in the Linux device tree and you can see that we under SCMI zero share memory with a size so this one will be reused on the SCMI channel description of the firmware so we have a not firmware specific to the SCMI so the SCMI zero where we will find the compatible specific to the SCMI SMC transport with a given ID which correspond to the SMC transport ID the share memory is referenced here and used by your channel description and you can see that this channel will describe the two different protocols that is that the channel is able to handle the channel with here under and transport the SCMI protocol so we see ID 14 and a reset protocol with the ID 16 so here is a simple description of a single channel in the Linux kernel so the question is how to use it inside inside a driver and the peripheral itself so this is here described for two different peripheral on the system so we will first focus on the DSI peripheral used so as you see on both formats that the clocks reference for the DSI are multiple clocks you have the I would say the specific and non-shared so exclusive clock which is the RCC DCR DSI kernel clock so which will be directly addressed by the Linux kernel and we have an SCMI clock which is an SCMI HCC clock which is a global overall system oscillator so it will be directly managed by the SCMI server because it can be shared by multiple multiple processors inside and firmware that are running inside the system so this operation on the clock will be addressed through the clock framework and will go directly to the SCMI server so the second example here is the M4 remote clock which manage the M4 coprocessor and this one will manage the reset of this coprocessor the reset of the coprocessor is shared between two different frameworks could be the Linux one or the secure if you have an optiOS running that both can be able to manage a reset of the coprocessor depending the way that your system is running secure coprocessor management or a non-secure coprocessor management so on both cases you will have to go through shared resources to manage this reset so what is our current status on the STM32 MP1 the current status is that we have fully implemented the clock and reset management of the system so based on the SCMI server you have as described previously the possibility to embed the SCMI server on optiOS or trusted firmware implementation so i would say both of them are now upstream regarding the clock and reset protocol management so the opti part is fully merged now and the tfa version 2.4 will embed this the same and for the uboot the SCMI agent driver will be merged in the next v2021.01 version so now the next step in terms of implementation will be to drive the regulators of the platform to implement the performance for the dvfs management and the last step would be to manage a coprocessor SCMI agent for the m4 because currently we address the m4 with another kind of resource manager which has been already shared on another presentation that you you found the link just in there here is um double links of the different specification SCMI or platform using this SCMI implementation and you will find also some SCMI direct upstream link to optiOS trusted firmware on kernel side thank you for attending this call and i'm i'm available for replying different questions thanks again