 Hello, and welcome to this presentation about MPU usage in SDM32 with ARM Cortex-M7. Purpose of the presentation is to show usage and setting of MPU on SDM32 with Cortex-M7. And mainly raise awareness of the issue with speculative access on Cortex-M7, which may cause speculative readlock. And the issue may be prevented by MPU. Presentation covers also basic parameters of MPU and options we can set. At the end you can find also few typical examples of MPU setting. Presentation is not covering security aspect of MPU usage, like setting application permissions to access only some part of memory. Only purpose of MPU usage in this presentation is to make project running reliable and with best possible performance. We start with the description of Cortex-M7 speculative read feature. Speculative memory read may be performed by Cortex-M7 core on normal memory regions. Purpose of speculative read is to increase performance of the microcontroller. Speculative memory read may cause high latency or even system error, when performed on external memories, like SD RAM, or Quad SPI. Normal memories even don't need to be connected to microcontroller, but its memory range is accessible by speculative read, because by default its memory region is set as normal. Arm technical reference manual exactly lists situation, when speculative access may be done by core. Speculative access cannot be predicted. It's possible to disable speculative access in core registers, but due to performance drop this option is not recommended. Speculative instruction fetches can be initiated to any normal, executable memory address. This can occur regardless of whether the fetched instruction gets executed or, in rare cases, whether the memory address contains any valid program instruction. Speculative data reads can be initiated to any normal, read, write, or read only memory address. In some rare cases, this can occur regardless of whether there is any instruction that causes the data read. Speculative cache line fills can be initiated to any cacheable memory address, and in rare cases, regardless of whether there is any instruction that causes the cache line fill. There are three types of memory regions for Cortex-M7 devices. Memory type determine, which operation are allowed on given memory region. In normal memory regions processor can perform speculative reads or reorder transactions for efficiency. Memory can also perform unaligned memory access, so this memory type is convenient for code execution. Both device memory type and strongly ordered memory type do load and store operations strictly in program order. Differences, that device memory type is bufferable, mean that instruction execution may continue before memory write is done. Memory write is then finished from a buffer. For strongly ordered memory region CPU waits the end of memory access instruction. Speculative access is never made to strongly ordered and device memory areas. Device memory type is used for microcontroller registers. Not buffered strongly ordered type is used for memories, where each write need to be visible for device, for example for external NAND memories. In ARM Cortex-M devices are additionally two attributes to be set for each memory region. Shareable shall be set for a region, if multiple masters can access the region. And it is up to the memory system to provide data synchronization between multiple masters. Typical example of multiple masters accessing same memory in SDM32, is processor core and DMA. In this case data cache may cause different data are visible for core and for DMA. As SDM32 microcontrollers don't contain any hardware feature for keeping data coherent, setting a region as shareable mean, that data cache is not used in the region. If region is not shareable, data cache can be used, but data coherency between bus masters need to ensured by software. Second attribute is execute never. When execute never is set for a region, instruction cannot be executed from that region, and any attempt for that cause hard fault. This attribute has more usage in security usage of MPU, for this presentation is more important, that speculative instruction fetch cannot be done in execute never region. Table on this slide show complete address range of Cortex-M7 device and default memory types in given region. Memory access in the regions follows rules of its memory type. Default memory type setting for each region, or its part, can be changed using MPU. For speculative access issue is important to note, that external RAM region is by default normal memory type with enabled code execution. But system must ensure, that all executable and normal memory type regions are safe to access. If any inaccessible memory location is addressed by speculative access, processor cannot guarantee cancellation of such speculative read, which may lead to extensive delay or even to device lock. In default mapping of Cortex-M7 device is critical external RAM memory region. Its type normal without executed never attribute set. So speculative access can be performed to this region. But external memory don't have to be connected, or don't have size to cover complete region size. All range not covered by external memory shall get its memory type changed by MPU setting to prevent speculative read issue. External memory may have normal memory type or any other type convenient for such memory type. Unlike to external RAM region, both code and SRAM region are safe to access, microcontroller memory driver handles this memory range how to recognize the issue. Typically, if microcontroller got locked due to speculative read, program main loop is not executed anymore. But interrupts are still invoked. No hard fault is triggered. If microcontroller was in debug mode, debug session fails and it's not possible to connect again, not even using connect to running target. Device is not responding on reset, power cycle is needed. Occurrence of the issue is random, even very small change in the code may hide, or release the problem. For example one knack instruction can determine, if the lock will occurs or not. To prevent speculative access issue, all addresses not safe to access shall change memory type to device, or strongly ordered memory type and set execute never attribute. After that device cannot be locked by speculative memory access. To make such setting of memory regions, Cortex-M7 contains memory protection unit, alias MPU. On following slides we will introduce code for setting safe background region to set attributes for critical region. We recommend to use MPU for handling critical external RAM region in each project, which is using STM32 based on Cortex-M7. Memory protection unit is part of microcontroller core. This task is to define memory attributes and access permissions. MPU is then monitoring bus transaction, and if any rule violation is detected, fault exception is triggered. In Cortex-M7 user see memory management handle a trigger when MPU rules are violated. But for example in Cortex-M0+, memory management handler is not available and MPU fault triggers hard fault. Depending on used device also number of region, which can be defined in MPU, is changing. We need to highlight also MPU behavior with overlapping regions, which we will use further in this presentation. If some memory area is covered by more MPU regions, region with the highest number is used for setting attributes on the memory address. We set region 0 to prevent speculative readissue, but any other region will get priority over region 0 to propagate preferred rules on used memory. By enabling MPU in a project based on kubel libraries, you have additionally one parameter to pass into HAL MPU enable function. There are four possible settings, which covers all combinations of settings two bits in MPU control registers. First bit sets, if MPU is enabled during hard fault, NMI, and fault mask handlers. If kept at zero, MPU is disabled during hard fault, NMI, and fault mask handlers. Second bit allow default memory map. If this bit is enabled, default memory map is used as a background region for privileged software accesses. In this case background region acts as if it has region number minus one. Any region that is defined and enabled has priority over this default map. If disabled, default memory map is disabled and any memory access to a location not covered by any enabled region causes a fault. In examples, which you will find later in this presentation, MPU privileged default parameter is used. Mean, that MPU is disabled in fault and default memory map is enabled. Few parameters need to be set also for each MPU region, which will be used. It's starting address of the region and its size. Then text parameter, which together with other MPU region parameters determine region memory type and cache behavior. Then you need to set parameters we already discussed on previous slides. By enabling or disabling you can make region cacheable, bufferable, and shareable. Not all combination of MPU regions parameters are allowed, list of possible combinations is on slide 17. Not listed on this slide is parameter execute never. This parameter can be set for any region without influence on used memory type. Setting execute never on a region disable code execution from that region. Last parameter in MPU region deserves more explanation. It's MPU sub region setting. For each region, which size is 256 bytes or more, it is possible to divide region into eight sub regions with equal size. Excluding sub region from region rules is set by writing one on corresponding position of 8 bit value. Concrete usage is demonstrated on the picture. We choose to have 8 kilobytes region starting at address zero. So each sub region size is 1 kilobyte. And if sub region field is set to value 3a hexa, second, fourth, fifth and sixth sub regions from start won't be included in MPU region. Table on this slide list all allowed MPU configuration in SDM32 microcontrollers with memory protection unit. Values from this table shall be followed when you design MPU for any region. Setting in MPU region also determine cache policy when parameter cacheable allowed in a region. Cache policy may have influence on performance, based on memory kind and usage, different policy may be more efficient. But it's not purpose of this material to cover more deeply cache policy setting. For more information please find related materials at the end of this presentation. Just would be good warn you additionally, that in SDM32F7 microcontrollers and some older SDM32H7 microcontrollers, older revision of ARM Cortex-M7 is used. And this older version of Cortex-M7 has Erata for data cache usage, when configured with write through policy. Conditions to reproduce the issue are very specific, but to be safe, it's recommended to use write back policy instead. For more details, please check product Erata sheet. Default setting of shareability and cache policies is visible in table. By default all cacheable regions are not shareable, software need to handle data coherency there. Also be careful, that write through cache policy is used on code memory region and part of external RAM memory region. Again, if this default setting is not suitable for your usage, you may change it using memory protection unit. Now we're leaving theoretical part and moving to examples of MPU settings. Starting with region zero, which we recommend to set as basic region. Purpose of the region is to set region of unused memory, which can cause speculative access issue, into safe configuration, which is not allowing speculative access. Region is set as strongly ordered with execute never parameter set using sub region, range 60 million hexa to E0 million hexa is set in region zero. Code example, how region zero for preventing speculative memory read can be implemented in project. Using sub region usage address range 60 million hexa to E0 million hexa is covered. Few more typical use cases of MPU usage are covered in this presentation. There is example for setting external QSPI flash memory, SD RAM on FMC, DMA usage with internal RAM, RAM buffers for Ethernet and LCD configuration. All examples count with usage of region zero from previous slide to prevent speculative read issue. As already covered on start of this presentation, there are some general recommendation, which memory type shall be used for various kind of memories. For code execution is best option normal memory type, which allows unaligned memory access. Also for RAM memories is normal memory type convenient as it's not having additional restriction and offers best performance. For MCU registers access is important to preserve program order of instruction. Instruction can be written in a burst, using bufferable attribute, so device memory type is best option. Strongly ordered memory type is used in memories, which need to have each write a single transaction, for example NAND memories or FPGAs. First example is external QSPI flash memory. Those days often used as memory for code execution or large data storage. In STM32F7 family QSPI or FMC need to rewrite setting of region zero to allow access on address A0 million hexa, where QSPI and FMC control registers are located. In STM32H7 microcontrollers is for QSPI and FMC control register storage used different address, then only one MPU region for QSPI memory range is enough. Start of QSPI memory depends on used bank, usually it's address 90 million hexa. We recommend to set this region as normal, shareable, with cache writeback policy. If code won't be executed from the memory, set also execute never attribute. Code example setting QSPI and FMC control register access. And initialization code for QSPI memory range. Like for QSPI, also FMC need on STM32F7 family added region to allow access to control registers on address A0 million hexa. Start of SD RAM depends on used bank, usually SD RAM is mapped on bank one at address C0 million hexa. We recommend to set as normal memory type with writeback cache policy. If only one bus master is accessing the memory area, set is not shareable to use also data cache. Code example setting QSPI and FMC control register access. And SD RAM memory range setting. DMA is very often used with microcontroller internal SRAM. By default RAM memory region is shareable, that make program responsible to keep data coherent when cache is used. If you want to achieve optimal performance, is strongly recommended to enable both data and instruction cache. If you don't want to keep data coherency using software, which mean flush complete cache before each DMA usage, best option is to set just the part of RAM memory, which is used by DMA as shareable. That ensure data coherency of the memory and preserve data cache enabled for parts of RAM, where application can safely use it, mean DMA is not used there. In example here we choose two buffers in RAM, total size one kilobyte. In program buffer address will be fixed to start from address 20 million and 20,000 hexa. Code setting for MPU region. This setting has multiple possibilities. As instructions won't be saved in the region, it is no difference if cache is completely disabled, or cache is enabled when set as shareable, then data cache is not used again. Ethernet peripheral in STM32 use DMA for data transfers between peripheral and buffers placed in RAM. Like for RAM buffer used by DMA with cache usage discussed in previous part, region where Ethernet RAM buffers are placed, need to be set as shareable. Additionally it's recommended to use different RAM for Ethernet buffer and application data to achieve better performance. For example on STM32F746 use SRAM1 for application data storage and SRAM2 for Ethernet buffers. Ethernet peripheral demands two pairs of buffer, one for data itself, second for DMA descriptor tables. Transmit and receive buffer RAM storage with DMA usage. Set as normal, non-cacheable memory region. As cache is disabled, shareable don't need to be enabled. DMA descriptor tables shall be set as shared device memory. Last example of MPU setting in this presentation is for LCD display controlled by flexible memory controller. In such scenario is typically used internal memory in LCD display, 8 or 16 bit bus width is used. Depending on FMC register select pin usage, two different memory regions may be used, which make need to use two memory region with same setting. That is also the case in example here. Region 1 covers 32 bytes of address 60 million hexa, which is starting address of bank 1 on STM32F7. 32 bytes is minimal MPU region size. First region setting 32 bytes strongly ordered memory starting from 60 million hexa. Second region for toggling register select. Also 32 bytes strongly ordered memory in our configuration starting from address 60 million 20,000 hexa. Here you can find reference to other material type to memory protection unit from STMicroelectronics. Programming manual for Cortex M7 microcontrollers, application note about level 1 cache, dedicated application note about memory protection unit. And for some users surprisingly also application note about LTDC, which contains quite detailed description of MPU usage, and also contains mention about speculative readlock. You can explore those material to get more complete information about MPU usage. In the presentation is often mentioned address of microcontroller peripherals. It can be changed from family to family and you can find peripheral addresses and complete memory map in reference manual. Thank you for watching this presentation and wish you lot of success with STM32.