 Hello and welcome to this presentation of the iCache module which is embedded in all products of the STM32L5 microcontroller family. The instruction cache or iCache is introduced on the CAHB code bus of Cortex M33 processor to improve performance when fetching instructions and data from both internal and external memories. It implements a slave port connected to the Cortex M33C bus and two master ports connected to the AHB5 bus matrix. The purpose of the instruction cache is to cache instruction fetches or instruction memories loads coming from the processor. As such, iCache only manages read transactions and doesn't manage write transactions. The iCache supports two configurations, two-way set associative cache or direct mapped. The memory address remap capability enables accesses to external memories to be steered to the C bus instead of the S bus. The iCache contributes to reducing power consumption. Accessing the small internal iCache memory in case of a cache hit consumes less than reading from flash memory or external memories. A software configuration of iCache as direct mapped allows even lower power consumption. The iCache supports memory address remapping for up to four address regions. This improves bus transaction concurrency as accesses to external memories are performed using the C bus instead of the S bus. The cache line size is 16 bytes, the replacement algorithm is sued at least recently used based on a binary tree and critical word first-bus ordering minimizes the latency of the instruction or data requested by the processor. Two performance counters provide statistics about the utilization of the iCache. A hardware sequencer activated by software is in charge of invalidating the entire contents of the iCache. Dual master access is a feature used to decouple the traffic according to targeted memory. For example, iCache assigns fast traffic i.e. addressing flash and SRAM memories to the AHB master 1 port and slow traffic i.e. addressing external memories sitting on OctoSBI and FMC interfaces to the AHB master 2 port thus preventing processor stalls on line refills from external memories. This allows interrupt service routine fetching on internal flash memory to take place in parallel with a cache line refill from external memory. The non-remapped traffic goes systematically to master 1 port. For any remapped region, traffic can be routed to either master port 1 or master port 2. The iCache flags an error and possibly asserts an interrupt request whenever it detects unexpected cacheable write accesses. An interrupt request can also be asserted upon completion of the cache invalidation sequence. The iCache doesn't manage AHB bus errors on master 1 or master 2 transactions but propagates them back to the execution port that received the initial C bus transaction. The ARMv8M default mapping and also the MPU define a cache attribute that the iCache uses to determine whether a lookup has to be performed. Two conditions have to be satisfied to perform the cache lookup. AHB lookup attribute asserted an iCache enabled. In case of a non-cacheable access, iCache is bypassed meaning that the AHB transaction is propagated and changed to the master output port except for the transaction address which may be modified due to the address remapping feature. The bypass and eventually remap logic doesn't increase the latency of the access to the targeted memory. iCache is disabled at boot. This slide details the cache organization in the two possible configurations. Two-way set associative and direct mapped. In two-way set associative mode, each way contains 256 lines of 16 bytes. Thus, the four LSBS of the address represent an offset within a cache line and the 8-bit index selects one entry among 256 in the tag memories and in the data memories. In direct mapped mode, the unique way contains 512 lines of 16 bytes. The index has therefore one additional bit. All cache operations such as read, refill, remapping and invalidation remain the same in direct mapped configuration. The only difference is the absence of a replacement algorithm in the case of a line eviction since only one way is possible for any data refill. The other difference is for power consumption. In two-way set associative mode, both cuts of memories, way zero tags plus data and way one tags plus data, are read speculatively at each cacheable memory request. In direct mapped mode, only one hardware memory cut is accessed, the one containing the 9-bit index. So direct mapped reduces the power consumption. A complete cache invalidation occurs in three circumstances. Automatically, after iCache reset is released, when software sets the cache-INV bit in the iCache-CR register, when software disables the iCache by clearing the EN bit in the iCache-CR register, cache invalidation is performed by a hardware sequencer that sets the busy F-bit until invalidation is completed. The B, S, Y and F flag is set upon completion of the invalidation procedure and can be used to assert an interrupt request. Software must test busy F and or B, S, Y and F values before enabling the iCache. Otherwise, if iCache is enabled before the end of an invalidate procedure, any cache access while busy F still at one is treated as non-cacheable. iCache is placed on CAHB bus and thus caches the code memory region, ranging from address 0x0 to 0x1FFFFFF of the memory map. In order to make some other memory regions cacheable, iCache supports a memory region remapping feature. Up to four external memory regions can be defined, whose addresses have an alias in the code region. Addressing these external memory regions through their code alias address allows the memory request to be routed to the CAHB bus and to be managed by iCache. Typically, any external memory space physically mapped at an address somewhere in the range 0x60 million to 0x9FFFF can be aliased with an address in range 0x0 to 0x07FFFF or 0x10 million to 0x1FFFFFF. The remapping functionality is also available for non-cacheable traffic and when cache is disabled. The burst type of AHB memory transactions for remapped regions is programmable. INCR for increment or wrap. A cache line is aligned on its size. Wrap for burst ordering minimizes the latency of the instruction or data explicitly requested by the processor. The word containing the data targeted by the address driven by the processor will be transferred first. It's called the critical word because it contains the information actually needed by the processor. The remaining words are then transferred using the wrap ordering. For instance, if the word number 3 is transferred first, there's a wrap to the beginning of the cache line and the sequence of words that follows is word 0, word 1, and word 2. In INCR mode, it's the same word ordering, word 0, word 1, word 2, word 3, whatever the read request address at line A, at line A plus 4, at line A plus 8, or at line A plus C. The software can program the kind of AHB burst that is generated by iCache masterports. Typically, wrap for remapped external memories accessed through the Octo-SPI interface. INCR burst mode for external memories accessed through the FSMC interface that doesn't support wrap burst mode. The hit and the miss capability is the ability to serve processor requests for an access to cache data during an ongoing line refill due to a previous cache miss. In step one, the Cortex M33 requests data contained in a cache line that is not currently in the cache. A cache lookup is performed because the region containing the target address is assumed to be cacheable, accessible from C-bursts, and iCache is active. The result of this lookup is a cache miss. Consequently, the iCache issues a cache line read request to memory in order to acquire the cache line containing the data explicitly requested by the processor. In step two, the memory returns the cache line and forwards the critical word to the processor, data i. At the same time, the processor issues a second request targeting the same cache line. In step three, the iCache detects this occurrence and delivers the corresponding data to the processor, thus avoiding a cache miss. The hit and the miss feature consists in serving a hitting request just after the previous miss without waiting for the complete refill of the cache line. The hit monitor counts the AHB transactions at the input of iCache execution port that do not generate a transaction on iCache Output Master 1 or Master 2 port. It also takes into account the hit and the miss events. The miss monitor counts the AHB transactions at the input of the iCache execution port that generates a transaction on iCache Output Master 1 or Master 2 port. It also takes into account all accesses whose address is not present in either the tag memory or the refill buffer. These counters do not wrap over when they reach their maximum value. They can be dynamically enabled and disabled by software, which is useful for analyzing specific pieces of code. The iCache has one interrupt request output but two sources of interrupts. Error detection on cacheable write request, flag is S-R-E-R-R-F, end of invalidate operation, flag is B-S-Y and F. Each interrupt source has independent status, enable and clear bits. iCache implements an ARM V8M trust zone. iCache registers are protected at system level, enabling only secure software to access them when trust zone is enabled. The TAMP module can autonomously trigger the erasure of the contents of the iCache for security reasons. iCache is clocked by the Cortex-M33 CAHB bus clock, so it has the same clock domain as the Cortex-M33 core, the same clock frequency and the same behavior during low power modes. When disabled, iCache is bypassed except for the remapping mechanism that is still functional. The code bus input requests, remapped or not, are just forwarded to the master ports. To reduce power consumption, the hit and miss monitors are disabled by default. They are to be used only during code debug and optimization. This is a list of peripherals related to the iCache. Please refer to these peripheral trainings for more information if needed. ARM Cortex-33, TAMPA and backup registers, nested vectored interrupt controller, internal flash memory, external memory interfaces, oak to SPI and FSMC.