 Hello and welcome to this presentation of the embedded flash memory which is included in all products of the STM32G4 microcontroller family. The STM32G4 microcontrollers embed up to 512 kilobytes of flash memory with dual-bank architecture. The flash memory interface manages all memory access, read, programming and erasing as well as memory protection, security and option bytes. Applications using this flash memory interface benefit from its high performance together with low power access. It supports read while write, has a small erase granularity, a short programming time and allows dual-bank booting. It provides various security and protection mechanisms for code and data, read and write access. This slide highlights the differences regarding the flash memory implementation between STM32G4 3x and 4x called Category 2 microcontrollers and STM32G4 7x and 8x called Category 3 microcontrollers. Flash memory size is 128 kilobytes for Category 2, 512 kilobytes for Category 3. Number of banks is 1 for Category 2, 1 or 2 for Category 3 depending on their debank option bit. Note that read while write capability or RWW is only supported when the dual-bank architecture is active. This enables programming or erasing one bank while executing code for the other bank. The page size which provides the minimum erase granularity is 2 kilobytes for Category 2, 4 kilobytes for Category 3 with single bank and 2 kilobytes for Category 3 with dual bank. The number of pages is 64 for Category 2 and 128 for Category 3. Regarding protection features, the Category 2 microcontrollers have one write protect area, one PCROP and one secure memory area, while Category 3 microcontrollers have two write protect areas, two PCROPs and two secure memory areas. The flash memory supports page arrays, bank arrays and mass arrays. The page, bank or mass arrays operation requires only 22 milliseconds and the programming time is only 82 microseconds for a double word. Fast programming mode writes 62 double words in a row and reduces the page programming time, eliminating the need for verifying the flash locations for each double word access and in addition avoiding the rising and falling time of the high voltage for each double word writing. An 8-bit ECC code is appended to the double word to program. It's checked on read to detect and correct single bit errors and detect double bit errors. In case of an uncorrectable error, the flash memory controller asserts the non-mascable interrupt or NMI to the Cortex-M4. The adaptive real-time memory accelerator with an instruction cache, a data cache and a prefetch buffer allows linear performance in relation to the frequency. It also contributes to decrease the power consumption as it belongs to the V-core power domain. The following protection mechanisms are supported. Write protection areas used to protect against unwanted write operation. Proprietary code read protection areas or PCROP, a part of the flash memory can be protected against access from third parties. The protected area is execute only. It can only be reached by the STM32. CPU as an instruction code area while other accesses like DMA, debug and CPU data read, write and arrays are strictly prohibited. The secureable memory area defines an area of code which can be executed only once at boot and never again unless a new reset occurs. The main memory contains 64 or 128 pages depending on the category of the microcontroller. For category 3 with a single bank architecture, page size is 4 kilobyte, each page consists of 8 rows of 500 bytes. For category 3 with dual bank architecture and category 2, page size is 2 kilobyte, each page consists of 8 rows of 256 bytes. In addition to main flash memory, the STM32G4 supports a system memory of 28 kilobytes containing the ST bootloader, a 1 kilobyte OTP memory that can be used to store user data that must not be erased or modified. The 1-bit is 0, the entire double word can no longer be written even with the value 0. Option spites containing default settings to configure IPs in the system on chip. They are automatically loaded after a power-up reset. The first table details the memory organization based on a main flash memory area and an information block for category 3 microcontrollers with dual bank architecture. The second table details the granularity of the flash memory operations. Programming is done on 8-byte double words. Fast programming is done on a row of 512 bytes. Erase is done either globally named mass arrays or with bank or page granularity. The secure memory is aligned on pages. Write protection is done per page. Read protection is global. Proprietary code readout protection is based on programmable start and end addresses aligned on either quad words or double words. The dual bank or debank option is used to select either a single bank or dual bank for the category 3 devices. The flash memory can be configured to support 2 banks with read, while write and dual bank boot capability able to boot from either bank 1 or bank 2. The bfb2 option in the user option buys is used to select the dual bank boot mode. When the bfb2 option is set, the device boots either from bank 2 or bank 1 depending on the valid bank. When the bfb2 option is cleared, the device always boots from bank 1. In order to read the flash memory, it's required to configure the number of wait states to be inserted in a read access depending on the clock frequency. The number of wait states also depends on the voltage scaling range. In range 1, the flash memory can be accessed up to 170 MHz with 7 wait states. It can be accessed with 0 wait states up to 20 MHz. For range 2, it's up to 26 MHz with 2 wait states. Thanks to the Adaptive Real-Time Accelerator or ART Accelerator, the program can be executed with 0 wait states independent of the clock frequency. This provides an almost linear performance in relation to the frequency with a benchmark result of 213 drystone mips at 170 MHz. Data and flash memory words are 72 bits wide. 8 bits are added per each double word of 64 bits. The ECC mechanism supports one error detection and correction, two errors detection. When one error is detected and corrected, the ECCC flag, meaning ECCC Correction, is set in the Flash ECC register or Flash ECCR. When two errors are detected, the ECCD flag, meaning ECC Detection, is set in the Flash ECC register or Flash ECCR. In this case, an NMI is generated. Fast programming enables the programming of a row of 256 bytes, while normal programming has a granularity of 8 bytes. The main purpose for fast programming is to reduce the page programming time. It's achieved by eliminating the need for verifying the flash memory locations before they are programmed, thus saving the time of high voltage ramping and folding for each double word. Fast programming is one-third faster than standard mode programming. Mass erase time, meaning a 512 kilobyte erase operation, approximately takes the same time as a page erase. Fast programming versus standard programming. 512 consecutive bytes are programmed instead of 8 byte double words located anywhere in the main flash memory. 8 byte programming is more reliable due to the verification step. Note that the maximum time between two consecutive double words is around 50 microseconds. If a second double word arrives after this delay, fast programming is aborted and an error flag is set. Consequently, interrupts should be disabled to make sure that this delay is not exceeded. This table summarizes the differences between standard and fast programming. Each program and erase operation can degrade the flash memory cell. After an accumulation of program and erase cycles, memory cells can become non-functional, causing memory errors. Endurance is the maximum number of arrays and programming sequences that the flash memory can support without affecting its reliability. Data retention is defined as retaining a given data pattern for a given amount of time. The retention depends on the number of program and erase cycles and also on the temperature. The ART accelerator brings outstanding performance and reduces dynamic power consumption. It consists of a 1 kilobyte instruction cache, 256 bytes of data cache and a prefetch buffer. The instruction cache contains 32 lines of four double words and the data cache has eight lines of four double words. Once all the instruction cache memory lines have been filled, the LRU for least recently used policy is used to determine the line to replace in the instruction memory cache. This feature is particularly useful when code contains loops. This architecture is chosen to provide the best trade-off between cache size, power consumption and performance. After each miss, the cache is updated with only the requested double word in order to limit the flash access for power saving. In a line, the four double words may not all be valid. In case of a miss, the Cortex-M4 code takes the instruction directly from the flash memory. In parallel, the 64-bit line is copied into the current buffer enabled and iCache if enabled. So the next sequential access is taken directly from the current buffer. If prefetch is enabled, another 64-bit flash access is performed to fill the prefetch buffer with sequential data. When the data is present in the current buffer, the CPU reads the current buffer. The next sequential read is performed in the prefetch buffer which is copied into the current buffer so that it's free to be filled with the next sequential data. If the data is not present in the current buffer, it's read from the prefetch buffer if it's present. If not, it's read from the instruction cache if there's a cache hit. Otherwise, the instruction cache behaves differently depending on if the prefetch buffer is enabled or not. If the prefetch buffer is enabled, the ART instruction cache behaves like a branch cache. The cache is modified each time a branch or a jump occurs in the execution flow. Sequential accesses are issued by the current instruction buffer and the prefetch buffer. Each time the prefetch buffer response hit, its contents are transferred to the current instruction buffer and a new flash access to fill the prefetch buffer is performed. In this case, the cache content is not altered. If the prefetch buffer is disabled, the ART instruction cache behaves like a normal cache. Since no prefetch buffer is available, even a sequential access will modify the cache content. The power and performance trade-off must be evaluated for each application to know whether it's better to enable or disable the prefetch buffer. For most of applications, enabling the prefetch buffer allows to increase slightly the performance but with a higher consumption. Generally, the best energy efficiency is provided with caches enabled and prefetch buffer disabled as it often reduces the number of flash accesses. This slide shows the number of cycles needed to execute sequential 16-bit instructions without prefetch when three weight states are needed to access the flash memory. Every flash access provides 64 bits or four instructions. Three weight states are therefore inserted every four instructions at every flash access. This slide shows the number of cycles needed to execute sequential 16-bit instructions with prefetch enabled when three weight states are needed to access the flash memory. After each flash access, another flash access is performed to fill the prefetch buffer. So after all instructions are fetched from the current buffer, the next sequential instruction is read from the prefetch buffer and no weight state is inserted as long as the instruction flow is sequential. Several flash memory protection options can be configured using the option bytes. Readout protection aims to protect the contents of the flash memory, option bytes, internal CCM SRAM and backup registers against reads requested by debuggers or software reads caused by programs executed after a boot from SRAM or bootloader. Only a boot from flash memory is permitted to read the contents of these memories. The proprietary code protection is a way to mark parts of the flash memory as execute only. Note that this kind of access permissions is not supported by the memory protection unit present in the Cortex-M4 core. PC-ROP areas are useful to protect only a part of the flash memory against third-party reads. Write protection prevents part of the flash memory from being erased and reprogrammed. The main purpose of the secureable memory area is to protect a specific part of flash memory against undesired access. This allows implementing software security services such as a secure key storage or secure boot in charge of image authentication. Once the processor has exited the secureable memory, this part of the flash memory is no longer accessible. The secureable area can only be unsecured by a reset of the device. The size of the secureable memory area is aligned on pages. In addition, the code executed from the secureable memory can temporarily disable debug accesses. Option bytes are used to early configure the system on chip before starting the Cortex-M4. They represent 48 bytes. They are automatically loaded after a power reset or on request by setting the OBL launch bit in the FlashCR register. This capability is required to apply a new setting without resetting the device. This slide and the two next ones describe the various fields in the option bytes. Boot underscore lock forces the system to boot from the main flash memory regardless of the other boot options. The readout protection level enables the readout protection for the entire flash memory. Level 0 no protection, level 1 read protection, level 2 no debug. The following transitions are supported. Level 0 to level 1, level 1 to level 0 which implies a partial or mass arrays, level 0 to level 2 and level 1 to level 2. PCROP ASTRT and PCROP AEND define the proprietary code readout protection address range A. PCROP BSTRT and PCROP BEND define the proprietary code readout protection address range B. PCROP RDP allows to select if the PCROP area is erased or not when the RDP protection is changed from level 1 to level 0. The flash memory controller supports many interrupt sources listed in this slide and the next one. An interrupt can be asserted upon successful end of operation. An interrupt can also be asserted when an error occurs during a program or arrays operation. Protection violations can also cause interrupts. A size error occurs when the data to be programmed is not word aligned. Programming sequential error occurs when a program operation is attempted without having previously erased the location in flash memory. A programming alignment error occurs when a complete double word is not provided before initiating a standard program operation or when a complete row is not written before initiating a fast programming operation. A data miss programming error occurs when data is not written in time during a fast programming sequence. When a single bit ECC error is detected and fixed an interrupt can be asserted. When a double bit ECC error is detected the NMI is asserted. The flash memory's consumption can be reduced when the code is not executed from flash. The flash clock can be gated off in run and low power run modes. It can also be configured to be gated off in sleep and low power sleep modes. The flash clock is configured in the reset and clock controller. It's enabled by default. The flash memory can be configured in power down mode during the sleep and low power sleep modes. It can also be configured in power down mode during run and low power run modes when the code is executed from SRAM. Gating the clock and putting the flash memory in power down mode significantly reduces power consumption. The flash memory module supports the following low power capabilities. Clock gating, flash memory power down mode, power gating of the entire module, flash memory and controller. In run, sleep, low power run and low power sleep modes clock gating and power down is supported. It can be used when code is executed from SRAM. In stop zero and stop one, the clocks are gated and flash memory can enter power down mode. In shutdown mode, the power of the flash memory module is gated for both the flash memory and controller. Gating the clock and putting the flash memory in power down mode significantly reduces power consumption. Here we compare code execution performance at 150 MHz while running the EEMBC CoreMark benchmark. The maximum performance is reached when the code is executed in CCM SRAM with data located in SRAM 1. When executing from flash memory at 150 MHz, the maximum coreMark performance is reached when the ART accelerator is enabled and there's almost no loss of performance due to the flash access time requiring 7 weight states at 150 MHz. Enabling the prefetched buffer yields a slightly higher score, 3.36 coreMark per MHz in case of single bank mode. The flash memory module has relationships with the following other modules. System configuration controller or CIS CFG, reset and clock controller or RCC, power controller or PWR, interrupts or NVIC, memory protections. For more details, please refer to application note AN2606 about the STM32 microcontroller system memory boot mode.