 Dzień dobry wszystkim wszystkim. Dziękujemy za załatwienie Webinaru Microelectronics. Nazywam się Szymon Panecki. Jestem tutaj z mojego kolegą Martin Hubek. Dzisiaj razem będziemy cieszyli się, aby przedstawić ci kontent w tym Webinarie. W tym online eventu zauważymy wasze funkcje i architektury SDM32 microcontrollers, które can significantly help to achieve low-power consumption of application and to extend battery lifetime. As an example, we will use latest SDM32 series from ultra low-power category, which is SDM32L5 based on ARM Cortex M33 core. Here is an agenda that we prepared for today. We will start with overview of SDM32L5 series, which will focus on its key elements related to low-power. Then we will discuss about architecture of SDM32L5, which is optimized in terms of performance and efficiency of computation. After theoretical part, we will show you two demonstrations. First demonstration will be about performance, and it will be based on CoreMark benchmark. Second demonstration will also use CoreMark, but this time it will show you correlation between performance and low-power. Later on, we will introduce you low-power modes, which are available in SDM32L5. We focus on consumption and wake-up time. Here, again, theory will be followed by practice, and you will see a demonstration. At the end of the session, we will summarize the most important things that we discussed today, and we will be available to answer your questions. So let's get started now with first section titled SDM32L5 overview and ultra-low-power features. SDM32L5 belongs to ultra-low-power category. First product of this kind was SDM32L1, which was introduced in 2009. Later on, ultra-low-power category was extended with SDM32L0, which happened in 2013. Another milestone was introduction of SDM32L4 and SDM32L4+, they were created in 2015 and 2016 respectively. Finally, SDM32L5 was introduced to the market in 2019. So what is important to identify on this picture is that SDM32L5 is a continuity of successful families like L0, L1, L4 and L4+, and it takes a huge benefit from experience of SD microelectronics in the ultra-low-power field. Ok, so we know already that ultra-low-power portfolio of SDM32 consists of five different series, and SDM32L5 is the latest one. So now you might be wondering what is different or new in SDM32L5 comparing to SDM32L0, L1, L4 and L4+. So here on this picture you can see a detailed comparison, but in fact the answer could be short and simple. So SDM32L5 offers you higher performance, advanced security and lower current consumption. All these features together with a new core, Cortex M33, defines SDM32L5 and makes it unique on the market. During this webinar we want to speak about ultra-low-power, so as a starting point let's try to define profile of typical low-power application. At the very beginning microcontroller has no power supply, which is typically a battery. So first phase we can simply call as off phase. Then power supply is applied and microcontroller starts to execute application. First part of the application is always related to configuration of resources like clocks, GPIOs and peripherals. We can call this phase as startup initialization. This phase typically takes place only once. Then we have part of application, which is repetitive. It starts from inactive phase. Microcontroller doesn't need to perform here any tasks. So goal is to reduce consumption to minimum. In consequence microcontroller stays in low-power mode. Then microcontroller receives an interrupt. It might come from external world, for example from GPIO or communication interface. It might come as well from internal resources, for example timer. Interrupt is a trigger to wake MCU up from low-power mode. That is why this phase is called wake up. After waking up microcontroller performs some tasks. For example, it could be reading data from sensor. In consequence, this phase is an active phase. Once activity is finished, microcontroller comes back to low-power mode and whole process starts again. In each of these application phases, microcontroller consumes current. So after taking into account all of phases, it is possible to calculate average current consumption of application. Once we are familiar with profile of typical low-power application, let's try to identify list of requirements for ultra low-power microcontroller. And let's speak about solutions of STM32L5, which addresses them. First requirement for microcontroller is computational performance. This feature is needed because it allows to execute faster tasks in active phase of application and to reduce the amount of time spent in it. In order to address this requirement, STM32L5 offers performant Cortex M33 Core system frequency up to 110 MHz and so called Art Accelerator, a cache memory, which reduces number of flash weight states. Second requirement for microcontroller is power efficiency. This feature is desired because it helps to reduce the current consumption in active phase of application. STM32L5 is equipped with internal SNPS, which can significantly reduce the consumption. In addition, scaling of voltage and gating of clocks that supply and feed resources of STM32L5 could further reduce the consumption in active phase. Third requirement for microcontroller is set of low power modes. They help to achieve low current consumption in inactive phase of application. Here STM32L5 offers multiple low power modes, starting from slip mode up to very deep low power modes like standby and shutdown. We will take a deeper look on these modes later on. Another microcontroller requirement is short wake up time. It helps to reduce the transition period between inactive and active phase of application. Depending on the low power mode, STM32L5 can offer wake up time even down to 6 clock cycles. Finally, last requirement for microcontroller is set of smart peripherals. Czcikali, they can help to reduce consumption by offloading CPU and by extending inactive phase of application. Here STM32L5 offers wide range of smart low power peripherals. Among them, we can find watchdog for ADC, low power reward, low power timer, DMA, or PKA. As a summary of this slide, we can say that STM32L5 offers effective solutions to address all requirements of ultra low power microcontroller and it can definitely help to optimize each phase of low power application. With this sentence, I finish this section and I let my colleague Martin to continue. Thank you, Simon. In this section, we will have a close look at performance and efficiency of computation on STM32L5. One of the key factors that affect performance but also power efficiency of a microcontroller is the system level architecture. Most applications run their code from either internal flash or an external one and they sometimes place the very critical code into S-Ren. But it's often the flash that limits at higher frequencies. However, this was largely mitigated on STM32L5 thanks to low number of flash weight states in the first place and also due to the cache that resides between the cortex core and the memories. So what you see in the picture are the four busmasters on the chip, which are the cortex core and the three DMA controllers. On the right you see the slaves, which are the internal flash and S-Rems, all the peripherals and also the external memories. The cache resides between the core above the bus matrix and it caches instructions and data to internal memories and when remapped even to external memories. When the instructions and data are inside the cache they can be accessed at zero weight states and even at the maximum frequency of 110 MHz. This not only affects the performance but also the power efficiency and that's for two reasons. Firstly, the code executes faster so the application can spend more time in one of the various low power modes. Secondly, the access to the cache is in fact more power efficient than accessing the original memory. The next step is to measure computational performance of an MCU. One way to do this is to simply state the core frequency or number of instructions that are executed each second. Drystone benchmark was one of the first that tried to tie performance with a real code. But today the industry standard is coremark from the embassy. Their source code is freely available via link algorithms and data manipulations that can be found in common real applications such as list processing, metrics manipulation or state machines that are based on if or switch statements. The outcome of the coremark is a single number that represents number of coremark iterations executed each second. You can also encounter coremark score per MHz which defines the computational efficiency. The coremark score is influenced by the core itself. In case of STM32L5 this is the codex M33. The score increases with frequency but this relationship is not necessarily linear especially when the code is executed from flash. However, on L5 this is the case the relationship is linear thanks to the low number of wait states for the flash and the cache that accelerates code execution. It's also worth mentioning that every coremark measurement should be accompanied by the exact version of compiler that was used and also the optimization flex. The coremark code is designed in a way that it prevents any elimination of code at compile time i any computation is actually performed at runtime. In the graph you see the dependency of coremark score on the frequency of the core for STM32L5 and one of our competitor that is also based on Cortex M33. In both cases the code is executed from internal flash. At low frequency the performance is almost the same but at higher frequencies flash wait states and the lack of cache start to limit our competitor. L5 reaches similar performance at 80 MHz where the other device needs to run at 150. And this has significant implications for the consumption because the same amount of computational work can be done at lower frequency and therefore at much lower consumption. Coremark can also be executed from SRAM which is always zero wait state at all frequencies so you might encounter very different results for code executed from flash in SRAM and it's something to be aware of. So how can we evaluate power efficiency of computation? What's often used for this purpose is micro-Ms per MHz in other words how much current is consumed i there are two issues with this. First of all megahertz is a poor metric for performance because performance depends on many other factors apart from frequency such as the flash and cache design. Second of all as you'll see later STM32L5 includes an internal SNPS so unit of energy instead of current provide better and more accurate description of current at certain VDD voltage. The better way to compare two products is to measure the energy that is required to compute certain tasks for example one coremark iteration. In fact embassy provide a standardized method to measure the consumption while running the coremark. So the ULPmark-CM is the inverse of the energy required per one coremark iteration. The inverse is taken just because we intuitively expect that the higher value expresses better performance. To compare two products we should always make sure that we compare at the same level of performance and not just at the same frequency. The STM32L5 integrates an internal SNPS which is an optional feature available on specific part numbers. W material point of view it requires only few passive components one inductor and two capacitors. SNPS greatly improves the power efficiency especially at high VDD voltage. SNPS is configurable at runtime so when the MCU enters one of the deeper low power modes in fact stop one or deeper SNPS is automatically switched off and it's re-enabled once the MCU wakes up. SNPS supplies the internal LDO that in turns supplies the whole digital domain the CPU, digital peripherals and memories. So this is the end of the theory part so let's now move to the practical demonstration. In this part I'll execute coremark on STM32L5. The code will be placed in flash and I'll run the test at two various frequencies to confirm that the coremark score is in fact linear with frequency. I'm going to use STM32L5 Discovery Kit that has the version of L5 with the integrated SNPS. On the board you'll also find STLink v3 power shield to do power consumption measurement and various other components such as external memories, Bluetooth module etc. To display some runtime messages during the coremark execution we will use the virtual comport of the STLink and in the next hands on we'll also use the cube monitor power to measure the consumption during the execution of coremark. So the USB cable is now plugged into the STLink we can connect to the target and flash the two binaries. So I have pre-compiled one binary for coremark running at 110 MHz and the second one running at 24 MHz. STLink enumerates also as a master reach so the easiest way to flash a binary is simply drag and drop it to the STLink master reach. So in just a moment you see the binary was flashed and the coremark is immediately executed. If I press reset it will start from the beginning. So the coremark needs to run for at least 10 seconds to get valid result. So let's wait a bit. You see the code run for 11 seconds which is fine. It ran 5000 iterations. So the coremark score is 443 iterations per second. I used RMCLANK v6.14 as a compiler and set the flags to optimize for speed. The code is of course running from flash and at the very bottom we see the confirmation of the frequency which is 110 MHz. So let's now run the second binary. I will just flash it in. Again we need to wait at least 10 seconds to get a valid result. And here we go. The coremark score is 98.3 iterations per second. So if we divide the coremark score with the frequency we get approximately 4 for both cases which is a proof that the relationship is in fact linear. Here you see another comparison of STM32L5 with one of our competitor that is also based on Cortex M33. To make this comparison fair we are using the same compiler and the same compiler flags. This is also the reason why the result for 110 MHz is few percent less than in the demonstration that we have just did with RMCLANK. So to summarize the system level architecture of a chip is extremely important factor that affects not only the performance but also power efficiency as you will see in the next demonstration. So now we will measure the consumption while running the coremark. I will run the coremark at 24 MHz of the microcontroller where it performs best in terms of computational workload per unit of energy. And we will also try to measure the consumed energy per one coremark iterations. And we will do that thanks to the integrated power shield on the discovery board and QPoR running on a PC. You might be already aware of standalone power shield that works as a power supply that also measures the current consumption dynamically. Now the power shield was integrated on the STO32L5 Discovery so we do not need any external components. The supply voltage can range from 1.8 to 3.3 volts. The measurement has a huge dynamic range from 300 nanoamps up to 150 mA and it measures with sampling rate of 100 kHz. So now I plugged in another USB cable to the power shield. I also changed the jumper here to supply the target of the STO32L5 from the power shield. And I also put the switch into the measure mode. So now I can go to QPoR and connect. I see the configuration tab. Here I can set the sampling rate so let's put it to the maximum 100 kHz. I set the acquisition time to infinity so it will keep measuring until I press stop button. And let's supply the target with 3 volts voltage. So once I press start acquisition the target will be supplied and the microcontroller will start to execute. We still have the binary with the core mark running at 24 MHz. So let me start the acquisition. You see there was a short current peak to charge the decoupling capacitors so let me stop the acquisition and start again. So you see the core mark finished just after I started the acquisition. So now the consumption is about half a milliamp. If I press the reset button and release it will start to execute core mark and after about 11 seconds it will finish it will reconfigure the clock to 4 MHz and that's why you see this sharp drop in consumption. So now what we can do is to show all the measurements and you clearly see this portion of the graph that is the microcontroller executing core mark. So what I can do I can zoom in just approximately into this interval and I see here in the selected time frame there was 61 mJ consumed during this time. And we also know from the traces that the microcontroller executed 1100 iterations of core mark. So from that we can calculate the energy required per one core mark iteration or it's inverse which is in fact the ULP mark CM score. If we plug in the numbers we get a ULP mark CM score of about 18 which is close enough to what we publish at the embassy website. Of course the accuracy of this measurement depends on how closely we can zoom in into the actual core mark execution. So this is only an approximate result. Here is another comparison of STM32 L5 and one of our competitor that is also based on Cortex M33. So these are the same. Both microcontrollers are executing 1500 iterations of core mark. The VDD voltage is 3 volts and the code runs from flash. L5 runs at 80 MHz and the other device at 150 MHz and as you remember from before they have the same performance level at those frequencies. I will note that L5 is 16.9% more efficient. I will now hand over to Simon. Thank you Martin. Let me continue with next section which is related to STM32 L5 low power modes. As already mentioned before STM32 L5 has wide range of operating modes. On this picture you can see their overview Starting from the top there is a run mode. We can notice that for maximum system frequency which is 110 MHz consumption is on the level of 11 mA. Of course it is possible to use so called clock gating and reduce system clock. For 80 MHz and 26 MHz consumption is 7 mA respectively. In case of significant system clock reduction for example to 2 MHz it is possible to use low power run mode. In such case consumption is on the level of 320 mA. In sleep mode core is stopped while rest of resources can be active including clock domain. As frequency as previously consumption is reduced to 230 mA and wake up time is very fast. Then we have 3 stop modes. In these modes both core and peripherals are stopped. These are typical low power modes where a microcontroller is waiting for an interrupt to wake up and to resume the application. Wydaje się o typu stop modes consumption is different, wake up time is also different and additionally number of wake up sources is different. Lowest current consumption can be achieved by using standby, shutdown and VBAT modes down to few nanoamps. However we need to consider limited number of wake up sources and longer wake up time. 1332L5 offers a variety of low power modes and it gives you a flexibility to select the most suitable one for your application. When deciding about low power mode there are some parameters to consider like average consumption, peak current performance and reaction time. For example standby mode fits to the profile to be very low, reaction time doesn't need to be quick and time between wake up cycles can be long. On the other hand side low power sleep could be a good idea for a profile where reaction time has to be quick and there is a short period between wake up cycles. A good trade off between low average current consumption and short reaction time is stop to mode. STM32L5 is clearly an ultra low power microcontroller family. So important to highlight is that among different microcontroller families with ARM Cortex M33 available on the market STM32L5 will give you best in class ultra low power capabilities. In terms of low power modes here you can see a comparison between STM32L5 and other device based on the same core. If we compare stop mode, standby mode and shutdown mode with their equivalents from competition we can notice a huge difference. Both in terms of consumption and wake up time STM32L5 is multiple times better than competitor. Ok, so this is the end of this section again to Martino. Thank you Simon. In this last demonstration I would like to show you a typical example of low power application. The firmware will perform ADC acquisition with 1 kHz rate. When the sample acquisition is finished the microcontroller will enter stop to which means that all high speed clocks are turned off i wytrzymające się od low power LDO. Po wake up, przed następnym samochodziem microcontroller will continue executing exactly at the point where it left before. Content of the memory and the state of the processor is retained in stop to. I will also measure the consumption again thanks to the cube monitor power to show the importance of the wake up time and the consumption in low power mode in this case stop to. In this picture you can see the transition table between various low power modes on STM32L5. For this particular example we are going to use run mode at 4 MHz to take the sample and then enter stop to once it's finished. Let's now have a look at a consumption profile for 1 ADC sample acquisition. When the STM32L5 is in stop to the consumption is just a few microamps. When it wakes up and the core starts to execute at 4 MHz, the consumption is at a level of hundreds of microamps, so it's very important that the active phase is as short as possible. To wake up from stop to it takes approximately 5 ms. To start the ADC acquisition to store the value and clear the wake up flags take less than 30 ms. So the rest of the 1 ms window which is the acquisition rate of this application can be spent in stop to. To achieve low power consumption it's necessary to have low duty cycle between the active and inactive phase. Having a short wake up time is absolutely essential to achieve this goal. So now I will flash the test binary again by drag and dropping to the STLink mass storage. I will connect to the power shield and let's set the sampling frequency again to the maximum 100 kHz and let's set the acquisition time So what we should see is a consumption profile during 100, exactly 100 sample acquisition and let's apply the target with 3 volts. Again I will start the acquisition for the second time and here what we see is the periodic pattern during the ADC acquisition. What we don't see is the sharp transition between stop to and run mode but it's only because we are measuring on a shunt resistor close to the power supply so the waveform is sort of low pass filtered. Nevertheless the average current consumption is correct. The microcontroller consumes 47 microamps in this particular application. So this is all from me and for the last time I will hand over to Simon. Thank you Martin. So this was the last section of our webinar. I hope you enjoyed watching it. Before we finish let me provide you a short summary of the things that we discussed today. STM32L5 is a continuity of successful ultra low power STM32 families like L0, L1, L4 and L4+. Thanks to this fact STM32L5 has multiple solutions which address all phases of ultra low power application. High computation performance and power efficiency reduce both time and consumption of active phase. Low power modes reduce consumption short wake up time reduces inactive to active transition phase. Finally smart low power peripherals offload CPU and extend period of inactive phase. In consequence we are very confident to say that STM32L5 is best in class ultra low power Cortex M33 based microcontroller family. Last information which we would like to share with you are references. STMicroelectronics created application nodes which are focusing on STM32 and ultra low power. If you are interested to know more about topics that we discussed today, we encourage you to check the documents mentioned on the slide. Thank you.