 Hello and welcome to this video on how to get started using STM32Cube.AI, the new STM32CubeMX extension to bring pre-trained artificial neural networks to STM32 devices. Deploying artificial intelligence on STM32 microcontrollers allows you to enable a breakthrough technology at the edge closer to the sensor on embedded low-power MCU devices. The STM32Cube.AI extension, referred to as X-Cube AI, is built around a groundbreaking library generator to import and efficiently convert pre-trained artificial neural networks developed with popular deep learning frameworks such as Keras, Café and Lausanne. We will show you how to take an existing pre-trained artificial neural network and use the X-Cube AI to generate an embedded library that runs on your STM32 device. We will show you how to optimize your artificial neural network with weight compression and validate the network implementation on your target STM32 device. To follow along with this getting started, you will need a Cortex-M4 or M7-based STM32MCU, such as one of the F4, L4 or F7 families, but we also support the new L5 family based on the Cortex-M33. You will also need a Keras model, so we will show you how to download this model on GitHub, and QBMX version 5.0 or later, and of course a compatible ID to compile the code that can be Etolic True Studio, IAR, Kyle or System Workbench. To download the model we are going to use in this video, you will go to github.com using the following URL and then you will click on model.h5 and on the top right you will see a download button, click on it and save it on your desktop or in your download folder and then we are going to rename it to har underscore github.h5. Let's get started by installing the X-Cube AI QBMX extension. So first you will want to open STM32QBMX, then go to install or remove embedded software packages, go to the STMicroelectronics tab and under X-Cube AI select the extension and click install now. To start a new project you will go into file, new project and then as you can see you will have the MCU selector window and you will scroll down on the left to enable the AI filter to filter out only the supported part numbers. To further filter out the list you can provide the model that we previously downloaded, so model type Keras, type saved model, model point it to the download directory har github and click analyze. So when you click analyze the X-Cube AI plugin will analyze the network and give you some information related to the minimum memory footprints required to run this model. So as you can see it will give you a minimum required flash memory of almost 3 megabytes. This is too big to run on most microcontroller embedded flash memories. So we've added a feature called compression where you can compress the weights of the model and we'll start by compressing by four, click analyze again and now the minimum flash required is down to 700 kilobytes. So now most of the microcontrollers have between one and two megabytes of flash and in this tutorial we're going to use an F746 MCU. So let's go back to the part number search type F746 ZG and select the ZGT in an LQFP 144 package and then click start project. When your project is open we'll start by enabling the X-Cube AI extension. Go ahead and click additional software and then under X-Cube AI core you'll start by enabling the core to include the AI library then go to X-Cube AI application and here you have three different types of application templates that can be provided, have system performance, to benchmark the neural network implementation on the target MCU, validation to benchmark and compare the computed results and application template to build a user application on top of this template. So in this video we'll use the application which will report both performance numbers and validate the network on the device. Then you'll need to enable the plugin in your project so you'll click on additional software, STMI-Corelectronics X-Cube and then enable both the core and your application. So now you're in the main X-Cube AI window you'll have some general information about the overall implementation on your MCU and then a list of model and some quick information. So if we go to the network tab we can see that the model has been preloaded with the model that we selected earlier. We can rename it let's say HR GitHub and under model type Keras and we want to say saved model and of course we need to apply some compression so let's say a compression of 4 click analyze and we have some the information that was reported earlier. Next we want to validate on the desktop to see if there has been any numerical degradation by the compression factor that was applied and as you can see it's a success. Next we'll want to configure the rest of the MCU to run this AI validation application. So as you can see the CRC IP has been enabled by default by the tool to run the AI library and then we will want to configure the UART interface to communicate with the device. So in the pinout view use the search menu to look for PD-8 and PD-9 those will be the UART pin connected to the ST-Link Virtual Comport. Click on PD-8 and select Usart 3 TX and do the same for PD-9 selecting Usart 3 RX. Next you'll want to go to the Usart 3 configuration menu and select Asynchronous mode and you can leave all the default parameters. 115-200 bits per second, 8-bit word length and no parity. If you are using a different board make sure to check your user manual to see for the correct pins. Now that the Usart peripheral is configured you want to configure your cords for optimal performance. Go to System Core and select Cortex M7 and you can run it either through the AXI interface or through the TCM interface to use the ART accelerator but for this tutorial we'll use the AXI interface and then you'll want to make sure to enable the iCache and the CPUD cache. Next you'll go to the clock configuration and we'll want to bump up the speed to the maximum supported frequency that is 216 MHz. As you can see QMX is able to automatically configure the PLL and change the clock source if necessary to run at the desired clock speed. Next go back to the Pinout and Configuration tab to return to the XCube AI configuration window. Finally we want to configure the XCube AI application settings. So under the Platform Settings tab select Usart Asynchronous and you should be able to see Usart 3. If you use the different UART interface it should show up here. Now that your project is all configured you'll want to go ahead and generate the code. So first go into Project Manager, give your project a name so you can call it My Project and you can give your project a location. In my case it will be on my desktop and select your tool chain. To run the validation application you need to increase the heap size to 0x2000 and then you can click Generate on the top right. Once your project is open go ahead and click Project Make to compile the project. While the project is compiling we'll go over the different components of this project. Under Library you have Network Runtime.a. This is the library that will be used by the AI Core component. Next in middleware under staiai data you have hirgithub underscore data dot c. This will be the file containing all the weights of your network. If you have compressed the network you can see a smaller list there. Under Source you have hirgithub.c and this will be the calls to the AI library and then you can right click on hirgithub.h and this will show the public API that can be used by your application to call the network. So you have different calls but the most important ones are ai hirgithub create, ai underscore hirgithub init, and ai hirgithub underscore run. Run is the function that will actually be used to run your network and your user application code is under application user and you have your main dot c that will start by configuring the cache, the hal init, the system clock and then the ai cube init and then you have a cube ai process function. This is where all the validation process happens. Now that your code is compiled with no errors and no warnings you can go ahead and connect your USB cable to your board both the board and the PC and you should see a steady red LED indicating that the st link has been correctly initialized. Next go to project, download, download active application. You can also launch a debug session if you prefer but is not required for this validation application. Once the project has been downloaded onto the flash you need to go ahead and reset the board to run the validation application. Now that your active application has been programmed go back to QBMX, go under hirgithub tab, click validate on target and then you can either use the automatic configuration or manually select the comport. Here I know that my board is connected on con 20 and then I click OK and then the validation process is starting. I would recommend that you pull up the output log to see the progress and you can see the progress of the validation on target going. Once the validation process is complete the report shows the average time spent on each layer and the average time spent per inference. You can also see the reported complexity in Mac, multiply, accumulate and the ROM size and the RAM size. If this error is calculated using the L2 norm and if it is below a certain threshold the tool will report that there has been no numerical degradation and in this case we were validating using random numbers. Next we will show you how you can also use a custom data set to validate using your input data. So go back to the configuration window and in validation you'll select from custom data and you'll point it to a custom CSV file with all your input vectors. Go ahead and click validate on target. This time we'll use automatic. Wait for the process to finish. So as you can see in both cases the validation was successful meaning that there has been no or very little numerical degradation due to the compression and the C implementation. If you choose a higher compression let's say a compression factor of 8 and you see that your validation process fail it doesn't necessarily mean that you will lose accuracy on your implementation. Some further analysis is required and you will have to look at the output of your network. Now that you validated your network implementation on both your host machine and your target MCU you can either remove the validation application or switch to the application template and use the API described in hirgithub.h. For more information please visit st.com slash stm32cubai and you will find other videos, example code, function packs and other information. Thank you for watching!