 Hello, and welcome to this presentation of the STM32H7 FMAC block. It covers the main features of this block, which is used to perform background signal filtering tasks autonomously. The FMAC unit is built around a fixed point multiplier and accumulator, or MAC. The MAC units receive two fixed point 16-bit operands from an internal 256x16-bit RAM and write the result back to this memory. The address of the input values in local memory is determined using a set of pointers. These pointers can be loaded, incremented, decremented, or reset by the internal hardware. Software does not address them directly. The unit allows frequent or lengthy filtering operations to be offloaded from the CPU, freeing up the processor for other tasks. Filter functions FIR and IIR can be realized by the FMAC. Typical applications requiring these filters are motor control, audio, power supply, lighting, and analog sensing. The FMAC offloads the CPU by executing background signal filtering tasks autonomously, thus freeing up the CPU MIPs for other tasks. The FMAC unit enables the user to select the filter type, the filter order, and the coefficients, which are all programmable. This figure details the architecture of the MAC unit. X is the input sample buffer containing the raw samples to be filtered. B is the array of coefficients of the filter to be applied to X samples. X and B have the same size, N plus 1 entries. Y is the output sample buffer containing the results of the filtering. A is the array of coefficients of the filter to be applied to Y samples. Y and A have the same size, M plus 1 entries. The number of MAC operations to obtain YN equals N plus 1 plus M. N plus 1 max to multiply accumulate vector X and vector B. M max to multiply accumulate vector Y, N minus 1, N minus M, and vector A. Inputs and outputs of FMAC use the fixed point signed integer Q1.15 format. In Q1.15 format, the numeric range is 1, 0.800 to 1 minus 2 to the minus 15th, 0.7FFF. 72-bit single precision floating point numbers can be converted to or from Q1.15 format by dedicated conversion instructions that are executed in the Cortex M7FPU. This figure details the various formats used internally by the FMAC. The output of the multiplier in Q2.30 format is truncated to Q2.22 added to the accumulator LSB aligned. The accumulator has 26 bits of which 22 are fractional and 4 are integer sign, Q4.22. The extra integer bits allow the accumulator to support partial accumulation in the range minus 8, 0 times 4, 0, 0, 0, 0, 0, 2 plus 8, 0 times 3, FFF, FFF. This can occur if there are a large number of successive positive or negative coefficients. When the filter gain is less than unity for all frequencies, the accumulator value always returns to the range plus or minus 1. If the partial sum exceeds the accumulator numeric range or wraps, a sticky flag is set to help debugging. Nevertheless, provided subsequent additions undo the wrapping, a correct result is still obtained. A programmable gain can be applied at the output of the accumulator, from 0 dB to 42 dB in steps of 6 dB. This is necessary for IIR filter implementation. The FMAC unit performs arithmetic functions on vectors, which are arrays of 16-bit fixed-point scalar values. These vectors are allocated in the local SRAM. Software is in charge of configuring the X1 and X2 operand buffers and Y output buffer through X1 BUFCFG, X2 BUFCFG and YBUFCFG registers. The base addresses can be chosen anywhere in internal memory, provided that all buffers fit within the internal memory address range 0x00 to 0xFF. Buffer base address and size have to be programmed. Note that X1, X2 and Y buffers may overlap. These buffers are not visible in the CPU mapping. Before starting a filtering operation, the CPU or DMA controller initializes the contents of input buffers using the initialization functions and writing to the W data register. The contents of input buffers can be either data to be filtered or filter coefficients. The data is transferred to the location within the target buffer, indicated by a write pointer. After each new write, the write pointer is incremented. When the write pointer reaches the end of the allocated buffer space, it wraps back to the base address. Regarding the X1 buffer, if the number of free spaces in the buffer is less than the watermark threshold programmed in the full WM field of the FMAC X1 BUFCFG register, the buffer is flagged as full. As long as the full flag is not set, interrupts or DMA requests are generated, if enabled, to request more data for the buffer. The watermark allows several data to be transferred under one interrupt without danger of overflow. Nevertheless, if an overflow does occur, the OVFL error flag is set and the write data is ignored. The write pointer is not incremented in the event of an overflow. Regarding the Y buffer, if the number of unread data in the buffer is less than the watermark threshold programmed in the empty WM field of the FMAC YBUFCFG register, the buffer is flagged as empty. As long as the empty flag is not set, interrupts or DMA requests are generated, if enabled, to request reads from the buffer. The watermark allows several data to be transferred under one interrupt without danger of overflow. Nevertheless, if an overflow does occur, the UNFL error flag is set. In this case, the read pointer is not incremented and the read operation returns the content of the memory at the read pointer address. Each multiplication takes a value from the X1 buffer and a value from the X2 buffer and multiplies them together. The pointer in the control unit generates the read address offset relative to the buffer base address for each value. The pointers are managed by hardware according to the current function. This figure explains the X1 buffer operation. When the write pointer reaches the end of the buffer, it wraps back to the beginning. If available space in the buffer is less than the transfer size, the input buffer full flag is activated. At the top of the input set, Xn equals the write pointer, in other words, no new sample available, the filter stalls until a new sample is available. The processor or DMA controller must ensure that the new sample, Xn plus 1, is available in the buffer space when required. If not, the buffer is flagged as empty, which stalls the execution of the unit until a new sample is added. No underflow condition is signaled on the X1 buffer. The X1 buffer can be used as a circular buffer. New data are continually transferred into the input buffer whenever space is available. The write pointer automatically wraps around when it reaches the last 16-bit entry in the buffer, as shown in the figure. Preloading this buffer is optional for digital filters, since if no input samples have been written in the buffer when the operation is started, it is flagged as empty, which triggers the CPU or DMA to load new samples until there are enough to begin operation. Preloading is nevertheless useful in the case of a vector operation, that is, the input data is already available in system memory and circular operation is not required. The X2 buffer is used to store coefficients. It is usually loaded once during the initialization of the FMAC. Consequently, it does not support the circular dressing mode. This figure summarizes the operation of the input buffers. During step one, the filter calculates Yn from Xn-7 to Xn and loads the next four samples. During step two, Yn is now calculated. The sample Xn-7 is removed. The n is incremented. The filter calculates Yn from Xn-7 to Xn. No new sample is loaded. During step three, Yn is now calculated. The sample Xn-7 is removed. Then n is incremented. The filter calculates Yn from Xn-7 to Xn. No new sample is loaded. During step four, Yn is now calculated. The sample Xn-7 is removed. Then n is incremented. The filter calculates Yn from Xn-7 to Xn. Four new samples are loaded. Since the upper address of the buffer has been reached, a wrap around to the beginning of the buffer occurs. This figure explains the Y buffer operation. When the right pointer reaches the end of the buffer, it wraps back to the beginning. A read pointer designates the oldest unread sample corresponding to the output data register. When a sample is read and is not part of the output set, then the space becomes free. If the right pointer equals the read pointer or the least recent sample in the output set, Yn-m, the filter stalls and the output buffer full flag is set. This figure summarizes the operation of the output buffer. During step one, the filter calculates Yn from Yn-7 to Yn-1. Eleven samples are unread. During step two, n is incremented. The filter calculates Yn from Yn-7 to Yn-1. Software or DMA reads four samples, which shifts the read pointer to the oldest sample. However, samples Yn-7 to Yn-5 are not deallocated because they are used in the current calculation. The F4 sample is unread. The F4 sample is unread. The F4 sample is unread. The F4 sample is unread. The F4 sample is unread. The F4 sample is unread. Because they are used in the current calculation. The FIR function performs a convolution of a vector B of length n plus 1 containing the filter coefficients and a vector X of indefinite length containing the sampled data. To implement the FIR in the FMAC, the buffers are used as follows. X1 buffer contains the elements of vector X. It is a circular buffer of length n plus 1 plus D. X2 buffer contains the elements of vector B. It is a fixed buffer of length n plus 1. Y buffer contains the output values Yn. It is a circular buffer of length D. Here are the parameters. The parameter P contains the length n plus 1 of the coefficient vector B and the range 2 to 127. The parameter R contains the gain to be applied to the accumulator output. The value output to the Y buffer is multiplied by 2 to the R where R is in the range 0 to 7. The parameter Q is not used. The function completes when the start bit in the FMAC param register is reset by software. The FIR requires n coefficients and n input samples to calculate one output sample. To optimize throughout, the input buffer size should be larger than n in order to load the next samples while the filter is working on the current set. For example, when using 4-beat DMA transfers, epsilon should be set to 4. Also, the size of the output buffer should be set to epsilon to transfer resulting samples in a unique AHB burst transaction. The IIR filter output vector Y is the convolution of a coefficient vector B of length n plus 1 and a vector X of indefinite length plus the convolution of the delayed output vector Y with a second coefficient vector A of length m. To implement the IIR in the FMAC, the buffers are used as follows. X1 buffer contains the elements of vector X. It is a circular buffer of length n plus 1 plus D. X2 buffer contains the elements of coefficient vectors B and A concatenated. B0, B1, B2, Bn, A1, A2, Am. It is a fixed buffer of length m plus n plus 1. Y buffer contains the output values Yn. It is a circular buffer of length m plus D. Here are the parameters. The parameter P contains the length n plus 1 of the coefficient vector B in the range 2 to 64. The parameter Q contains the length m of the coefficient vector A in the range 1 to 63. The parameter R contains the gain to be applied to the accumulator output. The value output to the Y buffer is multiplied by 2 to the R, where R is the range 0 to 7. The function completes when the start bit in the FMAC parameter register is reset by software. The FIR requires n feed forward coefficients and m feedback coefficients, m being lower than n. The input buffer size should be n plus epsilon, epsilon being the number of data in a DMA burst. The output buffer size should be m plus epsilon. The clock reference of the FMAC module is the AHB clock H clock. H clock maximum frequency is half of CPU max frequency. N tap filter such as FIR requires n multiplications and additions per output sample, knowing that each MAC requires two memory reads, thus two clock cycles. As a consequence, the maximum sample rate is H clock divided by 2 times n. N must be lower than H clock frequency, divided by 2 times the maximum sample rate frequency. Assuming the H clock frequency of the STM32H7 is 275 megahertz, we obtain maximum filter size at FS equals 2 MSPS is n is less than 68 taps. Maximum sample rate for n equals 127 taps is FS is less than 1.08 megahertz. Flow control can be source, sync, or filter driven. This slide describes the source driven flow control sequence. The source of the samples, ADC, I2C, defines the sample data rate. The source request the DMA or CPU to transfer data to the filter input buffer. The filter operates at a faster clock rate than 2n times the source sample rate. When the input buffer is empty, next sample not available, the filter stalls waiting for new data. When the output data is not empty, one or more samples available, an output channel DMA request or interrupt is generated. The DMA or CPU transfers the output samples to memory or another peripheral, such as DAC or PWM. This slide describes the filter driven flow control sequence. The filter clock rate determines the throughput. An input channel DMA request or interrupt is generated whenever the input buffer is not full. The DMA or CPU transfers data into the input buffer from memory or another peripheral. As long as data is available in the input buffer, the filter generates new output samples. When the output buffer is not empty, an output channel DMA request or interrupt is generated. The DMA or CPU transfers data from the output buffer to memory or another peripheral. This slide describes the sync driven flow control sequence. The destination of the samples, DAC, I2C, defines the sample data rate. The destination request the DMA or CPU to transfer data from the filter output. The filter operates at a faster clock rate than 2N times the destination sample rate. When the output buffer is full, the filter stalls. When the input buffer is not full, an input channel DMA request or interrupt is generated. The DMA or CPU transfers samples from memory or another peripheral to the input buffer. The FMAC executes the filter algorithm when the input buffer is not empty and the output buffer is not full. A flag called X1 full is set if the number of available spaces in X1 buffer is less than the full WM threshold. A DMA request can be generated when this flag is not set in order to fill the X1 buffer. A flag called Y empty is set if the number of unread data is less than the empty WM threshold. A DMA request can be generated when this flag is not set in order to empty the Y buffer. The management of buffers can also be performed by software relying on interrupt requests that can be asserted when either flag is inactive. The filter clock frequency must be chosen according to the chosen flow control scheme. The FMAC unit is active in run, low power run, sleep and low power sleep modes. It is not available in other low power modes. These peripherals may need to be specifically configured for correct use with the FMAC block. Please refer to the corresponding peripheral training modules for more information.