 Hello and welcome to this presentation of the STM32G4 Cordic Co-Processor block. It will cover the main features of this block, which is used to accelerate trigonometric functions. The Cordic Co-Processor provides hardware acceleration of certain mathematical functions, notably trigonometric commonly used in motor control, metering, signal processing and many other applications. It speeds up the calculation of these functions compared to a software implementation, allowing a lower operating frequency or freeing up processor cycles in order to perform other tasks. The Cordic block is an AHB slave that inserts weight state when the Cortex-M4 requests the result until the operation is completed. No input-output driver is therefore needed. Another approach consists in enabling the Cortex-M4 to handle other processing while the Cordic calculation is in progress. In this case, an interrupt request indicates that the result is available. DMA channels can be implemented to provide the arguments from memory and to write the result to memory. The Cordic block supports a pipelined operation. Next arguments can be provided while the calculation with the current arguments is in progress. Note that the Cordic block is a fixed point arithmetic accelerator. Cordic, which means coordinate rotation digital computer, is a hardware-efficient iterative method which uses rotations to calculate a wide range of elementary functions. In trigonometric circular mode, the sine and cosine of an angle theta are determined by rotating the unit vector 1, 0 through decreasing angles until the cumulative sum of the rotation angles equals the input angle. The x and y Cartesian components of the rotated vector then correspond respectively to the cosine and sine of theta. Inversely, the angle of a vector x, y corresponding to arc tangent of y over x is determined by rotating vector x, y through successively decreasing angles to obtain the unit vector 1, 0. The cumulative sum of the rotation angles gives the angle of the original vector. The Cordic algorithm can also be used for calculating hyperbolic functions like hyperbolic sine, hyperbolic cosine or inverse hyperbolic tangent by replacing the successive circular rotations by steps along a hyperbola. This slide indicates the list of the 10 supported mathematical functions. The first step when using the coprocessor is to select the required function by programming the func field of the Cordic CSR register accordingly. Consequently, only one function is active at a time. Several functions take two input arguments, arc 1 and arc 2 and some generate two results simultaneously, rest 1 and rest 2. This is a side effect of the Cordic algorithm and means that only one operation is needed to obtain two values. This is the case, for example, when performing polo-to-rectangular conversion. Sine theta also generates cosine theta while cosine theta also generates sine theta. Similarly, for rectangular-to-polar conversion, phase x, y, modulus x, y and for hyperbolic functions, cos theta sin theta. In q1.31 format, numbers are represented by one sine bit and 31 fractional bits. The numeric range is therefore minus 1 or 0x80 million to 1 minus 2 raised to the negative 31 or 0x7fffff. The precision is 2 raised to the negative 31, i.e. around 5 times 10 raised to the negative 10. In q1.15 format, the numeric range is 1 or 8x8000 to 1 minus 2 raised to the negative 15 or 0x7fff. This format has the advantage that two input arguments can be packed into a single 30-bit write and two results can be fetched in 132-bit read. However, the precision is reduced to 2 raised to the negative 15, i.e. around 3 times 10 raised to the negative 5. Angles are expressed in radian divided by pi. Consequently, only the interval minus 1 plus 1 is used. Several of the functions specify a scaling factor, scale. This allows the function input range to be extended to cover the full range of values supported by the CORDIC without saturating the input, output or internal registers. If the scaling factor is required, it should be calculated in software and programmed into the scale field of the CORDIC CSR register. The input arguments should be scaled accordingly before programming the scaled values in the CORDIC W data register. The scaling should also be undone on the results read from the CORDIC R data register. Note that the scaling factor entails a loss of precision due to truncation of the scaled value. The precision of the result is dependent on the number of CORDIC iterations. The algorithm converges at a constant rate of one binary digit per iteration for trigonometric functions. For hyperbolic functions, hyperbolic sane, hyperbolic cosine and natural logarithm, the convergence rate is less constant due to the peculiarities of the CORDIC algorithm. The square root function converges at roughly twice the speed of the hyperbolic functions. The format of arguments and results is independently programmed in the field's arg size and res size of the CORDIC CSR register, either q1.15 or q1.31. Internally, the CORDIC accelerator implements the q1.23 format. This means that rounding errors start to become significant at a precision of 2 raised to the negative 19. Continuing CORDIC iteration after the maximum precision has been reached will degrade the precision gradually. For maximum precision, q1.31 format should be used for input and output. However, given the format implemented internally, the output is limited to 20-bit precision at best. If q1.15 format is used for input, the precision will be limited to q1.15 whatever the output format. The precision required depends on the number of iterations which has to be programmed in the field precision of the CORDIC CSR register. The number of iterations is equal to the value programmed in this field multiplied by 4. For maximum speed, the minimum number of iterations for the required precision should be programmed. Note that for most functions, the recommended range for this field is 3 to 6. This slide describes the features of the cosine function. The primary argument is the angle theta in radians. It must be divided by pi before programming arc 1. The secondary argument m is the modulus. If m is greater than 1, a scaling must be applied in software to adapt it to the q1.31 range of arc 2. The primary result, res1, is the cosine of the angle multiplied by the modulus. The secondary result, res2, is the sine of the angle multiplied by the modulus. This slide describes the features of the cosine function. The primary argument is the angle theta in radians. It must be divided by pi before programming arc 1. The secondary argument m is the modulus. If m is greater than 1, a scaling must be applied in software to adapt it to the q1.31 range of arc 2. The primary result, res1, is the sine of the angle multiplied by the modulus. The secondary result, res2, is the cosine of the angle multiplied by the modulus. This slide describes the features of the phase function. The primary argument is the x-coordinate, that is the magnitude of the vector in the direction of the x-axis. If absolute value of x is greater than 1, a scaling must be applied in software to adapt it to the q1.31 range of arc 1. The secondary argument is the y-coordinate, that is the magnitude of the vector in the direction of the y-axis. If absolute value of y is greater than 1, a scaling must be applied in software to adapt it to the q1.31 range of arc 2. The primary result, res1, is the phase angle theta of the vector v. Res1 must be multiplied by pi to obtain the angle in radians. Note that values close to pi may sometimes wrap to minus pi due to the circular nature of the phase angle. The secondary result, res2, is the modulus given by absolute value of v equals square root of x-square plus y-square. If absolute value of v greater than 1, the result in res2 will be saturated to 1. This slide describes the features of the modulus function. The primary argument is the x-coordinate, that is the magnitude of the vector in the direction of the x-axis. If absolute value of x is greater than 1, a scaling must be applied in software to adapt it to the q1.31 range of arc 1. The secondary argument is the y-coordinate, that is the magnitude of the vector in direction of the y-axis. If absolute value of y is greater than 1, a scaling must be applied in software to adapt it to the q1.31 range of arc 2. The primary result, res1, is the modulus given by absolute value of v equals square root of x-square plus y-square. If absolute value of v is greater than 1, the result in res1 will be saturated to 1. The secondary result, res2, is the phase angle theta of the vector v. Res2 must be multiplied by pi to obtain the angle in radians. Note that values close to pi may sometimes wrap to minus pi due to the circular nature of the phase angle. This slide describes the features of the arc tension function. The primary argument, arc 1, is the input value x equals tension of theta. If absolute value of x is greater than 1, a scaling factor of 2 raised to the negative n must be applied in software such that x times 2 raised to the negative n is greater than minus 1 and lower than 1. The scaled value x times 2 raised to the negative n is programmed in arc 1 and the scale factor n must be programmed in the scale parameter. Note that the maximum input value allowed is tangent theta equals 128, which corresponds to an angle theta equals 89.55 degrees. For absolute value of x greater than 128, a software method must be used to find arc tension of x. The secondary argument, arc 2, is unused. The primary result, res1, is the angle theta equals arc tension of x. Res1 must be multiplied by 2 raised to the n times pi to obtain the angle in radians. The secondary result, res2, is unused. This slide describes the features of the hyperbolic cosine function. The primary argument is the hyperbolic angle x. Only values of x in the range minus 1.118 to plus 1.118 are supported. Since the minimum value of cos x is 1, which is beyond the range of the q1.31 format, a scaling factor of 2 raised to the negative n must be applied in software. The factor n equals 1 must be programmed in the scale parameter. The secondary argument, arc 2, is unused. The primary result, res1, is the hyperbolic cosine cos x. Res1 must be multiplied by 2 to obtain the correct result. The secondary result, res2, is the hyperbolic sine sine hx. Res2 must be multiplied by 2 to obtain the correct result. This slide describes the features of the hyperbolic sine function. The primary argument is the hyperbolic angle x. Only values of x in the range minus 1.118 to plus 1.118 are supported. For all input values, a scaling factor of 2 raised to the negative n must be applied in software where n equals 1. The scaled value, x times 0.5 is programmed in arc 1 and the factor n equals 1 must be programmed in the scale parameter. The secondary argument, arc 2, is unused. The primary result, res1, is the hyperbolic sine sine hx. Res1 must be multiplied by 2 to obtain the correct result. The secondary result, res2, is the hyperbolic cosine cos x. Res2 must be multiplied by 2 to obtain the correct result. This slide describes the features of the hyperbolic arc tangent function. The primary argument is the input value x. Only values of x in the range minus 0.806 to plus 0.806 are supported. The value x must be scaled by a factor 2 raised to the minus n where n equals 1. The scaled value, x times 0.5 is programmed in arc 1 and the factor n equals 1 must be programmed in the scale parameter. The secondary argument, arc 2, is unused. The primary result, res1, is the hyperbolic arc tangent of x. Res1 must be multiplied by 2 to obtain the correct value. The secondary result is not used. This slide describes the features of the natural logarithm function. The primary argument is the input value x. Only values of x in the range 0.107 to 9.35 are supported. The value x must be scaled by a factor 2 raised to the negative n such that x times 2 raised to the negative n is lower than 1 minus 2 raised to the negative n. The scaled value, x times 2 raised to the negative n is programmed in arc 1 and the factor n equals 1 must be programmed in the scale parameter. The secondary argument is unused. The primary result, res1, is the natural logarithm. Res1 must be multiplied by 2 raised to the n plus 1 power to obtain the correct value. The secondary result is not used. This slide describes the features of the square root function. The primary argument is the input value x. Only values of x in the range 0.027 to 2.34 are supported. The value x must be scaled by a factor 2 raised to the negative n such that x times 2 raised to the negative n is lower than 1 minus 2 raised to minus n minus 2 power. The scaled value, x times 2 raised to the negative n is programmed in arc 1 and the factor n equals 1 must be programmed in the scale parameter. The secondary argument is unused. The primary result, res1, is the square root of x. Res1 must be multiplied by 2 raised to the n to obtain the correct value. The secondary result is not used. The software that subcontracts a calculation to the chordic block doesn't need to pull a flag to determine when this calculation is completed. It simply initiates a read request of the rdata register through the AHB bus. As any AHB transaction, the slave is permitted to insert weight states by maintaining H-ready signal low. Once the results are available, the chordic block asserts H-ready, which completes the transaction. In the meantime, the Cortex-M4 processor is frozen. This approach is called zero overhead mode. As soon as the results have been read from rdata in one or two reads depending on the value of nrs, the pending operation is started. A new set of arguments and settings can be written as long as there are no operation pending. This means that time spent waiting for the chordic operation to complete can be used to prepare the next operation and the chordic is never idle. The chordic CSR register can be reprogrammed while a calculation is in progress without affecting the result of the ongoing calculation. The sequence described in this slide summarizes the use of the chordic IP in zero overhead mode assuming a single-shot operation. No further calculation is scheduled so the processor simply waits for the completion of the current operation. The sequence described in this slide summarizes the use of the chordic IP in zero overhead mode assuming pipelined operations. By iterating the steps 3 to 6, software can re-execute the same operation for an array of arguments. The seventh step is required to obtain the result of the last operation. The sequence described in this slide summarizes the use of the chordic IP in polling mode. When a new result is available in the chordic R data register, the RRDY flag is set in the chordic CSR register. The flag can be pulled by reading this register. It's reset by reading the chordic R data register once or twice depending on the NRAS field of the chordic CSR register. Polling the RRDY flag takes slightly longer than reading the chordic R data register directly since the result is not read as soon as it's available. However, the processor and bus interface are not stalled while reading the chordic CSR register so this mode may be of interest if stalling the processor is not acceptable, for instance when low latency interrupts must be serviced. The sequence described in this slide summarizes the use of the chordic IP in interrupt mode. By setting the interrupt enable or IEN bit in the chordic CSR register, an interrupt will be generated whenever the RRDY flag is set. The interrupt is cleared when the flag is reset. This mode allows the result of the calculation to be read under interrupt service routine and hence given a priority relative to other tasks. However, it's slower than directly reading the result or pulling the flag due to the interrupt handling delays. DMA mode is very efficient when performing multiple calculations using the same settings. It's not possible to modify the chordic CSR register by DMA. Consequently, if the settings need to be changed, the DMA should be stopped first and restarted once the new settings have been programmed. DMA writes can be combined with DMA, polling or interrupt read methods. Pipelining is always used in DMA mode. DMA write requests are enabled by setting the DMA WEN bit in the chordic CSR register. The purpose of this slide is to compare the performance of chordic and ARM fast math when calculating a fixed point sign. By using the Q1.15 and Q1.31 formats, performance ratios are identical. Chordic is 5 times faster in 0 overhead mode and 3 times faster in DMA in and out mode. Chordic is 5 times faster in 0 overhead mode and 3 times faster in DMA in and out mode. The purpose of this slide is to compare the performance of chordic and ARM fast math when calculating a floating point sign. Chordic is 3 times faster in 0 overhead mode, including the conversion time from float 32 to int 32 and back. The purpose of this slide is to compare the performance of chordic and ARM fast math when calculating a fixed point square root. By using the Q1.15 and Q1.31 formats, performance ratios are identical. Chordic is 14 times faster in 0 overhead mode. The purpose of this slide is to compare the performance of chordic and ARM fast math when calculating a floating point square root. Chordic is 1.3 times faster in 0 overhead mode, including the conversion time from float 32 to int 32 and back. The purpose of this slide is to compare the performance of chordic and ARM fast math when calculating a fixed point Q1.15 park transform. Chordic is 5 times faster in 0 overhead mode. The chordic unit is active in run, low power run, sleep and low power sleep modes. It is not available in the other low power modes. These peripherals may need to be specifically configured for correct use with the chordic block. Please refer to the corresponding peripheral training modules for more information.