 Hello and welcome to this presentation of the STM32 Debug and Trace interface. It covers the debug and trace capabilities offered by STM32 MP1 devices. The STM32 MP1 microprocessor incorporates all the familiar debug capabilities provided by the STM32 family of MCUs. It includes flash download, breakpoint debugging, register and memory view, serial wire trace, and adds high bandwidth instruction trace as well as cross triggering capabilities in multi-core versions of the STM32 MP1 family. The debug and trace infrastructure uses the ARM CoreSight standard well supported by most tool providers. The debug and trace infrastructure is composed of four distinct functional domains. Debug access infrastructure includes the debug port SWJDP and access ports, or APs, which allow access by an external debugger to the target's trace and debug features. Trace infrastructure includes the serial or SWO and parallel or TPIU trace ports, the trace FIFO or ETF used for smoothing the trace flow, and the trace funnels which combine the trace from each source into a single flow. There is also a system trace module or STM which allows software generated debug information as well as hardware events to be traced. Portex A7 Core includes the processor, single or dual core and embedded trace module or ETM. Cortex M4 Core includes the processor and associated debug and trace units, DWT, FPB, ITM, and ETM. In addition, there are system debug features including cross trigger interfaces and matrix or CTI and CTM. These allow simultaneous halting of both cores, triggering of trace, etc. Global Timestamp Generator provides a common time reference for the different trace sources. DBGMCU provides proprietary features such as freezing of timers during debug. External trigger input and output allows an external signal to trigger, debug, or trace or generates a trigger pulse for synchronizing external equipment or components. The debug port is available on dedicated pins on all STM32MP1 packages. Serial wire debug uses a special serial code driven by the debugger on the SWDIO or JTMS input. This is recognized by the SWJDP which switches to SWD mode after reset JTAG mode is configured by default. STLink and most third party debug adapters, for example ULink, support serial wire debug. AP0 allows access to the AXI interconnect. This gives the debugger direct access to all memory and peripheral registers. AP1 allows access to the debug and trace features on the system APB debug bus, namely the Cortex A7 debug features and the trace subsystem. AP2 allows access to the debug and trace features integrated in the Cortex M4 processor core via its internal AHB bus. Applications running on either processor can access the debug features located on the system debug bus since they are mapped in the unified address space. This includes the trace subsystem, STM, TPIU, TSGEN and ETF, as well as the Cortex A7 features, ETM, CTI and DBG. However, only the Cortex M4 can access features on its private bus. The authentication signal states are set in the boot and security or BSEC unit. The default state of these signals are determined by the factory state of the device, open or closed. The state can be modified by a secure software. The debugger must use secure privilege transactions to access secure addresses. These transactions only succeed if the SPIDEN signal is asserted. All debug-related registers in the Cortex A7 core are accessed via the system debug bus, APBD, through access port AP1. The ROM table contains pointers to the base addresses of each debug component in the core. ROM tables are used by some debug tools to automatically detect the topology of the core site infrastructure in the target. The debug unit or DBG contains the registers for controlling the processor core while in debug mode. All debug-related registers in the Cortex M4 core are accessed via the dedicated AHB access port AP0. The ROM table contains pointers to the base addresses of each debug component visible from the AP. ROM tables are used by some debug tools to automatically detect the topology of the core site infrastructure in the target. The SCS or System Control Space contains the registers for controlling the processor core while in debug mode. The other units are described in the following slides. A data watch point or DWT comparator compares one of the following with the value held in its DWT comp register. A data address, an instruction address, a data value, or the cycle count value for comparator zero only. For address matching, the comparator can use a mask so it can match a range of addresses. On a successful match, the comparator generates one of the following. One or more DWT data trace packets containing one or more of the address of the instruction that caused a data access. An address offset, bits 15 to 0 of the data access address. Or the matched data value, a watch point debug event on either the PC value or the access data address. Or a CMP match and event that signals the match outside the DWT unit. In dual core devices, the Cortex-M4 breakpoint unit, or FPV, also supports flash memory patching. This feature is intended for patching erroneous code by diverting execution to volatile memory at a given address. Software can write directly to any of 32 by 32-bit instrumentation trace macro cell or ITM stimulus registers to generate packets. The permission level for each port can be programmed. When software writes to an enabled stimulus port, the ITM combines the identity of the port, the size of the write access, and the data written into a packet that writes it to a FIFO. The ITM outputs packets from the FIFO onto the trace buzz. Reading a stimulus port register returns the status of the stimulus register empty or pending in bit 0. If multiple sources generate packets at the same time, the ITM arbitrates the order in which packets are output. The sources are listed here in descending order of priority. The timestamp generator, or TSgen, provides a 64-bit common time base for all trace packet timestamps. This allows the trace analyzer to align traces coming from different sources according to the time at which the trace was generated. The local timestamps are not synchronized and can run at different frequencies, making it impossible to know the precise timing at which a trace was generated. Note, the Cortex-M4 only uses the 48 LSBs of the global timestamp. The system trace macro cell, or STM, can be used for instrumentation of software. It is aimed primarily at the Cortex-A7, since the Cortex-M4 includes a simplified software trace unit, the ITM. Nevertheless, the STM is accessible to the Cortex-M4 core as well as the DMA and MDMA engines. It can even be used by the debugger. Accesses to the stimulus ports can be guaranteed or timing invariant. Guaranteed accesses will always generate a trace packet and will stall the access until the STM can accept it. Timing invariant accesses will always terminate immediately, so are less intrusive, but if the STM is not ready to accept the right due to its buffer being full, the data will be discarded and no packet generated. Trace packets always contain the identity of the master that generated them. The STM control registers are accessible via the system debug bus, or APBD. In the STM32MP1 series, the embedded trace macro cell, or ETM, is configured for instruction trace only. In other words, data accesses are not included in the trace information. The ITM, ETM, and STM generate trace streams, which are combined using the trace funnel. Some funnel parameters can be modified. For example, the number of bytes received on one input before switching to another. The less switching that occurs, the lower the overhead, but at the cost of increased latency. It is also possible to filter trace. For example, the ITM trace can be removed from the TPIU. It can be output on the SWO instead. Trace from the ITM, not the ETM, can be directed to the single wire trace port. In dual core devices, the ITM trace from both cores. In dual core devices, the ITM trace from both cores can be directed to the SWO and combined in the SWO trace funnel. However, since there is no formatting in the SWO, it is not possible for a trace port analyzer to separate the trace streams. Therefore, it is recommended that the funnel be used to manually select one ITM at a time for output on the SWO. If both are needed, the TPIU should be used. The ETF can be used as a trace buffer for storing traces on chip. The trace can be read by software or by the debugger or flushed via the trace port. If configured as a circular buffer, the trace will be stored continuously, so the most recent trace will overwrite the oldest. Alternatively, the FIFO full flag can be used to stop a trace when the buffer is full and hence capture a trace at a particular point in time. The ETF also acts in hardware mode to smooth the flow of trace to the TPIU. Since the trace stream tends to be bursty in nature and the instantaneous bandwidth is much higher than that of the trace port, the buffer absorbs the peaks and regulates the flow to the trace port's maximum continuous bandwidth. The trace port width can be programmed from one to four pins. The bandwidth scales proportionally to the number of pins and the trace CLK frequency selectable via a divider in the RCC. Full dual core instruction trace at maximum CLK frequency is likely to require the maximum bandwidth. By applying filters and triggers to the trace sources, ETM notably, the average amount of trace data can be reduced, allowing a lower CLK rate or reduced number of pins. The trace SWO pin is multiplexed with the JTDO signal, which is part of the JTAG interface. Hence, single wire trace is only available when the serial wire debug or SWD interface is enabled. Cross triggering can be used in multi-core devices to halt both cores simultaneously. When one core hits a breakpoint, its halted output, indicating it has entered debug mode, propagates to the other core, and causes it to enter debug mode as well. Similarly, both cores can restart simultaneously. The cross trigger feature can also be used to halt the processor with an external trigger signal. This might be an edge on one of the IO pins. There is a cross trigger interface, or CTI, for each of the processor cores, as well as a system CTI connected to the trace components, ETF, TPIU and STM, and external trigger signals. To use any of the cross trigger features, the CTIs must be programmed accordingly by the debugger. The required trigger input signals, or trig in-n, and trigger output signals, or trig out-n, need to be connected to the cross trigger matrix, or CTM. The CTM comprises up to four channels, allowing four different events to be propagated in parallel. Trigger inputs can be combined in the CTI so that any one of the combined inputs will cause an event on the connected channel. Similarly, a channel can be connected to several trigger outputs so that one event can trigger multiple actions. The DBGMCU is located on the debug APB bus and can be accessed by the debugger via the APB access port AP2. It is also accessible by the processors in the debug APB address space. The DBGMCU IDC register provides the device ID and revision codes in STM32 standard format. The information is also available in the debug port, or DP target ID register, accessible only to an external debugger, and in the system debug ROM table registers, or SysROM PIDR 2 to 0, accessible also by software. Low power mode emulation means that the debugger connection is not lost when entering low power mode. It eliminates the need to replace the low power entry command, for example WFI WFE, by a while open close parentheses loop. On exit, the device is in the same state as if the emulation was not active, apart from many changes made by the debugger during the low power mode emulation. Peripheral clock freeze is particularly useful to prevent a watchdog timeout from resetting the device while debugging, without having to rearm the watchdog with the debugger. It also allows timer values to be inspected and corresponding interrupts to be suspended until normal operation is resumed. The debug clock enable bits ensure that the debug blocks are only clocked when needed. This avoids unnecessary power consumption since apart from the ADP, all blocks are clocked with the ungated domain clock. On certain packages, the TRG in and TRG out pins are not available, only the bi-directional pin is used, and the direction must be chosen using the TRG OEN bit.