 Hello, and welcome to this presentation of the STM32 Debug and Trace interface. It covers the Debug and Trace capabilities offered by STM32 H7 devices. The STM32 H7 incorporates all the familiar debug capabilities provided by the STM32 family of MCUs. Flash download, breakpoint debugging, register and memory view, and serial wire trace, and adds high bandwidth instruction trace as well as cross-triggering capabilities in multi-core versions of the STM32 H7 family. The Debug and Trace infrastructure uses the ARM CoreSight standard, well supported by most tool providers. The Debug and Trace infrastructure is composed of four distinct functional domains. Debug access infrastructure includes the Debug port, or SWJDP, and access ports, or APs, which allow access by an external debugger to the target's trace-debug features. Trace interface includes the Serial, or SWO, and Parallel, or TPIU trace ports. The trace FIFO, or ETF, used for smoothing the trace flow, and the trace funnels which combine the trace from each source into a single flow. Cortex-M7 Core includes the processor and associated trace and debug units, DWT, FPB, ITM, and ETM. And Cortex-M4 Core, dual-course devices only. In addition, there are system debug features, including cross-trigger interfaces and matrix, or CTI and CTM. These allow simultaneous halting of both cores, triggering of trace, etc. Global Timestamp Generator provides a common time reference for the different trace sources. DBGMCU provides proprietary features such as freezing of timers during debug. And External Trigger Input or Output allows an external signal to trigger debug or trace, or generates a trigger pulse for synchronizing external equipment or components. The minimum configuration for debug requires PINs PA-13 and PA-14 to be allocated to Serial Wire Debug. SWDIO and SWCLK respectively. Serial Wire Debug uses a special serial code driven by the debugger on the SWDIO or JTMS input. This is recognized by the SWJDP, which switches to SWD mode after reset JTAG mode is configured by default. STLink and most third-party debug adapters, for example, ULink, support Serial Wire Debug. AP0 allows access to the debug and trace features integrated in the Cortex-M7 processor core via an AHB light bus connected to the AHBD port of the processor. AP1 allows access to the AHB bus matrix in the D3 domain. This gives visibility of the D3 domain memory and peripherals when the D1 and D2 domains are switched off. AP2 allows access to the debug and trace features on the system APV debug bus, that is all components not included in one of the processor cores. AP3 dual core devices only allows access to the debug and trace features integrated in the Cortex-M4 processor core via its internal AHB bus. All debug-related registers in the Cortex-M4 core are accessed via the dedicated AHB access port AP0. The ROM tables contain pointers to the base addresses of each debug component visible from the AP. They are used by some debug tools to automatically detect the topology of the core site infrastructure in the target. The SCS or system control space contains the registers for controlling the processor core while in debug mode. The other units are described in the following slides. A data watch point or DWT comparator compares one of the following with the value held in its DWT comp register. A data address, an instruction address, a data value or the cycle count value for comparator zero only. For address matching, the comparator can use a mask so it can match a range of addresses. On a successful match, the comparator generates one of the following. One or more DWT data trace packets containing one or more of the address of the instruction that caused a data access. An address offset bits 15 to zero of the data access address or the matched data value. A watch point debug event on either the PC value or the access data address. Or a CMP match and event that signals the match outside the DWT unit. In dual core devices, the Cortex M4 breakpoint unit or FPV also supports flash memory patching. This feature is intended for patching erroneous code by diverting execution to volatile memory at a given address. In the Cortex M7, flash memory patching is not supported by the FPV. Software can write directly to any of 32 by 32 bit instrumentation trace macro cell or ITM stimulus registers to generate packets. The permission level for each port can be programmed. When software writes to an enabled stimulus port, the ITM combines the identity of the port, the size of the write access and the data written into a packet that writes it to a FIFO. The ITM outputs packets from the FIFO onto the trace buzz. Reading a stimulus port register returns the status of the stimulus register empty or pending in bit zero. If multiple sources generate packets at the same time, the ITM arbitrates the order in which packets are output. The sources are listed here in descending order of priority. The timestamp generator or TSGEN provides a 64 bit common time base for all trace packet timestamps. This allows the trace analyzer to align traces coming from different sources according to the time at which the trace was generated. The local timestamps are not synchronized and can run at different frequencies, making it impossible to know the precise timing at which a trace was generated. Note, the Cortex M4 only uses the 48 LSBs of the global timestamp. The Cortex M7 uses all 64 bits. In the STM32H7, the embedded trace macrocell or ETM is configured for instruction trace only. In other words, data accesses are not included in the trace information. Note that the ETM in the Cortex M7 is quite different from the one in the Cortex M3 or M4. The ITM and ETM both generate trace streams which are combined using the trace funnel. Some funnel parameters can be modified. For example, the number of bytes received on one input before switching to another. The less switching that occurs, the lower the overhead, but at the cost of increased latency. It is also possible to filter trace. For example, the ITM trace can be removed from the TPIU. It can be output on the SWO instead. Trace from the ITM, not the ETM, can be directed to the single wire trace port. In dual core devices, the ITM trace from both cores in dual core devices, the ITM trace from both cores can be directed to the SWO and combined in the SWO trace funnel. However, since there is no formatting in the SWO, it is not possible for a trace port analyzer to separate the trace streams. Therefore, it is recommended that the funnel be used to manually select one ITM at a time for output on the SWO. If both are needed, the TPIU should be used. The ETF can be used as a trace buffer for storing traces on chip. The trace can be read by software or by the debugger, or flushed via the trace port. If configured as a circular buffer, the trace will be stored continuously, so the most recent trace will overwrite the oldest. Alternatively, the FIFO full flag can be used to stop a trace when the buffer is full. And hence, capture a trace at a particular point in time. The ETF also acts in hardware mode to smooth the flow of trace to the TPIU. Since the trace stream tends to be bursty in nature and the instantaneous bandwidth is much higher than that of the trace port, the buffer absorbs the peaks and regulates the flow to the trace port's maximum continuous bandwidth. The trace port width can be programmed from one to four pins. The bandwidth scales proportionally to the number of pins and the trace CLK frequency, selectable via a divider in the RCC. Full dual-core instruction trace at maximum clock frequency is likely to require the maximum bandwidth. By applying filters and triggers to the trace sources, ETM notably, the average amount of trace data can be reduced, allowing a lower clock rate or reduced number of pins. The trace SWO pin is multiplexed with the JTDO signal, which is part of the JTAG interface. Hence, single wire trace is only available when the serial wire debug or SWD interface is enabled. Cross-triggering can be used in dual-core devices to halt both cores simultaneously. When one core hits a breakpoint, its halted output, indicating it has entered debug mode, propagates to the other core and causes it to enter debug as well. Similarly, both cores can restart simultaneously. The cross-trigger feature can also be used to halt the processor with an external trigger signal. This might be an edge on one of the IO pins. There is a cross-trigger interface dedicated to each of the Cortex-M processors, as well as a system CTI connected to the trace components, ETF or TPIU, and external trigger signals. To use any of the cross-trigger features, the CTIs must be programmed accordingly by the debugger. The required trigger input signals, or TrigInN and TriggerOutput signals, or TrigOutN, need to be connected to the cross-trigger matrix, or CTM. The CTM comprises up to four channels, allowing four different events to be propagated in parallel. Trigger inputs can be combined in the CTI so that any one of the combined inputs will cause an event on the connected channel. Similarly, a channel can be connected to several trigger outputs so that one event can trigger multiple actions. The DBGMCU is located on the debug APB bus and can be accessed by the debugger via the APB access port AP2. It is also accessible by the processors in the debug APB address space. The DBGMCU IDC register provides the device ID and revision codes in STM32 standard format. The information is also available in the debug port or DP target ID register, accessible only to an external debugger and in the system debug ROM table registers or CISROM PIDR 2 to 0, accessible also by software. Low power mode emulation means that the debugger connection is not lost when entering low power mode. It eliminates the need to replace the low power entry command, for example WFI WFE, by a while open close parentheses loop. On exit, the device is in the same state as if the emulation was not active, apart from any changes made by the debugger during the low power mode emulation. Peripheral clock freeze is particularly useful to prevent a watchdog timeout from resetting the device while debugging, without having to rearm the watchdog with the debugger. It also allows timer values to be inspected and corresponding interrupts to be suspended until normal operation is resumed. The debug clock enable bits ensure that the debug blocks are only clocked when needed. This avoids unnecessary power consumption since apart from the ADP, all blocks are clocked with the ungated domain clock. On certain packages the TRG in and TRG out pins are not available, only the bi-directional pin is used and the direction must be chosen using the TRG OEN bit.