Debugging a Crash

Your firmware crashed. The LED stopped blinking, the UART went silent, and the target is frozen. This tutorial walks through diagnosing an ARM Cortex-M HardFault using mcjtag’s register and memory tools.

By the end, you will know how to read fault registers, decode the exception stack frame, and identify the instruction that caused the crash.

Step 1: Halt the target

target_control("halt")

If the CPU is already halted from the fault, this is a no-op. Check with:

target_state()

The response shows state="halted" and the current pc. If the target is sitting in a fault handler, the pc will point somewhere inside that handler’s code — not at the faulting instruction itself. We need the stack frame for that.

Step 2: Read the fault registers

read_registers()

Three registers matter most right now:

pc — Where execution stopped. If you are inside a HardFault handler, this is the handler’s code, not the original fault location.

lr — The link register. During an exception, ARM loads a special EXC_RETURN pattern into LR. Common values:

LR value	Meaning
`0xFFFFFFF1`	Return to handler mode, use MSP
`0xFFFFFFF9`	Return to thread mode, use MSP
`0xFFFFFFFD`	Return to thread mode, use PSP

The bottom bits tell you which stack pointer was in use when the fault occurred, which you need for the next step.

xPSR — The program status register. Bits 0-8 contain the ISR number:

ISR Number	Exception
0	Thread mode (no exception)
2	NMI
3	HardFault
4	MemManage
5	BusFault
6	UsageFault

ISR=3 confirms you are in a HardFault handler.

Step 3: Decode the exception frame

When a Cortex-M takes an exception, the hardware pushes 8 registers onto the active stack before entering the handler. This is called the exception stack frame.

First, determine which stack pointer was active. If LR contains 0xFFFFFFFD, the PSP was in use — read the psp register. Otherwise, use msp:

read_registers(names=["msp", "psp"])

Then read 8 words from that stack pointer value:

read_memory("<sp_value>", count=8)

The stack frame layout (lowest address first):

Offset	Register	What it tells you
+0x00	r0	First argument to the faulting function
+0x04	r1	Second argument
+0x08	r2	Third argument
+0x0C	r3	Fourth argument
+0x10	r12	Scratch register
+0x14	lr	Return address before the exception (caller of the faulting function)
+0x18	pc	Address of the faulting instruction
+0x1C	xPSR	Status flags at the time of the fault

The pc at offset +0x18 is the actual instruction that caused the fault. This is the address you want to look up in your .map file or disassembly.

The lr at offset +0x14 tells you who called the function that faulted.

Step 4: Check the Fault Status Registers

The System Control Block (SCB) contains registers that explain why the fault occurred. If you have an SVD file loaded:

svd_inspect(peripheral="SCB")

Or read them directly by address:

read_memory("0xE000ED28", count=4)

The four words decode as:

Address	Register	Purpose
`0xE000ED28`	CFSR	Configurable Fault Status Register (UsageFault + BusFault + MemManage)
`0xE000ED2C`	HFSR	HardFault Status Register
`0xE000ED34`	MMFAR	MemManage Fault Address (valid only if CFSR.MMARVALID is set)
`0xE000ED38`	BFAR	Bus Fault Address (valid only if CFSR.BFARVALID is set)

CFSR is the most informative. It is actually three sub-registers packed into 32 bits:

Bits	Sub-register	Fault type
0-7	MMFSR	Memory management faults (MPU violations, stack overflow)
8-15	BFSR	Bus faults (invalid address on the bus)
16-31	UFSR	Usage faults (undefined instruction, unaligned access, divide by zero)

HFSR tells you if the HardFault was forced (bit 30) — meaning a configurable fault escalated to HardFault because its handler was disabled or a secondary fault occurred during exception handling.

Step 5: Common fault causes

Once you have the CFSR bits and the faulting pc, the cause usually falls into one of these patterns:

Bus fault at an invalid address (BFSR.PRECISERR + BFAR)

The CPU tried to access a memory address that does not exist or is not mapped. Common causes:

Null pointer dereference (BFAR near 0x00000000)
Dereferencing a freed or corrupted pointer
Accessing a peripheral whose clock is not enabled

Usage fault with UNDEFINSTR (UFSR bit 0)

The CPU fetched something that is not a valid instruction. Common causes:

Corrupted function pointer (jumped to data instead of code)
Stack overflow overwrote the return address
Missing Thumb bit in a branch target (address should be odd for Thumb code)

MemManage fault (MMFSR)

An MPU (Memory Protection Unit) violation occurred. Common causes:

Writing to a read-only region
Executing code from a non-executable region
Stack overflow past the MPU guard region

Forced HardFault (HFSR bit 30)

The original fault was a BusFault, UsageFault, or MemManage, but the corresponding handler was not enabled (the SCB.SHCSR enable bits were clear), so it escalated to HardFault. Check the CFSR to see which underlying fault triggered it.

Step 6: Using the debug_crash prompt

mcjtag includes a debug_crash prompt that automates this entire workflow. When you invoke it, the LLM client will:

Halt the target
Read pc, lr, sp, and xPSR
Determine the active stack pointer
Read the exception frame
Check the fault status registers
Report the crash location, call chain, and likely cause

This is a good starting point. For complex faults (double faults, stack overflows that corrupted the frame, or faults during interrupt processing), you may need to walk through the steps manually and examine additional context.

Next steps

Hardware Setup — make sure your wiring and OpenOCD config are correct before debugging
SVD Register Decoding — decode SCB and other system peripherals with full bitfield names and descriptions
Safety Configuration — understand the memory write protections that prevent accidental flash corruption during debugging