Skip to content

Debugging a Crash

Your firmware crashed. The LED stopped blinking, the UART went silent, and the target is frozen. This tutorial walks through diagnosing an ARM Cortex-M HardFault using mcjtag’s register and memory tools.

By the end, you will know how to read fault registers, decode the exception stack frame, and identify the instruction that caused the crash.

target_control("halt")

If the CPU is already halted from the fault, this is a no-op. Check with:

target_state()

The response shows state="halted" and the current pc. If the target is sitting in a fault handler, the pc will point somewhere inside that handler’s code — not at the faulting instruction itself. We need the stack frame for that.

read_registers()

Three registers matter most right now:

pc — Where execution stopped. If you are inside a HardFault handler, this is the handler’s code, not the original fault location.

lr — The link register. During an exception, ARM loads a special EXC_RETURN pattern into LR. Common values:

LR valueMeaning
0xFFFFFFF1Return to handler mode, use MSP
0xFFFFFFF9Return to thread mode, use MSP
0xFFFFFFFDReturn to thread mode, use PSP

The bottom bits tell you which stack pointer was in use when the fault occurred, which you need for the next step.

xPSR — The program status register. Bits 0-8 contain the ISR number:

ISR NumberException
0Thread mode (no exception)
2NMI
3HardFault
4MemManage
5BusFault
6UsageFault

ISR=3 confirms you are in a HardFault handler.

When a Cortex-M takes an exception, the hardware pushes 8 registers onto the active stack before entering the handler. This is called the exception stack frame.

First, determine which stack pointer was active. If LR contains 0xFFFFFFFD, the PSP was in use — read the psp register. Otherwise, use msp:

read_registers(names=["msp", "psp"])

Then read 8 words from that stack pointer value:

read_memory("<sp_value>", count=8)

The stack frame layout (lowest address first):

OffsetRegisterWhat it tells you
+0x00r0First argument to the faulting function
+0x04r1Second argument
+0x08r2Third argument
+0x0Cr3Fourth argument
+0x10r12Scratch register
+0x14lrReturn address before the exception (caller of the faulting function)
+0x18pcAddress of the faulting instruction
+0x1CxPSRStatus flags at the time of the fault

The pc at offset +0x18 is the actual instruction that caused the fault. This is the address you want to look up in your .map file or disassembly.

The lr at offset +0x14 tells you who called the function that faulted.

The System Control Block (SCB) contains registers that explain why the fault occurred. If you have an SVD file loaded:

svd_inspect(peripheral="SCB")

Or read them directly by address:

read_memory("0xE000ED28", count=4)

The four words decode as:

AddressRegisterPurpose
0xE000ED28CFSRConfigurable Fault Status Register (UsageFault + BusFault + MemManage)
0xE000ED2CHFSRHardFault Status Register
0xE000ED34MMFARMemManage Fault Address (valid only if CFSR.MMARVALID is set)
0xE000ED38BFARBus Fault Address (valid only if CFSR.BFARVALID is set)

CFSR is the most informative. It is actually three sub-registers packed into 32 bits:

BitsSub-registerFault type
0-7MMFSRMemory management faults (MPU violations, stack overflow)
8-15BFSRBus faults (invalid address on the bus)
16-31UFSRUsage faults (undefined instruction, unaligned access, divide by zero)

HFSR tells you if the HardFault was forced (bit 30) — meaning a configurable fault escalated to HardFault because its handler was disabled or a secondary fault occurred during exception handling.

Once you have the CFSR bits and the faulting pc, the cause usually falls into one of these patterns:

Bus fault at an invalid address (BFSR.PRECISERR + BFAR)

The CPU tried to access a memory address that does not exist or is not mapped. Common causes:

  • Null pointer dereference (BFAR near 0x00000000)
  • Dereferencing a freed or corrupted pointer
  • Accessing a peripheral whose clock is not enabled

Usage fault with UNDEFINSTR (UFSR bit 0)

The CPU fetched something that is not a valid instruction. Common causes:

  • Corrupted function pointer (jumped to data instead of code)
  • Stack overflow overwrote the return address
  • Missing Thumb bit in a branch target (address should be odd for Thumb code)

MemManage fault (MMFSR)

An MPU (Memory Protection Unit) violation occurred. Common causes:

  • Writing to a read-only region
  • Executing code from a non-executable region
  • Stack overflow past the MPU guard region

Forced HardFault (HFSR bit 30)

The original fault was a BusFault, UsageFault, or MemManage, but the corresponding handler was not enabled (the SCB.SHCSR enable bits were clear), so it escalated to HardFault. Check the CFSR to see which underlying fault triggered it.

mcjtag includes a debug_crash prompt that automates this entire workflow. When you invoke it, the LLM client will:

  1. Halt the target
  2. Read pc, lr, sp, and xPSR
  3. Determine the active stack pointer
  4. Read the exception frame
  5. Check the fault status registers
  6. Report the crash location, call chain, and likely cause

This is a good starting point. For complex faults (double faults, stack overflows that corrupted the frame, or faults during interrupt processing), you may need to walk through the steps manually and examine additional context.

  • Hardware Setup — make sure your wiring and OpenOCD config are correct before debugging
  • SVD Register Decoding — decode SCB and other system peripherals with full bitfield names and descriptions
  • Safety Configuration — understand the memory write protections that prevent accidental flash corruption during debugging