Tracking Traps on the MODM7AE70 – Part 2

Microchip Traps

NetBurner’s Smart Trap feature greatly helps with debugging issues by automatically outputting stack and register information during a crash.

In part 1 of this blog post series, we reviewed the basics of reading the trap output and call stack, like understanding the Faulted PC address, task information, and using addr2line to find what memory addresses correspond to which lines of code in your program.

Now we will look at a more complex type of crash that complicates efforts to simply run addr2line and find the problem. We’ll talk about how the memory registers are laid out in a NetBurner ARM device, and even dive into the exception frame to understand what’s happening when things go wrong. These are deep topics that can take a lot of learning and experience to master, but today we hope to offer some lifelines to developers who might be new to these concepts.

The NetBurner MODM7AE70 and SBE70LC products use the Microchip SAME70 and SAMD20 processors to provide high performing and reliable ARM® embedded IoT capabilities for secure automation and industrial IoT applications.

For a very low price point, this ARM® Cortex-M7 embedded system-on-module (SOM) solves the problem of securely network-enabling devices with 10/100 Ethernet, including those requiring SSL/TLS, auto certificate creation, industrial protocols, digital, analog, I2C, SPI, CAN, and much more. The products can also be used as an edge node microcontroller, edge processor or IoT gateway. Microchip® ARM® processors, combined with NetBurner’s acclaimed reliability and ease-of-use, provide a solution you can depend on.
Netburner ARM Cortex M7 embedded Development Kit for IoT product development and industrial automation.

Or, learn more about NetBurner IoT.

First, we’ll look at the example code for this post, pasted below. You’ll notice that we present an option to test a return address corruption crash. Loading the code onto your device and plugging the device’s USB cable into your computer, that options menu should be presented to you via the serial console (MTTTY, CoolTerm, or similar.) We use a MODM7AE70 powered by the Microchip SAM E70 for this example, but any NetBurner ARM device with a v3.x NNDK should work, including the SOMRT1061.

The Code

This example demonstrates what happens when a buffer overflow overwrites the saved return address on the stack. When the function returns, execution attempts to jump to invalid memory, causing a hard fault.

The code intentionally creates a buffer overflow by copying a 70-character string into a 32-byte buffer without bounds checking in the corruptReturnAddress() function. When this function tries to return, the CPU attempts to jump to the corrupted return address, triggering a smart trap.

To run the code, create a new ARM-based NetBurner project (MODM7AE70, SOMRT1061, etc), paste the following into your main.cpp, and run it on your NetBurner device:

#include <predef.h>
#include <stdio.h>
#include <nbrtos.h>
#include <init.h>
#include <smarttrap.h>
#include <string.h>
#include <core_cm7.h>

const char * AppName="SmartTrap";

void corruptReturnAddress()
{
    volatile char intermediateBuffer[24];
    volatile char vulnerableBuffer[32];
    const char* overflowText = "http://www.example.com/THISISTHEFAILINGPATH/Testing/OverflowText.html";

    iprintf("Calling function to force return pointer to stack\r\n");
    OSTimeDly(TICKS_PER_SECOND);

    // Fill intermediate buffer for trace visibility
    for (uint32_t i = 0; i < sizeof(intermediateBuffer); i++) {
        intermediateBuffer[i] = 0xCC;
    }

    // Overflow buffer to corrupt return address (stored after local variables on ARM stack)
    volatile char* overflowPtr = vulnerableBuffer;
    for (uint32_t i = 0; i < strlen(overflowText); i++) {
        overflowPtr[i] = overflowText[i];
    }
}

/* Task wrapper for return address corruption example */
void ReturnAddressCorruptionTask(void* pd)
{
    volatile char taskBuffer[48];  // For trace visibility

    for (uint32_t i = 0; i < sizeof(taskBuffer); i++) {
        taskBuffer[i] = 0xAA;
    }

    corruptReturnAddress();
    OSTimeDly(TICKS_PER_SECOND);
}

void UserMain(void *pd)
{
    init();
    WaitForActiveNetwork();
    EnableSmartTraps();

    iprintf("Application: %s\r\n", AppName);

    while (1)
    {
        iprintf("\nSmart Trap Examples - Select:\r\n");
        iprintf("5: Return Address Corruption\r\n");

        switch(getchar()) {

        case '5':
            {
                iprintf("Loading return address corruption example...\r\n");
                OSTimeDly(TICKS_PER_SECOND);
                static uint32_t ReturnCorruptionTaskStack[256];
                OSTaskCreatewName(ReturnAddressCorruptionTask,
                                 NULL,
                                 &ReturnCorruptionTaskStack[0],
                                 &ReturnCorruptionTaskStack[255],
                                 MAIN_PRIO - 1,
                                 "ReturnCorrupt");

                iprintf("Return address corruption task created. Waiting...\r\n");
                OSTimeDly(TICKS_PER_SECOND * 3);
            }
            break;

        default:
            iprintf("Invalid choice\r\n");
        }
        OSTimeDly(TICKS_PER_SECOND);
    }
}

Understanding the Smart Trap Output

Here’s what the smart trap shows when this crash occurs:

-------------------Trap information-----------------------------
Trap Vector = (04)
MMFSR = 01
FPCAR = 204030F8
xPSR = 60000004
PriMask = 01
FaultMask = 00
BasePri = 00
Faulted PC = 50474E48
-------------------Register information-------------------------
R0 =7001414C R1 =70048FA4 R2 =0000006C R3 =00000000
R4 =4C494146 R5 =00000005 R6 =00000006 R7 =00000007
R8 =00000008 R9 =00000009 R10 =0000000A R11 =0000000B
IP[R12]=00000001 SP[R13]=70048F88 LR[R14]=70006E33 PC[R15]=50474E48
XPSR =61000000
-------------------RTOS information-----------------------------
Priority masking indicates trap from within ISR or CRITICAL RTOS section

Current task prio = 00000031
Current task TCB = 200014A0
This looks like a valid TCB
The current running task is: ReturnCorrupt#31
-------------------Task information-----------------------------
Task | State |Wait| Call Stack
Enet#26|Fifo |0002|70006776,70006E2A,70006B78,700115FC,000004A4
Config Server#28|Semaphore |0228|70006776,70006E2A,70006A90,70004A8C,70004AE8,7003C33A,70034006,000004A4
ReturnCorrupt#31|Running | |50474E48
Main#32|Timer |0028|70006776,70006E2A,700067E8,7002409C,000004A4
Idle#3F|Ready | |7002D240,000004A4

The Key Clue: Faulted PC

The first thing to notice is the Faulted PC value: 50474E48. This doesn’t look like a normal code address. On NetBurner ARM devices, valid code addresses are in the 70xxxxxx range.

When you convert this hex value to ASCII, you get “PGNH” – part of the string “FAILINGPATH” from our overflow text (“INGP”), but in little-endian byte order, which is reversed. The reason it’s “PGNH” instead of “PGNI” is that ARM clears the least significant bit of return addresses (the Thumb mode bit).

This is the smoking gun: the return address has been overwritten with data from our overflow string.

Understanding NetBurner ARM Memory Layout

Before diving deeper into registers, it’s important to understand the memory regions on NetBurner ARM devices:

The 70xxxxxx Region:
This memory region contains both:
Executable code (instructions): Your compiled program code lives here
Process/thread stacks (PSP): Each task’s stack for normal execution

When you see addresses like PC=70006E33 in call stacks, those are code addresses. When you see SP[R13]=70048F88, that’s a task stack address – both in the same 70xxxxxx region.

The 20xxxxxx Region:
This region contains:
Main Stack (MSP): Used by exception handlers and interrupt service routines
System data structures: Like Task Control Blocks (TCBs)

In the example output, notice FPCAR=204030F8 and Current task TCB = 200014A0 – both are in the 20xxxxxx range.

ARM’s Dual Stack Pointers:

ARM Cortex-M processors have two stack pointers that switch automatically:

PSP (Process Stack Pointer): Active during normal task execution, points into the 70xxxxxx region
MSP (Main Stack Pointer): Active during exceptions/interrupts, points into the 20xxxxxx region

What the SP Register Tells You:

The SP (Stack Pointer) register shows which stack was in use when the fault occurred:

SP in 70xxxxxx range → Exception occurred during normal task code (using PSP)
SP in 20xxxxxx range → Exception occurred while already in an ISR or exception handler (using MSP – a nested exception)

In our example, SP[R13]=70048F88 is in the 70xxxxxx range, confirming the crash happened during normal task execution. The smart trap even labels it as “Process Stack Dump” rather than “Main Stack Dump”.

Understanding ARM Registers

The ARM Cortex-M7 has 16 general-purpose registers that are visible in the smart trap output:

R0-R3: Function arguments and return values
R4-R11: General-purpose registers that must be preserved across function calls
R12 (IP): Intra-procedure call scratch register
R13 (SP): Stack pointer – points to the current top of the stack
R14 (LR): Link register – holds the return address when a function is called
R15 (PC): Program counter – points to the currently executing instruction

In our crash, notice that R4 contains 4C494146, which is ASCII “LIAF” – more of our overflowing string, reversed.

The ARM Exception Frame

When a fault occurs on ARM Cortex-M, the processor automatically pushes certain registers onto the stack in a specific order. Understanding this “exception frame” is crucial for debugging stack corruption issues.

The Process Stack Dump shows the actual stack contents at the time of the crash:

-------------------Process Stack Dump----------------------------
70048F68: 7001414C 70048FA4 0000006C 00000000 00000001 70006E33 50474E48 61000000
70048F88: 2F485441 74736554 2F676E69 7265764F 776F6C66 74786554 6D74682E AAAAAA6C

The exception frame layout at byte offsets from the stack pointer (70048F68) is:

00-0C: R0-R3 (function arguments)
10: R12 (swap register)
14: LR (link register when exception triggered)
18: PC (faulted program counter) – the corrupted return address appears here
1C: XPSR (status register)

Looking at offset 0x18 (the 7th word), we can see 50474E48 – our corrupted return address.

Debugging When addr2line Won’t Help

Since the Faulted PC contains garbage (ASCII text instead of a valid code address), running addr2line on it won’t provide any useful information. In these cases, you need to look at other registers and call stacks:

1. Check the LR (Link Register): In our example, LR contains 70006E33, which is a valid code address. Running addr2line on this can help identify where the function was called from. It may not be precise, but it can be a clue.

2. Examine the call stack: The trap output shows call stacks for all tasks. These can provide clues about the execution path leading to the crash.

3. Analyze the stack dump: Look for patterns in the hex dump. Converting values to ASCII by pasting them into a hex editor can reveal strings or data that corrupted the stack.

4. Check register values: Unexpected values in registers (especially readable ASCII like we saw in R4) indicate memory corruption. The more experience you get with debugging these issues, the more patterns you’ll see in terms of what’s expected or weird.

The Fix

The root cause of the crash in our example is unbounded string copying. The fix is to use safe string functions and correct sizes:

strncpy(buffer, source, sizeof(buffer) - 1);
buffer[sizeof(buffer) - 1] = '\0'; // Ensure null termination

You can also check sizes before copying, and handle unexpected sizes appropriately (for example, it may not crash the device copying only the first 32 bytes of a 70-character URL into a buffer, but it will probably not behave as desired.)

Conclusion

Understanding ARM registers and the exception frame layout is essential for debugging complex crashes on NetBurner devices. When the Faulted PC is corrupted, you need to be a detective: examine all the registers, investigate what various hex values actually mean and why, and trace through the exception frame to understand what went wrong.

The smart trap feature provides as much information as we can provide, but it’s up to you to know where to look and how to interpret it. With practice, these debugging skills will help you track down elusive crashes. Any more advanced, and you’ll need to look into attaching the GDB debugger or disassembling your program for more clues.

As always, we love feedback. If you have any comments or thoughts, please feel free to drop them in the comments section below. Alternatively, you can mail us directly at [email protected].

Share this post

Subscribe to our Newsletter

Get monthly updates from our Learn Blog with the latest in IoT and Embedded technology news, trends, tutorial and best practices. Or just opt in for product change notifications.

Leave a Reply