C++ Exceptions, Unwind Tables, and ‘relocation truncated to fit: R_ARM_PREL31’

November 22, 2023
No Comments
By NetBurner

Most embedded developers who’ve worked with a Cortex-M4 or newer part with external memory will likely have encountered a linker error at some point telling them ‘relocation truncated to fit: R_ARM_PREL31 against .ARM.extab…’. But what on earth does this cryptic message mean? Is there a solution to it? Why does it only appear in the link stage?

The standard advice is for the user to enable the GCC flag -fno-exceptions, but that doesn’t help if you actually want to use C++ exceptions. So, if that advice is a dead end, how are we to fix the underlying issue? In this post, we’re going to be condensing several rather deep topics to their relevant bits and showing when you can and can’t truly resolve this error.

C++ Exceptions: Throwing, Catching, and Unwinding the universe.

At it’s core, C++ exceptions are what they sound like, a language mechanism for creating alternate paths for handling ‘exceptional’ conditions. In other words, they are a way of writing code without having to place innumerable validation checks everywhere and just assume that things are valid until they aren’t, and the majority of the time, you don’t need to pay the cost until then. The problem is that when you finally hit the exceptional condition, you need to unwind all the calls made up to that point, until you’re back where you should have taken a different path through the code (i.e. the catch block). This is where we first encounter what’s known as the Unwind Table.

As you might imagine, the Unwind Table is a table of instructions that tells a language’s implementations exception unwinding mechanism how to unwind the current state of the world, so that it can undo all the changes it’s made, until it finally reaches a block that will handle the current exception. Generally speaking, the entry format of the Unwind Table is specified as part of an Exception Handling Application Binary Interface, so that (in theory), it is possible to unwind exceptions, regardless of what language was used for different parts of an executable.

Within the Unwind Table, there are many entries, corresponding to specific scopes of execution (function, if/else, try block, etc.), which tell a language how to unwind from that point in execution. To minimize the processing time it takes to unwind and the space required to encode, the entries have multiple forms from maximally compact to maximally expressive, and utilize a virtual machine model with a compact instruction set that is highly focused at register and memory assignments. Which brings us to…

The ARM Exception Handling ABI

In ARM’s case, the Exception Table has two main flavors: the general form and the compact form. To start, all table entries are multiples of the word size, i.e. 32 bits. This means that in order to flag which form an entry is, we need to sacrifice at least on bit to mark the entry format, so for that, we’ll take the top bit of the first word of the entry. This leaves us with 31 bits remaining. Now, how might we indicate the address of the unwind routine to use in the general case, if we’ve already used one of the bits in the first word of the entry? How about an offset from the entry’s address to the unwind routine?

In other words, we can create a Pointer Relative offset to a new memory location. And if we need to refer to a lower address? Let’s make it Signed. So, that’s what we do, we create a Pointer Relative, 31 bit, signed offset value (Aka R_ARM_PREL31) calculated as the difference between the address of the offset and the address that it points to, in this case the ‘personality routine’ used to unwind our execution context. And here we find the core of our issue: 31 bits of offset is not sufficient to cover the entire memory space of the Cortex-M series of processors.

So, why does this 31 bits of offset cause such a headache if it doesn’t crop up all the time? For that, we need to take about hardware Memory Maps.

The Cortex-M Memory Map

Since each Cortex-M core has a slightly different flavor of behavior regarding the default map, we’ll be discussing the Cortex-M7, as that is what I am most familiar with, and where folks seem to be encountering this problem the most.

ARM defines the default Cortex-M7 Memory Map as follows (note the Normal Memory locations):

0x0000_0000-0x1FFF_FFFF: Code, Normal Memory, Typically ROM, Flash, or ITCM RAM
0x2000_0000-0x3FFF_FFFF: SRAM, Normal Memory, SRAM region, usually used for On-Chip SRAM
0x4000_0000-0x6000_FFFF Peripheral, Device Memory, Memory space for On-Chip, Memory Mapped peripherals
0x6000_0000-0x7FFF_FFFF: RAM, Normal Memory, Memory with Write-Back, Write-Allocate cache attribute
0x8000_0000-0x9FFF_FFFF: RAM, Normal Memory, Memory with Write-Through cache attribute
0xA000_0000-0xBFFF_FFFF: Device, Sharable Device Memory, Shared Device Space (think shared memory for coprocessors, GPUs, Accelerators, etc.)
0xC000_0000-0xDFFF_FFFF: Device, Non-Sharable Device Memory, Non-Shared Device Space (no idea what this would be…)
0xE000_0000-0xFFFF_FFFF: System, Strongly-Ordered or Device Memory (varied by block), Contains the Private Peripheral Bus with all Cortex-M standard peripherals

Now, when you think about the historic usage of Cortex-M cores, they have been mostly used for Microcontrollers, with built in Flash memory, or more recently as a heterogeneous Real Time controller within a larger SoC, accessing only a small amount of internal SRAM. In both of these cases, we see that the entirety of the program instruction space would reside within the 0x0000_0000 to 0x3FFF_FFFF region of memory, which is just perfect for our 31-bits of signed offset. Additionally, if we want to only use the External RAM for program code, we find that that all would fit nicely in the 0x6000_0000-0x9FFF_FFFF space, and again is just perfect for our 31-bit signed offset.

The trouble arises at the exact moment we want to use Off-Chip, or External RAM alongside either Flash/ITCM or internal SRAM for the instruction memory. Given that we have to put the unwinding routines somewhere, we can choose to place it in the low order memory of Flash/ITCM or SRAM, or in the high order memory of External RAM. If we choose to place in the low order memory, we find that the 31-bit offset is insufficient to reach the unwinding routine, when the unwinding instructions live in the high order memory, and vis versa. This is the root cause of the relocation truncated to fit: R_ARM_PREL31 against '.ARM.extab...' message and resultant linkage failure. So, now that we know what the error means, what can we do about it?

Symbols: The Ouroborous and Hydra Problem

For starters, we could try to fix this problem by just duplicating our unwind routines, placing one copy in low memory, and one in high memory. However, as we will see shortly, that creates a whole host of additional linkage issues, the primary of which is that of a priori knowledge.

To begin, how might we go about duplicating our code in the resultant executable? We could try to tell the linker to just put specific functions in multiple places in memory. Unfortunately, this is antithetical to the behavior of most linkers, and even if we could force it to create multiple copies, odds are, we wouldn’t be able to convince the linker to use the closer copy when computing the offset necessary, leading to the same issue as before; there has to be a way to achieve our goal.

The short answer is that to deal with our problem there’s three options: one that’s a lot easier, but incompatible with certain hardware; one that’s moderately difficult, but more complicated on the device code and a bit more expensive at runtime; and one that’s modifying compilers and linkers, aka political along with being very, very hard. In our case, we’re going to start with the easy one and work our way up from there.

Linker Magic and Vendor Planning

The simple solution is highly dependent on two things: manipulating a linker to create multiple Unwind Tables and running on hardware that has generally usable External Memory mapped to 0x6000_0000. If your platform does not allow you to place program data or instructions at approximately 0x6000_0000, this solution will not work, and you will need to see the later ones.

Now, what is this solution and how does it work? First, you must define a section in of your linker script that will be used to store the low order memory unwind table, and a second section that will be used to high order memory unwind table. Specifically, if the unwind routines will live in high order memory, the low order table must live above 0x2000_0000, failure to do so will fail to resolve the truncation error. However, if the unwind routines are to live in low order memory, they must instead live above 0x2000_0000, otherwise you will still fail to link.

Additionally, if the unwind routines are placed in high order memory, the must be placed at the start of the 0x6000_0000, such that and routine directly referenced in the low order unwind table is within 0x3FFF_FFFF of the start of the referenced routine. Conversely, if the unwind routines are in low order memory, the unwind routines and the high order memory unwind table must be placed such that the all directly referenced routines are within 0x4000_0000 of the end of the high order unwind table in the 0x6000_0000 block.

Lastly, you must be able to tell the C++ implementation library which unwind table to use for a given instruction context. In the case of GCC using libunwind, that means defining the method unsigned __gnu_Unwind_Find_exidx(unsigned, int *), which takes an instruction address and a return pointer, that you return the start of the unwind table to use and place the table’s length at the pointed address.

And that’s it. Magically, the link error goes away: exceptions can trigger across internal memory back out into external memory, and vice versa, all day long.

Doomed by the Hardware

So what if for whatever reason, we can’t use sufficiently close memory in the 0x6000_0000 region for the easy solution? What can be done without major surgery to the linker software itself? Well, we can cheat and lie to the linker about what the actual memory map of the hardware is. That is, we can craft our linker script to include a fake region that lives at the top of the low order RAM region, but far above the end of the actual available internal SRAM, at something like 0x3F00_0000.

The good news is that this will allow us to at least link. The bad news is that there’s no way to actually use this block in the actual running image. For that, we need to reserve some additional space via the linker script, perform some manipulations of the final executable, and finally implement at least one trampoline function.

First, we will need to use a tool (in my case, objcopy), to change the destination address for the low order table from our region of fake memory, to somewhere in valid low order memory, ideally, RAM (you’ll see why in a moment). Next, we need to add a trampoline/veneer function in low order memory, for every unwind routine that we need to call. Lucky for us, GCC/libstdc++ only use one directly referenced routine: __gxx_personality_v0. So, we can create an assembly function trampoline that looks like:

__gxx_personality_v0_veneer:
  ldr pc, =__gxx_personality_v0
  .ltorg

Now, the Fun part begins. Recall that the values in our unwind tables use a Pointer Relative offset, that depends on the address containing the offset itself. Well, we moved where these tables are located, but haven’t actually modified the tables themselves. Predictably, the resulting addresses they refer to are now wrong! Which means we’ve got more work to do. To correct our table, we need to go through every PREL31 encoded value and calculate what the resulting address would be as if it were still at the original address, before we moved it, then calculate what the PREL31 encoded value should be for that result given where the value is now located, and then store that new value.

But what about those values that reference the unwind routines in high order memory? The new offset is in excess of what can be encoded as PREL31! Oh no, we’ve hit our own truncated to fit issue. Well, remember when we set up our trampoline function? We simply compare all the values that could reference a high order address against the address of the unwind routines. If the original calculated value matches a high order routine, we instead use the corresponding trampoline function as the ‘real’ address to refer to, and calculate the new offset value against that address.

And that’s it! The rest of the owl is how to implement this approach, as there are a couple ways to go about it. The easier way is to use a simple objcopy to move the table destination in the executable, but leave the values as is, and then at startup, quickly recalculate all the offsets in the table. The harder way is to use a different (probably custom) utility to extract the table data, modify it in the manner presented, write the new table back, and move the destination address along the way.

Politics: Software’s Hardest Problem

Given the complex mess the previous solution presents in terms of toolchain and build system complexities, you might ask: isn’t there a better way? And you’d be correct! The correct way for this to work is for the linker to be aware that the offset that it is truncating is a reference to an unwind routine, and that it should create a trampoline to place in a specific section, that it can then refer to instead. This way, we can do away with the whole fake memory space shenanigans, and not worry about how the vendor configured the platform, at it just works. The problem is that adding this feature is a much more difficult task than the kludge described just prior, both in terms of just the pure software aspect and the political nature that such changes represent within the context of a large project (such as GNU LD).

Share this post

Subscribe to our Newsletter

Get monthly updates from our Learn Blog with the latest in IoT and Embedded technology news, trends, tutorial and best practices. Or just opt in for product change notifications.

C++ Exceptions, Unwind Tables, and ‘relocation truncated to fit: R_ARM_PREL31’

C++ Exceptions: Throwing, Catching, and Unwinding the universe.

The ARM Exception Handling ABI

The Cortex-M Memory Map

Symbols: The Ouroborous and Hydra Problem

Linker Magic and Vendor Planning

Doomed by the Hardware

Politics: Software’s Hardest Problem

Related

Share this post

Subscribe to our Newsletter

Recent Posts

C++ Exceptions, Unwind Tables, and ‘relocation truncated to fit: R_ARM_PREL31’

C++ Exceptions: Throwing, Catching, and Unwinding the universe.

The ARM Exception Handling ABI

The Cortex-M Memory Map

Symbols: The Ouroborous and Hydra Problem

Linker Magic and Vendor Planning

Doomed by the Hardware

Politics: Software’s Hardest Problem

Related

Share this post

Subscribe to our Newsletter

Recent Posts

Tags