Understanding LLD Maps: Where Your Code Actually Lives

You wrote 200 lines of Rust for a Kubernetes monitoring agent. Clean code, minimal dependencies. cargo build --release produces a 47MB binary.

Or worse: you spent days writing an eBPF program to monitor GPU usage. The kernel verifier rejects it with "program too complex" and you have no idea which part is too large.

Without understanding how compilation and linking work, you're stuck guessing. You try removing random dependencies or simplifying code without knowing what actually matters. This is where linker maps become essential.

A linker map (LLD map when using LLVM's linker) is a detailed blueprint showing exactly where every function, variable, and dependency ended up in your binary. It answers questions like:

Why is my binary 47MB when I wrote so little code?
Which dependency is pulling in megabytes of unused code?
How large is each function in my eBPF program?
What sections exist in my binary and how much space do they take?

This isn't academic. When you're building systems software—especially eBPF programs that run in the kernel with strict size limits, or embedded systems with tight memory constraints—understanding where your code lives stops being optional.

How Your Code Becomes an Executable (The compilation pipeline)

Most developers have a vague mental model: "the compiler turns code into a program." The reality has distinct stages, and understanding them matters.

Source Code (.rs, .c files)
        ↓
    COMPILER (rustc, clang)
    "Translate to machine code"
        ↓
Object Files (.o, .rlib)
    "Machine code chunks with unresolved references"
        ↓
     LINKER (lld, ld)
    "Connect all the chunks, assign addresses"
        ↓
  Executable Binary
    "Ready to run"

The compiler doesn't produce a runnable program. It produces object files—chunks of machine code where function calls and variable references are still placeholders marked "address unknown, figure this out later."

Object Files: Pre-assembled Building Blocks

Take this simple Rust code:

// main.rs
fn main() {
    let metrics = calculate_gpu_metrics();
    println!("GPU usage: {}%", metrics);
}

fn calculate_gpu_metrics() -> u64 {
    // This calls a function from another file
    query_nvidia_driver()
}

After compilation, main.o contains machine code, but with unresolved references:

Machine Code:
  main:
    [x86-64 instructions]
    call <address?>              ← calculate_gpu_metrics (where is it?)
    call <address?>              ← println! (where is it?)

  calculate_gpu_metrics:
    [x86-64 instructions]
    call <address?>              ← query_nvidia_driver (where is it?)

Symbol Table:
  Defined:
    main                         → offset 0x0000
    calculate_gpu_metrics        → offset 0x0050

  Undefined (need to find):
    query_nvidia_driver
    println!
    [dozens of std functions]

The object file says "call this function" but doesn't know where that function lives. That's the linker's job.

What the Linker Actually Does

The linker takes all your object files and libraries, resolves every undefined reference, assigns final memory addresses, and produces an executable:

Linker Input:
├─ main.o
│  Defines: main, calculate_gpu_metrics
│  Needs: query_nvidia_driver, println!
│
├─ nvidia.o
│  Defines: query_nvidia_driver
│  Needs: cuda_driver_api_init
│
├─ libstd.rlib (Rust standard library)
│  Defines: println!, Vec::new, String::from, etc.
│  Needs: malloc, free, memcpy (from libc)
│
└─ libc.a
   Defines: malloc, free, memcpy, open, read, etc.

Linker Process:
1. Build complete symbol table (all definitions)
2. Resolve undefined references
3. Assign final addresses
4. Remove unused code (dead code elimination)
5. Merge duplicate functions
6. Organize into sections (.text, .data, .bss)
7. Write executable

Output:
└─ orb8 (executable)
   ├─ main                     at 0x401000
   ├─ calculate_gpu_metrics    at 0x401050
   ├─ query_nvidia_driver      at 0x402000
   ├─ println!                 at 0x450000
   └─ [thousands more symbols with addresses]

The linker also eliminates dead code. If you depend on a library but only use 3 of its 500 functions, the linker (with the right flags) can discard the unused 497. This is why understanding linking matters for binary size.

Understanding LLD Maps (Your binary's blueprint)

When you tell the linker to generate a map file, it writes a detailed report showing exactly what it did. Here's what a real LLD map contains:

Address          Size      File                    Symbol
0x401000         0x4a20    orb8.o                  main
0x405a20         0x1850    orb8.o                  calculate_gpu_metrics
0x4072b0         0x920     orb8.o                  query_nvidia_driver
0x407bd0         0x350     orb8.o                  parse_command_line_args
0x408000         0x15000   libstd.rlib             std::io::stdio::print
0x41d000         0x8000    libk8s_openapi.rlib     k8s_openapi::api::core::v1::Pod
0x425000         0x2000    libc.a                  malloc
...

Memory Map:
Section .text (executable code):
  Start: 0x401000
  Size:  0x89000 (548 KB)

Section .rodata (read-only data - strings, constants):
  Start: 0x48a000
  Size:  0x12000 (72 KB)

Section .data (initialized global variables):
  Start: 0x49d000
  Size:  0x3000 (12 KB)

Section .bss (uninitialized globals - zeroed at startup):
  Start: 0x4a0000
  Size:  0x1000 (4 KB)

Total executable size: 636 KB

This tells you:

Exactly which file contributed which symbol
How much space each function takes
Where everything lives in memory
Which sections are large and why

Generating an LLD Map

For Rust:

# Add to .cargo/config.toml
[target.x86_64-unknown-linux-gnu]
rustflags = ["-C", "link-arg=-Wl,-Map=output.map"]

# Or invoke directly
cargo rustc --release -- -C link-arg=-Wl,-Map=output.map

# Find the map at: target/release/output.map

For C/C++ with Clang:

clang -O2 main.c -o program -Wl,-Map=output.map

The map file is plain text. You can grep for symbols, sort by size, or analyze which dependencies contribute most to binary size.

Memory Sections: Where Different Data Lives (Why .text, .data, and .bss matter)

Your executable isn't just a blob of bytes. It's organized into sections with different properties and permissions.

Executable File Layout:
┌─────────────────────────────────────┐
│  .text (Code)                        │  ← All your functions
│  Size: 500 KB                        │     Read-only, Executable
│                                      │
│  main()                              │
│  calculate_gpu_metrics()             │
│  thousands of other functions        │
├─────────────────────────────────────┤
│  .rodata (Read-only data)            │  ← String literals, constants
│  Size: 80 KB                         │     Read-only, Not executable
│                                      │
│  "Starting orb8..."                  │
│  "Error: GPU not found"              │
│  [constant arrays, lookup tables]    │
├─────────────────────────────────────┤
│  .data (Initialized globals)         │  ← Static variables with values
│  Size: 12 KB                         │     Read-Write, Not executable
│                                      │
│  static CONFIG: Config = {...};      │
│  static mut COUNTER: u64 = 0;        │
├─────────────────────────────────────┤
│  .bss (Uninitialized globals)        │  ← Static variables without values
│  Size: 4 KB                          │     Read-Write, Not executable
│                                      │     (doesn't take file space!)
│  static mut BUFFER: [u8; 4096];      │
└─────────────────────────────────────┘

At runtime, the OS maps these sections to memory with specific permissions:

Runtime Memory Layout:
┌─────────────────────────────────────┐  Address    Permissions
│  .text                               │  0x400000   r-x (read, execute)
│  → Code lives here                   │             Cannot modify code
│  → CPU executes from here            │
├─────────────────────────────────────┤  0x490000   r-- (read only)
│  .rodata                             │             Cannot modify or execute
│  → String literals                   │
│  → Constant data                     │
├─────────────────────────────────────┤  0x4a0000   rw- (read, write)
│  .data                               │             Can modify, not execute
│  → Initialized globals               │
├─────────────────────────────────────┤  0x4b0000   rw- (read, write)
│  .bss                                │             Can modify, not execute
│  → Uninitialized globals             │             Zeroed at startup
├─────────────────────────────────────┤
│  Heap (grows up →)                   │  Dynamic allocations
│  → malloc, Vec::new, Box            │
│                                      │
│  Stack (grows down ←)                │  Local variables
│  → Function calls, parameters        │
└─────────────────────────────────────┘

Why Sections Matter

.text (code) - Read-only and executable at runtime. Any attempt to modify code triggers a segmentation fault. This is a security feature preventing code injection attacks. If you try to write to this section, the OS kills your process.

.rodata (read-only data) - String literals like "Error: connection failed" and constant arrays. Marked read-only at runtime. Trying to modify this data crashes your program. Can be shared between multiple processes running the same executable (memory savings).

.data (initialized globals) - Variables with explicit initial values. Takes space in the binary file because those initial values must be stored somewhere. Loaded into memory and marked read-write at program startup.

.bss (uninitialized globals) - Variables without initial values (or initialized to zero). This is where things get clever: the linker doesn't store zeros in your binary file. It just records "allocate 4KB of zeroed memory here." The kernel provides zeroed memory at runtime. This is why declaring static mut BUFFER: [u8; 1048576] doesn't increase your binary size by 1MB.

Real Debugging: When LLD Maps Matter (Practical scenarios)

Scenario 1: The 47MB Binary Problem

You're building orb8, a Kubernetes monitoring agent. The code is straightforward:

use k8s_openapi::api::core::v1::Pod;
use libbpf_rs::Program;
use prometheus::Registry;

fn main() {
    let ebpf = Program::load("gpu_monitor.o")?;
    let k8s = kube::Client::try_default()?;
    let metrics = Registry::new();

    monitor_cluster(ebpf, k8s, metrics)?;
}

You compile:

$ cargo build --release
$ ls -lh target/release/orb8
-rwxr-xr-x  1 user  staff   47M Nov 12 10:30 orb8

47MB for 200 lines of code. Something pulled in the world. What?

Generate the LLD map:

$ cargo rustc --release -- -C link-arg=-Wl,-Map=orb8.map

$ grep "\.rlib\|\.a" orb8.map | \
  awk '{print $2, $3}' | \
  sort -k1 -rn | \
  head -15

24M  libk8s_openapi.rlib
12M  libhyper.rlib
 6M  libtokio.rlib
 4M  libserde_json.rlib
 2M  libbrotli.rlib
 1M  libprometheus.rlib
800K liblibbpf_rs.rlib
500K libstd.rlib
...

The culprit: k8s_openapi contributes 24MB. This crate includes generated code for the entire Kubernetes API—every resource type, every version. You're only querying Pods and Nodes, but you got DaemonSets, StatefulSets, CRDs, everything.

The fix:

# Before: pulls entire k8s API
[dependencies]
k8s-openapi = { version = "0.20", features = ["v1_28"] }

# After: minimal client, only types you use
[dependencies]
kube = { version = "0.87", default-features = false, features = ["client", "rustls-tls"] }
# Generate only Pod and Node types, not the entire API
k8s-openapi = { version = "0.20", default-features = false, features = ["api"] }

Rebuild:

$ cargo build --release
$ ls -lh target/release/orb8
-rwxr-xr-x  1 user  staff   12M Nov 12 11:15 orb8

Binary size: 47MB → 12MB. Without the LLD map, you'd be guessing which dependency was the problem.

Scenario 2: eBPF Program Too Large

You wrote an eBPF program to monitor GPU memory allocations:

#include <linux/bpf.h>

struct gpu_metrics {
    __u64 allocated_bytes;
    __u64 freed_bytes;
    __u64 allocation_count;
};

struct {
    __uint(type, BPF_MAP_TYPE_HASH);
    __uint(max_entries, 10000);
    __type(key, __u32);
    __type(value, struct gpu_metrics);
} gpu_stats SEC(".maps");

SEC("uprobe/libcuda:cuMemAlloc_v2")
int trace_gpu_alloc(struct pt_regs *ctx) {
    __u32 pid = bpf_get_current_pid_tgid() >> 32;
    size_t size = PT_REGS_PARM2(ctx);

    struct gpu_metrics *metrics = bpf_map_lookup_elem(&gpu_stats, &pid);
    if (metrics) {
        metrics->allocated_bytes += size;
        metrics->allocation_count += 1;
        update_histogram(metrics, size);  // External helper function
        check_threshold_alerts(metrics);  // Another helper
        log_to_ring_buffer(metrics);      // More helpers
    } else {
        struct gpu_metrics init = {
            .allocated_bytes = size,
            .freed_bytes = 0,
            .allocation_count = 1
        };
        bpf_map_update_elem(&gpu_stats, &pid, &init, BPF_NOEXIST);
        initialize_histograms(pid);
    }

    return 0;
}

// Plus 5 more helper functions...

You try to load it:

$ sudo bpftool prog load gpu_monitor.o /sys/fs/bpf/gpu_mon
libbpf: prog 'trace_gpu_alloc': BPF program is too large. Processed 1500 insns
Error: failed to load program

The eBPF verifier has complexity limits. Your program exceeded them. Which part is too large?

Check the object file:

$ llvm-objdump -d gpu_monitor.o

gpu_monitor.o:  file format elf64-bpf

Disassembly of section uprobe/libcuda:

0000000000000000 <trace_gpu_alloc>:
       0:  bf 16 00 00 00 00 00 00   r6 = r1
       8:  85 00 00 00 0e 00 00 00   call 14
      ...
    1200:  95 00 00 00 00 00 00 00   exit

Section sizes:
  trace_gpu_alloc:          950 bytes (~120 instructions)
  update_histogram:         640 bytes (~80 instructions)
  check_threshold_alerts:   560 bytes (~70 instructions)
  log_to_ring_buffer:       480 bytes (~60 instructions)
  initialize_histograms:    720 bytes (~90 instructions)

Total: ~420 instructions across multiple functions. Each function call adds overhead (saving state, jumping, restoring state). The verifier sees complexity in the call graph.

The fix: Inline and simplify.

SEC("uprobe/libcuda:cuMemAlloc_v2")
int trace_gpu_alloc(struct pt_regs *ctx) {
    __u32 pid = bpf_get_current_pid_tgid() >> 32;
    size_t size = PT_REGS_PARM2(ctx);

    // Just track allocations, skip histograms and alerts
    __u64 *allocated = bpf_map_lookup_elem(&gpu_bytes, &pid);
    if (allocated) {
        __sync_fetch_and_add(allocated, size);
    } else {
        bpf_map_update_elem(&gpu_bytes, &pid, &size, BPF_NOEXIST);
    }

    // Move histogram and alert logic to userspace
    return 0;
}

Result: ~45 instructions. Loads successfully. The advanced analytics move to your Rust agent where there are no size limits.

Scenario 3: Embedded System Memory Constraints

You're porting orb8 to run on an ARM-based edge device with 64MB of RAM. The device already runs several services. Your monitoring agent can't exceed 8MB total memory.

Current state:

$ file target/aarch64-unknown-linux-gnu/release/orb8
orb8: ELF 64-bit LSB executable, ARM aarch64

$ size target/aarch64-unknown-linux-gnu/release/orb8
   text    data     bss     dec     hex filename
8234567  123456   89012 8447035  80e2ab orb8

That's 8.2MB just for .text (code). Plus .data and .bss. You're already over budget before runtime allocations.

Analyze the map:

$ cargo rustc --release --target aarch64-unknown-linux-gnu -- \
  -C link-arg=-Wl,-Map=orb8-arm.map

$ grep "^\.text" orb8-arm.map -A 100 | grep "0x" | \
  awk '{print $3, $2}' | sort -rn | head -20

2.1M  libhyper.rlib          # HTTP client for Kubernetes API
1.8M  libtokio.rlib          # Async runtime
1.2M  libk8s_openapi.rlib    # Kubernetes types
0.9M  libserde.rlib          # JSON serialization
0.8M  libtls.rlib            # HTTPS support
...

Decisions based on the map:

Remove Kubernetes API client - Run orb8 as a pure eBPF agent, export metrics locally. Use a separate lightweight forwarder to send to Kubernetes. Saves: 5MB
Static link musl instead of glibc - Smaller C library. Saves: 800KB
Disable TLS - Edge device talks to local collector only, no HTTPS needed. Saves: 1.2MB

Strip debug symbols - Production doesn't need them:

cargo build --release --target aarch64-unknown-linux-gnu
aarch64-linux-gnu-strip target/aarch64-unknown-linux-gnu/release/orb8

Saves: 1.5MB

Final binary: 2.8MB. Fits comfortably in 8MB budget with room for runtime allocations.

Without the map, you'd be randomly trying default-features = false on dependencies and hoping something helps.

How eBPF Linking Differs (The kernel does its own verification)

eBPF programs don't go through normal linking. They use a special load-time verification process:

Normal Program:
Source → Compile → Link → Executable → CPU runs it

eBPF Program:
Source → Compile → Object → Load to kernel → Verify → JIT compile → Run

When you compile an eBPF program, you get an ELF object file with special sections:

$ llvm-objdump -h gpu_monitor.o

Sections:
Idx Name              Size     Address  Type
  0 .text             00000000 00000000 CODE
  1 uprobe/libcuda    000004b0 00000000 CODE    ← Your program
  2 .maps             00000020 00000000 DATA    ← Map definitions
  3 .BTF              00000800 00000000 DATA    ← Type information
  4 .BTF.ext          00000200 00000000 DATA    ← Extended types

Your Rust agent loads this at runtime:

use libbpf_rs::ObjectBuilder;

let mut obj = ObjectBuilder::default()
    .open_file("gpu_monitor.o")?
    .load()?;  // ← Kernel verifier runs here

// Verifier checks:
// ✓ No infinite loops
// ✓ All memory accesses provably safe
// ✓ Program terminates in finite time
// ✓ Stack usage < 512 bytes
// ✓ No out-of-bounds array access

let prog = obj.prog_mut("trace_gpu_alloc").unwrap();
let link = prog.attach()?;

The kernel's verifier is like a linker, but instead of resolving addresses, it's proving safety properties. It analyzes every possible code path and rejects the program if there's even a theoretical possibility of:

Infinite loops
Out-of-bounds memory access
Stack overflow
Unsafe pointer dereference
Calling non-whitelisted kernel functions

If verification passes, the program gets JIT-compiled to native machine code and loaded into the kernel. This happens at runtime, not build time.

The Point (Why this matters)

Understanding linking and memory layout isn't about memorizing ELF section names. It's about having X-ray vision into your binaries when things go wrong.

When your binary is inexplicably large, you generate a map and see exactly which dependency pulled in megabytes of unused code.

When the eBPF verifier rejects your program, you check the disassembly and see which function is too complex.

When you're debugging why your program crashes on certain memory access, you check the map and realize you're trying to write to .rodata (read-only section).

When you're optimizing for embedded systems with tight constraints, the map shows you where every byte went so you can make informed decisions about what to keep and what to cut.

The tools:

# Generate linker map
cargo rustc --release -- -C link-arg=-Wl,-Map=output.map

# Analyze sections
objdump -h binary_name
readelf -S binary_name

# Symbol sizes
nm --size-sort --radix=d binary_name

# Check what's using space
size binary_name

# For eBPF objects
llvm-objdump -d program.o
bpftool prog dump xlated id <prog_id>

Systems programming means understanding the full stack from source code to running process. The linker is the bridge between your code and the executable. The map shows you exactly how that bridge was built.

Most of the time you don't need this knowledge. But when you're building eBPF monitors that run in kernel space with strict limits, or embedded systems with 64MB of RAM, or production services where every megabyte of binary size affects container startup time—this knowledge stops being optional.

The map shows you the territory. Sometimes you need to read it.