Understanding LLD Maps: Where Your Code Actually Lives
You wrote 200 lines of Rust for a Kubernetes monitoring agent. Clean code, minimal dependencies. cargo build --release produces a 47MB binary.
Or worse: you spent days writing an eBPF program to monitor GPU usage. The kernel verifier rejects it with "program too complex" and you have no idea which part is too large.
Without understanding how compilation and linking work, you're stuck guessing. You try removing random dependencies or simplifying code without knowing what actually matters. This is where linker maps become essential.
A linker map (LLD map when using LLVM's linker) is a detailed blueprint showing exactly where every function, variable, and dependency ended up in your binary. It answers questions like:
- Why is my binary 47MB when I wrote so little code?
- Which dependency is pulling in megabytes of unused code?
- How large is each function in my eBPF program?
- What sections exist in my binary and how much space do they take?
This isn't academic. When you're building systems software—especially eBPF programs that run in the kernel with strict size limits, or embedded systems with tight memory constraints—understanding where your code lives stops being optional.
How Your Code Becomes an Executable (The compilation pipeline)
Most developers have a vague mental model: "the compiler turns code into a program." The reality has distinct stages, and understanding them matters.
Source Code (.rs, .c files)
↓
COMPILER (rustc, clang)
"Translate to machine code"
↓
Object Files (.o, .rlib)
"Machine code chunks with unresolved references"
↓
LINKER (lld, ld)
"Connect all the chunks, assign addresses"
↓
Executable Binary
"Ready to run"
The compiler doesn't produce a runnable program. It produces object files—chunks of machine code where function calls and variable references are still placeholders marked "address unknown, figure this out later."
Object Files: Pre-assembled Building Blocks
Take this simple Rust code:
// main.rs
fn main() {
let metrics = calculate_gpu_metrics();
println!("GPU usage: {}%", metrics);
}
fn calculate_gpu_metrics() -> u64 {
// This calls a function from another file
query_nvidia_driver()
}
After compilation, main.o contains machine code, but with unresolved references:
Machine Code:
main:
[x86-64 instructions]
call <address?> ← calculate_gpu_metrics (where is it?)
call <address?> ← println! (where is it?)
calculate_gpu_metrics:
[x86-64 instructions]
call <address?> ← query_nvidia_driver (where is it?)
Symbol Table:
Defined:
main → offset 0x0000
calculate_gpu_metrics → offset 0x0050
Undefined (need to find):
query_nvidia_driver
println!
[dozens of std functions]
The object file says "call this function" but doesn't know where that function lives. That's the linker's job.
What the Linker Actually Does
The linker takes all your object files and libraries, resolves every undefined reference, assigns final memory addresses, and produces an executable:
Linker Input:
├─ main.o
│ Defines: main, calculate_gpu_metrics
│ Needs: query_nvidia_driver, println!
│
├─ nvidia.o
│ Defines: query_nvidia_driver
│ Needs: cuda_driver_api_init
│
├─ libstd.rlib (Rust standard library)
│ Defines: println!, Vec::new, String::from, etc.
│ Needs: malloc, free, memcpy (from libc)
│
└─ libc.a
Defines: malloc, free, memcpy, open, read, etc.
Linker Process:
1. Build complete symbol table (all definitions)
2. Resolve undefined references
3. Assign final addresses
4. Remove unused code (dead code elimination)
5. Merge duplicate functions
6. Organize into sections (.text, .data, .bss)
7. Write executable
Output:
└─ orb8 (executable)
├─ main at 0x401000
├─ calculate_gpu_metrics at 0x401050
├─ query_nvidia_driver at 0x402000
├─ println! at 0x450000
└─ [thousands more symbols with addresses]
The linker also eliminates dead code. If you depend on a library but only use 3 of its 500 functions, the linker (with the right flags) can discard the unused 497. This is why understanding linking matters for binary size.
Understanding LLD Maps (Your binary's blueprint)
When you tell the linker to generate a map file, it writes a detailed report showing exactly what it did. Here's what a real LLD map contains:
Address Size File Symbol
0x401000 0x4a20 orb8.o main
0x405a20 0x1850 orb8.o calculate_gpu_metrics
0x4072b0 0x920 orb8.o query_nvidia_driver
0x407bd0 0x350 orb8.o parse_command_line_args
0x408000 0x15000 libstd.rlib std::io::stdio::print
0x41d000 0x8000 libk8s_openapi.rlib k8s_openapi::api::core::v1::Pod
0x425000 0x2000 libc.a malloc
...
Memory Map:
Section .text (executable code):
Start: 0x401000
Size: 0x89000 (548 KB)
Section .rodata (read-only data - strings, constants):
Start: 0x48a000
Size: 0x12000 (72 KB)
Section .data (initialized global variables):
Start: 0x49d000
Size: 0x3000 (12 KB)
Section .bss (uninitialized globals - zeroed at startup):
Start: 0x4a0000
Size: 0x1000 (4 KB)
Total executable size: 636 KB
This tells you:
- Exactly which file contributed which symbol
- How much space each function takes
- Where everything lives in memory
- Which sections are large and why
Generating an LLD Map
For Rust:
# Add to .cargo/config.toml
[target.x86_64-unknown-linux-gnu]
rustflags = ["-C", "link-arg=-Wl,-Map=output.map"]
# Or invoke directly
cargo rustc --release -- -C link-arg=-Wl,-Map=output.map
# Find the map at: target/release/output.map
For C/C++ with Clang:
clang -O2 main.c -o program -Wl,-Map=output.map
The map file is plain text. You can grep for symbols, sort by size, or analyze which dependencies contribute most to binary size.
Memory Sections: Where Different Data Lives (Why .text, .data, and .bss matter)
Your executable isn't just a blob of bytes. It's organized into sections with different properties and permissions.
Executable File Layout:
┌─────────────────────────────────────┐
│ .text (Code) │ ← All your functions
│ Size: 500 KB │ Read-only, Executable
│ │
│ main() │
│ calculate_gpu_metrics() │
│ thousands of other functions │
├─────────────────────────────────────┤
│ .rodata (Read-only data) │ ← String literals, constants
│ Size: 80 KB │ Read-only, Not executable
│ │
│ "Starting orb8..." │
│ "Error: GPU not found" │
│ [constant arrays, lookup tables] │
├─────────────────────────────────────┤
│ .data (Initialized globals) │ ← Static variables with values
│ Size: 12 KB │ Read-Write, Not executable
│ │
│ static CONFIG: Config = {...}; │
│ static mut COUNTER: u64 = 0; │
├─────────────────────────────────────┤
│ .bss (Uninitialized globals) │ ← Static variables without values
│ Size: 4 KB │ Read-Write, Not executable
│ │ (doesn't take file space!)
│ static mut BUFFER: [u8; 4096]; │
└─────────────────────────────────────┘
At runtime, the OS maps these sections to memory with specific permissions:
Runtime Memory Layout:
┌─────────────────────────────────────┐ Address Permissions
│ .text │ 0x400000 r-x (read, execute)
│ → Code lives here │ Cannot modify code
│ → CPU executes from here │
├─────────────────────────────────────┤ 0x490000 r-- (read only)
│ .rodata │ Cannot modify or execute
│ → String literals │
│ → Constant data │
├─────────────────────────────────────┤ 0x4a0000 rw- (read, write)
│ .data │ Can modify, not execute
│ → Initialized globals │
├─────────────────────────────────────┤ 0x4b0000 rw- (read, write)
│ .bss │ Can modify, not execute
│ → Uninitialized globals │ Zeroed at startup
├─────────────────────────────────────┤
│ Heap (grows up →) │ Dynamic allocations
│ → malloc, Vec::new, Box │
│ │
│ Stack (grows down ←) │ Local variables
│ → Function calls, parameters │
└─────────────────────────────────────┘
Why Sections Matter
.text (code) - Read-only and executable at runtime. Any attempt to modify code triggers a segmentation fault. This is a security feature preventing code injection attacks. If you try to write to this section, the OS kills your process.
.rodata (read-only data) - String literals like "Error: connection failed" and constant arrays. Marked read-only at runtime. Trying to modify this data crashes your program. Can be shared between multiple processes running the same executable (memory savings).
.data (initialized globals) - Variables with explicit initial values. Takes space in the binary file because those initial values must be stored somewhere. Loaded into memory and marked read-write at program startup.
.bss (uninitialized globals) - Variables without initial values (or initialized to zero). This is where things get clever: the linker doesn't store zeros in your binary file. It just records "allocate 4KB of zeroed memory here." The kernel provides zeroed memory at runtime. This is why declaring static mut BUFFER: [u8; 1048576] doesn't increase your binary size by 1MB.
Real Debugging: When LLD Maps Matter (Practical scenarios)
Scenario 1: The 47MB Binary Problem
You're building orb8, a Kubernetes monitoring agent. The code is straightforward:
use k8s_openapi::api::core::v1::Pod;
use libbpf_rs::Program;
use prometheus::Registry;
fn main() {
let ebpf = Program::load("gpu_monitor.o")?;
let k8s = kube::Client::try_default()?;
let metrics = Registry::new();
monitor_cluster(ebpf, k8s, metrics)?;
}
You compile:
$ cargo build --release
$ ls -lh target/release/orb8
-rwxr-xr-x 1 user staff 47M Nov 12 10:30 orb8
47MB for 200 lines of code. Something pulled in the world. What?
Generate the LLD map:
$ cargo rustc --release -- -C link-arg=-Wl,-Map=orb8.map
$ grep "\.rlib\|\.a" orb8.map | \
awk '{print $2, $3}' | \
sort -k1 -rn | \
head -15
24M libk8s_openapi.rlib
12M libhyper.rlib
6M libtokio.rlib
4M libserde_json.rlib
2M libbrotli.rlib
1M libprometheus.rlib
800K liblibbpf_rs.rlib
500K libstd.rlib
...
The culprit: k8s_openapi contributes 24MB. This crate includes generated code for the entire Kubernetes API—every resource type, every version. You're only querying Pods and Nodes, but you got DaemonSets, StatefulSets, CRDs, everything.
The fix:
# Before: pulls entire k8s API
[dependencies]
k8s-openapi = { version = "0.20", features = ["v1_28"] }
# After: minimal client, only types you use
[dependencies]
kube = { version = "0.87", default-features = false, features = ["client", "rustls-tls"] }
# Generate only Pod and Node types, not the entire API
k8s-openapi = { version = "0.20", default-features = false, features = ["api"] }
Rebuild:
$ cargo build --release
$ ls -lh target/release/orb8
-rwxr-xr-x 1 user staff 12M Nov 12 11:15 orb8
Binary size: 47MB → 12MB. Without the LLD map, you'd be guessing which dependency was the problem.
Scenario 2: eBPF Program Too Large
You wrote an eBPF program to monitor GPU memory allocations:
#include <linux/bpf.h>
struct gpu_metrics {
__u64 allocated_bytes;
__u64 freed_bytes;
__u64 allocation_count;
};
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__uint(max_entries, 10000);
__type(key, __u32);
__type(value, struct gpu_metrics);
} gpu_stats SEC(".maps");
SEC("uprobe/libcuda:cuMemAlloc_v2")
int trace_gpu_alloc(struct pt_regs *ctx) {
__u32 pid = bpf_get_current_pid_tgid() >> 32;
size_t size = PT_REGS_PARM2(ctx);
struct gpu_metrics *metrics = bpf_map_lookup_elem(&gpu_stats, &pid);
if (metrics) {
metrics->allocated_bytes += size;
metrics->allocation_count += 1;
update_histogram(metrics, size); // External helper function
check_threshold_alerts(metrics); // Another helper
log_to_ring_buffer(metrics); // More helpers
} else {
struct gpu_metrics init = {
.allocated_bytes = size,
.freed_bytes = 0,
.allocation_count = 1
};
bpf_map_update_elem(&gpu_stats, &pid, &init, BPF_NOEXIST);
initialize_histograms(pid);
}
return 0;
}
// Plus 5 more helper functions...
You try to load it:
$ sudo bpftool prog load gpu_monitor.o /sys/fs/bpf/gpu_mon
libbpf: prog 'trace_gpu_alloc': BPF program is too large. Processed 1500 insns
Error: failed to load program
The eBPF verifier has complexity limits. Your program exceeded them. Which part is too large?
Check the object file:
$ llvm-objdump -d gpu_monitor.o
gpu_monitor.o: file format elf64-bpf
Disassembly of section uprobe/libcuda:
0000000000000000 <trace_gpu_alloc>:
0: bf 16 00 00 00 00 00 00 r6 = r1
8: 85 00 00 00 0e 00 00 00 call 14
...
1200: 95 00 00 00 00 00 00 00 exit
Section sizes:
trace_gpu_alloc: 950 bytes (~120 instructions)
update_histogram: 640 bytes (~80 instructions)
check_threshold_alerts: 560 bytes (~70 instructions)
log_to_ring_buffer: 480 bytes (~60 instructions)
initialize_histograms: 720 bytes (~90 instructions)
Total: ~420 instructions across multiple functions. Each function call adds overhead (saving state, jumping, restoring state). The verifier sees complexity in the call graph.
The fix: Inline and simplify.
SEC("uprobe/libcuda:cuMemAlloc_v2")
int trace_gpu_alloc(struct pt_regs *ctx) {
__u32 pid = bpf_get_current_pid_tgid() >> 32;
size_t size = PT_REGS_PARM2(ctx);
// Just track allocations, skip histograms and alerts
__u64 *allocated = bpf_map_lookup_elem(&gpu_bytes, &pid);
if (allocated) {
__sync_fetch_and_add(allocated, size);
} else {
bpf_map_update_elem(&gpu_bytes, &pid, &size, BPF_NOEXIST);
}
// Move histogram and alert logic to userspace
return 0;
}
Result: ~45 instructions. Loads successfully. The advanced analytics move to your Rust agent where there are no size limits.
Scenario 3: Embedded System Memory Constraints
You're porting orb8 to run on an ARM-based edge device with 64MB of RAM. The device already runs several services. Your monitoring agent can't exceed 8MB total memory.
Current state:
$ file target/aarch64-unknown-linux-gnu/release/orb8
orb8: ELF 64-bit LSB executable, ARM aarch64
$ size target/aarch64-unknown-linux-gnu/release/orb8
text data bss dec hex filename
8234567 123456 89012 8447035 80e2ab orb8
That's 8.2MB just for .text (code). Plus .data and .bss. You're already over budget before runtime allocations.
Analyze the map:
$ cargo rustc --release --target aarch64-unknown-linux-gnu -- \
-C link-arg=-Wl,-Map=orb8-arm.map
$ grep "^\.text" orb8-arm.map -A 100 | grep "0x" | \
awk '{print $3, $2}' | sort -rn | head -20
2.1M libhyper.rlib # HTTP client for Kubernetes API
1.8M libtokio.rlib # Async runtime
1.2M libk8s_openapi.rlib # Kubernetes types
0.9M libserde.rlib # JSON serialization
0.8M libtls.rlib # HTTPS support
...
Decisions based on the map:
-
Remove Kubernetes API client - Run orb8 as a pure eBPF agent, export metrics locally. Use a separate lightweight forwarder to send to Kubernetes. Saves: 5MB
-
Static link musl instead of glibc - Smaller C library. Saves: 800KB
-
Disable TLS - Edge device talks to local collector only, no HTTPS needed. Saves: 1.2MB
-
Strip debug symbols - Production doesn't need them:
cargo build --release --target aarch64-unknown-linux-gnu aarch64-linux-gnu-strip target/aarch64-unknown-linux-gnu/release/orb8Saves: 1.5MB
Final binary: 2.8MB. Fits comfortably in 8MB budget with room for runtime allocations.
Without the map, you'd be randomly trying default-features = false on dependencies and hoping something helps.
How eBPF Linking Differs (The kernel does its own verification)
eBPF programs don't go through normal linking. They use a special load-time verification process:
Normal Program:
Source → Compile → Link → Executable → CPU runs it
eBPF Program:
Source → Compile → Object → Load to kernel → Verify → JIT compile → Run
When you compile an eBPF program, you get an ELF object file with special sections:
$ llvm-objdump -h gpu_monitor.o
Sections:
Idx Name Size Address Type
0 .text 00000000 00000000 CODE
1 uprobe/libcuda 000004b0 00000000 CODE ← Your program
2 .maps 00000020 00000000 DATA ← Map definitions
3 .BTF 00000800 00000000 DATA ← Type information
4 .BTF.ext 00000200 00000000 DATA ← Extended types
Your Rust agent loads this at runtime:
use libbpf_rs::ObjectBuilder;
let mut obj = ObjectBuilder::default()
.open_file("gpu_monitor.o")?
.load()?; // ← Kernel verifier runs here
// Verifier checks:
// ✓ No infinite loops
// ✓ All memory accesses provably safe
// ✓ Program terminates in finite time
// ✓ Stack usage < 512 bytes
// ✓ No out-of-bounds array access
let prog = obj.prog_mut("trace_gpu_alloc").unwrap();
let link = prog.attach()?;
The kernel's verifier is like a linker, but instead of resolving addresses, it's proving safety properties. It analyzes every possible code path and rejects the program if there's even a theoretical possibility of:
- Infinite loops
- Out-of-bounds memory access
- Stack overflow
- Unsafe pointer dereference
- Calling non-whitelisted kernel functions
If verification passes, the program gets JIT-compiled to native machine code and loaded into the kernel. This happens at runtime, not build time.
The Point (Why this matters)
Understanding linking and memory layout isn't about memorizing ELF section names. It's about having X-ray vision into your binaries when things go wrong.
When your binary is inexplicably large, you generate a map and see exactly which dependency pulled in megabytes of unused code.
When the eBPF verifier rejects your program, you check the disassembly and see which function is too complex.
When you're debugging why your program crashes on certain memory access, you check the map and realize you're trying to write to .rodata (read-only section).
When you're optimizing for embedded systems with tight constraints, the map shows you where every byte went so you can make informed decisions about what to keep and what to cut.
The tools:
# Generate linker map
cargo rustc --release -- -C link-arg=-Wl,-Map=output.map
# Analyze sections
objdump -h binary_name
readelf -S binary_name
# Symbol sizes
nm --size-sort --radix=d binary_name
# Check what's using space
size binary_name
# For eBPF objects
llvm-objdump -d program.o
bpftool prog dump xlated id <prog_id>
Systems programming means understanding the full stack from source code to running process. The linker is the bridge between your code and the executable. The map shows you exactly how that bridge was built.
Most of the time you don't need this knowledge. But when you're building eBPF monitors that run in kernel space with strict limits, or embedded systems with 64MB of RAM, or production services where every megabyte of binary size affects container startup time—this knowledge stops being optional.
The map shows you the territory. Sometimes you need to read it.
Related: Kernel Space and eBPF: The Observability Revolution