Linux Networking and Container Networking: What I Learned Building an eBPF Observability Tool

I was staring at a dashboard showing every network flow attributed to the same entity: the node itself. Two pods were clearly communicating — I could see the traffic in tcpdump — but my eBPF-based observability agent was labeling every single flow with the node's IP instead of the actual pod names.

The agent's approach was straightforward: attach TC (Traffic Control) classifiers to network interfaces, extract source and destination IPs from packet headers, look up those IPs in a cache built from the Kubernetes API, and map them to pod names. Simple. Elegant. Completely wrong in this case.

Both pods were running with hostNetwork: true. They shared the node's IP address. My entire enrichment model — "look up the source IP, find the pod" — collapsed because multiple pods had the same IP.

To understand why that broke everything, I had to learn how Linux networking actually works. Not the textbook version with OSI layers and protocol headers. The version you need when you're attaching probes to network interfaces and trying to figure out which packets belong to which containers.

This is that knowledge.

Prerequisite: This post assumes you know what eBPF is. If terms like "TC classifier" or "ring buffer" are unfamiliar, start with Kernel Space and eBPF: The Observability Revolution.

The Packet's Journey (What actually happens when data leaves your application)

Before diving into containers, you need to understand how a packet moves through a regular Linux machine. Every networking concept that follows — namespaces, veth pairs, bridges — is a variation on this same fundamental pipeline.

When your application calls send() on a socket, the data doesn't just teleport to the network. It passes through a series of kernel subsystems, each with a specific job:

Application calls send("Hello")
        ↓
Socket layer: attach transport headers (TCP/UDP)
        ↓
IP layer: attach IP header, make routing decision
        ↓
Traffic Control (TC): apply bandwidth limits, run classifiers
        ↓
Network driver: hand to hardware
        ↓
NIC: electrical signals on the wire (or virtual wire)

The routing decision is the critical step. The kernel consults its routing table to determine which network interface to send the packet out of, and what the next hop should be. This is where things get interesting on a machine with multiple interfaces.

# A Kubernetes node's routing table
$ ip route show
default via 172.18.0.1 dev eth0                    # Default: send out eth0
10.244.0.0/24 dev cni0 proto kernel scope link      # Pod subnet: send to bridge
10.244.1.0/24 via 10.244.1.1 dev eth0              # Other node's pods: send out eth0

Reading this table: if the destination IP is in the 10.244.0.0/24 range (local pods), the kernel sends the packet to the cni0 interface — a bridge that connects to pods on this node. If the destination is 10.244.1.0/24 (pods on another node), it goes out eth0 toward the physical network. Everything else hits the default route and goes out eth0 as well.

The routing table is the kernel's decision tree for "where does this packet go next?" Every packet traverses it. When you're building network observability, understanding which interface a packet will traverse tells you where you need to be watching.

Network Interfaces (Not just hardware anymore)

A network interface is a named endpoint where the kernel sends and receives packets. On a physical machine, eth0 is your Ethernet port — the actual hardware that converts electrical signals to data. On a cloud VM, eth0 is a virtual NIC provided by the hypervisor, but it behaves identically from the kernel's perspective.

The important thing: Linux doesn't limit you to physical interfaces. It can create purely virtual interfaces that exist only in software. These virtual interfaces are the building blocks of container networking.

Here's what a typical Kubernetes node looks like:

$ ip link show
1: lo: <LOOPBACK,UP>           # Loopback - packets to 127.0.0.1 (never leaves the machine)
2: eth0: <BROADCAST,UP>        # Physical/virtual NIC (connects to the network)
3: cni0: <BROADCAST,UP>        # Bridge (virtual switch connecting pods)
5: veth9a3f@if4: <BROADCAST,UP> # One end of a virtual pipe (connects to a pod)
7: vethb7c2@if6: <BROADCAST,UP> # Another pipe to another pod
9: vethe1d8@if8: <BROADCAST,UP> # Another pipe to yet another pod

Three types of interfaces on one machine:

lo (loopback): Traffic to 127.0.0.1. Never leaves the machine.
eth0: The machine's connection to the outside world. All external traffic flows through here.
cni0: A bridge — a virtual network switch. We'll cover this in detail shortly.
veth*: Virtual Ethernet pairs — pipes connecting container namespaces to the bridge.

Each interface carries different traffic. This is the first critical insight for observability: if you attach a probe to the wrong interface, you'll miss traffic entirely. Attach to eth0 and you'll see all external traffic but miss pod-to-pod communication on the same node. Attach to cni0 and you'll see pod traffic but miss the host's own network activity.

Network Namespaces (How containers get their own network stack)

A network namespace is a complete, isolated copy of the Linux network stack. That means its own set of interfaces, its own routing table, its own firewall rules, and its own socket table. Processes inside one namespace cannot see or interact with interfaces in another namespace.

This is the core isolation mechanism behind container networking. It's not optional complexity — it's the reason containers can each have their own eth0 with their own IP address without conflicting with each other or the host.

Network Namespace Isolation

Each network namespace gets its own isolated set of interfaces, routing table, and sockets. A new namespace starts nearly empty -- it cannot see the host's network resources.

Host Network Namespace (PID 1)

Interfaces

Name	Type	State	IP Address	MAC
lo	loopback	UP	127.0.0.1/8	00:00:00:00:00:00
eth0	physical	UP	10.0.0.2/24	02:42:0a:00:00:02
cni0	bridge	UP	10.244.0.1/24	6a:3e:1f:a2:b8:01
veth8a3f1c2	veth	UP	-	be:4c:7d:e1:22:f3
vethd90e4b7	veth	UP	-	ae:12:5f:c3:44:a1

Routing Table

Destination	Gateway	Interface
default	10.0.0.1	eth0
10.0.0.0/24	-	eth0
10.244.0.0/24	-	cni0
10.244.1.0/24	10.0.0.3	eth0

Socket Table

Proto	Local Address	Remote Address	State
tcp	0.0.0.0:22	0.0.0.0:*	LISTEN
tcp	0.0.0.0:6443	0.0.0.0:*	LISTEN
tcp	10.0.0.2:6443	10.0.0.5:49312	ESTABLISHED
udp	0.0.0.0:8472	0.0.0.0:*	-

No container namespace exists yet.
Click "Create Namespace" to see what a new namespace looks like.

Container Network Namespace (container: nginx-7d4f8)

Interfaces

Name	Type	State	IP Address	MAC
lo	loopback	DOWN	127.0.0.1/8	00:00:00:00:00:00

Routing Table

(empty -- no routes configured)

Socket Table

(no active sockets)

Host NamespaceContainer Namespace

The host machine runs in the root namespace — the default namespace that exists when the system boots. When a container runtime (like containerd or Docker) creates a container, it creates a new network namespace for that container. Inside this new namespace, the network stack starts empty: only a loopback interface exists, and it's down.

# Create a network namespace (this is what container runtimes do)
$ ip netns add my-container

# Look inside — it's a blank slate
$ ip netns exec my-container ip link show
1: lo: <LOOPBACK> mtu 65536 state DOWN
   # Only loopback. No eth0. No connectivity. Total isolation.

# The host still has all its interfaces
$ ip link show
1: lo  2: eth0  3: cni0  5: veth9a3f@if4 ...

The ip netns exec command runs a command inside a specific namespace. From the container's perspective, the host's eth0, cni0, and all those veth interfaces don't exist. It's as if the container is on a completely separate machine with no network card installed.

This raises an obvious question: if the namespace is completely isolated, how does the container communicate with anything?

Veth Pairs (Virtual pipes connecting isolated worlds)

A veth pair is two virtual network interfaces connected by an invisible pipe. Whatever goes into one end comes out the other. They always come in pairs — you can't create one without the other.

Think of it as a wormhole: two endpoints, potentially in different namespaces, with a direct connection between them. You put a packet in one end and it instantly appears at the other end.

Virtual Ethernet (veth) Pair

A veth pair acts as a tunnel between two network namespaces. Whatever enters one end exits the other.

Host Namespace

veth-host

veth pair

Container Namespace

eth0 (veth-pod)

Whatever enters one end exits the other — the kernel forwards frames between the paired interfaces with no routing or filtering by default.

Host to ContainerContainer to Host

Here's how container runtimes use them:

# Create a veth pair — two interfaces, connected
$ ip link add veth-host type veth peer name veth-pod

# Right now, both ends are in the host namespace.
# Move one end into the container's namespace:
$ ip link set veth-pod netns my-container

# Now:
#   veth-host lives in the HOST namespace
#   veth-pod  lives in the CONTAINER namespace
# They're connected. Packets in one end → out the other.

# Give the container end an IP address and bring it up
$ ip netns exec my-container ip addr add 10.244.0.5/24 dev veth-pod
$ ip netns exec my-container ip link set veth-pod up
$ ip link set veth-host up

After this setup, the container has a network interface (veth-pod) with IP 10.244.0.5. Inside the container, this interface looks and behaves exactly like a regular eth0 — the container has no idea it's virtual. The container runtime typically renames the container-side interface to eth0 so applications don't need to know about the plumbing underneath.

The host-side interface (veth-host) appears in the host namespace as one of those vethXXXX entries you saw in the ip link show output earlier. Each running container has one veth pair, so a node running 30 pods has 30 veth interfaces on the host side.

Now the container can send packets — they emerge from the host-side veth. But where do they go from there? That's where bridges come in.

Bridges (cni0 is just a virtual network switch)

A bridge is a virtual Layer 2 network switch. If you've seen a physical network switch — the box with many Ethernet ports that connects devices on a local network — a Linux bridge is exactly that, in software.

The bridge cni0 (or docker0 if you're using Docker's default networking) connects all the host-side veth endpoints together. When pod A sends a packet to pod B on the same node, the packet travels: pod A's veth → bridge → pod B's veth. The bridge learns which MAC address lives on which port (just like a physical switch) and forwards frames accordingly.

Linux Bridge: Virtual Layer 2 Switch

A bridge (cni0) connects pod veth pairs on the same node. Toggle scenarios to see how traffic routes differently.

External Network

eth0

cni0bridge (L2 switch)

veth-A

veth-B

veth-C

Pod A10.244.0.5

Pod B10.244.0.6

Pod C10.244.0.7

Same-Node: Pod A to Pod B

Traffic stays on the bridge — never touches eth0. The bridge performs MAC-based forwarding between the two veth pairs entirely in kernel space.

Pod A → veth-A → cni0 → veth-B → Pod B

BridgeActive Patheth0 / External

# See what's connected to the bridge
$ bridge link show
5: veth9a3f@cni0: <BROADCAST,UP> master cni0   # Pod A's connection
7: vethb7c2@cni0: <BROADCAST,UP> master cni0   # Pod B's connection
9: vethe1d8@cni0: <BROADCAST,UP> master cni0   # Pod C's connection

# The bridge maintains a forwarding table (MAC address → port)
$ bridge fdb show br cni0
aa:bb:cc:00:00:01 dev veth9a3f master cni0   # Pod A's MAC → port 5
aa:bb:cc:00:00:02 dev vethb7c2 master cni0   # Pod B's MAC → port 7
aa:bb:cc:00:00:03 dev vethe1d8 master cni0   # Pod C's MAC → port 9

The critical insight: same-node pod-to-pod traffic never touches eth0. The packet goes from one veth port on the bridge to another veth port on the bridge. It stays entirely within the bridge, just like traffic between two devices plugged into the same physical switch never hits the router.

Cross-node traffic is different. If pod A sends a packet to a pod on another node, the bridge doesn't have a port for that destination. The packet gets forwarded to eth0 (the bridge's uplink, essentially), which sends it out to the physical network toward the other node.

This distinction matters enormously for observability. If your probes are only on eth0, you're blind to all same-node pod communication. On a busy node running dozens of pods, that can be the majority of the traffic.

Container Networking End-to-End (What actually happens when kubectl apply runs)

Now you have all the pieces. Let's trace what happens when Kubernetes creates a pod, and then what happens when that pod sends a packet to another pod.

Pod Creation: The Network Plumbing

The Container Networking Interface (CNI) is a specification — not a tool. It defines a contract: "given a container's network namespace, set up its networking." Different CNI plugins (Flannel, Calico, Cilium) implement this contract differently, but they all perform the same fundamental steps.

CNI Sequence: How a Pod Gets Its Network

Step-by-step process from pod creation to a fully networked container. Play the sequence or click any step to expand.

kubelet receives pod spec

containerd creates container

New network namespace created

CNI plugin invoked

Veth pair created

Pod end moved to namespace

IP address assigned

Host end connected to bridge

Routes configured

Pod is network-ready

ActiveCompletedPending

1. kubelet receives pod spec from API server
         ↓
2. kubelet tells containerd: "create this container"
         ↓
3. containerd creates a new network namespace
         ↓
4. kubelet calls the CNI plugin:
   "Here's the namespace, set up networking"
         ↓
5. CNI plugin creates a veth pair
         ↓
6. CNI plugin moves one end into the pod's namespace
         ↓
7. CNI plugin assigns an IP address (e.g., 10.244.0.5)
         ↓
8. CNI plugin connects the host end to the bridge (cni0)
         ↓
9. CNI plugin configures routes inside the pod namespace
         ↓
10. Pod is network-ready

Different CNI plugins differ in how they handle cross-node traffic. Flannel wraps packets in VXLAN tunnels. Calico uses BGP to distribute routes. Cilium uses eBPF to replace kube-proxy and implement routing. But the local plumbing — namespace, veth, bridge — is nearly universal.

The Packet Path: Pod A → Pod B (Same Node)

Pod A (10.244.0.5) makes an HTTP request to Pod B (10.244.0.6). Both are on the same node.

Pod A's application calls connect() + send()
         ↓
Pod A's network namespace:
  Routing table says: "10.244.0.0/24 → dev eth0"
  → Send packet out eth0 (which is the veth-podA end)
         ↓
Packet crosses the veth pair:
  veth-podA → veth-hostA
  Packet appears in the host namespace on the bridge port
         ↓
cni0 bridge:
  Looks up destination MAC in forwarding table
  Finds it on port veth-hostB
  Forwards the frame to that port
         ↓
Packet crosses the second veth pair:
  veth-hostB → veth-podB
  Packet appears in Pod B's namespace
         ↓
Pod B's network namespace:
  TCP/IP stack processes the packet
  Delivers to the listening socket
         ↓
Pod B's application receives the HTTP request

The packet never touched eth0. It never left the node. The bridge handled everything.

The Packet Path: Pod A → Pod C (Different Node)

Pod A (10.244.0.5) makes a request to Pod C (10.244.1.8) on Node 2.

Pod A's application calls connect() + send()
         ↓
Pod A's namespace routing table:
  "10.244.1.0/24" is NOT local → default route → eth0
         ↓
Packet crosses veth to host namespace
         ↓
Host namespace routing table:
  "10.244.1.0/24 via 10.244.1.1 dev eth0"
  → Send out eth0 toward the other node
         ↓
[Physical/virtual network between nodes]
         ↓
Node 2's eth0 receives the packet
         ↓
Node 2's routing table:
  "10.244.1.0/24 dev cni0" → forward to bridge
         ↓
Node 2's cni0 bridge forwards to Pod C's veth
         ↓
Pod C receives the packet

This time the packet did cross eth0 on both nodes. It's visible to probes on eth0. But on Node 1, the packet's source IP is still 10.244.0.5 (Pod A's real IP), and the destination is 10.244.1.8 (Pod C's real IP). No translation happened. This is a core Kubernetes networking guarantee.

Kubernetes Networking (Every pod gets an IP, and then things get complicated)

Kubernetes enforces three networking rules:

Every pod gets a unique IP address — no sharing, no conflicts
Pods can communicate with each other without NAT — the source IP a pod sees is the actual sender's IP
The IP a pod sees for itself is the same IP others see — no address translation at the pod boundary

These rules make IP-based observability possible. If you capture a packet with source IP 10.244.0.5, you can look that up and know it came from a specific pod. The IP is a reliable identifier.

But then Kubernetes introduces Services, and things get more complicated.

Services and DNAT

A Kubernetes Service provides a stable virtual IP (called a ClusterIP) that load-balances across a set of backend pods. The ClusterIP doesn't belong to any interface — it exists only as a set of firewall rules.

When a pod sends traffic to a ClusterIP, kube-proxy (or its eBPF replacement in Cilium) intercepts the packet and performs DNAT (Destination Network Address Translation): it rewrites the destination IP from the ClusterIP to one of the backend pod's real IPs.

Service DNAT: How kube-proxy Rewrites Packet Headers

Toggle between pre-DNAT and post-DNAT to see how the destination address changes.

IP Packet Header

Source IP

10.244.0.5Pod A

Destination IP

10.96.0.10ClusterIP

ClusterIP is a virtual IP — no pod has this address

Packet Flow

Pod A

10.244.0.5

Service: my-svc

10.96.0.10

kube-proxy DNAT

iptables / ipvs

Pod B

10.244.0.6

Observation Point Matters

Probe captures before DNAT: destination is 10.96.0.10 (useless for pod lookup)

Probe captures after DNAT: destination is 10.244.0.6 (maps to Pod B)

Resolvable IPVirtual IP (no pod)

# kube-proxy creates iptables rules like this:
$ iptables -t nat -L KUBE-SERVICES -n
# Service "my-svc" has ClusterIP 10.96.0.10, port 80
# Backend pods: 10.244.0.6 and 10.244.1.3

DNAT  tcp  -- 0.0.0.0/0  10.96.0.10  dpt:80  to:10.244.0.6:80  (50% probability)
DNAT  tcp  -- 0.0.0.0/0  10.96.0.10  dpt:80  to:10.244.1.3:80  (50% probability)

The packet transformation:

Before DNAT:  src=10.244.0.5  dst=10.96.0.10   (ClusterIP — virtual)
After DNAT:   src=10.244.0.5  dst=10.244.0.6    (Real pod IP)

DNAT (Destination NAT) rewrites only the destination IP address on the packet. The source IP stays intact. NAT (Network Address Translation) is the general term for rewriting IP addresses in packet headers — DNAT specifically rewrites the destination. There's also SNAT (Source NAT) which rewrites the source, but Kubernetes tries to avoid that for pod-to-pod traffic because it breaks the "pods see real IPs" guarantee.

This matters for observability because where DNAT happens determines what IP addresses you see in captured packets:

Before DNAT (on the sending pod's veth, or early in the bridge path): dst=10.96.0.10 — a ClusterIP that doesn't map to any pod in your cache
After DNAT (later in the path, on the receiving pod's veth): dst=10.244.0.6 — a real pod IP you can look up

Where the DNAT rules execute depends on the CNI and kube-proxy mode. With iptables-based kube-proxy, DNAT happens in the PREROUTING chain, which runs before the routing decision. This means traffic captured on the bridge is typically post-DNAT. But this is an implementation detail that varies across setups — the key point is that observation point determines what you see.

hostNetwork: true (The identity crisis that broke my enrichment pipeline)

Now we get to the thing that burned me.

Normally, a pod gets its own network namespace. It gets a unique IP, its own routing table, its own veth pair — complete isolation. hostNetwork: true skips all of that. The pod runs directly in the host's network namespace.

Normal Pod vs hostNetwork Pod

How network namespace isolation changes when a pod uses hostNetwork: true.

Node — 172.18.0.3

Pod Namespace

Normal Pod

nginx-7d4f8b

eth0

veth pair

cni0

bridge

IP: 10.244.0.5

Own namespace, unique IP

Host Namespace

hostNetwork Pod

kube-proxy-x9k2

hostNetwork: true

node eth0node eth0

direct access

IP: 172.18.0.3172.18.0.3(same as node)

Host namespace, shared IP

IP Lookup Collision

172.18.0.3

???

kube-proxyororb8-agent

Multiple hostNetwork pods share the node IP, making source attribution ambiguous for IP-based identity systems.

Pod Network Host Network Collision

# Normal pod — gets its own network identity
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: my-app
    image: nginx
# Result: Pod IP = 10.244.0.5 (unique, own namespace)
# Has: own eth0, own routing table, own veth pair

---
# hostNetwork pod — borrows the node's identity
apiVersion: v1
kind: Pod
spec:
  hostNetwork: true
  containers:
  - name: my-agent
    image: orb8-agent
# Result: Pod IP = 172.18.0.3 (same as the node)
# Has: node's eth0, node's routing table, no veth pair

Why would you want this? DaemonSets running system-level tools need it. An eBPF observability agent needs access to the host's network interfaces to attach probes. A CNI plugin needs to configure the host's network stack. kube-proxy needs to install iptables rules in the host namespace. These tools can't operate from inside an isolated pod namespace — they need to see and modify the host's networking directly.

The cost: hostNetwork pods share the node's IP address. If three hostNetwork pods run on a node with IP 172.18.0.3, all three have the IP 172.18.0.3.

This is what broke my enrichment pipeline. The pod watcher builds a map of IP → pod name. When it processes hostNetwork pods, it inserts the node's IP as the key. But only one entry can exist per key, so the last hostNetwork pod processed "wins." Every flow to or from that IP gets attributed to whatever pod happened to be indexed last.

Pod watcher processes events:
  kube-proxy (hostNetwork) → IP 172.18.0.3 → cache: 172.18.0.3 = kube-proxy
  orb8-agent (hostNetwork) → IP 172.18.0.3 → cache: 172.18.0.3 = orb8-agent  (overwrites!)

Flow arrives: src=172.18.0.3 dst=10.96.0.1
  Cache lookup: 172.18.0.3 → "orb8-agent"

  But the actual sender was kube-proxy.
  Or kubelet (a host process, not even a pod).
  Or sshd.
  We can't tell. They all share 172.18.0.3.

The IP → pod mapping is one-to-many for hostNetwork pods, but a hashmap is one-to-one. Information is lost.

For regular pods, IP-based enrichment works reliably because Kubernetes guarantees unique IPs. For hostNetwork pods, you need additional signals — port numbers, process IDs from tracepoint probes, or simply accepting that hostNetwork traffic gets labeled as "host" rather than attributed to a specific pod.

Where You Observe Determines What You See (The interface you attach to changes everything)

This is the practical synthesis of everything above. When building an eBPF observability tool, the decision of which network interface to attach TC classifiers to determines your visibility. Different interfaces see fundamentally different traffic.

Observation Points: Where You Attach the Probe Matters

Different probe attachment points capture different traffic. Select a probe location to see what becomes visible and what stays hidden.

Captures traffic entering/leaving the node

Visible Traffic

Cross-node pod traffic (Pod A to remote pod)
Host process traffic (kubelet, SSH, etc.)
hostNetwork pod traffic
External ingress/egress traffic

Invisible Traffic

Same-node pod-to-pod traffic (Pod A to Pod B)

eth0 Sees the Node Boundary

Probing eth0 captures everything crossing the node boundary: cross-node pod traffic, host process traffic, and hostNetwork pod traffic. But same-node pod-to-pod traffic never reaches eth0 -- it is switched entirely within the cni0 bridge.

Probe attachedVisible pathDimmed / not capturedDuplicate risk

Here's what each attachment point gives you:

Probes on eth0 (the host's physical/virtual NIC):

Cross-node pod traffic (packets leaving or entering the node)
Host process traffic (kubelet, sshd, containerd)
hostNetwork pod traffic
Missing: same-node pod-to-pod traffic (stays on the bridge, never touches eth0)

Probes on cni0 (the bridge):

All pod traffic on the node, including same-node communication
Missing: hostNetwork pod traffic (these pods bypass the bridge — they're in the host namespace)
Missing: host process traffic

Probes on individual veth interfaces:

Traffic for one specific pod only
Most granular, but requires managing probes as pods come and go
Veths are created and destroyed with pods — the probe lifecycle gets complicated

The practical approach for broad visibility: attach to both eth0 and the bridge interface. This covers cross-node traffic, same-node pod traffic, and host-level traffic. You'll get some duplicate events for cross-node pod traffic (it crosses both the bridge and eth0), but deduplication is easier than missing data.

// Discover and attach to multiple interfaces for maximum visibility
fn discover_interfaces() -> Vec<String> {
    let mut interfaces = Vec::new();

    // 1. Find the default route interface (eth0 or equivalent)
    //    Parse /proc/net/route — the entry with destination 00000000
    //    is the default route
    let routes = std::fs::read_to_string("/proc/net/route")
        .expect("failed to read route table");
    for line in routes.lines().skip(1) {
        let fields: Vec<&str> = line.split_whitespace().collect();
        if fields[1] == "00000000" {
            interfaces.push(fields[0].to_string()); // e.g., "eth0"
            break;
        }
    }

    // 2. Find bridge interfaces
    //    Bridges have a /sys/class/net/<name>/bridge directory
    for entry in std::fs::read_dir("/sys/class/net").unwrap() {
        let name = entry.unwrap().file_name().to_string_lossy().to_string();
        if matches!(name.as_str(), "cni0" | "docker0" | "cbr0")
            || name.starts_with("br-")
        {
            interfaces.push(name);
        }
    }

    interfaces
}

// Attach TC classifiers to each discovered interface
fn attach_probes(interfaces: &[String], bpf: &mut Bpf) {
    for iface in interfaces {
        // Ingress: packets arriving at this interface
        TcBuilder::new(bpf.program_mut("classifier_ingress"))
            .ifname(iface)
            .direction(TcAttachType::Ingress)
            .build().unwrap()
            .attach().unwrap();

        // Egress: packets leaving this interface
        TcBuilder::new(bpf.program_mut("classifier_egress"))
            .ifname(iface)
            .direction(TcAttachType::Egress)
            .build().unwrap()
            .attach().unwrap();
    }
}

Enriching Flows with Pod Names (The IP lookup that works 95% of the time)

TC classifiers operate at the network layer. They see raw packet headers: source IP, destination IP, source port, destination port, and protocol. That's it. A 5-tuple. There's no process name, no container ID, no pod label.

This is a fundamental constraint of the TC hook point. TC classifiers run in the kernel's network stack during softirq processing (the kernel's way of handling interrupts from hardware, like a network card signaling "packet arrived"). There's no process context available because the packet processing isn't running on behalf of any specific process — it's the kernel doing work triggered by a hardware interrupt. The eBPF helper bpf_get_current_cgroup_id(), which could identify the container, returns 0 in this context.

So you need another way to map IPs to pods. The Kubernetes API provides it.

IP Enrichment Pipeline

How raw packet 5-tuples get enriched with Kubernetes pod names via IP lookup.

Stage 1: Raw Event

src = 10.244.0.5

dst = 10.244.0.6

proto = TCP

sport = 43210

dport = 80

Stage 2: IP Lookup

pod IP cache

10.244.0.5->default/frontend

10.244.0.6->default/backend

Stage 3: Enriched Flow

default/frontend -> default/backend

TCP:80

Pod Network (unique IPs)

Each pod gets its own IP from the CNI. Lookup is 1:1 — unambiguous attribution.

hostNetwork (shared node IP)

Pods share the node IP. Multiple pods map to the same address, breaking 1:1 lookup.

Resolved Ambiguous

The agent runs a background task that watches the Kubernetes API for pod events. Every time a pod is created, updated, or deleted, the watcher updates an in-memory cache mapping IP addresses to pod metadata.

// Background task: watch Kubernetes API for pod changes
async fn watch_pods(cache: Arc<PodCache>) {
    let pods: Api<Pod> = Api::all(kube_client);
    let watcher = runtime::watcher(pods, watcher::Config::default());

    while let Some(event) = watcher.try_next().await.unwrap() {
        match event {
            Event::Apply(pod) => {
                // Pod created or updated — extract IP and metadata
                let ip = pod.status.as_ref()
                    .and_then(|s| s.pod_ip.as_ref());
                if let Some(ip) = ip {
                    cache.insert_by_ip(ip.parse().unwrap(), PodInfo {
                        namespace: pod.metadata.namespace.unwrap_or_default(),
                        name: pod.metadata.name.unwrap_or_default(),
                    });
                }
            }
            Event::Delete(pod) => {
                // Pod deleted — remove from cache
                if let Some(ip) = pod.status.as_ref()
                    .and_then(|s| s.pod_ip.as_ref())
                {
                    cache.remove_by_ip(&ip.parse().unwrap());
                }
            }
            Event::InitDone => {
                // Initial list complete — cache is warm
            }
        }
    }
}

When the main event loop reads a flow event from the eBPF ring buffer, it looks up both IPs in the cache:

// Main loop: poll ring buffer, enrich, aggregate
fn process_event(event: &NetworkFlowEvent, cache: &PodCache) -> EnrichedFlow {
    let src_pod = cache.get_by_ip(&event.src_ip);
    let dst_pod = cache.get_by_ip(&event.dst_ip);

    // Direction-aware attribution:
    // Ingress = packet arriving → destination is the local pod
    // Egress  = packet leaving  → source is the local pod
    let (namespace, pod_name) = if event.direction == INGRESS {
        dst_pod.or(src_pod)
    } else {
        src_pod.or(dst_pod)
    }.map(|p| (p.namespace.clone(), p.name.clone()))
     .unwrap_or(("external".into(), "unknown".into()));

    EnrichedFlow { namespace, pod_name, /* ... */ }
}

Where this works:

Regular pods: Each has a unique IP assigned by the CNI. Lookup is unambiguous.
Cross-node traffic: Source and destination are real pod IPs. Both resolve correctly.

Where this breaks:

hostNetwork pods: Multiple pods share the node IP. Lookup returns whichever was cached last.
External traffic: IPs from outside the cluster aren't in the pod cache. Labeled as external/unknown.
Service ClusterIPs: If a packet is captured before DNAT, the destination is a virtual IP not in the pod cache.

For the 95% case — regular pods communicating directly — this approach is reliable and efficient. The cache is updated in real time via the Kubernetes watch API, so new pods are resolvable within seconds of creation.

Interface Discovery (Reading /proc to find where to attach)

You can't hardcode interface names. Different Kubernetes distributions, CNI plugins, and cloud providers use different interface names and bridge configurations. The agent needs to discover them at startup.

The primary interface is found by parsing /proc/net/route, a pseudo-file the kernel exposes:

$ cat /proc/net/route
Iface   Destination  Gateway   Flags  RefCnt  Use  Metric  Mask      MTU  Window  IRTT
eth0    00000000     0101A8C0  0003   0       0    100     00000000  0    0       0
cni0    0000F40A     00000000  0001   0       0    0       00FFFFFF  0    0       0

The entry with destination 00000000 (which is 0.0.0.0 — the default route) tells you the primary interface. In this case, eth0. The gateway 0101A8C0 is 192.168.1.1 in little-endian hex.

Bridges are found by checking which interfaces have a /sys/class/net/<name>/bridge directory — a convention the kernel uses to expose bridge metadata. Some environments use well-known names (cni0, docker0, cbr0); others use generated names (br-a1b2c3d4 in Docker networks, or kind's bridge names).

The interface landscape is also dynamic. Veth interfaces are created and destroyed as pods come and go. Bridges can be reconfigured by CNI plugins. A robust agent runs discovery at startup and periodically re-scans, or watches for netlink events (the kernel's notification mechanism for network configuration changes) to react to interface changes in real time.

The Full Picture (What I wish I knew before building this)

Container networking is layers of abstraction, each solving one problem:

Network namespaces give containers isolated network stacks
Veth pairs connect those isolated stacks back to the host
Bridges let containers on the same node talk to each other
Routing tables direct traffic to the right interface
CNI plugins orchestrate all of the above when pods are created
Kubernetes Services and DNAT provide stable endpoints and load balancing
hostNetwork: true bypasses all isolation for system-level tools

For observability, the mental model is: packets flow through specific interfaces, and each interface carries specific traffic. Where you observe determines what you see. IP-based enrichment works because Kubernetes guarantees pod IP uniqueness — except when pods opt out of their own namespace via hostNetwork.

The opening mystery resolves cleanly: hostNetwork pods don't get their own IP, so the IP → pod mapping is ambiguous. The fix isn't a smarter data structure — it's accepting that hostNetwork traffic belongs to the "host" and needs different attribution strategies (port-based disambiguation, or simply labeling it as host traffic).

Building an eBPF-powered observability tool taught me that the Linux network stack is not a monolith. It's a pipeline of well-defined stages, observable at each point, with different tradeoffs at each stage. Understanding that pipeline — not just memorizing it, but knowing why each piece exists and what traffic flows where — is the difference between observability that works and observability that lies to you.