eBPF and XDP for Fast Packet Processing: A Practical Intro

When a packet arrives, the kernel normally allocates an sk_buff, walks it up the stack, and eventually a netfilter hook gets to decide its fate. That is a lot of work to do before you drop a packet you never wanted. XDP — eXpress Data Path — lets you run a small, verified program in the network driver, on the raw buffer, before any of that allocation happens. For dropping or redirecting at high packet rates, nothing in the normal stack comes close.

This is the entry point to eBPF for network engineers: not the theory, but a program you can load today.

Where XDP Sits

NIC -> [ XDP program ] -> sk_buff alloc -> tc ingress -> netfilter -> stack
^ runs here, earliest possible point

An XDP program returns a verdict for each packet:

  • XDP_DROP — discard immediately. The cheapest drop in Linux.
  • XDP_PASS — let it continue up the stack as normal.
  • XDP_TX — bounce it back out the same interface (load balancing, reflection).
  • XDP_REDIRECT — send it to another interface or a userspace socket (AF_XDP).

Because it runs before sk_buff allocation, XDP_DROP can shed millions of packets per second per core — which is exactly why it is the modern answer for volumetric DDoS filtering at the host.

A Real Program: Drop UDP to a Port

eBPF programs are C compiled to BPF bytecode, then verified by the kernel before they load. Here is one that drops UDP destined to port 9999 and passes everything else:

#include <linux/bpf.h>
#include <linux/if_ether.h>
#include <linux/ip.h>
#include <linux/udp.h>
#include <linux/in.h>
#include <bpf/bpf_helpers.h>
SEC("xdp")
int xdp_drop_udp(struct xdp_md *ctx)
{
void *data = (void *)(long)ctx->data;
void *data_end = (void *)(long)ctx->data_end;
struct ethhdr *eth = data;
if ((void *)(eth + 1) > data_end)
return XDP_PASS;
if (eth->h_proto != __constant_htons(ETH_P_IP))
return XDP_PASS;
struct iphdr *ip = (void *)(eth + 1);
if ((void *)(ip + 1) > data_end)
return XDP_PASS;
if (ip->protocol != IPPROTO_UDP)
return XDP_PASS;
struct udphdr *udp = (void *)ip + ip->ihl * 4;
if ((void *)(udp + 1) > data_end)
return XDP_PASS;
if (udp->dest == __constant_htons(9999))
return XDP_DROP;
return XDP_PASS;
}
char _license[] SEC("license") = "GPL";

Every one of those > data_end bounds checks is mandatory. The verifier rejects the program if it cannot prove you never read past the packet buffer. That is the deal eBPF makes: you get to run in the kernel’s fast path, and in exchange the verifier guarantees your code cannot crash or loop forever. Fighting the verifier is the eBPF rite of passage.

Compile, Load, Inspect

Terminal window
# Compile to BPF object
clang -O2 -g -target bpf -c xdp_drop_udp.c -o xdp_drop_udp.o
# Load onto an interface
ip link set dev eth0 xdp obj xdp_drop_udp.o sec xdp
# Confirm it's attached
ip link show dev eth0 # shows "xdp" and the program id
# Detach
ip link set dev eth0 xdp off

bpftool is the inspection workhorse:

Terminal window
bpftool prog show # loaded programs, ids, types
bpftool net show # what's attached to which interface
bpftool map dump name <map> # contents of a BPF map

Maps: State That Survives Between Packets

A program alone is stateless per packet. Maps are the shared key/value store between the BPF program and userspace — counters, blocklists, config. A per-source-IP packet counter:

struct {
__uint(type, BPF_MAP_TYPE_LRU_HASH);
__type(key, __u32); // src IP
__type(value, __u64); // packet count
__uint(max_entries, 1000000);
} pps SEC(".maps");

The program increments the count for each source; userspace reads the map and decides whom to add to a drop list — which it does by writing to another map the program checks. This is how XDP-based DDoS mitigation works: data plane in BPF, control logic in userspace, communicating through maps, with no kernel rebuild and no packet copy.

Generic vs Native vs Offload

XDP runs in three modes, and the mode decides the speed:

ModeWhereSpeed
Generic (skb)After sk_buff allocSlow — fallback, for testing
NativeIn the driverFast — the point of XDP
OffloadOn the NIC (SmartNIC)Fastest — frees the CPU entirely

If your NIC driver lacks native XDP support, the kernel silently uses generic mode and you get none of the performance. Check that ip link reports xdp (native) and not xdpgeneric before benchmarking, or you will conclude XDP is slow when you are just running it in the wrong mode.

Putting the Map to Work: Count, Then Drop

The counter map alone is observation. The mitigation pattern wires two maps together — one the program writes (per-source packet counts), one userspace writes and the program reads (the drop list). Here is the program side doing both: it bumps the per-source counter and checks the blocklist before deciding:

struct {
__uint(type, BPF_MAP_TYPE_LRU_HASH);
__type(key, __u32); // src IP
__type(value, __u64); // packet count
__uint(max_entries, 1000000);
} pps SEC(".maps");
struct {
__uint(type, BPF_MAP_TYPE_HASH);
__type(key, __u32); // src IP
__type(value, __u8); // 1 = blocked
__uint(max_entries, 65536);
} blocklist SEC(".maps");
SEC("xdp")
int xdp_count_drop(struct xdp_md *ctx)
{
void *data = (void *)(long)ctx->data;
void *data_end = (void *)(long)ctx->data_end;
struct ethhdr *eth = data;
if ((void *)(eth + 1) > data_end)
return XDP_PASS;
if (eth->h_proto != __constant_htons(ETH_P_IP))
return XDP_PASS;
struct iphdr *ip = (void *)(eth + 1);
if ((void *)(ip + 1) > data_end)
return XDP_PASS;
__u32 src = ip->saddr;
__u8 *blocked = bpf_map_lookup_elem(&blocklist, &src);
if (blocked && *blocked)
return XDP_DROP;
__u64 init = 1;
__u64 *cnt = bpf_map_lookup_elem(&pps, &src);
if (cnt)
__sync_fetch_and_add(cnt, 1);
else
bpf_map_update_elem(&pps, &src, &init, BPF_ANY);
return XDP_PASS;
}
char _license[] SEC("license") = "GPL";

Two verifier-relevant details. The if (cnt) null check after every bpf_map_lookup_elem is not optional — the helper can return NULL and the verifier rejects any dereference you have not proven is non-NULL. And __sync_fetch_and_add gives an atomic increment, which matters because the same map entry is touched concurrently across RX queues on different CPUs (for hot counters a BPF_MAP_TYPE_PERCPU_HASH avoids the contention entirely and you sum in userspace).

Userspace then pins the maps, watches pps, and writes offenders into blocklist:

Terminal window
# Programmatically the loader pins maps under bpffs; from the shell:
bpftool map dump name pps
bpftool map update name blocklist key 203 0 113 5 value 1

The control loop runs in userspace at whatever cadence you like; the data plane keeps dropping at line rate with a single hash lookup per packet. No rule reload, no sk_buff, no stack.

Verifier Errors You Will Actually Hit

The verifier log is dense but the failures cluster into a few patterns. Read the log bottom-up — the last lines are where it gave up:

; if (udp->dest == __constant_htons(9999))
invalid access to packet, off=22 size=2, R3 pkt_end ...
math between pkt pointer and register with unbounded value

That “invalid access to packet” means a bounds check is missing or the compiler could not connect it to the access. The fixes, in order of how often they are the real cause:

  • Missing > data_end check before reading a header. Every pointer advance needs its own check; checking eth does not cover ip.
  • Variable-offset access the verifier cannot bound. ip->ihl * 4 is attacker-controlled (4 bits, 0–60 bytes). After computing udp = (void *)ip + ip->ihl * 4, the (void *)(udp + 1) > data_end check is what makes that access provable — without it the offset is “unbounded” to the verifier.
  • Loops without a clear bound. Pre-5.3 kernels reject any back-edge; modern kernels accept bounded loops but you must annotate or keep the trip count obviously finite. #pragma unroll on a small fixed loop sidesteps it.

When stuck, raise the log verbosity at load time — bpftool prog load ... or the loader’s BPF_F_LOG_LEVEL — and the verifier prints the register state at the failing instruction, which usually names the unchecked pointer outright.

Where XDP Fits

  • Use XDP for high-rate drop/redirect: DDoS scrubbing, load balancing (this is how Katran and Cilium work), DDoS-resistant front doors.
  • Use nftables/tc for ordinary host firewalling and shaping — the rate is fine and the ergonomics are far better.
  • Use DPDK only when you are bypassing the kernel entirely for a dedicated dataplane appliance — a much bigger commitment than XDP, which coexists with the normal stack.

You do not write XDP for your office firewall. You write it when packets-per-second is the bottleneck and the normal stack’s per-packet overhead is the thing you cannot afford. For everyone else, the value is understanding that the tools you do use — Cilium, modern load balancers, Facebook-scale DDoS mitigation — are built on exactly this, running your logic in the driver before the kernel even knows the packet exists.