When a packet arrives, the kernel normally allocates an sk_buff, walks it up the stack, and eventually a netfilter hook gets to decide its fate. That is a lot of work to do before you drop a packet you never wanted. XDP — eXpress Data Path — lets you run a small, verified program in the network driver, on the raw buffer, before any of that allocation happens. For dropping or redirecting at high packet rates, nothing in the normal stack comes close.
This is the entry point to eBPF for network engineers: not the theory, but a program you can load today.
Where XDP Sits
NIC -> [ XDP program ] -> sk_buff alloc -> tc ingress -> netfilter -> stack ^ runs here, earliest possible pointAn XDP program returns a verdict for each packet:
XDP_DROP— discard immediately. The cheapest drop in Linux.XDP_PASS— let it continue up the stack as normal.XDP_TX— bounce it back out the same interface (load balancing, reflection).XDP_REDIRECT— send it to another interface or a userspace socket (AF_XDP).
Because it runs before sk_buff allocation, XDP_DROP can shed millions of packets per second per core — which is exactly why it is the modern answer for volumetric DDoS filtering at the host.
A Real Program: Drop UDP to a Port
eBPF programs are C compiled to BPF bytecode, then verified by the kernel before they load. Here is one that drops UDP destined to port 9999 and passes everything else:
#include <linux/bpf.h>#include <linux/if_ether.h>#include <linux/ip.h>#include <linux/udp.h>#include <linux/in.h>#include <bpf/bpf_helpers.h>
SEC("xdp")int xdp_drop_udp(struct xdp_md *ctx){ void *data = (void *)(long)ctx->data; void *data_end = (void *)(long)ctx->data_end;
struct ethhdr *eth = data; if ((void *)(eth + 1) > data_end) return XDP_PASS; if (eth->h_proto != __constant_htons(ETH_P_IP)) return XDP_PASS;
struct iphdr *ip = (void *)(eth + 1); if ((void *)(ip + 1) > data_end) return XDP_PASS; if (ip->protocol != IPPROTO_UDP) return XDP_PASS;
struct udphdr *udp = (void *)ip + ip->ihl * 4; if ((void *)(udp + 1) > data_end) return XDP_PASS;
if (udp->dest == __constant_htons(9999)) return XDP_DROP;
return XDP_PASS;}
char _license[] SEC("license") = "GPL";Every one of those > data_end bounds checks is mandatory. The verifier rejects the program if it cannot prove you never read past the packet buffer. That is the deal eBPF makes: you get to run in the kernel’s fast path, and in exchange the verifier guarantees your code cannot crash or loop forever. Fighting the verifier is the eBPF rite of passage.
Compile, Load, Inspect
# Compile to BPF objectclang -O2 -g -target bpf -c xdp_drop_udp.c -o xdp_drop_udp.o
# Load onto an interfaceip link set dev eth0 xdp obj xdp_drop_udp.o sec xdp
# Confirm it's attachedip link show dev eth0 # shows "xdp" and the program id
# Detachip link set dev eth0 xdp offbpftool is the inspection workhorse:
bpftool prog show # loaded programs, ids, typesbpftool net show # what's attached to which interfacebpftool map dump name <map> # contents of a BPF mapMaps: State That Survives Between Packets
A program alone is stateless per packet. Maps are the shared key/value store between the BPF program and userspace — counters, blocklists, config. A per-source-IP packet counter:
struct { __uint(type, BPF_MAP_TYPE_LRU_HASH); __type(key, __u32); // src IP __type(value, __u64); // packet count __uint(max_entries, 1000000);} pps SEC(".maps");The program increments the count for each source; userspace reads the map and decides whom to add to a drop list — which it does by writing to another map the program checks. This is how XDP-based DDoS mitigation works: data plane in BPF, control logic in userspace, communicating through maps, with no kernel rebuild and no packet copy.
Generic vs Native vs Offload
XDP runs in three modes, and the mode decides the speed:
| Mode | Where | Speed |
|---|---|---|
| Generic (skb) | After sk_buff alloc | Slow — fallback, for testing |
| Native | In the driver | Fast — the point of XDP |
| Offload | On the NIC (SmartNIC) | Fastest — frees the CPU entirely |
If your NIC driver lacks native XDP support, the kernel silently uses generic mode and you get none of the performance. Check that ip link reports xdp (native) and not xdpgeneric before benchmarking, or you will conclude XDP is slow when you are just running it in the wrong mode.
Putting the Map to Work: Count, Then Drop
The counter map alone is observation. The mitigation pattern wires two maps together — one the program writes (per-source packet counts), one userspace writes and the program reads (the drop list). Here is the program side doing both: it bumps the per-source counter and checks the blocklist before deciding:
struct { __uint(type, BPF_MAP_TYPE_LRU_HASH); __type(key, __u32); // src IP __type(value, __u64); // packet count __uint(max_entries, 1000000);} pps SEC(".maps");
struct { __uint(type, BPF_MAP_TYPE_HASH); __type(key, __u32); // src IP __type(value, __u8); // 1 = blocked __uint(max_entries, 65536);} blocklist SEC(".maps");
SEC("xdp")int xdp_count_drop(struct xdp_md *ctx){ void *data = (void *)(long)ctx->data; void *data_end = (void *)(long)ctx->data_end;
struct ethhdr *eth = data; if ((void *)(eth + 1) > data_end) return XDP_PASS; if (eth->h_proto != __constant_htons(ETH_P_IP)) return XDP_PASS;
struct iphdr *ip = (void *)(eth + 1); if ((void *)(ip + 1) > data_end) return XDP_PASS;
__u32 src = ip->saddr;
__u8 *blocked = bpf_map_lookup_elem(&blocklist, &src); if (blocked && *blocked) return XDP_DROP;
__u64 init = 1; __u64 *cnt = bpf_map_lookup_elem(&pps, &src); if (cnt) __sync_fetch_and_add(cnt, 1); else bpf_map_update_elem(&pps, &src, &init, BPF_ANY);
return XDP_PASS;}
char _license[] SEC("license") = "GPL";Two verifier-relevant details. The if (cnt) null check after every bpf_map_lookup_elem is not optional — the helper can return NULL and the verifier rejects any dereference you have not proven is non-NULL. And __sync_fetch_and_add gives an atomic increment, which matters because the same map entry is touched concurrently across RX queues on different CPUs (for hot counters a BPF_MAP_TYPE_PERCPU_HASH avoids the contention entirely and you sum in userspace).
Userspace then pins the maps, watches pps, and writes offenders into blocklist:
# Programmatically the loader pins maps under bpffs; from the shell:bpftool map dump name ppsbpftool map update name blocklist key 203 0 113 5 value 1The control loop runs in userspace at whatever cadence you like; the data plane keeps dropping at line rate with a single hash lookup per packet. No rule reload, no sk_buff, no stack.
Verifier Errors You Will Actually Hit
The verifier log is dense but the failures cluster into a few patterns. Read the log bottom-up — the last lines are where it gave up:
; if (udp->dest == __constant_htons(9999))invalid access to packet, off=22 size=2, R3 pkt_end ...math between pkt pointer and register with unbounded valueThat “invalid access to packet” means a bounds check is missing or the compiler could not connect it to the access. The fixes, in order of how often they are the real cause:
- Missing
> data_endcheck before reading a header. Every pointer advance needs its own check; checkingethdoes not coverip. - Variable-offset access the verifier cannot bound.
ip->ihl * 4is attacker-controlled (4 bits, 0–60 bytes). After computingudp = (void *)ip + ip->ihl * 4, the(void *)(udp + 1) > data_endcheck is what makes that access provable — without it the offset is “unbounded” to the verifier. - Loops without a clear bound. Pre-5.3 kernels reject any back-edge; modern kernels accept bounded loops but you must annotate or keep the trip count obviously finite.
#pragma unrollon a small fixed loop sidesteps it.
When stuck, raise the log verbosity at load time — bpftool prog load ... or the loader’s BPF_F_LOG_LEVEL — and the verifier prints the register state at the failing instruction, which usually names the unchecked pointer outright.
Where XDP Fits
- Use XDP for high-rate drop/redirect: DDoS scrubbing, load balancing (this is how Katran and Cilium work), DDoS-resistant front doors.
- Use nftables/tc for ordinary host firewalling and shaping — the rate is fine and the ergonomics are far better.
- Use DPDK only when you are bypassing the kernel entirely for a dedicated dataplane appliance — a much bigger commitment than XDP, which coexists with the normal stack.
You do not write XDP for your office firewall. You write it when packets-per-second is the bottleneck and the normal stack’s per-packet overhead is the thing you cannot afford. For everyone else, the value is understanding that the tools you do use — Cilium, modern load balancers, Facebook-scale DDoS mitigation — are built on exactly this, running your logic in the driver before the kernel even knows the packet exists.