Traffic Control with tc: Shaping, Policing, and HTB

tc has a reputation for being write-only — you build a working config once, it survives by luck, and nobody dares touch it. That reputation comes from skipping the model and copying incantations off forums. The model itself is small: three object types, a clear hierarchy, and a couple of qdiscs you will actually use. Learn those and tc stops being scary.

The Three Objects

  • qdisc (queuing discipline) — the algorithm that decides packet ordering and timing on egress. Attached to an interface (the root qdisc) or to a class.
  • class — a subdivision of a classful qdisc’s bandwidth. Classes nest into a tree.
  • filter — rules that sort packets into classes.

Egress only. The kernel queues packets on the way out an interface; you cannot truly shape what is already arriving (more on ingress later).

fq_codel: Fix Bufferbloat First

Before any fancy shaping, the single highest-value change on most links is replacing a dumb FIFO with fq_codel, which keeps latency low under load by managing queue depth per flow:

Terminal window
tc qdisc replace dev eth0 root fq_codel

On many systems this is already the default. If a link feels laggy under load — a big upload destroying your ping — this one line is often the entire fix. Shaping is for dividing bandwidth; fq_codel is for keeping it responsive.

HTB: Sharing Bandwidth With Guarantees

Hierarchical Token Bucket is the classful qdisc for “give each class a guaranteed minimum, let it borrow up to a ceiling when others are idle.” That borrowing behavior is the whole reason to use HTB over fixed rate limits.

Shape a 100 Mbit link, split between VoIP, business traffic, and bulk:

Terminal window
DEV=eth0
tc qdisc del dev $DEV root 2>/dev/null
# Root HTB, default unclassified traffic to class 30
tc qdisc add dev $DEV root handle 1: htb default 30
# Parent class = total link capacity
tc class add dev $DEV parent 1: classid 1:1 htb rate 100mbit ceil 100mbit
# VoIP: 20mbit guaranteed, can burst to full link
tc class add dev $DEV parent 1:1 classid 1:10 htb rate 20mbit ceil 100mbit prio 0
# Business: 50mbit guaranteed
tc class add dev $DEV parent 1:1 classid 1:20 htb rate 50mbit ceil 100mbit prio 1
# Bulk: 30mbit guaranteed, lowest priority
tc class add dev $DEV parent 1:1 classid 1:30 htb rate 30mbit ceil 100mbit prio 2

rate is the guarantee; ceil is the cap when borrowing. Because every class can ceil to 100mbit, an idle link lets any class use the whole pipe — but the moment VoIP needs its 20mbit, HTB reclaims it from the borrowers. Guarantee plus opportunistic sharing.

Add fq_codel as a leaf qdisc under each class so flows within a class stay responsive:

Terminal window
tc qdisc add dev $DEV parent 1:10 handle 110: fq_codel
tc qdisc add dev $DEV parent 1:20 handle 120: fq_codel
tc qdisc add dev $DEV parent 1:30 handle 130: fq_codel

Filters: Sorting Into Classes

Classes are useless until packets land in them. Filters do the sorting. Match VoIP by DSCP (EF), business by port:

Terminal window
# DSCP EF (46) -> VoIP class
tc filter add dev $DEV parent 1: protocol ip prio 1 \
u32 match ip tos 0xb8 0xfc flowid 1:10
# TCP 443 to the app subnet -> business
tc filter add dev $DEV parent 1: protocol ip prio 2 \
u32 match ip dport 443 0xffff flowid 1:20

u32 matching is the classic, terse way (the 0xb8/0xfc is the DSCP byte and its mask). For readability, tc also supports flower-based matching on modern kernels, but u32 is universal.

Ingress: The IFB Trick

You cannot shape ingress directly — the packet is already in the host. The standard workaround is to redirect ingress to an Intermediate Functional Block device and shape its egress:

Terminal window
modprobe ifb
ip link set dev ifb0 up
# Redirect all ingress on eth0 to ifb0
tc qdisc add dev eth0 handle ffff: ingress
tc filter add dev eth0 parent ffff: protocol ip u32 \
match u32 0 0 action mirred egress redirect dev ifb0
# Now shape ifb0's egress (= eth0's ingress) with HTB as above
tc qdisc add dev ifb0 root handle 1: htb default 10
tc class add dev ifb0 parent 1: classid 1:10 htb rate 50mbit

Ingress shaping is always approximate — by the time you drop a packet it already crossed the link — but for keeping a host from saturating a downstream it works well enough.

Verifying and Debugging

Terminal window
# The class tree with live byte/packet counts and borrow stats
tc -s class show dev eth0
# Qdisc stats — look at drops and 'overlimits'
tc -s qdisc show dev eth0
# Filters and where they point
tc filter show dev eth0

tc -s class show is where you confirm reality matches intent: each class shows bytes sent and how often it hit its ceiling. If a class shows zero bytes, your filter is not matching — check the filter before blaming the shaper.

Reading the Counters When a Class Misbehaves

The most common production complaint is “class X is being starved” or “the ceiling isn’t holding.” tc -s class show carries the numbers that settle it:

Terminal window
tc -s class show dev eth0
# class htb 1:10 root rate 20Mbit ceil 100Mbit
# Sent 184320000 bytes 122880 pkt (dropped 0, overlimits 0 requeues 0)
# rate 18Mbit 1500pps
# lended: 40000 borrowed: 81920 giants: 0
# tokens: 14200 ctokens: 9100

borrowed is packets this class sent above its rate by borrowing from the parent; lended is bandwidth this class gave to siblings. If a class shows steady borrowed and its siblings show steady lended, sharing is working as designed. overlimits climbing means the class is hitting ceil — expected for bulk, a red flag for VoIP. Negative or pinned tokens means the bucket is empty and packets are queuing, which is your guarantee being exercised under contention.

A subtle one: HTB enforces rate/ceil in bits on the wire, but it has to estimate per-packet overhead. On links with small packets (VoIP again) the default overhead model under-counts, so the shaper lets through slightly more than configured. Tell HTB about the link-layer framing so the math is honest:

Terminal window
# Account for Ethernet framing (or 'atm' for old DSL)
tc class add dev $DEV parent 1:1 classid 1:10 htb \
rate 20mbit ceil 100mbit overhead 24 mpu 64 linklayer ethernet

mpu 64 sets the minimum packet unit so tiny packets are billed at the real minimum frame size, and overhead 24 covers headers tc cannot see. Skip this and your shaped 95mbit can pass 100+ on small-packet traffic, the queue migrates upstream, and bufferbloat you thought you fixed comes back.

A Failure Drill: The Shaper That Silently Stopped Working

Scenario: shaping config survives a reboot via systemd, but after a NIC firmware update someone enabled multiqueue and the single root HTB now only sees one of eight hardware queues. Symptom: throughput far above the configured ceiling, no errors logged.

Reproduce and confirm whether your qdisc actually owns egress:

Terminal window
# Does the root qdisc cover the whole device, or did mq take over?
tc qdisc show dev eth0
# qdisc mq 0: root <-- mq is root, your htb is per-queue or missing
# qdisc fq_codel 0: parent :1 ...
# Check hardware queue count
ls /sys/class/net/eth0/queues/ | grep -c tx-

If mq is root, an HTB attached to root handle 1: either failed to attach or is shaping one queue. The fix for a software shaper that must see all traffic is to force a single transmit queue, or shape on an IFB device where multiqueue does not apply:

Terminal window
# Collapse to one tx queue so one HTB sees everything
ethtool -L eth0 combined 1
tc qdisc replace dev eth0 root handle 1: htb default 30

The drill to run before go-live: configure the shaper, then push line-rate traffic with iperf3 and confirm tc -s class show byte counts climb on the right class and overlimits appears exactly when you cross ceil. If iperf reports more than your ceiling, the qdisc is not on the path the traffic actually takes — investigate mq before touching rates.

What Trips People Up

  • No filter, no class. Traffic with no matching filter falls to default. If everything lands in default, your filters are wrong, and the elaborate class tree does nothing.
  • Shaping below the bottleneck. Set your rate slightly under the real link speed (e.g. 95mbit on a 100mbit link). If you shape at line rate, the queue forms in the modem/upstream where you cannot control it, and bufferbloat returns.
  • Persistence. tc config vanishes on reboot. Wire it into a systemd unit or your network manager hook; do not leave it in someone’s shell history.

tc is not mysterious once you hold the model: a tree of classes under an HTB root, filters sorting packets into them, fq_codel keeping each class honest. Build it that way, verify with tc -s class show, and it is as maintainable as anything else on the box.