BGP Route Servers with BIRD: Running an IXP Peering Fabric

At an internet exchange, every member could peer with every other member directly. With 200 members that is 199 BGP sessions each and a configuration nightmare. The route server solves it: each member peers once, with the route server, and gets routes from everyone else. It is the piece of infrastructure that makes an IXP scale.

The subtle part is that a route server is not a normal BGP router. It must not insert itself into the path, must not change next-hops, and must keep each member’s routes logically separate. BIRD handles all of this, but only if you configure it deliberately.

What Makes a Route Server Different

A transit router puts its own AS in the path and rewrites the next-hop to itself. A route server does neither:

  • Transparent AS_PATH — the route server’s AS does not appear, so traffic flows directly member-to-member, not through the RS.
  • Next-hop unchanged — the next-hop stays the advertising member’s IP on the peering LAN.
  • Per-client RIB — each member’s best-path selection is computed independently, so member A can be sent a different route than member B for the same prefix (because A and B have different export policies).

BIRD’s rs client option turns all three on.

Base Configuration

BIRD 2 unifies IPv4/IPv6 into one daemon. Define the router id and the RPKI-backed ROA tables first:

Terminal window
router id 192.0.2.10;
roa4 table r4;
roa6 table r6;
protocol rpki validator {
roa4 { table r4; };
roa6 { table r6; };
remote "127.0.0.1" port 3323; # local RPKI validator over RTR
retry keep 30;
}

The Filtering Functions

Every route from a member is filtered before it enters the table and before it leaves to another member. The filters do four jobs: drop bogons, enforce a sane prefix length, check RPKI, and honor member control communities.

Terminal window
function is_bogon_v4() {
return net ~ [
0.0.0.0/8+, 10.0.0.0/8+, 100.64.0.0/10+, 127.0.0.0/8+,
169.254.0.0/16+, 172.16.0.0/12+, 192.0.2.0/24+,
192.168.0.0/16+, 198.18.0.0/15+, 224.0.0.0/3+
];
}
function reasonable_len_v4() {
return net.len >= 8 && net.len <= 24;
}
function rpki_ok() {
case roa_check(r4, net, bgp_path.last_nonaggregated) {
ROA_VALID: return true;
ROA_UNKNOWN: return true; # accept unknown, reject only invalid
ROA_INVALID: return false;
}
}

Rejecting RPKI-invalid and accepting unknown is the standard posture: you drop demonstrably wrong origins without blackholing the large share of space not yet covered by ROAs.

Member Control Communities

Members expect to steer their announcements: “send this prefix to everyone except AS64502,” or “do not announce this at all.” This is implemented with well-known IXP communities, where 0:peer-as means do not export to that AS and rs-as:peer-as means export only to that AS.

Terminal window
define RS_ASN = 65500;
function honor_control(int peer_as) {
# 0:0 = announce to none
if (65500, 0, 0) ~ bgp_large_community then return false;
# 0:peer = do not announce to this peer
if (65500, 0, peer_as) ~ bgp_large_community then return false;
# if any selective-announce community is set, export only to listed peers
if (65500, 1, *) ~ bgp_large_community then {
if (65500, 1, peer_as) !~ bgp_large_community then return false;
}
return true;
}

A Member Session

Each member is an rs client. The import filter validates; the export filter applies control communities for that member’s AS:

Terminal window
protocol bgp member_64500 {
local 192.0.2.10 as RS_ASN;
neighbor 192.0.2.1 as 64500;
rs client;
ipv4 {
import filter {
if is_bogon_v4() then reject;
if !reasonable_len_v4() then reject;
if !rpki_ok() then reject;
# strip any control communities the member shouldn't set inbound
accept;
};
export filter {
if !honor_control(64500) then reject;
accept;
};
import limit 200000 action restart;
};
}

import limit is not optional. A member that fat-fingers a redistribute and leaks the full table should hit a ceiling and have its session restarted, not flood every other member.

Verification

Terminal window
# Sessions and prefix counts per member
birdc show protocols
# RPKI session to the validator established?
birdc show protocols all rpki
# What did a specific member send, and did it pass filters?
birdc show route protocol member_64500
birdc show route filtered protocol member_64500
# Confirm AS_PATH does NOT contain the RS ASN on a received route
birdc show route 198.51.100.0/24 all
# bgp_path should be just the origin member's AS, no 65500

That last check is the one that proves the route server is transparent. If 65500 shows up in the AS_PATH, rs client is missing somewhere and you have turned your route server into an accidental transit hop.

Operational Drills

TestExpected
Member announces a bogonRejected, visible in show route filtered
Member announces RPKI-invalid originRejected
Member tags 0:0 on a prefixPrefix announced to nobody
Member exceeds import limitSession restarts, others unaffected
Validator (Routinator) restartsRPKI session re-establishes, ROAs reload

Next-Hop and the Third-Party Trap

The rule that a route server never rewrites the next-hop has a sharp edge: BIRD’s default gateway behavior can still recursively resolve a next-hop that is not directly reachable on the peering LAN, and a misbehaving member can advertise a next-hop pointing at another member’s IP. That is third-party next-hop, and on a shared IXP fabric it lets one member silently redirect traffic destined for them through someone else.

Pin the next-hop check explicitly rather than trusting defaults:

Terminal window
protocol bgp member_64500 {
local 192.0.2.10 as RS_ASN;
neighbor 192.0.2.1 as 64500;
rs client;
ipv4 {
next hop keep; # preserve the member's next-hop, don't rewrite to self
import filter {
# reject any route whose next-hop is not the peer's own address
if from != bgp_next_hop then reject;
if is_bogon_v4() then reject;
if !reasonable_len_v4() then reject;
if !rpki_ok() then reject;
accept;
};
export filter {
if !honor_control(64500) then reject;
accept;
};
import limit 200000 action restart;
};
}

if from != bgp_next_hop then reject is the guard: from is the session’s peer address, bgp_next_hop is what the route claims. If a member announces a prefix with someone else’s next-hop, the route is dropped before it can poison the table. On IXPs this single line closes a real attack surface.

Scaling: Sessions, RAM, and Split Daemons

A 500-member exchange means 500 sessions and potentially several million paths once you count every member’s full prefix set times the per-client export computation. Two operational facts decide whether BIRD copes.

First, run IPv4 and IPv6 in separate BIRD instances, not one daemon with both channels. BIRD is single-threaded per process; splitting v4 and v6 gives you two cores’ worth of convergence and means an IPv6 reconfigure does not stall v4 sessions.

Terminal window
# Separate config + control socket per address family
bird -c /etc/bird/bird4.conf -s /run/bird/bird4.ctl
bird -c /etc/bird/bird6.conf -s /run/bird/bird6.ctl
birdc -s /run/bird/bird4.ctl show memory
# BGP attributes / route tables dominate; watch "Total" against box RAM

Second, enabling interpret communities off and keeping per-client filters lean matters at scale — every export filter runs per prefix per client. Watch reconfigure time, because that is when a route server hurts:

Terminal window
birdc -s /run/bird/bird4.ctl configure
# "Reconfigured" should return in well under a second
# If it takes seconds, your filters are doing too much per route — precompute with prefix sets
birdc -s /run/bird/bird4.ctl show protocols | grep -c BGP # session count sanity

A route server that takes ten seconds to reconfigure is a route server you become afraid to touch during business hours, which is how stale filters accumulate. Keep the per-route work minimal and add members in batches.

The Ticket You Will Get Most Often

“My prefix is not showing up” lands in the queue daily, and 90% of the time the answer is in show route filtered — but the other 10% is the export side, and members never think to check there. A prefix can pass every import filter, sit healthy in the route server’s table, and still not reach a given member because that member’s export filter dropped it. The common cause is a control community the announcing member set without realizing its reach.

Trace it from both ends:

Terminal window
# Did it pass import? (the announcing member's session)
birdc show route filtered protocol member_64500
# absent here = it was accepted on import, look at export next
# Is it in the master table at all?
birdc show route 198.51.100.0/24 all
# check the large communities attached — (65500, 0, X) or (65500, 1, X)?
# Would it be exported toward the complaining member's session?
birdc show route 198.51.100.0/24 export member_64502
# empty = honor_control() rejected it for AS64502 specifically

show route <prefix> export <protocol> is the command that ends the argument: it runs the prefix through that member’s export filter and shows exactly what they would receive. If it is empty while the master table has the route, the announcing member tagged a 0:peer-as or selective-announce community that excludes the complainant. Point them at their own community, not at your route server.

What I Run Beside It

A route server is not a replacement for members building proper bilateral sessions for their most important traffic — it is the easy 95%. Pair BIRD with a looking glass and per-member traffic stats, and publish your filtering policy so members know exactly why a prefix was dropped. The single biggest source of IXP support tickets is “why isn’t my prefix showing up,” and birdc show route filtered answers it in one line.