EVPN Multihoming with ESI-LAG: Active-Active Without MLAG

MLAG works, but it ties two switches together with a proprietary peer-link and a control-plane handshake that only the same vendor speaks. EVPN multihoming does the same job — a server dual-homed to two leaves, both links forwarding — using nothing but BGP EVPN routes. No peer-link, no vendor lock, and it scales past two switches.

The trade is conceptual: instead of “these two boxes pretend to be one,” you describe a shared Ethernet Segment and let the control plane sort out who forwards what. Once it clicks, it is cleaner than MLAG.

The Ethernet Segment

An Ethernet Segment (ES) is the set of links from one host (or downstream switch) to multiple PEs. It is identified by an ESI — a 10-byte value configured identically on both leaves for that bundle.

[Leaf 1] [Leaf 2]
\ /
\ ae0 (ESI X) /
\ /
\ /
[ Server ]
LACP bundle

The server runs an ordinary LACP bond. It has no idea two different switches terminate it. That illusion requires three things to match across the leaves:

  • the ESI on the bundle interface
  • the LACP system-id (so the host bundles all members into one LAG)
  • the VLAN-to-VNI mapping

Configuring ESI-LAG on Junos

The aggregated interface carries the ESI and a shared LACP system-id:

Terminal window
interfaces {
ae0 {
esi {
00:11:22:33:44:55:66:77:88:01;
all-active;
}
aggregated-ether-options {
lacp {
active;
system-id 00:00:00:00:00:01;
}
}
unit 0 {
family ethernet-switching {
interface-mode trunk;
vlan { members [ v10 v20 ]; }
}
}
}
}

all-active is the mode that makes both links forward. The alternative, single-active, keeps one link hot and the other standby — useful for devices that cannot bond, but it throws away half your bandwidth.

The identical config goes on Leaf 2: same ESI, same system-id. Different 00:00:00:00:00:01 per ES, but identical between the two leaves sharing that ES.

The Routes That Make It Work

EVPN multihoming runs on two route types most people never look at until something breaks:

RouteNameJob
Type 4Ethernet SegmentDiscover which PEs share an ESI; elect the DF
Type 1Ethernet A-DAdvertise reachability and enable aliasing/fast withdraw

Designated Forwarder (DF) election decides which leaf forwards broadcast, unknown-unicast, and multicast (BUM) toward the host. Without it, both leaves would flood the same broadcast to the server and it would see duplicates. Type-4 routes let the leaves discover each other and agree on the DF per VLAN.

Aliasing is the Type-1 magic. A remote leaf learns a host’s MAC from only one of the two local leaves (whichever learned it first), but the per-ESI Type-1 A-D routes tell it the ESI is reachable through both. So it load-balances across both, even though it only ever saw the MAC advertised once.

Split-horizon stops a BUM frame the DF sends to the host from looping back through the peer leaf and onto the segment twice. EVPN uses the ESI label for this filtering — it is automatic, but it is also the thing that breaks when an ESI is mistyped and the two leaves think they are on different segments.

Verification

Confirm both leaves see the segment and agree on a DF:

Terminal window
# Ethernet Segment state, DF election result
show evpn ethernet-segment esi 00:11:22:33:44:55:66:77:88:01 extensive
# Look for:
# Designated forwarder: <one leaf's IP> per VLAN
# Number of remote PEs: 1 (the other leaf)

Check the LAG actually bundled on the host side:

Terminal window
show lacp interfaces ae0
# Both members: Collecting + Distributing = Yes

Confirm aliasing — a remote MAC reachable via two VTEPs:

Terminal window
show evpn database mac-address aa:bb:cc:dd:ee:ff extensive
# Active source should list both leaf VTEP addresses for the ESI

The check that catches the most mistakes:

Terminal window
show evpn ethernet-segment
# Both leaves must list the SAME ESI as "local".
# If only one does, the ESI is mistyped on the other.

Failure Behavior

This is where EVPN multihoming earns its keep. When a leaf-to-host link fails:

  1. The leaf withdraws its Type-1 A-D per-ESI route.
  2. Remote VTEPs stop hashing flows to that leaf for that ESI — this is fast withdraw, a single route update instead of per-MAC churn.
  3. The surviving leaf keeps forwarding. The host’s LACP drops one member and keeps the other.

Convergence is one BGP update per segment, not one per MAC. On a leaf carrying thousands of hosts that difference is the gap between sub-second and tens of seconds.

Drills to run before trusting it:

TestExpected
Pull one host linkTraffic shifts to surviving leaf, DF unchanged if it was the survivor
Reload the DF leafDF re-elects to the peer, BUM keeps flowing
Mistype an ESI deliberatelyHost sees duplicate broadcasts — proves split-horizon depends on matching ESI

That last test is worth doing once in a lab so you recognize the symptom. Duplicate ARP replies and a host that intermittently sees its own broadcasts is the signature of an ESI mismatch, and it is maddening to diagnose if you have never seen it.

DF Election and the Modulo Trap

The default DF election algorithm is modulo: every PE sharing the ESI sorts the candidate addresses, and for each VLAN the DF is chosen by VLAN-ID mod number-of-PEs. With two leaves and an even/odd VLAN spread this balances BUM duty across both. With two leaves and only one active VLAN, one leaf carries all the BUM forwarding and the other carries none — which is fine until the DF leaf reloads and every BUM flow re-elects at once.

The election timer matters more than the algorithm. When a leaf boots, it must not declare itself DF before it has heard the Type-4 routes from its peers, or both leaves briefly think they are DF and the host sees duplicate broadcasts during convergence.

Terminal window
interfaces {
ae0 {
esi {
00:11:22:33:44:55:66:77:88:01;
all-active;
df-election-type {
preference {
value 32767; # higher wins; pin the DF deterministically
}
}
}
}
}
Terminal window
# Confirm who won and how long the election waited
show evpn ethernet-segment esi 00:11:22:33:44:55:66:77:88:01 extensive
# DF Election Algorithm: MOD based (or "preference based")
# Designated forwarder: 10.255.1.11 (per VLAN/VNI)
# DF election next time: --

On a fabric where BUM is heavy — lots of broadcast-chatty hosts — preference-based election lets you place the DF on the leaf with the better uplink utilization instead of leaving it to a VLAN-ID hash.

A Failure Drill With Real Output

The drill that proves the whole design is pulling the host link from the current DF leaf and watching aliasing hold the flow up. Before the pull, a remote VTEP shows the ESI reachable through both leaves:

Terminal window
show evpn database mac-address aa:bb:cc:dd:ee:ff extensive
# Active source: 00:11:22:33:44:55:66:77:88:01 (ESI)
# Remote PE: 10.255.1.11 (Leaf 1)
# Remote PE: 10.255.1.12 (Leaf 2)

Pull the link on Leaf 1. Leaf 1 withdraws its per-ESI Type-1 A-D route. Within one BGP update the remote VTEP recomputes:

Terminal window
show evpn database mac-address aa:bb:cc:dd:ee:ff extensive
# Active source: 00:11:22:33:44:55:66:77:88:01 (ESI)
# Remote PE: 10.255.1.12 (Leaf 2) <- single path now

No per-MAC withdrawal, no flood of updates — one Type-1 route gone, and every flow that was hashing to Leaf 1 moves to Leaf 2. Time it. If convergence stretches past a second on a leaf carrying thousands of MACs, you are probably relying on per-MAC Type-2 withdrawal because the per-ESI A-D route is missing, which points back at an ESI mismatch or all-active not actually being set on one side.

Symptom after a link pullLikely cause
Sub-second cutover, one BGP updateHealthy — Type-1 fast withdraw working
Tens of seconds, MAC-by-MACPer-ESI A-D route absent; check ESI match
Traffic never shiftsRemote VTEP only ever saw one PE for the ESI; aliasing broken
Duplicate frames during cutoverDF re-election racing; tighten election timing

When ESI-LAG Is the Wrong Tool

If both switches are the same vendor, sitting in the same rack, and you will never scale past a pair, traditional MLAG is less to configure and operationally familiar. EVPN multihoming pays off when you want vendor flexibility, more than two attachment points, or you are already running EVPN-VXLAN for the overlay and would rather not bolt a separate MLAG control plane beside it.

The mental model is the whole battle. Stop thinking “two switches acting as one” and start thinking “a segment the fabric knows is reachable through several doors.” The config is short; the routes do the work.