NX-OS VXLAN/EVPN Fabric: Underlay and Overlay End to End

March 20, 2026 · 8 min read

The spine/leaf cabling is done and the underlay pings. Now the actual job starts: turning a routed fabric into something that carries tenant L2 and L3 across racks. VXLAN with BGP EVPN is the standard answer, but the config touches five features that all have to agree, and a single mismatch leaves you with a fabric that looks healthy and forwards nothing.

This is the order I build it in, and the checks I run at each layer before moving up.

The Two Planes

VXLAN separates the transport from the service:

  Overlay (EVPN)   tenant MAC/IP reachability via BGP
  ───────────────────────────────────────────────
  Underlay (OSPF)  loopback reachability + ECMP

The underlay only needs to do one thing well: every VTEP loopback reachable from every other, with equal-cost paths to the spines. The overlay rides on top and never appears in the underlay routing table.

Underlay: OSPF and PIM

Enable features first. On NX-OS nothing works until the feature is on.

feature ospf
feature pim
feature interface-vlan
feature vn-segment-vlan-based
feature nv overlay
nv overlay evpn
feature bgp

Point-to-point links to the spines, plus loopbacks for the BGP router-id and the VTEP source:

interface loopback0
  description ROUTER-ID
  ip address 10.255.0.11/32
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode

interface loopback1
  description VTEP-NVE-SOURCE
  ip address 10.255.1.11/32
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode

interface Ethernet1/1
  description TO-SPINE-1
  no switchport
  ip address 10.1.1.1/31
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode
  mtu 9216

Two details that cause silent failures:

MTU. VXLAN adds 50 bytes. If the underlay is 1500, every full-size tenant frame drops. Set 9216 fabric-wide.
Separate loopbacks. Keep router-id (lo0) and NVE source (lo1) distinct. With vPC, both leaves share an anycast lo1 address, and you do not want that bleeding into your BGP router-id.

Verify before going further:

show ip ospf neighbors
show ip route 10.255.1.12   # remote VTEP loopback, must be /32 via spines
show ip pim rp mapping       # if using multicast replication

If a remote VTEP loopback is not in the table with ECMP next-hops, stop. The overlay cannot work.

Overlay: BGP EVPN

Spines are route reflectors; leaves are clients. The address family that matters is l2vpn evpn.

router bgp 65000
  router-id 10.255.0.11
  address-family l2vpn evpn
    retain route-target all
  neighbor 10.255.0.1
    remote-as 65000
    update-source loopback0
    address-family l2vpn evpn
      send-community extended
  neighbor 10.255.0.2
    remote-as 65000
    update-source loopback0
    address-family l2vpn evpn
      send-community extended

send-community extended is not optional — EVPN route-targets travel as extended communities. Drop it and routes propagate but import nowhere.

Confirm the sessions came up in the right AFI:

show bgp l2vpn evpn summary
# State/PfxRcd should show a number, not Idle/Active

Mapping VLANs to VXLAN

Each tenant VLAN gets a VNI. L2 VNIs carry MAC; an L3 VNI per VRF carries inter-subnet routing.

vlan 10
  vn-segment 10010
vlan 20
  vn-segment 10020
vlan 999
  vn-segment 50999     # L3 VNI for tenant VRF

vrf context TENANT-A
  vni 50999
  rd auto
  address-family ipv4 unicast
    route-target both auto evpn

rd auto and route-target both auto evpn let NX-OS derive RD/RT from the BGP AS and VNI. It is consistent and saves a class of typo bugs — use it unless you have a specific reason to pin values.

The NVE Interface

This is the encapsulation engine. Ingress replication via BGP avoids needing multicast in the underlay, which is the simpler choice for most fabrics:

interface nve1
  no shutdown
  host-reachability protocol bgp
  source-interface loopback1
  member vni 10010
    ingress-replication protocol bgp
  member vni 10020
    ingress-replication protocol bgp
  member vni 50999
    associate-vrf

associate-vrf on the L3 VNI is what makes inter-subnet routing work across the fabric. Forget it and L2 stretches fine but tenants cannot route between subnets.

Tie the VLANs into EVPN:

evpn
  vni 10010 l2
    rd auto
    route-target import auto
    route-target export auto
  vni 10020 l2
    rd auto
    route-target import auto
    route-target export auto

Distributed Anycast Gateway

Every leaf is the default gateway for every subnet, using the same MAC everywhere. A VM keeps its gateway after a live migration to another rack — no ARP relearning, no traffic tromboning to one switch.

fabric forwarding anycast-gateway-mac 0000.2222.3333

interface Vlan10
  no shutdown
  vrf member TENANT-A
  ip address 10.10.10.1/24
  fabric forwarding mode anycast-gateway

interface Vlan20
  no shutdown
  vrf member TENANT-A
  ip address 10.10.20.1/24
  fabric forwarding mode anycast-gateway

The same Vlan10 SVI with the same IP is configured on every leaf. That is intentional. The anycast-gateway-mac must be identical fabric-wide.

Proving It Forwards

Control plane up is not the same as data plane working. Walk it end to end.

# Are remote VTEPs discovered as NVE peers?
show nve peers

# Is the local VTEP up with the right source?
show nve interface nve1 detail

# Type-2 routes (MAC/IP) learned from other leaves?
show bgp l2vpn evpn

# A specific host's MAC reachable, and via which VTEP?
show l2route evpn mac all
show l2route evpn mac-ip all

# VXLAN-aware MAC table — note the remote VTEP in the "next-hop"
show mac address-table dynamic vlan 10

The single most useful check is show nve peers. If two leaves host the same VNI but do not appear as peers to each other, ingress replication has nothing to replicate to, and east-west traffic between them blackholes while local traffic looks fine.

For L3, confirm the tenant VRF sees remote subnets via the L3 VNI:

show ip route vrf TENANT-A
# remote subnets show next-hop of the remote VTEP, %TENANT-A, via the L3 VNI
show forwarding vrf TENANT-A route

Failure Drills Worth Running Pre-Production

Test	Expected	Common bug it catches
Shut one spine uplink	ECMP reroutes, no loss	Missing `ip router ospf` on a link
Reload a leaf	Hosts reconverge in seconds	NVE source on lo0 instead of lo1
Move a host between leaves	MAC mobility, no duplicate	Missing anycast-gateway-mac match
Ping across subnets	Routed via L3 VNI	`associate-vrf` forgotten

Run these with traffic flowing, not on a quiet fabric. The failures that matter only show up under load.

The Bug That Looks Like a Hardware Fault

The most time-consuming VXLAN failure I have chased was a fabric where two leaves hosted the same VNI, OSPF was full, BGP EVPN was up, Type-2 routes were present on both sides — and a host on Leaf 11 could not reach a host on Leaf 12. Local traffic worked. Everything looked healthy.

show nve peers is where it surfaced:

show nve peers
# Interface Peer-IP        State LearnType Uptime   Router-Mac
# nve1      10.255.1.12    Up    CP        01:14:22  n/a

The peer was there, control-plane learned. But the MAC table told a different story:

show mac address-table dynamic vlan 10
# the remote MAC pointed at peer-ip 10.255.1.99 — an address no leaf owned

The cause was a duplicated NVE source loopback. Leaf 12 had been templated from Leaf 99 and its interface loopback1 still carried 10.255.1.99/32. OSPF advertised it, BGP next-hops resolved to it, and frames were VXLAN-encapsulated toward a destination that existed nowhere. The fix was one line, but the symptom — partial reachability with a clean control plane — screams hardware until you check the source addresses.

The lesson: when east-west blackholes but the control plane is green, audit show nve interface nve1 detail for the source IP on every leaf and confirm each one is unique. A duplicated VTEP source is invisible to every protocol-level check.

vPC and the Anycast VTEP

Two leaves running vPC must present a single VTEP to the fabric, or MAC moves between the vPC peers look like host flaps to the rest of the EVPN domain. The mechanism is a shared secondary IP on the NVE source loopback — the anycast VTEP address — advertised by both peers.

interface loopback1
  description VTEP-NVE-SOURCE
  ip address 10.255.1.11/32                 # unique primary
  ip address 10.255.1.100/32 secondary      # shared anycast VTEP, identical on the vPC peer
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode

interface nve1
  source-interface loopback1
  source-interface hold-down-time 180

Remote leaves see one VTEP (10.255.1.100) regardless of which vPC peer learned the MAC, so a host hashed to either peer is just “behind the anycast VTEP.” The source-interface hold-down-time keeps the NVE down after reload until the underlay and vPC are settled — bring the VTEP up too early and you advertise reachability before forwarding is ready, blackholing traffic for the few seconds it takes BGP to converge.

Verify both peers advertise the same secondary and that orphan hosts (single-homed to one peer) are reachable:

show nve interface nve1 detail | include "Source\|Anycast"
show vpc                                    # both peers vPC-consistent
show bgp l2vpn evpn 10.255.1.100            # both leaves originate the anycast VTEP route

What I Skip and Why

I do not start with multicast underlay replication. Ingress replication via BGP is one less protocol to operate, and for fabrics up to a few dozen leaves the replication cost is negligible. Multicast earns its place only at scale, and by then you have the EVPN fundamentals solid.

Get the underlay boringly reliable, bring up EVPN, then add VNIs one tenant at a time with show nve peers open in another window. A fabric built this way fails predictably, which is the only kind of failure you want.