Fabric is up. vPC formed. Port-channels bundled. Then a link fails, and traffic blackholes. Or a leaf reboots, and half the servers lose connectivity. Or ECMP doesn’t balance as expected.
Spine/leaf sounds simple — until failure scenarios reveal configuration gaps. The time to discover these is before production, not during an outage.
vPC as an Operational Object
What vPC Actually Is
vPC (Virtual Port Channel) makes two Nexus switches appear as one to downstream devices:
[Spine 1] [Spine 2] │ │ ┌─────┴──────────────┴─────┐ │ │ [Leaf 1]──vPC Peer Link──[Leaf 2] │ │ └──────────┬───────────────┘ │ [Server] (port-channel)The server sees one port-channel to one “switch.” In reality, half the links go to Leaf 1, half to Leaf 2.
Critical vPC Components
! vPC Domain configurationvpc domain 1 peer-switch peer-keepalive destination 10.0.0.2 source 10.0.0.1 peer-gateway layer3 peer-router auto-recovery delay restore 120 delay restore interface-vlan 60Key elements:
| Component | Purpose | Failure Impact |
|---|---|---|
| Peer-link | Sync MAC tables, forward orphan traffic | vPC suspends on secondary |
| Peer-keepalive | Detect peer failure | Split-brain if both fail |
| Peer-gateway | Allow peer to route for other’s HSRP MAC | Traffic blackhole |
| Auto-recovery | Re-enable vPC after split-brain | Manual intervention needed |
vPC Health Checks
# Overall vPC statusshow vpc
# Expected output:# vPC domain id : 1# Peer status : peer adjacency formed ok# vPC keep-alive status : peer is alive# Configuration consistency status : success# Per-vlan consistency status : success# Type-2 consistency status : success# vPC role : primary
# Peer-link statusshow vpc peer-link
# vPC consistency checkshow vpc consistency-parameters globalshow vpc consistency-parameters interface port-channel 10Consistency Check Failures
vPC requires certain configs to match on both peers:
# Check what's inconsistentshow vpc consistency-parameters global
# Type 1 (must match or vPC won't form):# - STP mode, VLAN state, port-type# - vPC domain settings
# Type 2 (warning, vPC still works):# - VLAN configurations# - IGMP snooping settingsFix pattern: Compare configs side by side:
# On both switchesshow run vpcshow run interface port-channel X
# Look for differences in:# - allowed VLANs# - switchport mode# - STP settingsPort-Channel Hygiene
LACP Configuration
Always use LACP, never static:
! Server-facing port-channel (vPC)interface port-channel 10 description Server-Cluster-01 switchport mode trunk switchport trunk allowed vlan 100-110 vpc 10
interface Ethernet1/1 description Server-Cluster-01-Link1 switchport mode trunk switchport trunk allowed vlan 100-110 channel-group 10 mode active
! LACP must be active on both ends! "mode active" = initiate LACP! "mode passive" = respond only (avoid)Allowed VLANs
Only allow VLANs that should traverse the link:
! WRONG: Allow all VLANsinterface port-channel 10 switchport trunk allowed vlan all
! RIGHT: Explicit VLAN listinterface port-channel 10 switchport trunk allowed vlan 100-110,200Why it matters:
- Broadcast domains stay contained
- STP topology is cleaner
- Troubleshooting is easier
Native VLAN
Match native VLAN on both ends to avoid untagged traffic issues:
! Set explicit native VLANinterface port-channel 10 switchport trunk native vlan 999
! Verifyshow interface port-channel 10 trunkMTU Configuration
Jumbo frames require consistent MTU end-to-end:
! System MTU (affects all L2 interfaces)system jumbomtu 9216
! Per-interface MTU (L3)interface Ethernet1/1 mtu 9216
! Verifyshow interface port-channel 10 | include MTU
! Test end-to-endping 10.0.1.100 df-bit packet-size 9000Port-Channel Verification
# Status overviewshow port-channel summary
# Expected output:# 10 Po10(SU) Eth LACP Eth1/1(P) Eth1/2(P)# SU = Layer2, Up# P = member is up and bundled
# Detailed statusshow port-channel database interface port-channel 10
# LACP countersshow lacp counters interface port-channel 10
# Member interface statusshow lacp neighbor interface port-channel 10Underlay Routing Sanity
OSPF Underlay Checks
# Verify all adjacencies are FULLshow ip ospf neighbors
# Expected: All neighbors in FULL state# FULL/DR, FULL/BDR, FULL/DROTHER
# Check for stuck adjacenciesshow ip ospf neighbors | include INIT|2WAY|EXSTART
# Verify routes are learnedshow ip route ospf
# Check OSPF database consistencyshow ip ospf database summaryBGP Underlay Checks
For eBGP spine/leaf:
# All neighbors establishedshow bgp ipv4 unicast summary
# Expected: State = Established, or showing prefix count# Neighbor V AS MsgRcvd MsgSent State/PfxRcd# 10.0.1.1 4 65001 1234 1234 10
# Check for routes from all spinesshow bgp ipv4 unicast
# Verify ECMPshow ip route 10.0.2.0/24
# Should show multiple next-hops if ECMP working# via 10.0.1.1, Eth1/49, via 10.0.1.2, Eth1/50ECMP Behavior
# Check maximum ECMP pathsshow running-config | include maximum-paths
! Configure if neededrouter bgp 65001 address-family ipv4 unicast maximum-paths 4 maximum-paths ibgp 4
# Verify load balancingshow ip load-sharing
# Test ECMP path selectionshow routing hash 10.0.1.100 10.0.2.100 ipTimer Alignment
Fast convergence requires aggressive timers:
! BGP timersrouter bgp 65001 neighbor 10.0.1.1 timers 3 9 neighbor 10.0.1.1 bfd
! OSPF timersinterface Ethernet1/49 ip ospf hello-interval 1 ip ospf dead-interval 3 ip ospf bfd
! BFD configurationfeature bfdbfd interval 250 min_rx 250 multiplier 3Failure Drills
What to Test Before Go-Live
| Failure Scenario | Expected Behavior | Verify |
|---|---|---|
| Single uplink fails | Traffic shifts to other uplinks | show port-channel summary |
| vPC member fails | vPC still operational | show vpc brief |
| Peer-link fails | Secondary suspends vPCs | show vpc |
| Leaf fails | Servers failover to peer | Ping from server |
| Spine fails | ECMP removes path | show ip route |
Drill 1: Single Uplink Failure
# On leaf, shut one uplinkinterface Ethernet1/49 shutdown
# Verify:# 1. Port-channel stays up (degraded)show port-channel summary
# 2. Routing adjustsshow ip route
# 3. Traffic still flowsping <destination>
# Restoreno shutdownDrill 2: vPC Member Failure
# Shut one member of server port-channelinterface Ethernet1/1 shutdown
# Verify:# 1. vPC stays upshow vpc brief
# 2. Server still has connectivity (via peer)# Test from server
# 3. Traffic flows through peer-link if neededshow interface port-channel <peer-link> counters
# Restoreno shutdownDrill 3: Peer-Link Failure
Caution: This is disruptive. Schedule maintenance window.
# Simulate peer-link failureinterface port-channel 1 # peer-link shutdown
# Expected on secondary:# - vPCs suspend# - Peer-keepalive maintains split-brain prevention
show vpc# Role should show: secondary, operational secondary
# Restore immediatelyno shutdownDrill 4: Leaf Failure
# Simulate complete leaf failure (reload)reload
# On peer leaf, verify:show vpc orphan-portsshow vpc
# Servers should failover to surviving leaf# vPC ports on surviving leaf stay upDrill 5: Spine Failure
# On spine, shut all downlinksinterface Ethernet1/1-48 shutdown
# On leaves, verify:# 1. OSPF/BGP removes routes via failed spineshow ip route
# 2. ECMP still works via remaining spine(s)show ip route <destination>
# 3. Traffic flowsping <destination> source <loopback>Pre-Production Checklist
vPC Checklist
[ ] Peer-link is port-channel (not single link)[ ] Peer-keepalive uses dedicated link/VRF[ ] Consistency checks pass (show vpc consistency-parameters global)[ ] Auto-recovery is configured[ ] Delay restore timers appropriate for environment[ ] peer-gateway enabled[ ] layer3 peer-router enabled (if routing on vPC VLANs)Port-Channel Checklist
[ ] LACP mode active (not passive or on)[ ] Allowed VLANs explicitly configured[ ] Native VLAN matches both ends[ ] MTU consistent end-to-end[ ] Spanning-tree port type configured (edge for servers)[ ] BPDU guard enabled on edge portsUnderlay Routing Checklist
[ ] All adjacencies FULL/Established[ ] Routes learned from all spines[ ] ECMP working (multiple next-hops)[ ] BFD enabled for fast failure detection[ ] Timers aligned (hello/dead intervals)[ ] Loopback addresses reachable from all leavesFailure Testing Checklist
[ ] Single uplink failure tested[ ] vPC member failure tested[ ] Peer-link failure tested (with maintenance window)[ ] Leaf failure simulated[ ] Spine failure simulated[ ] Convergence time documented[ ] Alerts verified during failuresMonitoring Commands Summary
# vPC statusshow vpcshow vpc briefshow vpc peer-linkshow vpc consistency-parameters globalshow vpc orphan-ports
# Port-channel statusshow port-channel summaryshow port-channel databaseshow lacp countersshow lacp neighbor
# Routing statusshow ip ospf neighborsshow bgp ipv4 unicast summaryshow ip routeshow ip route summary
# Interface statusshow interface statusshow interface trunkshow interface counters errors
# Spanning treeshow spanning-tree summaryshow spanning-tree vlan <id>The Lesson
Spine/leaf operations require:
- vPC hygiene — consistency checks, proper peer-link/keepalive, recovery settings
- Port-channel discipline — LACP active, explicit VLANs, matching MTU
- Underlay verification — all adjacencies up, ECMP working, fast timers
- Failure drills — test every failure scenario before go-live
The fabric that “just works” in the lab will surprise you in production when:
- A link fails and traffic asymmetry begins
- A leaf reboots and half the vPCs suspend
- ECMP doesn’t balance because of misconfigured maximum-paths
Test failures before they test you. Document expected behavior. Verify convergence times. A spine/leaf fabric is only as reliable as your preparation.