NX-OS Spine/Leaf Operations: vPC, Port-Channels, and Pre-Production Checks

Fabric is up. vPC formed. Port-channels bundled. Then a link fails, and traffic blackholes. Or a leaf reboots, and half the servers lose connectivity. Or ECMP doesn’t balance as expected.

Spine/leaf sounds simple — until failure scenarios reveal configuration gaps. The time to discover these is before production, not during an outage.

vPC as an Operational Object

What vPC Actually Is

vPC (Virtual Port Channel) makes two Nexus switches appear as one to downstream devices:

[Spine 1] [Spine 2]
│ │
┌─────┴──────────────┴─────┐
│ │
[Leaf 1]──vPC Peer Link──[Leaf 2]
│ │
└──────────┬───────────────┘
[Server]
(port-channel)

The server sees one port-channel to one “switch.” In reality, half the links go to Leaf 1, half to Leaf 2.

Critical vPC Components

! vPC Domain configuration
vpc domain 1
peer-switch
peer-keepalive destination 10.0.0.2 source 10.0.0.1
peer-gateway
layer3 peer-router
auto-recovery
delay restore 120
delay restore interface-vlan 60

Key elements:

ComponentPurposeFailure Impact
Peer-linkSync MAC tables, forward orphan trafficvPC suspends on secondary
Peer-keepaliveDetect peer failureSplit-brain if both fail
Peer-gatewayAllow peer to route for other’s HSRP MACTraffic blackhole
Auto-recoveryRe-enable vPC after split-brainManual intervention needed

vPC Health Checks

Terminal window
# Overall vPC status
show vpc
# Expected output:
# vPC domain id : 1
# Peer status : peer adjacency formed ok
# vPC keep-alive status : peer is alive
# Configuration consistency status : success
# Per-vlan consistency status : success
# Type-2 consistency status : success
# vPC role : primary
# Peer-link status
show vpc peer-link
# vPC consistency check
show vpc consistency-parameters global
show vpc consistency-parameters interface port-channel 10

Consistency Check Failures

vPC requires certain configs to match on both peers:

Terminal window
# Check what's inconsistent
show vpc consistency-parameters global
# Type 1 (must match or vPC won't form):
# - STP mode, VLAN state, port-type
# - vPC domain settings
# Type 2 (warning, vPC still works):
# - VLAN configurations
# - IGMP snooping settings

Fix pattern: Compare configs side by side:

Terminal window
# On both switches
show run vpc
show run interface port-channel X
# Look for differences in:
# - allowed VLANs
# - switchport mode
# - STP settings

Port-Channel Hygiene

LACP Configuration

Always use LACP, never static:

Terminal window
! Server-facing port-channel (vPC)
interface port-channel 10
description Server-Cluster-01
switchport mode trunk
switchport trunk allowed vlan 100-110
vpc 10
interface Ethernet1/1
description Server-Cluster-01-Link1
switchport mode trunk
switchport trunk allowed vlan 100-110
channel-group 10 mode active
! LACP must be active on both ends
! "mode active" = initiate LACP
! "mode passive" = respond only (avoid)

Allowed VLANs

Only allow VLANs that should traverse the link:

Terminal window
! WRONG: Allow all VLANs
interface port-channel 10
switchport trunk allowed vlan all
! RIGHT: Explicit VLAN list
interface port-channel 10
switchport trunk allowed vlan 100-110,200

Why it matters:

  • Broadcast domains stay contained
  • STP topology is cleaner
  • Troubleshooting is easier

Native VLAN

Match native VLAN on both ends to avoid untagged traffic issues:

Terminal window
! Set explicit native VLAN
interface port-channel 10
switchport trunk native vlan 999
! Verify
show interface port-channel 10 trunk

MTU Configuration

Jumbo frames require consistent MTU end-to-end:

Terminal window
! System MTU (affects all L2 interfaces)
system jumbomtu 9216
! Per-interface MTU (L3)
interface Ethernet1/1
mtu 9216
! Verify
show interface port-channel 10 | include MTU
! Test end-to-end
ping 10.0.1.100 df-bit packet-size 9000

Port-Channel Verification

Terminal window
# Status overview
show port-channel summary
# Expected output:
# 10 Po10(SU) Eth LACP Eth1/1(P) Eth1/2(P)
# SU = Layer2, Up
# P = member is up and bundled
# Detailed status
show port-channel database interface port-channel 10
# LACP counters
show lacp counters interface port-channel 10
# Member interface status
show lacp neighbor interface port-channel 10

Underlay Routing Sanity

OSPF Underlay Checks

Terminal window
# Verify all adjacencies are FULL
show ip ospf neighbors
# Expected: All neighbors in FULL state
# FULL/DR, FULL/BDR, FULL/DROTHER
# Check for stuck adjacencies
show ip ospf neighbors | include INIT|2WAY|EXSTART
# Verify routes are learned
show ip route ospf
# Check OSPF database consistency
show ip ospf database summary

BGP Underlay Checks

For eBGP spine/leaf:

Terminal window
# All neighbors established
show bgp ipv4 unicast summary
# Expected: State = Established, or showing prefix count
# Neighbor V AS MsgRcvd MsgSent State/PfxRcd
# 10.0.1.1 4 65001 1234 1234 10
# Check for routes from all spines
show bgp ipv4 unicast
# Verify ECMP
show ip route 10.0.2.0/24
# Should show multiple next-hops if ECMP working
# via 10.0.1.1, Eth1/49, via 10.0.1.2, Eth1/50

ECMP Behavior

Terminal window
# Check maximum ECMP paths
show running-config | include maximum-paths
! Configure if needed
router bgp 65001
address-family ipv4 unicast
maximum-paths 4
maximum-paths ibgp 4
# Verify load balancing
show ip load-sharing
# Test ECMP path selection
show routing hash 10.0.1.100 10.0.2.100 ip

Timer Alignment

Fast convergence requires aggressive timers:

Terminal window
! BGP timers
router bgp 65001
neighbor 10.0.1.1 timers 3 9
neighbor 10.0.1.1 bfd
! OSPF timers
interface Ethernet1/49
ip ospf hello-interval 1
ip ospf dead-interval 3
ip ospf bfd
! BFD configuration
feature bfd
bfd interval 250 min_rx 250 multiplier 3

Failure Drills

What to Test Before Go-Live

Failure ScenarioExpected BehaviorVerify
Single uplink failsTraffic shifts to other uplinksshow port-channel summary
vPC member failsvPC still operationalshow vpc brief
Peer-link failsSecondary suspends vPCsshow vpc
Leaf failsServers failover to peerPing from server
Spine failsECMP removes pathshow ip route
Terminal window
# On leaf, shut one uplink
interface Ethernet1/49
shutdown
# Verify:
# 1. Port-channel stays up (degraded)
show port-channel summary
# 2. Routing adjusts
show ip route
# 3. Traffic still flows
ping <destination>
# Restore
no shutdown

Drill 2: vPC Member Failure

Terminal window
# Shut one member of server port-channel
interface Ethernet1/1
shutdown
# Verify:
# 1. vPC stays up
show vpc brief
# 2. Server still has connectivity (via peer)
# Test from server
# 3. Traffic flows through peer-link if needed
show interface port-channel <peer-link> counters
# Restore
no shutdown

Caution: This is disruptive. Schedule maintenance window.

Terminal window
# Simulate peer-link failure
interface port-channel 1 # peer-link
shutdown
# Expected on secondary:
# - vPCs suspend
# - Peer-keepalive maintains split-brain prevention
show vpc
# Role should show: secondary, operational secondary
# Restore immediately
no shutdown

Drill 4: Leaf Failure

Terminal window
# Simulate complete leaf failure (reload)
reload
# On peer leaf, verify:
show vpc orphan-ports
show vpc
# Servers should failover to surviving leaf
# vPC ports on surviving leaf stay up

Drill 5: Spine Failure

Terminal window
# On spine, shut all downlinks
interface Ethernet1/1-48
shutdown
# On leaves, verify:
# 1. OSPF/BGP removes routes via failed spine
show ip route
# 2. ECMP still works via remaining spine(s)
show ip route <destination>
# 3. Traffic flows
ping <destination> source <loopback>

Pre-Production Checklist

vPC Checklist

[ ] Peer-link is port-channel (not single link)
[ ] Peer-keepalive uses dedicated link/VRF
[ ] Consistency checks pass (show vpc consistency-parameters global)
[ ] Auto-recovery is configured
[ ] Delay restore timers appropriate for environment
[ ] peer-gateway enabled
[ ] layer3 peer-router enabled (if routing on vPC VLANs)

Port-Channel Checklist

[ ] LACP mode active (not passive or on)
[ ] Allowed VLANs explicitly configured
[ ] Native VLAN matches both ends
[ ] MTU consistent end-to-end
[ ] Spanning-tree port type configured (edge for servers)
[ ] BPDU guard enabled on edge ports

Underlay Routing Checklist

[ ] All adjacencies FULL/Established
[ ] Routes learned from all spines
[ ] ECMP working (multiple next-hops)
[ ] BFD enabled for fast failure detection
[ ] Timers aligned (hello/dead intervals)
[ ] Loopback addresses reachable from all leaves

Failure Testing Checklist

[ ] Single uplink failure tested
[ ] vPC member failure tested
[ ] Peer-link failure tested (with maintenance window)
[ ] Leaf failure simulated
[ ] Spine failure simulated
[ ] Convergence time documented
[ ] Alerts verified during failures

Monitoring Commands Summary

Terminal window
# vPC status
show vpc
show vpc brief
show vpc peer-link
show vpc consistency-parameters global
show vpc orphan-ports
# Port-channel status
show port-channel summary
show port-channel database
show lacp counters
show lacp neighbor
# Routing status
show ip ospf neighbors
show bgp ipv4 unicast summary
show ip route
show ip route summary
# Interface status
show interface status
show interface trunk
show interface counters errors
# Spanning tree
show spanning-tree summary
show spanning-tree vlan <id>

The Lesson

Spine/leaf operations require:

  1. vPC hygiene — consistency checks, proper peer-link/keepalive, recovery settings
  2. Port-channel discipline — LACP active, explicit VLANs, matching MTU
  3. Underlay verification — all adjacencies up, ECMP working, fast timers
  4. Failure drills — test every failure scenario before go-live

The fabric that “just works” in the lab will surprise you in production when:

  • A link fails and traffic asymmetry begins
  • A leaf reboots and half the vPCs suspend
  • ECMP doesn’t balance because of misconfigured maximum-paths

Test failures before they test you. Document expected behavior. Verify convergence times. A spine/leaf fabric is only as reliable as your preparation.