You need to restart the routing daemon. Maybe for upgrade, maybe for config reload. Normal behavior: neighbors detect restart, withdraw routes, traffic reroutes. Convergence takes seconds to minutes.
Graceful restart keeps forwarding while the control plane restarts. Neighbors know you’re restarting (not dead) and keep routes. Data plane continues forwarding. After restart, routing state resynchronizes. No traffic loss.
Graceful restart prevents traffic loss during planned maintenance.
How Graceful Restart Works
Normal Restart (Without GR)
1. Router A routing daemon restarts2. Router B detects adjacency down3. Router B withdraws all routes from A4. Traffic reconverges to alternate paths5. Router A comes back up6. Adjacency re-established7. Routes re-learned8. Traffic returns to original path
Impact: Minutes of reconvergence, possible blackholeGraceful Restart
1. Router A signals "entering graceful restart"2. Router A daemon restarts, forwarding plane continues3. Router B (helper) keeps routes, marks them "stale"4. Router A comes back up quickly5. Router A re-establishes adjacency, refreshes routes6. Router B removes "stale" flag7. No route withdrawal, no reconvergence
Impact: Near-zero traffic disruptionBGP Graceful Restart
Basic Configuration
configure
# Enable graceful restart for BGPset protocols bgp parameters graceful-restart
# Optional: Set restart time (how long helper waits)set protocols bgp parameters graceful-restart restart-time 120
# Optional: Set stalepath time (how long to keep stale routes)set protocols bgp parameters graceful-restart stalepath-time 360
commitPer-Neighbor Configuration
# Enable/disable per neighborset protocols bgp neighbor 10.0.0.2 capability graceful-restart
# Some neighbors might not support GR# Disable for specific neighbor:set protocols bgp neighbor 10.0.0.3 capability graceful-restart disableBGP GR Timers
| Timer | Purpose | Default | Range |
|---|---|---|---|
| restart-time | Time helper waits for restart | 120s | 1-4095s |
| stalepath-time | Time to keep stale routes | 360s | 1-4095s |
# Adjust timersset protocols bgp parameters graceful-restart restart-time 180set protocols bgp parameters graceful-restart stalepath-time 600Verify BGP GR
# Check neighbor capabilitiesshow bgp neighbors 10.0.0.2
# Look for:# Graceful Restart Capability: advertised and received# Remote Restart timer is 120 seconds# Address families by peer:# IPv4 Unicast(Preserved)
# Check current GR stateshow bgp neighbors 10.0.0.2 graceful-restartOSPF Graceful Restart
Basic Configuration
configure
# Enable OSPF graceful restartset protocols ospf graceful-restart
# Set grace periodset protocols ospf graceful-restart grace-period 180
commitOSPF GR Helper Mode
# Helper mode (support other routers restarting)set protocols ospf graceful-restart helper enable
# Can restrict helper modeset protocols ospf graceful-restart helper strict-lsa-checking# If LSA changes during restart, exit GR (safer)OSPF GR Timers
| Timer | Purpose | Default |
|---|---|---|
| grace-period | Time to complete restart | 180s |
# Adjust grace periodset protocols ospf graceful-restart grace-period 300Verify OSPF GR
# Check OSPF graceful restart statusshow ip ospf graceful-restart
# Check neighbor state during restartshow ip ospf neighbor
# During GR, neighbor might show special stateTesting Graceful Restart
Test BGP GR
# Terminal 1: Watch BGP neighborwatch -n 1 'vtysh -c "show bgp neighbors 10.0.0.2"'
# Terminal 2: Restart BGPsystemctl restart frr
# Observe:# - Neighbor should stay established (or show "Restart")# - Routes should not be withdrawn# - Quick re-establishmentTest OSPF GR
# Terminal 1: Watch OSPF neighborwatch -n 1 'vtysh -c "show ip ospf neighbor"'
# Terminal 2: Restart OSPFsystemctl restart frr
# Observe:# - Neighbor should not go Down# - Routes should persistVerify Forwarding Continues
# From another host, continuous ping through routerping -i 0.1 destination-through-router
# During restart:# Without GR: Packet loss during convergence# With GR: Zero or minimal packet lossLong-Lived Graceful Restart (LLGR)
For BGP, LLGR extends stale route retention:
configure
# Enable LLGRset protocols bgp parameters graceful-restart long-lived
# Set LLGR stale time (much longer than regular)set protocols bgp parameters graceful-restart long-lived stale-time 86400
commitLLGR keeps routes even longer, with lower preference (community added). Useful for edge cases where restart takes very long.
Graceful Restart vs BFD
They serve different purposes:
| Feature | Graceful Restart | BFD |
|---|---|---|
| Purpose | Survive planned restarts | Detect failures fast |
| Trigger | Control plane restart | Link/peer failure |
| Response | Keep routes | Withdraw routes fast |
| Use together | Yes | Yes |
# Use both!# BFD: Detect actual failures quickly# GR: Survive planned restarts
set protocols bgp neighbor 10.0.0.2 bfdset protocols bgp parameters graceful-restartWhen GR Doesn’t Help
Unplanned Failures
# Router crashes (not graceful)# Forwarding plane also fails# GR signal never sent
# Solution: BFD detects quickly, traffic reroutesForwarding Plane Restart
# If forwarding (kernel/hardware) restarts:# GR won't help - traffic still disrupted
# GR only helps when:# - Control plane (FRR) restarts# - Forwarding (kernel routes) continuesConfiguration Changes
# Major config change might require route refresh anyway# GR preserves old routes, but new config applies
# Be careful: GR might keep stale config brieflyTroubleshooting GR
GR Not Working
# Check if GR capability exchangedshow bgp neighbors 10.0.0.2 | grep -i graceful
# If "not received":# - Peer doesn't support GR# - Peer has GR disabled
# Check OSPF GR statusshow ip ospf graceful-restart
# If disabled, check config:show configuration commands | grep gracefulRoutes Withdrawn Anyway
# Possible causes:# 1. Restart took too long (exceeded timer)# 2. Helper router cleared routes# 3. GR not properly negotiated
# Check timersshow bgp neighbors 10.0.0.2 | grep -i timershow bgp neighbors 10.0.0.2 | grep -i restart
# Increase restart-time if neededHelper Not Preserving Routes
# Check helper configurationshow configuration commands | grep helper
# OSPF might need explicit helper modeset protocols ospf graceful-restart helper enableBest Practices
1. Enable on All Routers
# GR is peer-to-peer negotiation# Both sides should have it enabled
# Without GR on peer:# - Your restart withdraws routes from peer# - Peer's restart withdraws routes from you2. Test Before Production
# Test GR in lab/staging# Verify:# - Capabilities exchanged# - Routes preserved during restart# - Forwarding continues3. Monitor During Maintenance
# During planned restart, monitor:show bgp summaryshow ip ospf neighbor
# Watch for state changes# Verify quick re-establishment4. Tune Timers for Your Environment
# Fast restart (SSD, modern hardware)set protocols bgp parameters graceful-restart restart-time 60
# Slow restart (older hardware, large config)set protocols bgp parameters graceful-restart restart-time 300Configuration Summary
BGP Graceful Restart
configure
# Basic GRset protocols bgp parameters graceful-restart
# Timersset protocols bgp parameters graceful-restart restart-time 120set protocols bgp parameters graceful-restart stalepath-time 360
# Per-neighbor (optional)set protocols bgp neighbor 10.0.0.2 capability graceful-restart
commitOSPF Graceful Restart
configure
# Basic GRset protocols ospf graceful-restartset protocols ospf graceful-restart grace-period 180
# Helper modeset protocols ospf graceful-restart helper enable
commitThe Lesson
Graceful restart prevents traffic loss during planned maintenance.
Without GR:
- Daemon restart = all routes withdrawn
- Traffic reconverges (seconds to minutes)
- Users see disruption
With GR:
- Daemon restart signaled to neighbors
- Neighbors keep routes (marked stale)
- Forwarding continues
- Daemon comes back, routes refreshed
- Users notice nothing
Every production router should have graceful restart enabled. It’s free insurance for maintenance windows.
The 30 seconds you spend configuring GR saves minutes of disruption every time you restart the routing daemon.