Graceful Restart: Maintaining Forwarding During Protocol Restarts

You need to restart the routing daemon. Maybe for upgrade, maybe for config reload. Normal behavior: neighbors detect restart, withdraw routes, traffic reroutes. Convergence takes seconds to minutes.

Graceful restart keeps forwarding while the control plane restarts. Neighbors know you’re restarting (not dead) and keep routes. Data plane continues forwarding. After restart, routing state resynchronizes. No traffic loss.

Graceful restart prevents traffic loss during planned maintenance.

How Graceful Restart Works

Normal Restart (Without GR)

1. Router A routing daemon restarts
2. Router B detects adjacency down
3. Router B withdraws all routes from A
4. Traffic reconverges to alternate paths
5. Router A comes back up
6. Adjacency re-established
7. Routes re-learned
8. Traffic returns to original path
Impact: Minutes of reconvergence, possible blackhole

Graceful Restart

1. Router A signals "entering graceful restart"
2. Router A daemon restarts, forwarding plane continues
3. Router B (helper) keeps routes, marks them "stale"
4. Router A comes back up quickly
5. Router A re-establishes adjacency, refreshes routes
6. Router B removes "stale" flag
7. No route withdrawal, no reconvergence
Impact: Near-zero traffic disruption

BGP Graceful Restart

Basic Configuration

Terminal window
configure
# Enable graceful restart for BGP
set protocols bgp parameters graceful-restart
# Optional: Set restart time (how long helper waits)
set protocols bgp parameters graceful-restart restart-time 120
# Optional: Set stalepath time (how long to keep stale routes)
set protocols bgp parameters graceful-restart stalepath-time 360
commit

Per-Neighbor Configuration

Terminal window
# Enable/disable per neighbor
set protocols bgp neighbor 10.0.0.2 capability graceful-restart
# Some neighbors might not support GR
# Disable for specific neighbor:
set protocols bgp neighbor 10.0.0.3 capability graceful-restart disable

BGP GR Timers

TimerPurposeDefaultRange
restart-timeTime helper waits for restart120s1-4095s
stalepath-timeTime to keep stale routes360s1-4095s
Terminal window
# Adjust timers
set protocols bgp parameters graceful-restart restart-time 180
set protocols bgp parameters graceful-restart stalepath-time 600

Verify BGP GR

Terminal window
# Check neighbor capabilities
show bgp neighbors 10.0.0.2
# Look for:
# Graceful Restart Capability: advertised and received
# Remote Restart timer is 120 seconds
# Address families by peer:
# IPv4 Unicast(Preserved)
# Check current GR state
show bgp neighbors 10.0.0.2 graceful-restart

OSPF Graceful Restart

Basic Configuration

Terminal window
configure
# Enable OSPF graceful restart
set protocols ospf graceful-restart
# Set grace period
set protocols ospf graceful-restart grace-period 180
commit

OSPF GR Helper Mode

Terminal window
# Helper mode (support other routers restarting)
set protocols ospf graceful-restart helper enable
# Can restrict helper mode
set protocols ospf graceful-restart helper strict-lsa-checking
# If LSA changes during restart, exit GR (safer)

OSPF GR Timers

TimerPurposeDefault
grace-periodTime to complete restart180s
Terminal window
# Adjust grace period
set protocols ospf graceful-restart grace-period 300

Verify OSPF GR

Terminal window
# Check OSPF graceful restart status
show ip ospf graceful-restart
# Check neighbor state during restart
show ip ospf neighbor
# During GR, neighbor might show special state

Testing Graceful Restart

Test BGP GR

Terminal window
# Terminal 1: Watch BGP neighbor
watch -n 1 'vtysh -c "show bgp neighbors 10.0.0.2"'
# Terminal 2: Restart BGP
systemctl restart frr
# Observe:
# - Neighbor should stay established (or show "Restart")
# - Routes should not be withdrawn
# - Quick re-establishment

Test OSPF GR

Terminal window
# Terminal 1: Watch OSPF neighbor
watch -n 1 'vtysh -c "show ip ospf neighbor"'
# Terminal 2: Restart OSPF
systemctl restart frr
# Observe:
# - Neighbor should not go Down
# - Routes should persist

Verify Forwarding Continues

Terminal window
# From another host, continuous ping through router
ping -i 0.1 destination-through-router
# During restart:
# Without GR: Packet loss during convergence
# With GR: Zero or minimal packet loss

Long-Lived Graceful Restart (LLGR)

For BGP, LLGR extends stale route retention:

Terminal window
configure
# Enable LLGR
set protocols bgp parameters graceful-restart long-lived
# Set LLGR stale time (much longer than regular)
set protocols bgp parameters graceful-restart long-lived stale-time 86400
commit

LLGR keeps routes even longer, with lower preference (community added). Useful for edge cases where restart takes very long.

Graceful Restart vs BFD

They serve different purposes:

FeatureGraceful RestartBFD
PurposeSurvive planned restartsDetect failures fast
TriggerControl plane restartLink/peer failure
ResponseKeep routesWithdraw routes fast
Use togetherYesYes
Terminal window
# Use both!
# BFD: Detect actual failures quickly
# GR: Survive planned restarts
set protocols bgp neighbor 10.0.0.2 bfd
set protocols bgp parameters graceful-restart

When GR Doesn’t Help

Unplanned Failures

Terminal window
# Router crashes (not graceful)
# Forwarding plane also fails
# GR signal never sent
# Solution: BFD detects quickly, traffic reroutes

Forwarding Plane Restart

Terminal window
# If forwarding (kernel/hardware) restarts:
# GR won't help - traffic still disrupted
# GR only helps when:
# - Control plane (FRR) restarts
# - Forwarding (kernel routes) continues

Configuration Changes

Terminal window
# Major config change might require route refresh anyway
# GR preserves old routes, but new config applies
# Be careful: GR might keep stale config briefly

Troubleshooting GR

GR Not Working

Terminal window
# Check if GR capability exchanged
show bgp neighbors 10.0.0.2 | grep -i graceful
# If "not received":
# - Peer doesn't support GR
# - Peer has GR disabled
# Check OSPF GR status
show ip ospf graceful-restart
# If disabled, check config:
show configuration commands | grep graceful

Routes Withdrawn Anyway

Terminal window
# Possible causes:
# 1. Restart took too long (exceeded timer)
# 2. Helper router cleared routes
# 3. GR not properly negotiated
# Check timers
show bgp neighbors 10.0.0.2 | grep -i timer
show bgp neighbors 10.0.0.2 | grep -i restart
# Increase restart-time if needed

Helper Not Preserving Routes

Terminal window
# Check helper configuration
show configuration commands | grep helper
# OSPF might need explicit helper mode
set protocols ospf graceful-restart helper enable

Best Practices

1. Enable on All Routers

Terminal window
# GR is peer-to-peer negotiation
# Both sides should have it enabled
# Without GR on peer:
# - Your restart withdraws routes from peer
# - Peer's restart withdraws routes from you

2. Test Before Production

Terminal window
# Test GR in lab/staging
# Verify:
# - Capabilities exchanged
# - Routes preserved during restart
# - Forwarding continues

3. Monitor During Maintenance

Terminal window
# During planned restart, monitor:
show bgp summary
show ip ospf neighbor
# Watch for state changes
# Verify quick re-establishment

4. Tune Timers for Your Environment

Terminal window
# Fast restart (SSD, modern hardware)
set protocols bgp parameters graceful-restart restart-time 60
# Slow restart (older hardware, large config)
set protocols bgp parameters graceful-restart restart-time 300

Configuration Summary

BGP Graceful Restart

Terminal window
configure
# Basic GR
set protocols bgp parameters graceful-restart
# Timers
set protocols bgp parameters graceful-restart restart-time 120
set protocols bgp parameters graceful-restart stalepath-time 360
# Per-neighbor (optional)
set protocols bgp neighbor 10.0.0.2 capability graceful-restart
commit

OSPF Graceful Restart

Terminal window
configure
# Basic GR
set protocols ospf graceful-restart
set protocols ospf graceful-restart grace-period 180
# Helper mode
set protocols ospf graceful-restart helper enable
commit

The Lesson

Graceful restart prevents traffic loss during planned maintenance.

Without GR:

  • Daemon restart = all routes withdrawn
  • Traffic reconverges (seconds to minutes)
  • Users see disruption

With GR:

  • Daemon restart signaled to neighbors
  • Neighbors keep routes (marked stale)
  • Forwarding continues
  • Daemon comes back, routes refreshed
  • Users notice nothing

Every production router should have graceful restart enabled. It’s free insurance for maintenance windows.

The 30 seconds you spend configuring GR saves minutes of disruption every time you restart the routing daemon.