Having two internet connections means nothing if your router doesn’t know when one fails. I’ve seen setups where the “failover” just meant two default routes with different metrics — the primary could be completely dead, and the router would happily keep trying to send traffic through it until the metrics were manually adjusted.
Real failover requires active health checking. VyOS provides this, but it needs proper configuration. Let’s build multi-WAN that actually works.
The Multi-WAN Architecture
Typical setup:
- eth0: Primary ISP (faster, preferred)
- eth1: Secondary ISP (backup)
- eth2: LAN
Goals:
- Use primary when healthy
- Failover to secondary when primary fails
- Fail back when primary recovers
- All of this automatically
Basic Interface Setup
configure
# Primary WANset interfaces ethernet eth0 description 'WAN-PRIMARY'set interfaces ethernet eth0 address dhcp
# Secondary WANset interfaces ethernet eth1 description 'WAN-SECONDARY'set interfaces ethernet eth1 address dhcp
# LANset interfaces ethernet eth2 description 'LAN'set interfaces ethernet eth2 address '10.0.0.1/24'
commitThe Wrong Way: Static Metrics
You might think this works:
# DON'T DO THIS (or at least, don't rely only on this)set protocols static route 0.0.0.0/0 next-hop 192.168.1.1 distance 10set protocols static route 0.0.0.0/0 next-hop 192.168.2.1 distance 20This creates two default routes. The lower distance (10) is preferred. But here’s the problem: if the primary ISP goes down at layer 3 (routing issue, ISP outage, etc.), the interface might still be up. The router keeps using the “preferred” route that goes nowhere.
The Right Way: Health Checking
VyOS uses conntrack-sync or custom scripts for health checking. A more robust approach is using vyos-wan-load-balance or implementing checks with route monitoring.
Option 1: Interface State Tracking
Basic tracking — failover when interface goes down:
configure
# Primary route with interface trackingset protocols static route 0.0.0.0/0 next-hop 192.168.1.1 distance 10set protocols static route 0.0.0.0/0 next-hop 192.168.1.1 interface 'eth0'
# Secondary route - used when primary interface is downset protocols static route 0.0.0.0/0 next-hop 192.168.2.1 distance 20set protocols static route 0.0.0.0/0 next-hop 192.168.2.1 interface 'eth1'
commitThis helps but only detects link failure, not upstream issues.
Option 2: Scripted Health Checks
For proper SLA monitoring, create a health check script:
#!/bin/bashPRIMARY_GW="192.168.1.1"SECONDARY_GW="192.168.2.1"CHECK_TARGET="8.8.8.8"PRIMARY_METRIC=10FAILOVER_METRIC=5
# Check primary WAN by pinging through itif ping -c 3 -W 2 -I eth0 $CHECK_TARGET > /dev/null 2>&1; then # Primary is healthy - ensure it's preferred ip route replace default via $PRIMARY_GW metric $PRIMARY_METRIC ip route replace default via $SECONDARY_GW metric 20else # Primary is down - make secondary preferred ip route replace default via $SECONDARY_GW metric $FAILOVER_METRIC ip route replace default via $PRIMARY_GW metric 100 logger "WAN Failover: Primary down, using secondary"fiSchedule via cron:
set system task-scheduler task wan-check executable path '/config/scripts/wan-health-check.sh'set system task-scheduler task wan-check interval '30'Option 3: VyOS WAN Load Balancing (Recommended)
VyOS has built-in WAN load balancing with health checks:
configure
# Define WAN interfaces for load balancingset load-balancing wan interface-health eth0 failure-count '3'set load-balancing wan interface-health eth0 nexthop '192.168.1.1'set load-balancing wan interface-health eth0 success-count '3'set load-balancing wan interface-health eth0 test 10 resp-time '5'set load-balancing wan interface-health eth0 test 10 target '8.8.8.8'set load-balancing wan interface-health eth0 test 10 ttl-limit '1'set load-balancing wan interface-health eth0 test 10 type 'ping'
set load-balancing wan interface-health eth1 failure-count '3'set load-balancing wan interface-health eth1 nexthop '192.168.2.1'set load-balancing wan interface-health eth1 success-count '3'set load-balancing wan interface-health eth1 test 10 resp-time '5'set load-balancing wan interface-health eth1 test 10 target '8.8.4.4'set load-balancing wan interface-health eth1 test 10 ttl-limit '1'set load-balancing wan interface-health eth1 test 10 type 'ping'
# Define load balancing ruleset load-balancing wan rule 10 inbound-interface 'eth2'set load-balancing wan rule 10 interface eth0 weight '100'set load-balancing wan rule 10 interface eth1 weight '1'set load-balancing wan rule 10 failover
# Sticky connections (optional - keeps sessions on same WAN)set load-balancing wan sticky-connections inboundset load-balancing wan enable-local-traffic
commitKey parameters:
- failure-count: How many failed tests before marking down
- success-count: How many successes before marking up
- weight: Higher = more traffic (100:1 means primary gets almost all traffic)
- failover: Enable failover mode (not just load balancing)
Understanding the Health Check
set load-balancing wan interface-health eth0 test 10 target '8.8.8.8'set load-balancing wan interface-health eth0 test 10 type 'ping'set load-balancing wan interface-health eth0 test 10 resp-time '5'This pings 8.8.8.8 through eth0. If response takes >5 seconds or fails, it counts as a failure. After 3 failures (failure-count), the interface is marked down.
Choose your test target wisely:
- Public DNS (8.8.8.8, 1.1.1.1) - highly available
- Your ISP’s gateway - tests only first hop
- Multiple targets for more confidence
# Multiple tests - all must passset load-balancing wan interface-health eth0 test 10 target '8.8.8.8'set load-balancing wan interface-health eth0 test 10 type 'ping'set load-balancing wan interface-health eth0 test 20 target '1.1.1.1'set load-balancing wan interface-health eth0 test 20 type 'ping'NAT for Multi-WAN
Each WAN needs its own NAT rule:
configure
# NAT for primary WANset nat source rule 100 outbound-interface name 'eth0'set nat source rule 100 source address '10.0.0.0/24'set nat source rule 100 translation address 'masquerade'
# NAT for secondary WANset nat source rule 110 outbound-interface name 'eth1'set nat source rule 110 source address '10.0.0.0/24'set nat source rule 110 translation address 'masquerade'
commitmasquerade automatically uses the correct outbound IP based on which interface traffic exits.
Monitoring WAN Status
# Check WAN health statusshow wan-load-balance
# Check current routingshow ip route
# Check NAT sessionsshow nat source translationsSticky Sessions: Why They Matter
Without sticky sessions, a TCP connection might start on WAN1, then mid-connection failover happens, and packets go out WAN2 with a different source IP. The remote server sees packets from a different IP and drops them.
set load-balancing wan sticky-connections inboundSticky connections track existing connections and keep them on the same WAN until they complete. New connections go to whichever WAN is preferred at that moment.
Exclude Certain Traffic from Load Balancing
Some traffic should always use a specific WAN:
# VPN traffic always uses primary (to maintain stable VPN connection)set load-balancing wan rule 5 inbound-interface 'eth2'set load-balancing wan rule 5 destination port '51820'set load-balancing wan rule 5 protocol 'udp'set load-balancing wan rule 5 interface eth0 weight '100'
# VoIP traffic uses secondary (more stable latency)set load-balancing wan rule 6 inbound-interface 'eth2'set load-balancing wan rule 6 destination port '5060-5061'set load-balancing wan rule 6 protocol 'udp'set load-balancing wan rule 6 interface eth1 weight '100'Rules are processed in order. Rule 5 and 6 handle specific traffic, rule 10 (from earlier) handles everything else.
Active-Active vs Active-Passive
Active-Passive (Failover):
set load-balancing wan rule 10 interface eth0 weight '100'set load-balancing wan rule 10 interface eth1 weight '1'set load-balancing wan rule 10 failoverPrimary handles all traffic. Secondary only used when primary fails.
Active-Active (Load Sharing):
set load-balancing wan rule 10 interface eth0 weight '70'set load-balancing wan rule 10 interface eth1 weight '30'# Remove 'failover' flagTraffic distributed across both. 70% to primary, 30% to secondary (roughly).
Active-Active provides more bandwidth but complicates troubleshooting and may cause issues with services that expect consistent source IP.
Testing Failover
Before relying on failover, test it:
-
Verify both WANs work independently
Terminal window # Test via primaryping -I eth0 8.8.8.8# Test via secondaryping -I eth1 8.8.8.8 -
Simulate primary failure
Terminal window # Temporarily block test target from primary using output filterset firewall ipv4 name TEST rule 1 action 'drop'set firewall ipv4 name TEST rule 1 destination address '8.8.8.8'set firewall ipv4 output filter rule 100 outbound-interface name 'eth0'set firewall ipv4 output filter rule 100 action 'jump'set firewall ipv4 output filter rule 100 jump-target 'TEST'commit# Watch failover happenshow wan-load-balance# Remove test firewalldelete firewall ipv4 name TESTdelete firewall ipv4 output filter rule 100commit -
Physically disconnect primary Unplug eth0. Verify traffic continues via eth1. Reconnect and verify fail-back.
Complete Multi-WAN Configuration
# === Interfaces ===set interfaces ethernet eth0 description 'WAN-PRIMARY'set interfaces ethernet eth0 address dhcpset interfaces ethernet eth1 description 'WAN-SECONDARY'set interfaces ethernet eth1 address dhcpset interfaces ethernet eth2 description 'LAN'set interfaces ethernet eth2 address '10.0.0.1/24'
# === NAT ===set nat source rule 100 outbound-interface name 'eth0'set nat source rule 100 source address '10.0.0.0/24'set nat source rule 100 translation address 'masquerade'set nat source rule 110 outbound-interface name 'eth1'set nat source rule 110 source address '10.0.0.0/24'set nat source rule 110 translation address 'masquerade'
# === WAN Load Balancing with Health Check ===set load-balancing wan interface-health eth0 failure-count '3'set load-balancing wan interface-health eth0 nexthop 'dhcp'set load-balancing wan interface-health eth0 success-count '3'set load-balancing wan interface-health eth0 test 10 resp-time '5'set load-balancing wan interface-health eth0 test 10 target '8.8.8.8'set load-balancing wan interface-health eth0 test 10 type 'ping'
set load-balancing wan interface-health eth1 failure-count '3'set load-balancing wan interface-health eth1 nexthop 'dhcp'set load-balancing wan interface-health eth1 success-count '3'set load-balancing wan interface-health eth1 test 10 resp-time '5'set load-balancing wan interface-health eth1 test 10 target '8.8.4.4'set load-balancing wan interface-health eth1 test 10 type 'ping'
set load-balancing wan rule 10 inbound-interface 'eth2'set load-balancing wan rule 10 interface eth0 weight '100'set load-balancing wan rule 10 interface eth1 weight '1'set load-balancing wan rule 10 failover
set load-balancing wan sticky-connections inboundset load-balancing wan enable-local-trafficThe Lesson
Multi-WAN without proper health checking is false confidence. Your router might report two routes while happily sending traffic into a black hole.
Real failover requires:
- Active health checks that test actual connectivity, not just link state
- Reasonable timers - fast enough to detect failures quickly, slow enough to avoid flapping
- Testing - verify failover actually works before you need it
- Monitoring - alerts when failover happens so you know to investigate
VyOS’s WAN load balancing provides all of this out of the box. Configure it, test it, and trust it — but verify with monitoring.