Multi-WAN on VyOS: Failover That Actually Works

Having two internet connections means nothing if your router doesn’t know when one fails. I’ve seen setups where the “failover” just meant two default routes with different metrics — the primary could be completely dead, and the router would happily keep trying to send traffic through it until the metrics were manually adjusted.

Real failover requires active health checking. VyOS provides this, but it needs proper configuration. Let’s build multi-WAN that actually works.

The Multi-WAN Architecture

Typical setup:

  • eth0: Primary ISP (faster, preferred)
  • eth1: Secondary ISP (backup)
  • eth2: LAN

Goals:

  1. Use primary when healthy
  2. Failover to secondary when primary fails
  3. Fail back when primary recovers
  4. All of this automatically

Basic Interface Setup

Terminal window
configure
# Primary WAN
set interfaces ethernet eth0 description 'WAN-PRIMARY'
set interfaces ethernet eth0 address dhcp
# Secondary WAN
set interfaces ethernet eth1 description 'WAN-SECONDARY'
set interfaces ethernet eth1 address dhcp
# LAN
set interfaces ethernet eth2 description 'LAN'
set interfaces ethernet eth2 address '10.0.0.1/24'
commit

The Wrong Way: Static Metrics

You might think this works:

Terminal window
# DON'T DO THIS (or at least, don't rely only on this)
set protocols static route 0.0.0.0/0 next-hop 192.168.1.1 distance 10
set protocols static route 0.0.0.0/0 next-hop 192.168.2.1 distance 20

This creates two default routes. The lower distance (10) is preferred. But here’s the problem: if the primary ISP goes down at layer 3 (routing issue, ISP outage, etc.), the interface might still be up. The router keeps using the “preferred” route that goes nowhere.

The Right Way: Health Checking

VyOS uses conntrack-sync or custom scripts for health checking. A more robust approach is using vyos-wan-load-balance or implementing checks with route monitoring.

Option 1: Interface State Tracking

Basic tracking — failover when interface goes down:

Terminal window
configure
# Primary route with interface tracking
set protocols static route 0.0.0.0/0 next-hop 192.168.1.1 distance 10
set protocols static route 0.0.0.0/0 next-hop 192.168.1.1 interface 'eth0'
# Secondary route - used when primary interface is down
set protocols static route 0.0.0.0/0 next-hop 192.168.2.1 distance 20
set protocols static route 0.0.0.0/0 next-hop 192.168.2.1 interface 'eth1'
commit

This helps but only detects link failure, not upstream issues.

Option 2: Scripted Health Checks

For proper SLA monitoring, create a health check script:

/config/scripts/wan-health-check.sh
#!/bin/bash
PRIMARY_GW="192.168.1.1"
SECONDARY_GW="192.168.2.1"
CHECK_TARGET="8.8.8.8"
PRIMARY_METRIC=10
FAILOVER_METRIC=5
# Check primary WAN by pinging through it
if ping -c 3 -W 2 -I eth0 $CHECK_TARGET > /dev/null 2>&1; then
# Primary is healthy - ensure it's preferred
ip route replace default via $PRIMARY_GW metric $PRIMARY_METRIC
ip route replace default via $SECONDARY_GW metric 20
else
# Primary is down - make secondary preferred
ip route replace default via $SECONDARY_GW metric $FAILOVER_METRIC
ip route replace default via $PRIMARY_GW metric 100
logger "WAN Failover: Primary down, using secondary"
fi

Schedule via cron:

Terminal window
set system task-scheduler task wan-check executable path '/config/scripts/wan-health-check.sh'
set system task-scheduler task wan-check interval '30'

VyOS has built-in WAN load balancing with health checks:

Terminal window
configure
# Define WAN interfaces for load balancing
set load-balancing wan interface-health eth0 failure-count '3'
set load-balancing wan interface-health eth0 nexthop '192.168.1.1'
set load-balancing wan interface-health eth0 success-count '3'
set load-balancing wan interface-health eth0 test 10 resp-time '5'
set load-balancing wan interface-health eth0 test 10 target '8.8.8.8'
set load-balancing wan interface-health eth0 test 10 ttl-limit '1'
set load-balancing wan interface-health eth0 test 10 type 'ping'
set load-balancing wan interface-health eth1 failure-count '3'
set load-balancing wan interface-health eth1 nexthop '192.168.2.1'
set load-balancing wan interface-health eth1 success-count '3'
set load-balancing wan interface-health eth1 test 10 resp-time '5'
set load-balancing wan interface-health eth1 test 10 target '8.8.4.4'
set load-balancing wan interface-health eth1 test 10 ttl-limit '1'
set load-balancing wan interface-health eth1 test 10 type 'ping'
# Define load balancing rule
set load-balancing wan rule 10 inbound-interface 'eth2'
set load-balancing wan rule 10 interface eth0 weight '100'
set load-balancing wan rule 10 interface eth1 weight '1'
set load-balancing wan rule 10 failover
# Sticky connections (optional - keeps sessions on same WAN)
set load-balancing wan sticky-connections inbound
set load-balancing wan enable-local-traffic
commit

Key parameters:

  • failure-count: How many failed tests before marking down
  • success-count: How many successes before marking up
  • weight: Higher = more traffic (100:1 means primary gets almost all traffic)
  • failover: Enable failover mode (not just load balancing)

Understanding the Health Check

Terminal window
set load-balancing wan interface-health eth0 test 10 target '8.8.8.8'
set load-balancing wan interface-health eth0 test 10 type 'ping'
set load-balancing wan interface-health eth0 test 10 resp-time '5'

This pings 8.8.8.8 through eth0. If response takes >5 seconds or fails, it counts as a failure. After 3 failures (failure-count), the interface is marked down.

Choose your test target wisely:

  • Public DNS (8.8.8.8, 1.1.1.1) - highly available
  • Your ISP’s gateway - tests only first hop
  • Multiple targets for more confidence
Terminal window
# Multiple tests - all must pass
set load-balancing wan interface-health eth0 test 10 target '8.8.8.8'
set load-balancing wan interface-health eth0 test 10 type 'ping'
set load-balancing wan interface-health eth0 test 20 target '1.1.1.1'
set load-balancing wan interface-health eth0 test 20 type 'ping'

NAT for Multi-WAN

Each WAN needs its own NAT rule:

Terminal window
configure
# NAT for primary WAN
set nat source rule 100 outbound-interface name 'eth0'
set nat source rule 100 source address '10.0.0.0/24'
set nat source rule 100 translation address 'masquerade'
# NAT for secondary WAN
set nat source rule 110 outbound-interface name 'eth1'
set nat source rule 110 source address '10.0.0.0/24'
set nat source rule 110 translation address 'masquerade'
commit

masquerade automatically uses the correct outbound IP based on which interface traffic exits.

Monitoring WAN Status

Terminal window
# Check WAN health status
show wan-load-balance
# Check current routing
show ip route
# Check NAT sessions
show nat source translations

Sticky Sessions: Why They Matter

Without sticky sessions, a TCP connection might start on WAN1, then mid-connection failover happens, and packets go out WAN2 with a different source IP. The remote server sees packets from a different IP and drops them.

Terminal window
set load-balancing wan sticky-connections inbound

Sticky connections track existing connections and keep them on the same WAN until they complete. New connections go to whichever WAN is preferred at that moment.

Exclude Certain Traffic from Load Balancing

Some traffic should always use a specific WAN:

Terminal window
# VPN traffic always uses primary (to maintain stable VPN connection)
set load-balancing wan rule 5 inbound-interface 'eth2'
set load-balancing wan rule 5 destination port '51820'
set load-balancing wan rule 5 protocol 'udp'
set load-balancing wan rule 5 interface eth0 weight '100'
# VoIP traffic uses secondary (more stable latency)
set load-balancing wan rule 6 inbound-interface 'eth2'
set load-balancing wan rule 6 destination port '5060-5061'
set load-balancing wan rule 6 protocol 'udp'
set load-balancing wan rule 6 interface eth1 weight '100'

Rules are processed in order. Rule 5 and 6 handle specific traffic, rule 10 (from earlier) handles everything else.

Active-Active vs Active-Passive

Active-Passive (Failover):

Terminal window
set load-balancing wan rule 10 interface eth0 weight '100'
set load-balancing wan rule 10 interface eth1 weight '1'
set load-balancing wan rule 10 failover

Primary handles all traffic. Secondary only used when primary fails.

Active-Active (Load Sharing):

Terminal window
set load-balancing wan rule 10 interface eth0 weight '70'
set load-balancing wan rule 10 interface eth1 weight '30'
# Remove 'failover' flag

Traffic distributed across both. 70% to primary, 30% to secondary (roughly).

Active-Active provides more bandwidth but complicates troubleshooting and may cause issues with services that expect consistent source IP.

Testing Failover

Before relying on failover, test it:

  1. Verify both WANs work independently

    Terminal window
    # Test via primary
    ping -I eth0 8.8.8.8
    # Test via secondary
    ping -I eth1 8.8.8.8
  2. Simulate primary failure

    Terminal window
    # Temporarily block test target from primary using output filter
    set firewall ipv4 name TEST rule 1 action 'drop'
    set firewall ipv4 name TEST rule 1 destination address '8.8.8.8'
    set firewall ipv4 output filter rule 100 outbound-interface name 'eth0'
    set firewall ipv4 output filter rule 100 action 'jump'
    set firewall ipv4 output filter rule 100 jump-target 'TEST'
    commit
    # Watch failover happen
    show wan-load-balance
    # Remove test firewall
    delete firewall ipv4 name TEST
    delete firewall ipv4 output filter rule 100
    commit
  3. Physically disconnect primary Unplug eth0. Verify traffic continues via eth1. Reconnect and verify fail-back.

Complete Multi-WAN Configuration

Terminal window
# === Interfaces ===
set interfaces ethernet eth0 description 'WAN-PRIMARY'
set interfaces ethernet eth0 address dhcp
set interfaces ethernet eth1 description 'WAN-SECONDARY'
set interfaces ethernet eth1 address dhcp
set interfaces ethernet eth2 description 'LAN'
set interfaces ethernet eth2 address '10.0.0.1/24'
# === NAT ===
set nat source rule 100 outbound-interface name 'eth0'
set nat source rule 100 source address '10.0.0.0/24'
set nat source rule 100 translation address 'masquerade'
set nat source rule 110 outbound-interface name 'eth1'
set nat source rule 110 source address '10.0.0.0/24'
set nat source rule 110 translation address 'masquerade'
# === WAN Load Balancing with Health Check ===
set load-balancing wan interface-health eth0 failure-count '3'
set load-balancing wan interface-health eth0 nexthop 'dhcp'
set load-balancing wan interface-health eth0 success-count '3'
set load-balancing wan interface-health eth0 test 10 resp-time '5'
set load-balancing wan interface-health eth0 test 10 target '8.8.8.8'
set load-balancing wan interface-health eth0 test 10 type 'ping'
set load-balancing wan interface-health eth1 failure-count '3'
set load-balancing wan interface-health eth1 nexthop 'dhcp'
set load-balancing wan interface-health eth1 success-count '3'
set load-balancing wan interface-health eth1 test 10 resp-time '5'
set load-balancing wan interface-health eth1 test 10 target '8.8.4.4'
set load-balancing wan interface-health eth1 test 10 type 'ping'
set load-balancing wan rule 10 inbound-interface 'eth2'
set load-balancing wan rule 10 interface eth0 weight '100'
set load-balancing wan rule 10 interface eth1 weight '1'
set load-balancing wan rule 10 failover
set load-balancing wan sticky-connections inbound
set load-balancing wan enable-local-traffic

The Lesson

Multi-WAN without proper health checking is false confidence. Your router might report two routes while happily sending traffic into a black hole.

Real failover requires:

  1. Active health checks that test actual connectivity, not just link state
  2. Reasonable timers - fast enough to detect failures quickly, slow enough to avoid flapping
  3. Testing - verify failover actually works before you need it
  4. Monitoring - alerts when failover happens so you know to investigate

VyOS’s WAN load balancing provides all of this out of the box. Configure it, test it, and trust it — but verify with monitoring.