Every network incident postmortem I’ve read includes some variation of “a configuration change was made.” Manual changes on production routers are the leading cause of outages. We know this. We still do it.
Automation isn’t about being fancy. It’s about reducing the blast radius of human error. When configs live in Git, changes are reviewed before deployment, and rollback is one command away — you still make mistakes, but they’re smaller and recoverable.
This is how to automate VyOS configuration management in a way that actually works.
The Problem with Manual Configuration
Picture this:
- Need to add a firewall rule
- SSH into router
- Type commands from memory
- Typo in IP address
- Commit
- Traffic drops
- Panic
Now picture:
- Edit rule in Git
- PR reviewed by colleague (catches typo)
- Merge triggers automated deploy
- Change applied
- If wrong,
git revertand redeploy
Same change. One is an incident, one is Tuesday.
Config Backup Strategy
Before automating changes, automate backups. You need to recover from whatever you’re about to break.
Manual Backup Commands
# Full config as set commandsshow configuration commands > /config/backup-$(date +%Y%m%d).txt
# Config as JSON (useful for parsing)show configuration json > /config/backup-$(date +%Y%m%d).jsonAutomated Backup Script
Create /config/scripts/backup-config.sh:
#!/bin/bash
BACKUP_DIR="/config/backups"DATE=$(date +%Y%m%d-%H%M%S)HOSTNAME=$(hostname)BACKUP_FILE="${BACKUP_DIR}/${HOSTNAME}-${DATE}.cfg"
# Create backup directorymkdir -p "${BACKUP_DIR}"
# Export config/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper begin/opt/vyatta/bin/cli-shell-api showCfg --show-active-only > "${BACKUP_FILE}"/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper end
# Compressgzip "${BACKUP_FILE}"
# Keep last 30 daysfind "${BACKUP_DIR}" -name "*.cfg.gz" -mtime +30 -delete
# Optional: Push to remote storage# scp "${BACKUP_FILE}.gz" backup-server:/backups/Schedule via cron:
configureset system task-scheduler task backup-config cron-spec '0 * * * *'set system task-scheduler task backup-config executable path '/config/scripts/backup-config.sh'commitHourly backups, 30 days retention.
Off-Router Backup
Backups on the router die with the router. Push to external storage:
#!/bin/bashHOSTNAME=$(hostname)DATE=$(date +%Y%m%d)REMOTE="git@git.example.com:network/configs.git"WORK_DIR="/tmp/config-backup"
# Clone reporm -rf "${WORK_DIR}"git clone "${REMOTE}" "${WORK_DIR}"
# Export config/opt/vyatta/bin/cli-shell-api showCfg --show-active-only > "${WORK_DIR}/${HOSTNAME}.cfg"
# Commit and pushcd "${WORK_DIR}"git add "${HOSTNAME}.cfg"git commit -m "Automated backup: ${HOSTNAME} ${DATE}" || truegit push
# Cleanuprm -rf "${WORK_DIR}"Now every config change is version-controlled, even manual ones.
Configuration as Code
Store your configs in Git from the start, not just as backups.
Repository Structure
vyos-configs/├── README.md├── inventory/│ ├── production.yml│ └── staging.yml├── templates/│ ├── base/│ │ ├── system.j2│ │ ├── interfaces.j2│ │ └── firewall.j2│ └── roles/│ ├── edge-router.j2│ └── core-router.j2├── vars/│ ├── common.yml│ └── per-router/│ ├── router1.yml│ └── router2.yml├── configs/│ ├── router1.cfg│ └── router2.cfg└── scripts/ ├── generate.py ├── deploy.sh └── validate.shJinja2 Templates
Templates let you define config patterns once and instantiate for each router.
Template Example
templates/base/interfaces.j2:
{# Interface configuration template #}
{% for iface in interfaces %}set interfaces ethernet {{ iface.name }} address '{{ iface.address }}'set interfaces ethernet {{ iface.name }} description '{{ iface.description }}'{% if iface.vrrp is defined %}set high-availability vrrp group {{ iface.vrrp.group }} interface '{{ iface.name }}'set high-availability vrrp group {{ iface.vrrp.group }} virtual-address '{{ iface.vrrp.vip }}'set high-availability vrrp group {{ iface.vrrp.group }} priority '{{ iface.vrrp.priority }}'{% endif %}{% endfor %}Variables File
vars/per-router/router1.yml:
hostname: router1router_id: 10.255.255.1
interfaces: - name: eth0 address: 10.0.0.2/24 description: LAN vrrp: group: LAN vip: 10.0.0.1/24 priority: 200 - name: eth1 address: 203.0.113.2/24 description: WANGeneration Script
scripts/generate.py:
#!/usr/bin/env python3import yamlimport jinja2import sysfrom pathlib import Path
def generate_config(router_name): # Load variables common = yaml.safe_load(open('vars/common.yml')) router = yaml.safe_load(open(f'vars/per-router/{router_name}.yml'))
# Merge variables variables = {**common, **router}
# Load templates env = jinja2.Environment( loader=jinja2.FileSystemLoader('templates'), undefined=jinja2.StrictUndefined )
# Render each template output = [] for template_file in sorted(Path('templates/base').glob('*.j2')): template = env.get_template(f'base/{template_file.name}') output.append(template.render(**variables))
return '\n'.join(output)
if __name__ == '__main__': router = sys.argv[1] config = generate_config(router) print(config)Generate config:
python scripts/generate.py router1 > configs/router1.cfgAnsible Integration
Ansible is the standard tool for network automation. VyOS has a collection.
Inventory
inventory/production.yml:
all: children: vyos_routers: hosts: router1: ansible_host: 10.0.0.2 router2: ansible_host: 10.0.0.3 vars: ansible_user: vyos ansible_network_os: vyos.vyos.vyos ansible_connection: ansible.netcommon.network_cliPlaybook: Apply Configuration
playbooks/apply-config.yml:
---- name: Apply VyOS configuration hosts: vyos_routers gather_facts: no
tasks: - name: Load configuration from file set_fact: config_lines: "{{ lookup('file', 'configs/' + inventory_hostname + '.cfg').split('\n') }}"
- name: Apply configuration vyos.vyos.vyos_config: lines: "{{ config_lines }}" save: yes register: result
- name: Show changes debug: var: result.commands when: result.changedRun:
ansible-playbook -i inventory/production.yml playbooks/apply-config.ymlPlaybook: Backup Before Change
Always backup before deploying:
---- name: Safe configuration deployment hosts: vyos_routers gather_facts: no
tasks: - name: Backup current configuration vyos.vyos.vyos_config: backup: yes backup_options: filename: "{{ inventory_hostname }}-{{ ansible_date_time.iso8601 }}.cfg" dir_path: ./backups/
- name: Apply new configuration vyos.vyos.vyos_config: src: "configs/{{ inventory_hostname }}.cfg" save: yesSafe Deployment Practices
Automation without safety is just faster mistakes.
1. Dry Run First
VyOS doesn’t have a true dry-run, but you can compare:
#!/bin/bashROUTER=$1NEW_CONFIG=$2
# Get current configssh vyos@${ROUTER} 'show configuration commands' > /tmp/current.cfg
# Comparediff -u /tmp/current.cfg "${NEW_CONFIG}"Review the diff before deploying.
2. Staged Rollout
Don’t deploy to all routers at once:
# Deploy to staging first- hosts: staging_routers tasks: - include_tasks: apply-config.yml
# Wait and validate- hosts: staging_routers tasks: - name: Wait for convergence pause: minutes: 5
- name: Validate connectivity vyos.vyos.vyos_command: commands: - ping 8.8.8.8 count 3 register: ping_result failed_when: "'0 received' in ping_result.stdout[0]"
# Only then production- hosts: production_routers tasks: - include_tasks: apply-config.yml3. Rollback Procedure
When things go wrong (they will), rollback fast:
#!/bin/bashROUTER=$1BACKUP_FILE=$2
echo "Rolling back ${ROUTER} to ${BACKUP_FILE}"
# Load backup configssh vyos@${ROUTER} "configure; load ${BACKUP_FILE}; commit; save; exit"
echo "Rollback complete"Or with Ansible:
- name: Emergency rollback hosts: "{{ target_router }}" gather_facts: no
tasks: - name: Load backup configuration vyos.vyos.vyos_config: src: "backups/{{ inventory_hostname }}-{{ backup_date }}.cfg" save: yes4. Change Windows
Automate deployment timing, not just deployment:
# Only deploy during change window- name: Check change window hosts: localhost tasks: - name: Verify time is within change window assert: that: - ansible_date_time.weekday in ['Saturday', 'Sunday'] - ansible_date_time.hour | int >= 2 - ansible_date_time.hour | int <= 6 fail_msg: "Outside change window (Sat-Sun 02:00-06:00)"5. Validation After Deploy
Don’t just deploy and hope:
- name: Post-deployment validation hosts: vyos_routers tasks: - name: Check BGP sessions vyos.vyos.vyos_command: commands: - show ip bgp summary register: bgp_status
- name: Verify BGP established assert: that: - "'Established' in bgp_status.stdout[0]" fail_msg: "BGP session not established!"
- name: Check VRRP status vyos.vyos.vyos_command: commands: - show vrrp register: vrrp_status
- name: Check route count vyos.vyos.vyos_command: commands: - show ip route summary register: route_countGitOps Workflow
Full GitOps: Git is the source of truth. Changes go through Git, not directly to routers.
Workflow
1. Engineer creates branch2. Edits config in vars/ or templates/3. Runs generate.py locally4. Commits generated config5. Opens PR6. Colleague reviews diff7. CI validates (syntax, linting)8. PR merged9. CD pipeline deploys to routers10. Monitoring confirms successCI Pipeline (GitHub Actions Example)
.github/workflows/validate.yml:
name: Validate Config
on: [pull_request]
jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
- name: Setup Python uses: actions/setup-python@v5 with: python-version: '3.11'
- name: Install dependencies run: pip install jinja2 pyyaml
- name: Generate configs run: | for router in vars/per-router/*.yml; do name=$(basename $router .yml) python scripts/generate.py $name > configs/$name.cfg done
- name: Check for config drift run: | git diff --exit-code configs/CD Pipeline
.github/workflows/deploy.yml:
name: Deploy Config
on: push: branches: [main] paths: - 'configs/**'
jobs: deploy: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
- name: Setup Ansible run: | pip install ansible ansible-galaxy collection install vyos.vyos
- name: Deploy to staging run: | ansible-playbook -i inventory/staging.yml playbooks/apply-config.yml
- name: Validate staging run: | ansible-playbook -i inventory/staging.yml playbooks/validate.yml
- name: Deploy to production run: | ansible-playbook -i inventory/production.yml playbooks/apply-config.ymlThe Lesson
Automation reduces manual errors — if you have rules of the game.
Automation without process is just automated mistakes. The value comes from:
- Version control: Every change tracked, reviewable, revertible
- Code review: Someone else catches your typos
- Testing: Validate before production
- Staged rollout: Break staging, not production
- Fast rollback: Recover in minutes, not hours
The router config should never be edited directly. Changes flow through Git. If it’s not in Git, it didn’t happen (or it shouldn’t have).
Start small. Automate backups first — that’s pure upside. Then move to templated configs. Then add Ansible deployment. Then CI/CD. Each step reduces risk and increases confidence.
The goal isn’t to eliminate human involvement. It’s to move humans from “typing commands at 2 AM” to “reviewing diffs in daylight.” That’s where we make fewer mistakes.