Automation & GitOps for VyOS: Templates, Backups, Safe Deploy

Every network incident postmortem I’ve read includes some variation of “a configuration change was made.” Manual changes on production routers are the leading cause of outages. We know this. We still do it.

Automation isn’t about being fancy. It’s about reducing the blast radius of human error. When configs live in Git, changes are reviewed before deployment, and rollback is one command away — you still make mistakes, but they’re smaller and recoverable.

This is how to automate VyOS configuration management in a way that actually works.

The Problem with Manual Configuration

Picture this:

  1. Need to add a firewall rule
  2. SSH into router
  3. Type commands from memory
  4. Typo in IP address
  5. Commit
  6. Traffic drops
  7. Panic

Now picture:

  1. Edit rule in Git
  2. PR reviewed by colleague (catches typo)
  3. Merge triggers automated deploy
  4. Change applied
  5. If wrong, git revert and redeploy

Same change. One is an incident, one is Tuesday.

Config Backup Strategy

Before automating changes, automate backups. You need to recover from whatever you’re about to break.

Manual Backup Commands

Terminal window
# Full config as set commands
show configuration commands > /config/backup-$(date +%Y%m%d).txt
# Config as JSON (useful for parsing)
show configuration json > /config/backup-$(date +%Y%m%d).json

Automated Backup Script

Create /config/scripts/backup-config.sh:

#!/bin/bash
BACKUP_DIR="/config/backups"
DATE=$(date +%Y%m%d-%H%M%S)
HOSTNAME=$(hostname)
BACKUP_FILE="${BACKUP_DIR}/${HOSTNAME}-${DATE}.cfg"
# Create backup directory
mkdir -p "${BACKUP_DIR}"
# Export config
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper begin
/opt/vyatta/bin/cli-shell-api showCfg --show-active-only > "${BACKUP_FILE}"
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper end
# Compress
gzip "${BACKUP_FILE}"
# Keep last 30 days
find "${BACKUP_DIR}" -name "*.cfg.gz" -mtime +30 -delete
# Optional: Push to remote storage
# scp "${BACKUP_FILE}.gz" backup-server:/backups/

Schedule via cron:

Terminal window
configure
set system task-scheduler task backup-config cron-spec '0 * * * *'
set system task-scheduler task backup-config executable path '/config/scripts/backup-config.sh'
commit

Hourly backups, 30 days retention.

Off-Router Backup

Backups on the router die with the router. Push to external storage:

/config/scripts/backup-remote.sh
#!/bin/bash
HOSTNAME=$(hostname)
DATE=$(date +%Y%m%d)
REMOTE="git@git.example.com:network/configs.git"
WORK_DIR="/tmp/config-backup"
# Clone repo
rm -rf "${WORK_DIR}"
git clone "${REMOTE}" "${WORK_DIR}"
# Export config
/opt/vyatta/bin/cli-shell-api showCfg --show-active-only > "${WORK_DIR}/${HOSTNAME}.cfg"
# Commit and push
cd "${WORK_DIR}"
git add "${HOSTNAME}.cfg"
git commit -m "Automated backup: ${HOSTNAME} ${DATE}" || true
git push
# Cleanup
rm -rf "${WORK_DIR}"

Now every config change is version-controlled, even manual ones.

Configuration as Code

Store your configs in Git from the start, not just as backups.

Repository Structure

vyos-configs/
├── README.md
├── inventory/
│ ├── production.yml
│ └── staging.yml
├── templates/
│ ├── base/
│ │ ├── system.j2
│ │ ├── interfaces.j2
│ │ └── firewall.j2
│ └── roles/
│ ├── edge-router.j2
│ └── core-router.j2
├── vars/
│ ├── common.yml
│ └── per-router/
│ ├── router1.yml
│ └── router2.yml
├── configs/
│ ├── router1.cfg
│ └── router2.cfg
└── scripts/
├── generate.py
├── deploy.sh
└── validate.sh

Jinja2 Templates

Templates let you define config patterns once and instantiate for each router.

Template Example

templates/base/interfaces.j2:

{# Interface configuration template #}
{% for iface in interfaces %}
set interfaces ethernet {{ iface.name }} address '{{ iface.address }}'
set interfaces ethernet {{ iface.name }} description '{{ iface.description }}'
{% if iface.vrrp is defined %}
set high-availability vrrp group {{ iface.vrrp.group }} interface '{{ iface.name }}'
set high-availability vrrp group {{ iface.vrrp.group }} virtual-address '{{ iface.vrrp.vip }}'
set high-availability vrrp group {{ iface.vrrp.group }} priority '{{ iface.vrrp.priority }}'
{% endif %}
{% endfor %}

Variables File

vars/per-router/router1.yml:

hostname: router1
router_id: 10.255.255.1
interfaces:
- name: eth0
address: 10.0.0.2/24
description: LAN
vrrp:
group: LAN
vip: 10.0.0.1/24
priority: 200
- name: eth1
address: 203.0.113.2/24
description: WAN

Generation Script

scripts/generate.py:

#!/usr/bin/env python3
import yaml
import jinja2
import sys
from pathlib import Path
def generate_config(router_name):
# Load variables
common = yaml.safe_load(open('vars/common.yml'))
router = yaml.safe_load(open(f'vars/per-router/{router_name}.yml'))
# Merge variables
variables = {**common, **router}
# Load templates
env = jinja2.Environment(
loader=jinja2.FileSystemLoader('templates'),
undefined=jinja2.StrictUndefined
)
# Render each template
output = []
for template_file in sorted(Path('templates/base').glob('*.j2')):
template = env.get_template(f'base/{template_file.name}')
output.append(template.render(**variables))
return '\n'.join(output)
if __name__ == '__main__':
router = sys.argv[1]
config = generate_config(router)
print(config)

Generate config:

Terminal window
python scripts/generate.py router1 > configs/router1.cfg

Ansible Integration

Ansible is the standard tool for network automation. VyOS has a collection.

Inventory

inventory/production.yml:

all:
children:
vyos_routers:
hosts:
router1:
ansible_host: 10.0.0.2
router2:
ansible_host: 10.0.0.3
vars:
ansible_user: vyos
ansible_network_os: vyos.vyos.vyos
ansible_connection: ansible.netcommon.network_cli

Playbook: Apply Configuration

playbooks/apply-config.yml:

---
- name: Apply VyOS configuration
hosts: vyos_routers
gather_facts: no
tasks:
- name: Load configuration from file
set_fact:
config_lines: "{{ lookup('file', 'configs/' + inventory_hostname + '.cfg').split('\n') }}"
- name: Apply configuration
vyos.vyos.vyos_config:
lines: "{{ config_lines }}"
save: yes
register: result
- name: Show changes
debug:
var: result.commands
when: result.changed

Run:

Terminal window
ansible-playbook -i inventory/production.yml playbooks/apply-config.yml

Playbook: Backup Before Change

Always backup before deploying:

---
- name: Safe configuration deployment
hosts: vyos_routers
gather_facts: no
tasks:
- name: Backup current configuration
vyos.vyos.vyos_config:
backup: yes
backup_options:
filename: "{{ inventory_hostname }}-{{ ansible_date_time.iso8601 }}.cfg"
dir_path: ./backups/
- name: Apply new configuration
vyos.vyos.vyos_config:
src: "configs/{{ inventory_hostname }}.cfg"
save: yes

Safe Deployment Practices

Automation without safety is just faster mistakes.

1. Dry Run First

VyOS doesn’t have a true dry-run, but you can compare:

scripts/diff-config.sh
#!/bin/bash
ROUTER=$1
NEW_CONFIG=$2
# Get current config
ssh vyos@${ROUTER} 'show configuration commands' > /tmp/current.cfg
# Compare
diff -u /tmp/current.cfg "${NEW_CONFIG}"

Review the diff before deploying.

2. Staged Rollout

Don’t deploy to all routers at once:

# Deploy to staging first
- hosts: staging_routers
tasks:
- include_tasks: apply-config.yml
# Wait and validate
- hosts: staging_routers
tasks:
- name: Wait for convergence
pause:
minutes: 5
- name: Validate connectivity
vyos.vyos.vyos_command:
commands:
- ping 8.8.8.8 count 3
register: ping_result
failed_when: "'0 received' in ping_result.stdout[0]"
# Only then production
- hosts: production_routers
tasks:
- include_tasks: apply-config.yml

3. Rollback Procedure

When things go wrong (they will), rollback fast:

scripts/rollback.sh
#!/bin/bash
ROUTER=$1
BACKUP_FILE=$2
echo "Rolling back ${ROUTER} to ${BACKUP_FILE}"
# Load backup config
ssh vyos@${ROUTER} "configure; load ${BACKUP_FILE}; commit; save; exit"
echo "Rollback complete"

Or with Ansible:

- name: Emergency rollback
hosts: "{{ target_router }}"
gather_facts: no
tasks:
- name: Load backup configuration
vyos.vyos.vyos_config:
src: "backups/{{ inventory_hostname }}-{{ backup_date }}.cfg"
save: yes

4. Change Windows

Automate deployment timing, not just deployment:

# Only deploy during change window
- name: Check change window
hosts: localhost
tasks:
- name: Verify time is within change window
assert:
that:
- ansible_date_time.weekday in ['Saturday', 'Sunday']
- ansible_date_time.hour | int >= 2
- ansible_date_time.hour | int <= 6
fail_msg: "Outside change window (Sat-Sun 02:00-06:00)"

5. Validation After Deploy

Don’t just deploy and hope:

- name: Post-deployment validation
hosts: vyos_routers
tasks:
- name: Check BGP sessions
vyos.vyos.vyos_command:
commands:
- show ip bgp summary
register: bgp_status
- name: Verify BGP established
assert:
that:
- "'Established' in bgp_status.stdout[0]"
fail_msg: "BGP session not established!"
- name: Check VRRP status
vyos.vyos.vyos_command:
commands:
- show vrrp
register: vrrp_status
- name: Check route count
vyos.vyos.vyos_command:
commands:
- show ip route summary
register: route_count

GitOps Workflow

Full GitOps: Git is the source of truth. Changes go through Git, not directly to routers.

Workflow

1. Engineer creates branch
2. Edits config in vars/ or templates/
3. Runs generate.py locally
4. Commits generated config
5. Opens PR
6. Colleague reviews diff
7. CI validates (syntax, linting)
8. PR merged
9. CD pipeline deploys to routers
10. Monitoring confirms success

CI Pipeline (GitHub Actions Example)

.github/workflows/validate.yml:

name: Validate Config
on: [pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: pip install jinja2 pyyaml
- name: Generate configs
run: |
for router in vars/per-router/*.yml; do
name=$(basename $router .yml)
python scripts/generate.py $name > configs/$name.cfg
done
- name: Check for config drift
run: |
git diff --exit-code configs/

CD Pipeline

.github/workflows/deploy.yml:

name: Deploy Config
on:
push:
branches: [main]
paths:
- 'configs/**'
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Ansible
run: |
pip install ansible
ansible-galaxy collection install vyos.vyos
- name: Deploy to staging
run: |
ansible-playbook -i inventory/staging.yml playbooks/apply-config.yml
- name: Validate staging
run: |
ansible-playbook -i inventory/staging.yml playbooks/validate.yml
- name: Deploy to production
run: |
ansible-playbook -i inventory/production.yml playbooks/apply-config.yml

The Lesson

Automation reduces manual errors — if you have rules of the game.

Automation without process is just automated mistakes. The value comes from:

  1. Version control: Every change tracked, reviewable, revertible
  2. Code review: Someone else catches your typos
  3. Testing: Validate before production
  4. Staged rollout: Break staging, not production
  5. Fast rollback: Recover in minutes, not hours

The router config should never be edited directly. Changes flow through Git. If it’s not in Git, it didn’t happen (or it shouldn’t have).

Start small. Automate backups first — that’s pure upside. Then move to templated configs. Then add Ansible deployment. Then CI/CD. Each step reduces risk and increases confidence.

The goal isn’t to eliminate human involvement. It’s to move humans from “typing commands at 2 AM” to “reviewing diffs in daylight.” That’s where we make fewer mistakes.