NetBox as Source of Truth: Driving Config from Inventory

Everyone has a “source of truth.” Usually it is a spreadsheet that was accurate the day it was made and has been quietly wrong ever since. A source of truth is only real if the running config is generated from it — the moment someone can change a device without changing the database, the database is just documentation, and documentation rots.

NetBox is the common answer for the data model. The discipline that makes it work is the rule that nothing gets configured by hand.

Model the Intent, Not the Config

NetBox holds intent: what should be true. Sites, racks, devices, interfaces, IP addresses, prefixes, VLANs, VRFs, and the relationships between them. It does not hold CLI — it holds the facts your CLI is generated from.

The pieces that drive most automation:

  • IPAM — prefixes, IP addresses, VRFs, VLANs. The authoritative answer to “what is this interface’s address.”
  • DCIM — devices, interfaces, cabling. The physical and logical topology.
  • Config Contexts — JSON data attached to devices by role/site/platform: NTP servers, SNMP communities, BGP AS, syslog targets. This is the per-device variable bag your templates consume.

Pulling Data with pynetbox

The API is the integration point. pynetbox wraps it:

import pynetbox
nb = pynetbox.api("https://netbox.example.net", token="...")
# Every active device at a site
devices = nb.dcim.devices.filter(site="dc1", status="active")
for dev in devices:
ctx = dev.config_context # merged JSON for this device
intfs = nb.dcim.interfaces.filter(device_id=dev.id)
ips = nb.ipam.ip_addresses.filter(device_id=dev.id)
# hand this structured data to a template

config_context is the key feature: NetBox merges JSON from site, role, platform, and device-level contexts into one dictionary per device. Your template asks for ctx["bgp"]["asn"] and never cares where in the hierarchy it was defined.

Generating Config

The data from NetBox feeds a template, exactly as in any automation stack — but now the variables are authoritative, not hand-maintained:

from jinja2 import Environment, FileSystemLoader
env = Environment(loader=FileSystemLoader("templates"))
tmpl = env.get_template("leaf.j2")
for dev in devices:
data = {
"hostname": dev.name,
"ctx": dict(dev.config_context),
"interfaces": [
{"name": i.name, "description": i.description,
"ips": [str(a) for a in nb.ipam.ip_addresses.filter(interface_id=i.id)]}
for i in nb.dcim.interfaces.filter(device_id=dev.id)
],
}
open(f"out/{dev.name}.cfg", "w").write(tmpl.render(**data))

Push the result with NAPALM in dry-run mode, review the diff, commit. The chain is: NetBox → render → diff → commit. No hop in that chain involves a human typing an IP address.

NetBox as Dynamic Inventory

The cleanest integration skips the manual export entirely — let your automation read NetBox directly as inventory. Ansible has a NetBox inventory plugin:

netbox_inventory.yml
plugin: netbox.netbox.nb_inventory
api_endpoint: https://netbox.example.net
token: "{{ lookup('env','NETBOX_TOKEN') }}"
group_by:
- device_roles
- sites
query_filters:
- status: active
Terminal window
ansible-inventory -i netbox_inventory.yml --graph

Now “add a device” means “add it in NetBox,” and the next automation run picks it up. There is no second list of hosts to forget to update. Nornir has an equivalent inventory plugin (nornir_netbox) for the Python side.

Verifying the Inventory Before You Trust It

Before any playbook runs against NetBox-as-inventory, prove the inventory resolves the way you think. The nb_inventory plugin builds groups and host vars from the API, and silent filtering is the usual surprise — a query_filters typo quietly drops half your fleet and the playbook “succeeds” against the wrong subset.

Terminal window
# Full tree: groups and the hosts under them
ansible-inventory -i netbox_inventory.yml --graph
# Everything the plugin knows about one host, including config_context as vars
ansible-inventory -i netbox_inventory.yml --host leaf01.dc1 --yaml

If a device is missing, the cause is almost always query_filters or device status. Map host vars explicitly so playbooks reference stable names rather than whatever NetBox field happened to populate:

netbox_inventory.yml
plugin: netbox.netbox.nb_inventory
api_endpoint: https://netbox.example.net
token: "{{ lookup('env','NETBOX_TOKEN') }}"
group_by:
- device_roles
- sites
compose:
ansible_host: primary_ip4
device_query_filters:
- has_primary_ip: 'true'

compose: ansible_host: primary_ip4 is the line people forget — without it Ansible tries to connect by device name, and DNS for management interfaces is rarely complete. Pulling primary_ip4 from NetBox means the IP you connect on is the same IP the source of truth records, closing one more gap where reality and the database could disagree.

Keeping It Honest: Drift Detection

A source of truth that nobody verifies drifts the first time someone makes an emergency change at 2 a.m. and forgets to update NetBox. Catch it by diffing intended-from-NetBox against actual-from-device on a schedule:

from napalm import get_network_driver
driver = get_network_driver("ios")
with driver(dev.name, user, pw) as d:
d.load_replace_candidate(config=rendered_from_netbox) # replace, so removals/extra lines show as drift
diff = d.compare_config()
d.discard_config()
if diff:
alert(f"{dev.name} drifted from NetBox:\n{diff}")

A non-empty diff means reality and the database disagree. Either the device was changed out-of-band (fix the device or update NetBox) or NetBox was changed without deploying (deploy it). Both are findings you want surfaced, and running this nightly turns “the spreadsheet is wrong again” into a daily alert instead of a quarterly surprise.

Custom Fields and Webhooks: Extending the Model

The stock schema covers physical and logical topology, but every network has facts NetBox does not model out of the box — a maintenance window, an out-of-band management IP type, a “monitored by” flag. Custom fields attach those to existing object types without forking the data model:

# Read a custom field the same way you read a built-in one
for dev in nb.dcim.devices.filter(site="dc1"):
if dev.custom_fields.get("oob_managed"):
push_oob_profile(dev)

Webhooks turn NetBox from a passive database into an event source. Register a webhook on the dcim.device object for created/updated events, point it at your CI trigger, and “someone changed a device in NetBox” becomes “a render pipeline started.” That closes the loop — the database edit is the deploy trigger, so there is no gap where the model is ahead of the devices.

A practical gotcha: webhooks fire on every save, including bulk imports. Debounce on the receiving side or you will kick off a hundred pipeline runs when someone uploads a rack of new switches.

Scaling the API Pulls

The naive loop above issues one API call per interface and per IP, per device. On a 500-device fabric that is tens of thousands of round trips, and a nightly drift job that should take minutes takes an hour. Two fixes matter.

First, pull in bulk and index in memory instead of filtering per-device:

# One call each, then group locally
all_intfs = list(nb.dcim.interfaces.filter(site="dc1"))
all_ips = list(nb.ipam.ip_addresses.filter(site="dc1"))
intfs_by_dev = {}
for i in all_intfs:
intfs_by_dev.setdefault(i.device.id, []).append(i)

Second, raise the page size so pynetbox makes fewer paginated requests. nb.dcim.interfaces paginates at the API’s default limit; set a larger limit to cut the number of HTTP calls:

nb.http_session.headers # pynetbox reuses one session
# Request a bigger page so a 4000-interface site is a handful of calls, not 80
intfs = nb.dcim.interfaces.filter(site="dc1", limit=1000)

For read-heavy jobs, point pynetbox at a NetBox read replica or enable caching in front of the API. The source of truth is authoritative, but it does not need to serve every nightly diff from the primary database.

When the Diff Is Noise

Drift detection earns trust only if a clean device produces an empty diff. The first time you run compare_config() against a NAPALM full-replace candidate, expect false positives — and learn to read them:

- snmp-server location DC1-RACK-A
+ snmp-server location DC1 Rack A
- ntp server 10.0.0.5
+ ntp server 10.0.0.5 prefer

That is not drift; that is your template not matching how the device normalizes config. Three usual culprits:

  • Ordering — devices reorder ACL entries, route-maps, or snmp-server host lines. NAPALM’s diff is line-based and will flag reordering as a change.
  • Defaults — the device shows prefer or a default timeout your template omits. Either render it or strip it before comparing.
  • Secrets — hashed passwords and SNMP communities render differently every time. Exclude them from the compared config rather than chasing a diff that never clears.

The discipline is to drive the template until a known-good device diffs clean, then turn on alerting. A drift detector that cries wolf nightly gets muted in a week, and a muted detector is the same as no detector.

The Cultural Part

The tooling is the easy 20%. The 80% is the rule the team has to actually keep: no manual changes that bypass NetBox. The first time an engineer SSHes in and fixes something live without updating the model, the source of truth is dead — it is now lying, and everyone learns to distrust it. Drift detection is what enforces the rule, because it makes the violation visible the next morning instead of letting it hide until it causes an outage.

Model intent, generate from it, read it as inventory, and diff reality against it nightly. Do that and NetBox stops being documentation and becomes what the name promises.