High availability sounds like a feature you enable. Click “HA,” and VMs automatically restart when a node fails. Magic.
It’s not magic. It’s fencing, quorum, shared storage, and very specific failure handling. Get any of these wrong and HA either doesn’t work, or worse — causes split-brain where VMs run on multiple nodes simultaneously, corrupting data.
HA without testing is just a checkbox. A checkbox that might destroy your data when you actually need it.
HA Prerequisites
Before enabling HA, you need:
1. Cluster (3+ Nodes Recommended)
# Check cluster statuspvecm status
# Need quorum for HA decisions# 2 nodes = no node can fail without losing quorum# 3 nodes = 1 node can failTwo-node clusters need a QDevice for HA to work reliably.
2. Shared Storage
HA VMs must be on storage accessible from all nodes:
# Check shared storagepvesm status
# Valid for HA:# - Ceph (RBD)# - NFS# - iSCSI# - GlusterFS
# NOT valid:# - local# - local-lvm# - local-zfs (unless Ceph ZFS)If storage isn’t shared, VM can’t start on another node.
3. Fencing Capability
Fencing ensures a failed node is truly dead before starting VMs elsewhere. Without fencing, you risk:
Node 1: Appears dead (network issue)Node 2: Starts VM copyNode 1: Actually alive, VM still runningResult: Two VMs, same disk, corruptionFencing (The Critical Part)
What Fencing Does
Fencing forces a failed node to stop before HA restarts VMs:
- Node detected as failed
- HA manager tries to fence (kill) the node
- Only after successful fence → start VMs on other node
Fencing Methods
Hardware fencing (recommended):
- IPMI/iLO/DRAC power off
- PDU power cut
- SBD (Storage-Based Death)
Software fencing:
- Watchdog timer (self-fence)
- SSH fence (tell node to shutdown)
Configuring Watchdog Fencing
Most common in homelab. Node kills itself if it loses quorum:
# Enable hardware watchdogecho "softdog" >> /etc/modules
# Load modulemodprobe softdog
# Verifyls /dev/watchdogProxmox HA uses watchdog automatically. If node loses quorum and can’t reach cluster, watchdog triggers reboot.
IPMI Fencing (Production)
For reliable fencing, use IPMI:
# Install fence agentsapt install fence-agents
# Test IPMI fencing manuallyipmitool -H 10.0.0.200 -U admin -P password power statusipmitool -H 10.0.0.200 -U admin -P password power offConfigure in /etc/pve/ha/fence.cfg:
# Fence configuration# Not directly supported in PVE GUI, but can use with custom scriptsStorage-Based Fencing (SBD)
Nodes write heartbeats to shared storage. Missing heartbeat = fence:
# Create SBD device on shared storagesbd -d /dev/sdb create
# Configure SBDsbd -d /dev/sdb -1 60 -4 120 createEnabling HA for VMs
Add VM to HA
# Enable HA for VM 100ha-manager add vm:100
# With specific groupha-manager add vm:100 --group production
# Check HA statusha-manager statusVia Web UI: Datacenter → HA → Add → Select VM
HA States
| State | Meaning |
|---|---|
| started | HA will ensure VM is running |
| stopped | HA will ensure VM is stopped |
| disabled | HA ignores this VM |
| ignored | Temporarily ignore (migration) |
HA Groups
Groups define which nodes can run HA VMs:
# Create group preferring pve1 and pve2ha-manager groupadd production --nodes pve1,pve2
# Add VM to groupha-manager set vm:100 --group production
# Node priority (lower = preferred)ha-manager groupadd production --nodes pve1:1,pve2:2,pve3:3With priorities, VMs prefer pve1, failover to pve2, last resort pve3.
Restricted Groups
Only allow VMs on specific nodes:
# Create restricted groupha-manager groupadd gpu-nodes --nodes pve2,pve3 --restricted
# VMs in this group can ONLY run on pve2 or pve3ha-manager set vm:200 --group gpu-nodesUseful for VMs needing specific hardware (GPU, special storage).
HA Manager Behavior
Node Failure Sequence
1. Node stops responding to cluster heartbeats2. Other nodes detect failure (after timeout)3. Quorum check: Do remaining nodes have majority?4. If quorate: a. Attempt to fence failed node b. Wait for fence confirmation c. Start VMs on surviving nodes5. If not quorate: a. Cluster freezes b. No HA actions (prevents split-brain)Failover Timing
Detection timeout: 30 seconds (default)Fence attempt: Variable (IPMI: seconds, watchdog: 60s)VM startup: 10-60 seconds
Total failover time: 1-3 minutes typicalFor faster failover, tune detection but beware false positives.
Resource Migration
When node comes back online, VMs don’t automatically migrate back:
# VMs stay on failover node until:# 1. Manual migration# 2. Next failure# 3. Maintenance mode + recovery
# To migrate back manuallyqm migrate 100 pve1 --onlineThis is intentional. Automatic “failback” risks unnecessary disruption.
Maintenance Mode
Before working on a node, use maintenance mode:
# Request maintenance (HA migrates VMs away)ha-manager set-maintenance pve1 --enable
# Check statusha-manager status
# Wait for migrations to complete# Do maintenance work
# Disable maintenanceha-manager set-maintenance pve1 --disableThis gracefully moves VMs, unlike a failure which is disruptive.
Manual VM Migration
For HA VMs, use:
# Request HA to migrateha-manager migrate vm:100 pve2
# Or set VM to ignored temporarilyha-manager set vm:100 --state ignoredqm migrate 100 pve2 --onlineha-manager set vm:100 --state startedDon’t just qm migrate an HA VM — HA manager might fight you.
Testing HA (Critical)
Test 1: Simulated Node Failure
# On node to "fail"systemctl stop pve-cluster corosync
# Watch from another nodeha-manager status
# VMs should migrate to other nodes# After 1-2 minutes, check VMs are running elsewhere
# Restore nodesystemctl start corosync pve-clusterTest 2: Hard Power Off
Warning: These commands immediately crash the node without graceful shutdown.
# Physical power button or:echo b > /proc/sysrq-trigger # Immediate reboot (no sync)
# Or IPMI (preferred for remote testing):ipmitool chassis power off
# This tests actual fencing behaviorTest 3: Network Partition
# On node, drop cluster trafficiptables -A INPUT -p udp --dport 5405:5412 -j DROPiptables -A OUTPUT -p udp --dport 5405:5412 -j DROP
# Node should fence itself (watchdog) or be fenced (IPMI)# VMs should migrate
# Restoreiptables -FTest 4: Storage Failure
# If using NFS, unmount itumount -l /mnt/nfs-storage
# HA behavior depends on configuration# VMs using that storage should fail# Other VMs should continue
# Document what happens!Document Test Results
HA Test Report - 2025-01-08
Test: Node power off (pve2)Method: IPMI power offExpected: VMs 100, 101 migrate to pve1 or pve3
Timeline:- 00:00 Power off pve2- 00:32 Cluster detects failure- 00:45 Fence confirmed- 01:15 VM 100 started on pve1- 01:28 VM 101 started on pve3
Total failover: 1 minute 28 secondsResult: PASS
Issues: NoneTested by: AdminCommon HA Problems
”No quorum” — Nothing Happens
# Check quorumpvecm status | grep Quorate
# If "Quorate: No", cluster can't make decisions# Need majority of nodes onlineFix: Add more nodes, add QDevice, or manually set expected votes (dangerous).
VMs Won’t Start After Failover
# Check HA manager logsjournalctl -u pve-ha-lrm -f
# Common causes:# - Shared storage not available# - Resource constraints (RAM, CPU)# - Start dependenciesSplit-Brain Detected
If somehow VMs ran on multiple nodes:
# IMMEDIATELY stop VMs on one nodeqm stop 100 --skiplock
# Check for disk corruption# Restore from backup if neededThis is catastrophic. Prevent with proper fencing.
HA Service Stuck
# Restart HA servicessystemctl restart pve-ha-crmsystemctl restart pve-ha-lrm
# Check statusha-manager statusHA Architecture
Minimum Viable HA
3 nodes minimum (for quorum)Shared storage (NFS, Ceph, iSCSI)Fencing (watchdog at minimum)Production HA
3+ nodesRedundant network (bonding)Dedicated cluster networkCeph or enterprise SANHardware fencing (IPMI)UPS with monitoringHA Network Topology
┌───────────────────────────────────┐ │ Cluster Network │ │ (Corosync, fencing, HA) │ └───────────┬───────────┬───────────┘ │ │ ┌───────────┴───┐ ┌───┴───────────┐ │ pve1 │ │ pve2 │ │ (node 1) │ │ (node 2) │ └───────┬───────┘ └───────┬───────┘ │ │ ┌───────┴───────────────────┴───────┐ │ Storage Network │ │ (Ceph, iSCSI, NFS) │ └───────────────────────────────────┘Separate networks for cluster and storage prevents storage issues from affecting HA decisions.
The Lesson
HA without tests is just a checkbox.
Enabling HA takes 30 seconds. Testing it takes hours. But that testing is what determines whether HA works when you need it.
The checkbox says “HA enabled.” The test proves:
- Fencing actually works
- VMs actually migrate
- Storage is actually shared
- Recovery time meets requirements
Every HA setup has edge cases. The node that takes 5 minutes to fence. The VM that won’t start because of resource constraints. The storage path that fails under load.
You find these in testing, or you find them in production. Testing is cheaper.
Schedule regular HA tests. Document what happens. Fix what’s broken. That’s how you turn a checkbox into actual high availability.