A Proxmox cluster looks simple: join nodes, share configuration, migrate VMs between them. Click a button, cluster created. The web UI makes it seem like magic.
It’s not magic. It’s distributed systems, and distributed systems fail in ways that single nodes don’t. Split-brain scenarios, quorum loss, network partitions — these aren’t theoretical. They happen, and when they do, your VMs stop or corrupt.
Clustering is not a button. It’s network discipline and failure planning.
What a Cluster Actually Is
A Proxmox cluster is:
- Shared configuration:
/etc/pveis replicated across all nodes via pmxcfs (a cluster filesystem) - Corosync: Cluster communication layer handling membership and messaging
- Quorum: Voting system to prevent split-brain
- Optional: Shared storage, HA, live migration
┌─────────────┐ Corosync ┌─────────────┐│ Node 1 │◄──────────────►│ Node 2 ││ (vote: 1) │ │ (vote: 1) │└──────┬──────┘ └──────┬──────┘ │ │ │ Corosync │ │◄────────────────────────────►│ │ │ ▼ ▼┌─────────────┐│ Node 3 ││ (vote: 1) │└─────────────┘
Quorum: 2 of 3 votes required (majority)Before You Cluster
Network Requirements
Corosync needs reliable, low-latency networking:
- Dedicated network recommended: Separate from VM traffic
- Same subnet: All nodes must be on same L2 network for Corosync
- Low latency: Under 2ms round-trip ideally
- Redundant links: For production, use bonding or multiple Corosync rings
Bad ideas:
- Corosync over WAN (latency kills it)
- Corosync over congested VM network
- Single network link (any failure = cluster issues)
Hostname and DNS
Before clustering, every node needs:
# Correct hostnamehostnamectl set-hostname pve1.lab.local
# /etc/hosts must resolve all cluster nodescat /etc/hosts127.0.0.1 localhost10.0.0.10 pve1.lab.local pve110.0.0.11 pve2.lab.local pve210.0.0.12 pve3.lab.local pve3Critical: Hostnames cannot change after clustering. Get them right now.
Time Synchronization
All nodes must have synchronized time:
# Check time synctimedatectl status
# Should show "System clock synchronized: yes"Time drift causes certificate issues and cluster instability.
Creating the Cluster
On First Node (pve1)
# Create clusterpvecm create my-cluster
# Verifypvecm statusOutput shows:
Cluster information-------------------Name: my-clusterConfig Version: 1Transport: knetSecure auth: on
Quorum information------------------Date: ...Quorum provider: corosync_votequorumNodes: 1Node ID: 0x00000001Ring ID: 1.5Quorate: Yes
Votequorum information----------------------Expected votes: 1Highest expected: 1Total votes: 1Quorum: 1Joining Additional Nodes (pve2, pve3)
From each node to join:
# Join cluster (run on pve2)pvecm add 10.0.0.10
# Enter root password for pve1 when promptedAfter joining:
# Check status from any nodepvecm status
# Should show all nodespvecm nodesQuorum: Why It Matters
Quorum prevents split-brain — where two halves of a cluster both think they’re in charge, making conflicting decisions.
How Quorum Works
Each node has votes (default: 1). Quorum requires majority:
| Nodes | Votes | Quorum Needed | Can Lose |
|---|---|---|---|
| 1 | 1 | 1 | 0 nodes |
| 2 | 2 | 2 | 0 nodes |
| 3 | 3 | 2 | 1 node |
| 4 | 4 | 3 | 1 node |
| 5 | 5 | 3 | 2 nodes |
Two-node problem: With 2 nodes, losing either means no quorum. Both nodes freeze.
Two-Node Cluster Solutions
Option 1: QDevice (recommended)
External quorum device provides tie-breaking vote:
# On a separate lightweight VM/LXC (not on cluster nodes!)apt install corosync-qnetd
# On each cluster nodeapt install corosync-qdevicepvecm qdevice setup 10.0.0.100 # QDevice IPNow you have 2 nodes + 1 QDevice = 3 votes. Can survive 1 node failure.
Option 2: Expected votes override (dangerous)
# On surviving node during splitpvecm expected 1This tells the node “expect only 1 vote for quorum.” Dangerous — only use when you’re certain the other node is truly dead.
Checking Quorum Status
# Detailed quorum infopvecm status
# Is cluster quorate?pvecm status | grep Quorate# Quorate: Yes (means cluster can operate)# Quorate: No (means cluster is frozen)Corosync Configuration
View Current Config
cat /etc/pve/corosync.confRedundant Corosync Links
For production, use multiple networks:
# View current linkspvecm status
# Add second linkpvecm addlink 0 10.10.0.10 # Node 0, second network IPpvecm addlink 1 10.10.0.11 # Node 1pvecm addlink 2 10.10.0.12 # Node 2Now Corosync uses both networks. If one fails, the other maintains cluster.
Network Interface Configuration
Ensure Corosync interfaces are correctly configured:
# Check which interfaces Corosync usescorosync-cfgtool -s
# Should show ring status for each linkCommon Cluster Operations
Node Maintenance
Before working on a node:
# Migrate all VMs off the node# Via Web UI or:for vmid in $(qm list | awk 'NR>1 {print $1}'); do qm migrate $vmid pve2 --onlinedone
# If using HA, disable it temporarilyha-manager set vm:100 --state disabledRemoving a Node
# On node being removed - stop cluster servicessystemctl stop pve-cluster corosync
# On remaining nodepvecm delnode pve3
# On removed node - clean uprm -rf /etc/pve/nodes/pve3rm /etc/corosync/*rm /var/lib/corosync/*Adding Node Back After Removal
The node must be completely clean:
# On the node to re-addsystemctl stop pve-cluster corosyncrm -rf /etc/pve/*rm -rf /etc/corosync/*rm -rf /var/lib/corosync/*
# Then join freshpvecm add 10.0.0.10Split-Brain Scenarios
What Happens
Network partition between nodes:
┌─────────┐ X ┌─────────┐│ pve1 │─────────X─────────│ pve2 ││ (alone) │ X │ (alone) │└─────────┘ (network cut) └─────────┘
Both nodes think: "Is the other dead, or just unreachable?"Without quorum:
- Neither can be sure the other is truly dead
- Both freeze rather than risk conflicting operations
- VMs stop (better than corruption)
With quorum (3+ nodes or QDevice):
- Majority side continues operating
- Minority side freezes
- Clear decision, no conflict
Recovering from Split-Brain
If both sides made changes (shouldn’t happen with proper quorum):
# Check pmxcfs statuscat /etc/pve/.members
# Force resync (DANGEROUS - data loss possible)systemctl stop pve-clusterpmxcfs -l # Local mode# Review /etc/pve, fix conflicts manuallysystemctl start pve-clusterThis is why you prevent split-brain rather than recover from it.
Troubleshooting
Cluster Won’t Form
# Check Corosync statussystemctl status corosync
# Check logsjournalctl -u corosync -f
# Common issues:# - Firewall blocking ports 5405-5412/udp# - Hostname mismatch# - Time driftNode Shows as Offline
# Check from "offline" nodepvecm status
# Check network connectivityping pve1ping pve2
# Check Corosync communicationcorosync-cfgtool -s# Ring should show "no faults"”Cluster not quorate” Error
# Check how many nodes are visiblepvecm nodes
# If nodes are missing, check network# If all nodes present but not quorate, check vote countpvecm status | grep -E "Expected|Total"Network Design for Clusters
Minimum (Lab)
┌─────────────┐All traffic ───────►│ Switch │ └──────┬──────┘ ┌────────────┼────────────┐ ▼ ▼ ▼ pve1 pve2 pve3 10.0.0.10 10.0.0.11 10.0.0.12Single network for everything. Works, but any network issue affects cluster.
Recommended (Production)
Corosync Network (dedicated) ┌─────────────┐ │ Switch A │ └──────┬──────┘ ┌────────────┼────────────┐ ▼ ▼ ▼ pve1 pve2 pve310.10.0.10 10.10.0.11 10.10.0.12
Management + VM Network ┌─────────────┐ │ Switch B │ └──────┬──────┘ ┌────────────┼────────────┐ ▼ ▼ ▼ pve1 pve2 pve310.0.0.10 10.0.0.11 10.0.0.12Separate networks. Corosync traffic isolated from VM traffic.
Best (Production + Redundancy)
Corosync Ring 0 Corosync Ring 1 Switch A Switch B │ │ ┌───┼───┐ ┌───┼───┐ ▼ ▼ ▼ ▼ ▼ ▼ pve1 pve2 pve3 pve1 pve2 pve3
Both rings active. Either can fail without cluster impact.The Lesson
A cluster is not a button. It’s network discipline and failure planning.
Clicking “Create Cluster” is the easy part. The hard part is:
- Network reliability (Corosync needs it)
- Quorum planning (how many nodes can you lose?)
- Split-brain prevention (QDevice for 2 nodes)
- Failure testing (does it actually fail over?)
A cluster that hasn’t been failure-tested is a cluster that will surprise you. Test node failures. Test network partitions. Know what happens before production depends on it.
The goal isn’t “we have a cluster.” The goal is “we understand how our cluster fails and have planned for it.”