Cluster Setup: Joining Nodes, Quorum, and Corosync Realities

A Proxmox cluster looks simple: join nodes, share configuration, migrate VMs between them. Click a button, cluster created. The web UI makes it seem like magic.

It’s not magic. It’s distributed systems, and distributed systems fail in ways that single nodes don’t. Split-brain scenarios, quorum loss, network partitions — these aren’t theoretical. They happen, and when they do, your VMs stop or corrupt.

Clustering is not a button. It’s network discipline and failure planning.

What a Cluster Actually Is

A Proxmox cluster is:

  1. Shared configuration: /etc/pve is replicated across all nodes via pmxcfs (a cluster filesystem)
  2. Corosync: Cluster communication layer handling membership and messaging
  3. Quorum: Voting system to prevent split-brain
  4. Optional: Shared storage, HA, live migration
┌─────────────┐ Corosync ┌─────────────┐
│ Node 1 │◄──────────────►│ Node 2 │
│ (vote: 1) │ │ (vote: 1) │
└──────┬──────┘ └──────┬──────┘
│ │
│ Corosync │
│◄────────────────────────────►│
│ │
▼ ▼
┌─────────────┐
│ Node 3 │
│ (vote: 1) │
└─────────────┘
Quorum: 2 of 3 votes required (majority)

Before You Cluster

Network Requirements

Corosync needs reliable, low-latency networking:

  • Dedicated network recommended: Separate from VM traffic
  • Same subnet: All nodes must be on same L2 network for Corosync
  • Low latency: Under 2ms round-trip ideally
  • Redundant links: For production, use bonding or multiple Corosync rings

Bad ideas:

  • Corosync over WAN (latency kills it)
  • Corosync over congested VM network
  • Single network link (any failure = cluster issues)

Hostname and DNS

Before clustering, every node needs:

Terminal window
# Correct hostname
hostnamectl set-hostname pve1.lab.local
# /etc/hosts must resolve all cluster nodes
cat /etc/hosts
127.0.0.1 localhost
10.0.0.10 pve1.lab.local pve1
10.0.0.11 pve2.lab.local pve2
10.0.0.12 pve3.lab.local pve3

Critical: Hostnames cannot change after clustering. Get them right now.

Time Synchronization

All nodes must have synchronized time:

Terminal window
# Check time sync
timedatectl status
# Should show "System clock synchronized: yes"

Time drift causes certificate issues and cluster instability.

Creating the Cluster

On First Node (pve1)

Terminal window
# Create cluster
pvecm create my-cluster
# Verify
pvecm status

Output shows:

Cluster information
-------------------
Name: my-cluster
Config Version: 1
Transport: knet
Secure auth: on
Quorum information
------------------
Date: ...
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1.5
Quorate: Yes
Votequorum information
----------------------
Expected votes: 1
Highest expected: 1
Total votes: 1
Quorum: 1

Joining Additional Nodes (pve2, pve3)

From each node to join:

Terminal window
# Join cluster (run on pve2)
pvecm add 10.0.0.10
# Enter root password for pve1 when prompted

After joining:

Terminal window
# Check status from any node
pvecm status
# Should show all nodes
pvecm nodes

Quorum: Why It Matters

Quorum prevents split-brain — where two halves of a cluster both think they’re in charge, making conflicting decisions.

How Quorum Works

Each node has votes (default: 1). Quorum requires majority:

NodesVotesQuorum NeededCan Lose
1110 nodes
2220 nodes
3321 node
4431 node
5532 nodes

Two-node problem: With 2 nodes, losing either means no quorum. Both nodes freeze.

Two-Node Cluster Solutions

Option 1: QDevice (recommended)

External quorum device provides tie-breaking vote:

Terminal window
# On a separate lightweight VM/LXC (not on cluster nodes!)
apt install corosync-qnetd
# On each cluster node
apt install corosync-qdevice
pvecm qdevice setup 10.0.0.100 # QDevice IP

Now you have 2 nodes + 1 QDevice = 3 votes. Can survive 1 node failure.

Option 2: Expected votes override (dangerous)

Terminal window
# On surviving node during split
pvecm expected 1

This tells the node “expect only 1 vote for quorum.” Dangerous — only use when you’re certain the other node is truly dead.

Checking Quorum Status

Terminal window
# Detailed quorum info
pvecm status
# Is cluster quorate?
pvecm status | grep Quorate
# Quorate: Yes (means cluster can operate)
# Quorate: No (means cluster is frozen)

Corosync Configuration

View Current Config

Terminal window
cat /etc/pve/corosync.conf

For production, use multiple networks:

Terminal window
# View current links
pvecm status
# Add second link
pvecm addlink 0 10.10.0.10 # Node 0, second network IP
pvecm addlink 1 10.10.0.11 # Node 1
pvecm addlink 2 10.10.0.12 # Node 2

Now Corosync uses both networks. If one fails, the other maintains cluster.

Network Interface Configuration

Ensure Corosync interfaces are correctly configured:

Terminal window
# Check which interfaces Corosync uses
corosync-cfgtool -s
# Should show ring status for each link

Common Cluster Operations

Node Maintenance

Before working on a node:

Terminal window
# Migrate all VMs off the node
# Via Web UI or:
for vmid in $(qm list | awk 'NR>1 {print $1}'); do
qm migrate $vmid pve2 --online
done
# If using HA, disable it temporarily
ha-manager set vm:100 --state disabled

Removing a Node

Terminal window
# On node being removed - stop cluster services
systemctl stop pve-cluster corosync
# On remaining node
pvecm delnode pve3
# On removed node - clean up
rm -rf /etc/pve/nodes/pve3
rm /etc/corosync/*
rm /var/lib/corosync/*

Adding Node Back After Removal

The node must be completely clean:

Terminal window
# On the node to re-add
systemctl stop pve-cluster corosync
rm -rf /etc/pve/*
rm -rf /etc/corosync/*
rm -rf /var/lib/corosync/*
# Then join fresh
pvecm add 10.0.0.10

Split-Brain Scenarios

What Happens

Network partition between nodes:

┌─────────┐ X ┌─────────┐
│ pve1 │─────────X─────────│ pve2 │
│ (alone) │ X │ (alone) │
└─────────┘ (network cut) └─────────┘
Both nodes think: "Is the other dead, or just unreachable?"

Without quorum:

  • Neither can be sure the other is truly dead
  • Both freeze rather than risk conflicting operations
  • VMs stop (better than corruption)

With quorum (3+ nodes or QDevice):

  • Majority side continues operating
  • Minority side freezes
  • Clear decision, no conflict

Recovering from Split-Brain

If both sides made changes (shouldn’t happen with proper quorum):

Terminal window
# Check pmxcfs status
cat /etc/pve/.members
# Force resync (DANGEROUS - data loss possible)
systemctl stop pve-cluster
pmxcfs -l # Local mode
# Review /etc/pve, fix conflicts manually
systemctl start pve-cluster

This is why you prevent split-brain rather than recover from it.

Troubleshooting

Cluster Won’t Form

Terminal window
# Check Corosync status
systemctl status corosync
# Check logs
journalctl -u corosync -f
# Common issues:
# - Firewall blocking ports 5405-5412/udp
# - Hostname mismatch
# - Time drift

Node Shows as Offline

Terminal window
# Check from "offline" node
pvecm status
# Check network connectivity
ping pve1
ping pve2
# Check Corosync communication
corosync-cfgtool -s
# Ring should show "no faults"

”Cluster not quorate” Error

Terminal window
# Check how many nodes are visible
pvecm nodes
# If nodes are missing, check network
# If all nodes present but not quorate, check vote count
pvecm status | grep -E "Expected|Total"

Network Design for Clusters

Minimum (Lab)

┌─────────────┐
All traffic ───────►│ Switch │
└──────┬──────┘
┌────────────┼────────────┐
▼ ▼ ▼
pve1 pve2 pve3
10.0.0.10 10.0.0.11 10.0.0.12

Single network for everything. Works, but any network issue affects cluster.

Corosync Network (dedicated)
┌─────────────┐
│ Switch A │
└──────┬──────┘
┌────────────┼────────────┐
▼ ▼ ▼
pve1 pve2 pve3
10.10.0.10 10.10.0.11 10.10.0.12
Management + VM Network
┌─────────────┐
│ Switch B │
└──────┬──────┘
┌────────────┼────────────┐
▼ ▼ ▼
pve1 pve2 pve3
10.0.0.10 10.0.0.11 10.0.0.12

Separate networks. Corosync traffic isolated from VM traffic.

Best (Production + Redundancy)

Corosync Ring 0 Corosync Ring 1
Switch A Switch B
│ │
┌───┼───┐ ┌───┼───┐
▼ ▼ ▼ ▼ ▼ ▼
pve1 pve2 pve3 pve1 pve2 pve3
Both rings active. Either can fail without cluster impact.

The Lesson

A cluster is not a button. It’s network discipline and failure planning.

Clicking “Create Cluster” is the easy part. The hard part is:

  • Network reliability (Corosync needs it)
  • Quorum planning (how many nodes can you lose?)
  • Split-brain prevention (QDevice for 2 nodes)
  • Failure testing (does it actually fail over?)

A cluster that hasn’t been failure-tested is a cluster that will surprise you. Test node failures. Test network partitions. Know what happens before production depends on it.

The goal isn’t “we have a cluster.” The goal is “we understand how our cluster fails and have planned for it.”