Cluster Setup: Joining Nodes, Quorum, and Corosync Realities

August 22, 2025 · 8 min read

A Proxmox cluster looks simple: join nodes, share configuration, migrate VMs between them. Click a button, cluster created. The web UI makes it seem like magic.

It’s not magic. It’s distributed systems, and distributed systems fail in ways that single nodes don’t. Split-brain scenarios, quorum loss, network partitions — these aren’t theoretical. They happen, and when they do, your VMs stop or corrupt.

Clustering is not a button. It’s network discipline and failure planning.

What a Cluster Actually Is

A Proxmox cluster is:

Shared configuration: /etc/pve is replicated across all nodes via pmxcfs (a cluster filesystem)
Corosync: Cluster communication layer handling membership and messaging
Quorum: Voting system to prevent split-brain
Optional: Shared storage, HA, live migration

┌─────────────┐    Corosync    ┌─────────────┐
│    Node 1   │◄──────────────►│    Node 2   │
│  (vote: 1)  │                │  (vote: 1)  │
└──────┬──────┘                └──────┬──────┘
       │                              │
       │         Corosync             │
       │◄────────────────────────────►│
       │                              │
       ▼                              ▼
┌─────────────┐
│    Node 3   │
│  (vote: 1)  │
└─────────────┘

Quorum: 2 of 3 votes required (majority)

Before You Cluster

Network Requirements

Corosync needs reliable, low-latency networking:

Dedicated network recommended: Separate from VM traffic
Same subnet: All nodes must be on same L2 network for Corosync
Low latency: Under 2ms round-trip ideally
Redundant links: For production, use bonding or multiple Corosync rings

Bad ideas:

Corosync over WAN (latency kills it)
Corosync over congested VM network
Single network link (any failure = cluster issues)

Hostname and DNS

Before clustering, every node needs:

# Correct hostname
hostnamectl set-hostname pve1.lab.local

# /etc/hosts must resolve all cluster nodes
cat /etc/hosts
127.0.0.1 localhost
10.0.0.10 pve1.lab.local pve1
10.0.0.11 pve2.lab.local pve2
10.0.0.12 pve3.lab.local pve3

Critical: Hostnames cannot change after clustering. Get them right now.

Time Synchronization

All nodes must have synchronized time:

# Check time sync
timedatectl status

# Should show "System clock synchronized: yes"

Time drift causes certificate issues and cluster instability.

Creating the Cluster

On First Node (pve1)

# Create cluster
pvecm create my-cluster

# Verify
pvecm status

Output shows:

Cluster information
-------------------
Name:             my-cluster
Config Version:   1
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             ...
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.5
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1

Joining Additional Nodes (pve2, pve3)

From each node to join:

# Join cluster (run on pve2)
pvecm add 10.0.0.10

# Enter root password for pve1 when prompted

After joining:

# Check status from any node
pvecm status

# Should show all nodes
pvecm nodes

Quorum: Why It Matters

Quorum prevents split-brain — where two halves of a cluster both think they’re in charge, making conflicting decisions.

How Quorum Works

Each node has votes (default: 1). Quorum requires majority:

Nodes	Votes	Quorum Needed	Can Lose
1	1	1	0 nodes
2	2	2	0 nodes
3	3	2	1 node
4	4	3	1 node
5	5	3	2 nodes

Two-node problem: With 2 nodes, losing either means no quorum. Both nodes freeze.

Two-Node Cluster Solutions

Option 1: QDevice (recommended)

External quorum device provides tie-breaking vote:

# On a separate lightweight VM/LXC (not on cluster nodes!)
apt install corosync-qnetd

# On each cluster node
apt install corosync-qdevice
pvecm qdevice setup 10.0.0.100  # QDevice IP

Now you have 2 nodes + 1 QDevice = 3 votes. Can survive 1 node failure.

Option 2: Expected votes override (dangerous)

# On surviving node during split
pvecm expected 1

This tells the node “expect only 1 vote for quorum.” Dangerous — only use when you’re certain the other node is truly dead.

Checking Quorum Status

# Detailed quorum info
pvecm status

# Is cluster quorate?
pvecm status | grep Quorate
# Quorate: Yes  (means cluster can operate)
# Quorate: No   (means cluster is frozen)

Corosync Configuration

View Current Config

cat /etc/pve/corosync.conf

Redundant Corosync Links

For production, use multiple networks:

# View current links
pvecm status

# Add second link
pvecm addlink 0 10.10.0.10  # Node 0, second network IP
pvecm addlink 1 10.10.0.11  # Node 1
pvecm addlink 2 10.10.0.12  # Node 2

Now Corosync uses both networks. If one fails, the other maintains cluster.

Network Interface Configuration

Ensure Corosync interfaces are correctly configured:

# Check which interfaces Corosync uses
corosync-cfgtool -s

# Should show ring status for each link

Common Cluster Operations

Node Maintenance

Before working on a node:

# Migrate all VMs off the node
# Via Web UI or:
for vmid in $(qm list | awk 'NR>1 {print $1}'); do
    qm migrate $vmid pve2 --online
done

# If using HA, disable it temporarily
ha-manager set vm:100 --state disabled

Removing a Node

# On node being removed - stop cluster services
systemctl stop pve-cluster corosync

# On remaining node
pvecm delnode pve3

# On removed node - clean up
rm -rf /etc/pve/nodes/pve3
rm /etc/corosync/*
rm /var/lib/corosync/*

Adding Node Back After Removal

The node must be completely clean:

# On the node to re-add
systemctl stop pve-cluster corosync
rm -rf /etc/pve/*
rm -rf /etc/corosync/*
rm -rf /var/lib/corosync/*

# Then join fresh
pvecm add 10.0.0.10

Split-Brain Scenarios

What Happens

Network partition between nodes:

┌─────────┐         X         ┌─────────┐
│  pve1   │─────────X─────────│  pve2   │
│ (alone) │         X         │ (alone) │
└─────────┘   (network cut)   └─────────┘

Both nodes think: "Is the other dead, or just unreachable?"

Without quorum:

Neither can be sure the other is truly dead
Both freeze rather than risk conflicting operations
VMs stop (better than corruption)

With quorum (3+ nodes or QDevice):

Majority side continues operating
Minority side freezes
Clear decision, no conflict

Recovering from Split-Brain

If both sides made changes (shouldn’t happen with proper quorum):

# Check pmxcfs status
cat /etc/pve/.members

# Force resync (DANGEROUS - data loss possible)
systemctl stop pve-cluster
pmxcfs -l  # Local mode
# Review /etc/pve, fix conflicts manually
systemctl start pve-cluster

This is why you prevent split-brain rather than recover from it.

Troubleshooting

Cluster Won’t Form

# Check Corosync status
systemctl status corosync

# Check logs
journalctl -u corosync -f

# Common issues:
# - Firewall blocking ports 5405-5412/udp
# - Hostname mismatch
# - Time drift

Node Shows as Offline

# Check from "offline" node
pvecm status

# Check network connectivity
ping pve1
ping pve2

# Check Corosync communication
corosync-cfgtool -s
# Ring should show "no faults"

”Cluster not quorate” Error

# Check how many nodes are visible
pvecm nodes

# If nodes are missing, check network
# If all nodes present but not quorate, check vote count
pvecm status | grep -E "Expected|Total"

Network Design for Clusters

Minimum (Lab)

                    ┌─────────────┐
All traffic ───────►│   Switch    │
                    └──────┬──────┘
              ┌────────────┼────────────┐
              ▼            ▼            ▼
           pve1         pve2         pve3
        10.0.0.10    10.0.0.11    10.0.0.12

Single network for everything. Works, but any network issue affects cluster.

Recommended (Production)

Corosync Network (dedicated)
          ┌─────────────┐
          │  Switch A   │
          └──────┬──────┘
    ┌────────────┼────────────┐
    ▼            ▼            ▼
 pve1         pve2         pve3
10.10.0.10  10.10.0.11  10.10.0.12

Management + VM Network
          ┌─────────────┐
          │  Switch B   │
          └──────┬──────┘
    ┌────────────┼────────────┐
    ▼            ▼            ▼
 pve1         pve2         pve3
10.0.0.10   10.0.0.11   10.0.0.12

Separate networks. Corosync traffic isolated from VM traffic.

Best (Production + Redundancy)

Corosync Ring 0          Corosync Ring 1
    Switch A                 Switch B
       │                        │
   ┌───┼───┐                ┌───┼───┐
   ▼   ▼   ▼                ▼   ▼   ▼
 pve1 pve2 pve3           pve1 pve2 pve3

Both rings active. Either can fail without cluster impact.

The Lesson

A cluster is not a button. It’s network discipline and failure planning.

Clicking “Create Cluster” is the easy part. The hard part is:

Network reliability (Corosync needs it)
Quorum planning (how many nodes can you lose?)
Split-brain prevention (QDevice for 2 nodes)
Failure testing (does it actually fail over?)

A cluster that hasn’t been failure-tested is a cluster that will surprise you. Test node failures. Test network partitions. Know what happens before production depends on it.

The goal isn’t “we have a cluster.” The goal is “we understand how our cluster fails and have planned for it.”