A Proxmox cluster exists so that no single node is a single point of failure. That same property is what lets you upgrade it without an outage — drain a node, upgrade it, bring it back, repeat. The trick is doing it in an order that never loses quorum and never leaves Ceph trying to heal data that is about to come right back.
Done carelessly, an upgrade is how you turn routine patching into a 2 a.m. recovery. Done in order, VMs never notice.
The Pre-Flight Check
Proxmox ships a checker for major-version upgrades. Run it on every node and resolve everything it flags before touching a package:
# major version upgrade readiness (name tracks the version pair)pve8to9 --full # run on each node; fix all warnings/failures firstIt catches the things that actually break upgrades: a node out of quorum, Ceph not HEALTH_OK, a too-old corosync, insufficient free space, or repositories pointing at the wrong release. Do not start until it is clean cluster-wide.
Confirm the cluster is healthy and has quorum to spare:
pvecm status # Quorate: Yes, and note the expected vote countha-manager status # all services started, none in error/recoveryceph -s # HEALTH_OK before you begin (if running Ceph)The Rolling Procedure
Upgrade one node at a time, never more. With a 3+ node cluster you always keep quorum because only one vote is ever missing.
1. Drain the node
Move every running VM/CT off the node with live migration — this is the no-downtime part. With HA configured, you can let HA do it, but explicit is clearer:
# Live-migrate each VM off this node to a targetqm migrate 101 node2 --onlineqm migrate 102 node3 --online# containers (brief switch unless using restart migration carefully)pct migrate 201 node2 --onlineOr drain via HA, which respects HA groups and rules:
ha-manager crm-command node-maintenance enable node1# HA relocates services off node1; wait until it's emptyConfirm the node is empty before proceeding:
qm list ; pct list # should show nothing running on this node2. Upgrade the node
Now it carries no workload, so a reboot costs nothing:
apt updateapt dist-upgrade # full upgrade; for major versions after repo switch# reboot if the kernel or other core packages were updatedreboot3. Rejoin and verify before moving on
After reboot, confirm it is back in quorum and healthy before draining the next node:
pvecm status # node back, Quorate: Yesceph -s # back to HEALTH_OK if running Cephha-manager crm-command node-maintenance disable node1Only then drain and upgrade node 2. Patience here is the whole method — two nodes down at once on a 3-node cluster is a lost quorum and a frozen cluster.
Ceph: Order Matters
If the cluster runs Ceph, the daemons upgrade in a strict order, and you must stop Ceph from rebalancing while a node is briefly down.
Set noout so OSDs going down for a reboot do not trigger a full data rebalance that you will only have to undo minutes later:
ceph osd set noout# ... do the node upgrade/reboot ...ceph osd unset noout # after all nodes doneThe daemon upgrade order is fixed — monitors, then managers, then OSDs, then MDS/gateways:
# 1. Monitors (one at a time, check quorum between each)ceph mon stat# 2. Managersceph mgr stat# 3. OSDs — restart per node, wait for HEALTH_OK before the nextceph -s # wait for all PGs active+clean between OSD nodesRestarting OSDs out of order, or without noout, sends Ceph into a rebalance storm that hammers your disks and risks data movement during a window when redundancy is already reduced. One node’s OSDs at a time, noout set, active+clean confirmed between each.
Corosync: Do Not Break the Heartbeat
Quorum rides on corosync. Two rules during upgrades:
- Never upgrade corosync on multiple nodes simultaneously — a version mismatch mid-cluster can drop membership. The pre-flight check flags incompatible jumps.
- Keep the corosync network healthy. If corosync shares a link with VM/migration traffic, a migration storm during the upgrade can starve the heartbeat and cause a spurious fence. A dedicated (or QoS-protected) corosync link is the standing recommendation, and it matters most during the upgrade churn.
corosync-quorumtool -s # members, quorum, and that all nodes agreejournalctl -u corosync -n 50 # watch for retransmits/membership flapsWhen a Migration Refuses to Move
The drain step assumes every VM live-migrates cleanly. Some will not, and finding out mid-upgrade with the node half-drained is the worst time. The usual blockers:
- Local resources. A VM with a disk on local storage, a passed-through PCI device, or a CD-ROM mounted from a local ISO cannot live-migrate.
qm migratefails fast and tells you which. - CPU type mismatch. A VM pinned to
hostCPU type may refuse to land on a node with a different microarchitecture. Use a named model (e.g.x86-64-v2-AES) on mixed hardware so VMs migrate across the whole cluster.
Check before you drain, not during:
# Ask the precondition API what would block node1 -> elsewhere (moves nothing)pvesh get /nodes/node1/qemu/101/migrate# returns local_disks, local_resources, allowed_nodes, not_allowed_nodes
# Inspect what a VM holds that might pin itqm config 101 | grep -E 'hostpci|^ide|^sata|machine|cpu'For a VM that genuinely cannot live-migrate (a GPU passthrough host, say), the honest options are: accept a brief offline migration in a planned window, or upgrade that node last and shut the VM down for its reboot only. Do not let one un-migratable VM tempt you into rebooting a node that still hosts it live.
# Offline migration for a VM that can't move online — it is briefly downqm migrate 105 node2 # no --online: stop, move, start on targetA Failure Mid-Upgrade: Recovering Quorum
Suppose node1 is drained and rebooting when node2 unexpectedly hard-fails. On a 3-node cluster you now have one node up out of three — quorum is lost, and the surviving node freezes the cluster filesystem (/etc/pve goes read-only) to protect against split-brain. VMs already running keep running; you just cannot start, stop, or migrate anything.
pvecm status# Expected: Quorate: No, with Total votes below the quorum thresholdThe correct fix is to get a node back, not to fight the safety mechanism. Bringing node1 back from its reboot restores 2/3 votes and quorum returns on its own. Only if a node is genuinely gone for the duration do you lower the expected votes — and this is a deliberate, documented step, not a reflex:
# ONLY when a node is confirmed down for an extended period and you must# operate the survivors. Temporarily reduce expected votes.pvecm expected 1# Restore the real value once the cluster is whole again.Setting expected low while a node is merely rebooting is how you create the split-brain the quorum was preventing. The discipline is the same as the upgrade itself: one node out at a time, and if a second leaves unexpectedly, your job is to restore it — not to convince corosync the missing nodes do not count.
The Order, Condensed
pveXtoY --fullon every node — fix everything.ceph osd set noout(if Ceph).- Per node, one at a time: drain (live-migrate) →
apt dist-upgrade→ reboot → verify quorum +HEALTH_OK→ next. - Ceph daemons in order: mon → mgr → osd → mds,
active+cleanbetween OSD nodes. ceph osd unset noout.- Final check:
pvecm status,ceph -s,ha-manager statusall green.
The discipline is boring and that is the point. The clusters that go down during upgrades are the ones where someone got impatient and rebooted two nodes “to save time.” On a quorum-based system, saving time by parallelizing the one thing you must serialize is how you turn a clean rolling upgrade into a frozen cluster and a recovery you did not plan for.