Replacing A Proxmox Virtual Environment Server in a Ceph cluster

From RoseWiki
Jump to navigation Jump to search

The Situation

Imagine a situation in which you have a cluster of Proxmox VE nodes as part of a hpyerconverged Ceph installation. You want to replace one of your nodes with a fully brand new installation of Proxmox VE. You don't intend to migrate its drives, you want to replace the node outright with new drives and a new installation of Proxmox VE. In order for this to work, there are a handful of things you need to consider to prevent 1. unnecessary Ceph rebalances 2. broken HA in Proxmox VE. The broken HA issue comes from the fact that Proxmox VE uses SSH internally to move data around and to run commands on remote nodes. These keys get messed up because Proxmox does not remove the keys when decommissioning a node.

The Strategy

Migrate all virtual machines from the node. Ensure that you have made copies of any local data or backups that you want to keep. In addition, make sure to remove any scheduled replication jobs to the node to be removed.

Failure to remove replication jobs to a node before removing said node will result in the replication job becoming irremovable. Especially note that replication automatically switches direction if a replicated VM is migrated, so by migrating a replicated VM from a node to be deleted, replication jobs will be set up to that node automatically.

Replacing a Proxmox Virtual Environment Server in hyperconverged Ceph configuration.

This guide assumes a four node cluster with hostnames node1-node4. We're assuming a replacement of node2, and for convenience, we're migrating to node1, but it could be to any other node in the cluster. In this guide, we are specifically replacing node2 with a new installation of PVE on an upgraded server that shares the same IP(s) and hostname, not migrating an existing installation to a new chassis.

Confirm cluster status (prerequisite, optional).

From node2's CLI, run pvecm nodes.

node2# pvecm nodes
Membership information
----------------------
    Nodeid      Votes Name
         1          1 node1
         2          1 node2 (local)
         3          1 node3
         4          1 node4

Confirm that the node you intend to decommission is listed has the "(local)" identifier next to its name. This confirms we're on the right machine.

Migrate VM, CT, templates, storage off of node2.

Using HA, migrate running VM's from node2 to node1.

Manually migrate all remaining nonrunning VMs, CTs, and templates off of node2 to node1.

Decommission node from CEPH.

  1. Set OSDs on node2 to "out" and wait for rebalance.
    • On Node2, go to Ceph -> OSD -> Node2 -> per each OSD listed, select the OSD and press "Out" in the top right of the control bar.
  2. Following rebalance, stop and destroy all OSDs on node2.
    • In the same screen of the PVE gui, select each out OSD and press stop, and once stopped, under the "more" submenu, select "destroy".
  3. Remove the Ceph servers (monitor, manager, and metadata) from node2.
    • In Ceph -> Monitor, select node2's monitor and press stop, and then once stopped, press destroy.
    • In Ceph -> Monitor, select node2's manager in the bottom panel, press stop, and once stopped, press destroy.
    • In Ceph -> CephFS - Metadata Servers, stop and destroy the Metadata Server for node2.
  4. On node2's CLI, clean up the Ceph CRUSH map and remove the host bucket using "ceph osd crush remove node2".
  5. From node1's CLI, run "pvecm delnode node2"
Important: Power off node2 before running pvecm delnode. This is because the SSH keys and some internal references to the cluster may still exist on the node being removed, and it may attempt to perform sync operations within corosync and any distributed storage (i.e. Replicated ZFS pools).
At this point, it is possible that you will receive an error message stating Could not kill node (error = CS_ERR_NOT_EXIST). This does not signify an actual failure in the deletion of the node, but rather a failure in corosync trying to kill an offline node. Thus, it can be safely ignored.

Confirm node deleted.

From node1's CLI, run pvecm nodes.

node2# pvecm nodes
Membership information
----------------------
    Nodeid      Votes Name
         1          1 node1 (local)
         2          1 node3
         3          1 node4

Clean up SSH Keys.

There is a major missing detail in the official documentation which is that if you intend to join a node into a cluster with the same hostname and IP the previous node had, everything will break if you don't take some prerequisite actions.

Power on the new node2 preconfigured with the same hostname and IP address.

From node1, SSH into the new node2. This will generate an error from the SSH client which will contain a line containing a command that will remove the key from the known hosts file. Use this command syntax to clear these keys:

ssh-keygen -f '/etc/ssh/ssh_known_hosts' -R 'XXX.XXX.XXX.XXX'
ssh-keygen -f '/etc/ssh/ssh_known_hosts' -R 'node2.domain.com'

Now attempt to SSH into node2 again. You will get a nearly identical error, only this time the path to the known_hosts file will be in the root user's .ssh directory as so:

ssh-keygen -f '/root/.ssh/ssh_known_hosts' -R 'XXX.XXX.XXX.XXX'
ssh-keygen -f '/root/.ssh/ssh_known_hosts' -R 'node2.domain.com'
This second series of commands, pointing to /root/, are not replicated across the cluster members, and must be done on all members of the cluster manually.

After executing these commands, when we join the node to the cluster, SSH will work correctly.

Join node to cluster.

  1. Power on server and configure LOM such as Dell DRAC to use the correct IP address.
  2. Edit /etc/hostname and /etc/hosts to confirm hostname is correctly matched to previous install's hostname.
  3. Reboot and verify hostname and IP are correct.
  4. If the previous machine had a Proxmox license, apply it now.
  5. Validate network connectivty on corosync network and both the Ceph frontend (consumption and management) and backend (replication) networks to all other nodes.
  6. Join the Proxmox cluster.
  7. Install Ceph.
  8. Add Ceph monitor and Ceph manager to this node.
  9. Migrate a test VM to the new node to confirm consumption.
  10. If there are any other maintenance tasks to complete (like swapping another node with the previous node's hardware) do NOT add OSDs back to node2 until ready.