Replacing Proxmox Virtual Environment Server in a Ceph cluster
Jump to navigation
Jump to search
Replacing a Proxmox Virtual Environment Server in hyperconverged ceph configuration.
This guide assumes a four node cluster with hostnames node1-node4. We're assuming a replacement of node2.
- Using HA, migrate running VM's from node2 to node1, or any other location that has ample resources and is currently a member of CEPH.
- Set OSDs on node2 to "out" and wait for rebalance.
- Following rebalance, stop and destroy all OSDs on node2.
- Remove Ceph mon and manager from node2.
- Clean up the Ceph CRUSH map and remove the host bucket using "ceph osd crush remove node2".
- From a node that's still participating in Ceph, run "pvecm delnode node2".
The node is now decommissioned and no longer participating in Ceph. It can be removed. Let's install the replacement.
- Physically remove old server, install new server. Cable and power server.
- Configure LOM such as Dell DRAC to use the correct IP address.
- Set new node2's IP management IP address to the IP of the previous machine. Validate connectivity.
- Edit /etc/hostname and /etc/hosts to confirm hostname is correctly matched to previous install's hostname.
- Reboot and verify hostname and IP are correct.
- If the previous machine had a Proxmox license, apply it now.
- Validate network connectivty on corosync network and both the Ceph frontend (consumption and management) and backend (replication) networks to all other nodes.
- Join the Proxmox cluster.
- Install Ceph.
- Add Ceph Mon and Ceph Manager to this node.
- Migrate a test VM to the new node to confirm consumption.
- If there are any other maintenance tasks to complete (like swapping another node with the previous node's hardware) do NOT add OSDs back to node2 until ready.
A similar series of steps can be taken if existing drives are being moved to a new server intallation, maintaining OS and OSDs, as opposed to new drives. We'll assume a replacement of node 1 with the previous node2's. hardware.
- Using HA, migrate running VM's from node1 to node2, or any other location that has ample resources and is currently a member of CEPH.
- Unlike before, set the noout flag - the OSDs aren't actually going anywhere, so we do not want a rebalancing.
- Shutdown node1.
- Physially move boot and data drives from node1 to the donor that was previously node2.
- Un-rack now driveless node1 and replace with now populated donor node2. Cable.
- Power on and configure LOM such as Dell DRAC to use the correct IP address.
- Validate system boot.
- Validate network connectivty on corosync network and both the Ceph frontend (consumption and management) and backend (replication) networks to all other nodes.
- Verify that OSDs are online and that all PGs report synced.
- Disable noout flag.
- We can now safely add OSDs back to the new node2 and allow it to rebalance. This could take a large amount of time - up to several days, depending on quantity of storage.