While creating a test environment for three Proxmox servers, one of the servers was cloned before realizing it was easier and faster to simply build up the next virtual machine. Everything looked fine except when setting up the ceph monitors, managers, and meta data servers, the cloned server had two entries. One was good with the green checkmark and another with a question mark in an unknown state. The IP addresses, hostnames, host files were all fine but by a stroke of luck, figured out the problem and the resolution.
After cloning the Proxmox server to another virtual machine, the IP address was changed, hostname and hosts file. Here are the files that were changed.
# Change the IP address in this file. /etc/network/interfaces # Change the IP address and server name in this file. /etc/hosts # Change the hostname hostnamectl set-hostname mynewhostnamegoeshere # Change IP on only one of the cluster nodes /etc/pve/corosync.conf
The next step and this is what was missed that created the unknown duplicate entry is to change the Machine ID, which was uncovered with the hostnamectl command.
rm -f /etc/machine-id /var/lib/dbus/machine-id dbus-uuidgen --ensure=/etc/machine-id dbus-uuidgen --ensure
Check the Machine ID again. It should be a new one. The ceph cluster should be happy too.
Clean Dashboard of ceph Health_WARN
Since you may have crashed the ceph cluster while trying to remedy the situation in advance of this possible solution. As this was in my case, I learned of a couple of commands to clear out the ceph error logs to further clean the dashboard.
ceph crash ls ceph crash archive-all
Source(s)
- https://wiki.debian.org/MachineId
- https://forum.proxmox.com/threads/change-cluster-nodes-ip-addresses.33406/
- https://forum.proxmox.com/threads/resetting-ceph-warnings.65778/