Cody Bunch Some Random IT Guy - OpenStack, DevOps, Cloud, Things

[Kubernetes] Fix: Node Password Rejected...

[Kubernetes] Fix: Node Password Rejected…

Node password rejected, duplicate hostname or contents of...

Judging by the error message, my Kubernetes cluster was having about the same Monday as I was.

What led here was trying to add a node back into the cluster after having rebuilt it.

Some steps…

To fix this, we need to uninstall the k3s-agent from the node in question, remove the local password file on said node, and finally, remove the node entry from the primary node.

On the node

On the node having the issue, we log in and uninstall the k3s agent and remove the local password file. As no workloads have been scheduled to the node, due to the “Node password rejected…” error, this should be non-impacting. The following commands show how to do this:

Uninstall k3s:

$ sudo /usr/local/bin/k3s-agent-uninstall.sh

Remove local password file:

$ sudo rm -f /etc/rancher/node/password

On the primary node

Log into primary node, and check for the node in /var/lib/rancher/k3s/server/cred/node-passwd:

# cat /var/lib/rancher/k3s/server/cred/node-passwd
fb4544f0d6cb016cc0c261e77ac214a4,swarm-08,swarm-08,
2a08234cdcf20072b2643eb934b04080,swarm-07,swarm-07,
e107b05b351cc5284e6b6babbe87145e,swarm-04,swarm-04,
d89f5ac3a70ccdbe3da828c63285b49d,swarm-03,swarm-03,
3a1882f29f1ac1e2f4029a580aa5836e,swarm-01,swarm-01,
2b2a5926b30407a7335fe01b1e6122b1,swarm-05,swarm-05,
bf92afadf05e21ee2bc22fb147760174,swarm-09,swarm-09,
2864b53210d3785b36ee304fc163a45d,swarm-02,swarm-02,
46a16d8dc4d227258f19caa2557b4bac,swarm-06,swarm-06,

# Count the number of entries
# cat /var/lib/rancher/k3s/server/cred/node-passwd | wc -l
9

Next we use sed to test removing the line:

# sed '/swarm-02/c\' /var/lib/rancher/k3s/server/cred/node-passwd
fb4544f0d6cb016cc0c261e77ac214a4,swarm-08,swarm-08,
2a08234cdcf20072b2643eb934b04080,swarm-07,swarm-07,
e107b05b351cc5284e6b6babbe87145e,swarm-04,swarm-04,
d89f5ac3a70ccdbe3da828c63285b49d,swarm-03,swarm-03,
3a1882f29f1ac1e2f4029a580aa5836e,swarm-01,swarm-01,
2b2a5926b30407a7335fe01b1e6122b1,swarm-05,swarm-05,
bf92afadf05e21ee2bc22fb147760174,swarm-09,swarm-09,
46a16d8dc4d227258f19caa2557b4bac,swarm-06,swarm-06,

Looks good, but once more to be sure:

# sed '/swarm-02/c\' /var/lib/rancher/k3s/server/cred/node-passwd | wc -l
8

Now that we have verified our sed syntax, we can remove the line and restart the k3s service:

# sed -i '/swarm-02/c\' /var/lib/rancher/k3s/server/cred/node-passwd

# sudo systemctl restart k3s

Finally, we reinstall k3s on the node using k3sup from Alex Ellis:

$ k3sup join \
  --ip ${nodeIP} \
  --server-ip ${primaryNodeIP} \
  --user ${k3sUser} \
  --k3s-version "v1.18.9+k3s1"

Note: I tag the specific version of k3s to use as my cluster is running on Raspberry Pi Model 2 B, which are not quite strong enough to stay super current.

Once the installation is finished, you can check that the node has indeed joined the cluster happily:

kubectl get node
NAME       STATUS                     ROLES    AGE     VERSION
swarm-03   Ready                      <none>   72d     v1.18.9+k3s1
swarm-08   Ready                      <none>   72d     v1.18.9+k3s1
swarm-06   Ready                      <none>   72d     v1.18.9+k3s1
swarm-04   Ready                      <none>   72d     v1.18.9+k3s1
swarm-01   Ready,SchedulingDisabled   master   72d     v1.18.9+k3s1
swarm-02   Ready                      <none>   7m12s   v1.18.9+k3s1
swarm-09   Ready                      <none>   72d     v1.18.9+k3s1
swarm-05   Ready                      <none>   72d     v1.18.9+k3s1
swarm-07   Ready                      <none>   68d     v1.18.9+k3s1

Addendum

You may also need to use the following commands to completely eject the node:

Note: These commands alone did not fix the “Node password rejected…” issue, as the kubectl delete node command did not clear out the node entry in /var/lib/rancher/k3s/server/cred/node-passwd

# Drain the node
$ kubectl drain [nodeName] --force --delete-local-data

# Delete the node
$ kubectl delete node [nodeName]

Resources