- 环境查看
系统环境
# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
# uname -a
Linux CentOS7K8SMaster01005101 3.10.0-1160.114.2.el7.x86_64 #1 SMP Wed Mar 20 15:54:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
软件环境
# kubectl version
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.4", GitCommit:"c27b913fddd1a6c480c229191a087698aa92f0b1", GitTreeState:"clean", BuildDate:"2019-02-28T13:37:52Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
# /opt/etcd/bin/etcd --version
etcd Version: 3.3.10
Git SHA: 27fc7e2
Go Version: go1.10.4
Go OS/Arch: linux/amd64
- 故障现象
阿里云一台服务器出现挖矿之后恢复了磁盘快照etcd无法启动
查看/var/log/messages日志发现以下日志
etcd: member fa1b88064457e060 has already been bootstrapped
systemd: etcd.service: main process exited, code=exited, status=1/FAILURE
systemd: Unit etcd.service entered failed state.
systemd: etcd.service failed.
systemd: etcd.service holdoff time over, scheduling restart.
etcd: recognized environment variable ETCD_NAME, but unused: shadowed by corresponding flag
etcd: recognized environment variable ETCD_DATA_DIR, but unused: shadowed by corresponding flag
etcd: recognized environment variable ETCD_LISTEN_PEER_URLS, but unused: shadowed by corresponding flag
etcd: recognized environment variable ETCD_LISTEN_CLIENT_URLS, but unused: shadowed by corresponding flag
etcd: recognized environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS, but unused: shadowed by corresponding flag
etcd: recognized environment variable ETCD_ADVERTISE_CLIENT_URLS, but unused: shadowed by corresponding flag
etcd: recognized environment variable ETCD_INITIAL_CLUSTER, but unused: shadowed by corresponding flag
etcd: recognized environment variable ETCD_INITIAL_CLUSTER_TOKEN, but unused: shadowed by corresponding flag
etcd: recognized environment variable ETCD_INITIAL_CLUSTER_STATE, but unused: shadowed by corresponding flag
etcd: etcd Version: 3.3.10
- 原因分析
恢复快照之后数据不同步导致 - 修复步骤
删除/var/lib/etcd/重启etcd无法修复
需要修改etcd启动文件
完整的配置文件如下
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target[Service]
Type=notify
EnvironmentFile=/opt/etcd/cfg/etcd
ExecStart=/opt/etcd/bin/etcd \
--name=${ETCD_NAME} \
--data-dir=${ETCD_DATA_DIR} \
--listen-peer-urls=${ETCD_LISTEN_PEER_URLS} \
--listen-client-urls=${ETCD_LISTEN_CLIENT_URLS},http://127.0.0.1:2379 \
--advertise-client-urls=${ETCD_ADVERTISE_CLIENT_URLS} \
--initial-advertise-peer-urls=${ETCD_INITIAL_ADVERTISE_PEER_URLS} \
--initial-cluster=${ETCD_INITIAL_CLUSTER} \
--initial-cluster-token=${ETCD_INITIAL_CLUSTER_TOKEN} \
--initial-cluster-state=existing \
--cert-file=/opt/etcd/ssl/server.pem \
--key-file=/opt/etcd/ssl/server-key.pem \
--peer-cert-file=/opt/etcd/ssl/server.pem \
--peer-key-file=/opt/etcd/ssl/server-key.pem \
--trusted-ca-file=/opt/etcd/ssl/ca.pem \
--peer-trusted-ca-file=/opt/etcd/ssl/ca.pem
Restart=on-failure
LimitNOFILE=65536[Install]
WantedBy=multi-user.target
重启
# systemctl reload-daemon
# systemctl restart etcd
查看是否修复
# kubectl get cs
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-1 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
etcd-2 Healthy {"health":"true"}