基于CFSSL构建高可用ETCD集群全指南(含TLS证书管理)
摘要:本文深入讲解使用CFSSL工具签发TLS证书,并部署生产级高可用ETCD集群的完整流程。涵盖证书全生命周期管理、集群配置优化及安全加固方案,适用于Kubernetes、分布式系统等场景。
一、环境规划与架构设计
1.1 节点信息
节点IP | 角色 | 主机名 | 证书SAN扩展 |
---|---|---|---|
192.167.14.228 | ETCD Master | etcd-1 | IP:228,229,246 |
192.167.14.229 | ETCD Backup | etcd-2 | DNS:etcd-cluster |
192.167.14.246 | ETCD Backup | etcd-3 |
1.2 端口规划
端口 | 协议 | 用途 |
---|---|---|
2379 | HTTPS | 客户端通信 |
2380 | HTTPS | 节点间Peer通信 |
二、CFSSL证书管理全流程
2.1 安装CFSSL工具链
wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64 \https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64 \https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64chmod +x cfssl* && mv cfssl_linux-amd64 /usr/local/bin/cfssl
mv cfssljson_linux-amd64 /usr/local/bin/cfssljson
mv cfssl-certinfo_linux-amd64 /usr/bin/cfssl-certinfo
2.2 生成根证书机构(CA)
mkdir -p ~/etcd_tls && cd ~/etcd_tls# CA配置文件
cat > ca-config.json <<EOF
{"signing": {"default": {"expiry": "876000h"},"profiles": {"kubernetes": {"expiry": "876000h","usages": ["signing", "key encipherment", "server auth", "client auth"]}}}
}
EOF# CA CSR请求文件
cat > ca-csr.json <<EOF
{"CN": "Kubernetes","key": {"algo": "rsa", "size": 2048},"names": [{"C": "CN", "L": "Xi'an", "O": "k8s", "OU": "Cluster"}]
}
EOF# 生成CA证书
cfssl gencert -initca ca-csr.json | cfssljson -bare ca
2.3 签发ETCD服务证书
cat > etcd-csr.json <<EOF
{"CN": "etcd","hosts": ["192.167.14.228","192.167.14.229", "192.167.14.246","etcd-cluster.local"],"key": {"algo": "rsa", "size": 2048},"names": [{"C": "CN", "L": "Xi'an", "O": "k8s", "OU": "ETCD"}]
}
EOFcfssl gencert -ca=ca.pem -ca-key=ca-key.pem \-config=ca-config.json -profile=kubernetes \etcd-csr.json | cfssljson -bare etcd
三、ETCD集群部署实战
3.1 安装ETCD二进制
ETCD_VER=v3.5.9
wget https://github.com/etcd-io/etcd/releases/download/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gztar -zxvf etcd-${ETCD_VER}-linux-amd64.tar.gz
mkdir -p /opt/etcd/{bin,cfg,ssl}
mv etcd-${ETCD_VER}-linux-amd64/{etcd,etcdctl} /opt/etcd/bin/
3.2 节点配置模板(以etcd-1为例)
cat > /opt/etcd/cfg/etcd.conf <<EOF
[Member]
name = "etcd-1"
data-dir = "/var/lib/etcd"
listen-peer-urls = "https://192.167.14.228:2380"
listen-client-urls = "https://192.167.14.228:2379,https://127.0.0.1:2379"[Cluster]
initial-advertise-peer-urls = "https://192.167.14.228:2380"
advertise-client-urls = "https://192.167.14.228:2379"
initial-cluster = "etcd-1=https://192.167.14.228:2380,etcd-2=https://192.167.14.229:2380,etcd-3=https://192.167.14.246:2380"
initial-cluster-token = "etcd-cluster"
initial-cluster-state = "new"
EOF
3.3 Systemd服务配置
cat > /usr/lib/systemd/system/etcd.service <<EOF
[Unit]
Description=ETCD KeyValue Store
Documentation=https://etcd.io
After=network.target[Service]
EnvironmentFile=/opt/etcd/cfg/etcd.conf
ExecStart=/opt/etcd/bin/etcd \--cert-file=/opt/etcd/ssl/etcd.pem \--key-file=/opt/etcd/ssl/etcd-key.pem \--peer-cert-file=/opt/etcd/ssl/etcd.pem \--peer-key-file=/opt/etcd/ssl/etcd-key.pem \--trusted-ca-file=/opt/etcd/ssl/ca.pem \--peer-trusted-ca-file=/opt/etcd/ssl/ca.pem
Restart=on-failure
LimitNOFILE=65536[Install]
WantedBy=multi-user.target
EOF
四、集群初始化与验证
4.1 启动集群
systemctl daemon-reload
systemctl enable --now etcd
4.2 集群健康检查
ETCDCTL_API=3 /opt/etcd/bin/etcdctl \--cacert=/opt/etcd/ssl/ca.pem \--cert=/opt/etcd/ssl/etcd.pem \--key=/opt/etcd/ssl/etcd-key.pem \--endpoints="https://192.167.14.228:2379,https://192.167.14.229:2379,https://192.167.14.246:2379" \endpoint health --write-out=table
预期输出:
+---------------------------+--------+-------------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+---------------------------+--------+-------------+-------+
| https://192.167.14.228:2379 | true | 14.567345ms | |
| https://192.167.14.229:2379 | true | 15.234512ms | |
| https://192.167.14.246:2379 | true | 16.789123ms | |
+---------------------------+--------+-------------+-------+
五、生产级优化建议
5.1 安全加固
# 启用客户端证书认证
--client-cert-auth=true# 定期轮换证书(每年)
openssl x509 -in /opt/etcd/ssl/etcd.pem -noout -dates
5.2 性能调优
# 调整后端存储配额
--quota-backend-bytes=8589934592 # 8GB# 优化日志配置
--log-level=warn
--logger=zap
六、防火墙策略(生产必配)
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.167.14.0/24" port port="2379-2380" protocol="tcp" accept'
firewall-cmd --reload
七、故障排查指南
现象 | 排查命令 | 解决方案 |
---|---|---|
节点无法加入集群 | journalctl -u etcd -f | 检查证书SAN与节点IP是否匹配 |
客户端连接超时 | telnet <IP> 2379 | 验证防火墙和SELinux策略 |
存储空间不足 | du -sh /var/lib/etcd/member/ | 清理快照或扩容存储 |
证书过期 | cfssl-certinfo -cert etcd.pem | 重新签发证书并滚动重启集群 |
扩展工具推荐:
- etcd-browser:Web管理界面
- etcd-backup-operator:自动化备份工具
通过本文,您已掌握企业级ETCD集群的构建与维护技能。建议定期进行灾难恢复演练确保集群高可用!
如果本教程帮助您解决了问题,请点赞❤️收藏⭐支持!欢迎在评论区留言交流技术细节!欲了解密码学知识,请订阅《密码学实战》专栏 → 密码学实战