一、版本兼容性和服务器规划
组件 | 版本/配置信息 | 备注 |
---|---|---|
操作系统 | Anolis OS 8.9 | 基于 Linux 5.10.134-17.3.an8.x86_64 |
内核版本 | Linux 5.10.134-17.3.an8.x86_64 | 与 Kubernetes 1.29 兼容 |
架构 | x86-64 | |
Kubernetes 版本 | v1.29.5 | 最新稳定版,兼容 Linux 5.10 内核 |
Docker 版本 | 24.0.7 | 需要配置 systemd Cgroup 驱动 |
Calico 版本 | v3.27.3 | 支持 Kubernetes 1.29,适配 x86-64 架构 |
Dashboard 版本 | v2.7.0 | 最新版本 |
Master 服务器 IP | 192.168.153.200 | 主节点 |
Node1 服务器 IP | 192.168.153.201 | 工作节点1 |
Node2 服务器 IP | 192.168.153.202 | 工作节点2 |
二、环境准备(所有节点执行)
1、修改hosts文件,设置主机名
(只master节点上执行)
hostnamectl set-hostname master
(只node1节点执行)
hostnamectl set-hostname node1
(只node2节点执行)
hostnamectl set-hostname node22、关闭防火墙和SELinux
sudo systemctl disable --now firewalld
sudo setenforce 0
sudo sed -i 's/^SELINUX=.*/SELINUX=disabled/g' /etc/selinux/config3、禁用Swap
sudo swapoff -a
sudo sed -ri '/swap/s/^/#/' /etc/fstab4、配置内核参数
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system5、# 配置主机名解析(所有节点)
sudo tee -a /etc/hosts <<EOF
192.168.153.200 k8s-endpoint
192.168.153.200 master
192.168.153.201 node1
192.168.153.202 node2
EOF6、# 时间同步
sudo dnf install chrony -y
sudo systemctl enable --now chronyd7、#给终端配置颜色添加时间(个人习惯)
1.打开 ~/.bashrc 文件:
vim ~/.bashrc
2.找到或添加以下行来设置 PS1 变量(这是定义提示符的变量):
export PS1='\[\e[0;92m\][\u@\h \t]# \[\e[0m\]'
3. 保存并关闭文件。
:wq
4.使更改生效:
source ~/.bashrc
三、安装Docker(所有节点)
#卸载Podman及相关组件,强制移除所有Podman相关包(容易起冲突)
sudo dnf remove podman buildah skopeo catatonit --nobest -y
# 清理残留依赖
sudo dnf autoremove
# 清理旧缓存
sudo dnf clean all
# Docker源(阿里云加速)
sudo dnf config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
#安装docker指定版本
sudo dnf install -y docker-ce-24.0.7 docker-ce-cli-24.0.7 containerd.io# 配置Docker参数
sudo mkdir /etc/docker
cat <<EOF | sudo tee /etc/docker/daemon.json
{"exec-opts": ["native.cgroupdriver=systemd"],"registry-mirrors": ["https://docker.mirrors.ustc.edu.cn/"]
}
EOF#设置开机自启动
sudo systemctl enable --now docker
四、安装Kubernetes组件(所有节点)
# 添加Kubernetes源
sudo tee /etc/yum.repos.d/kubernetes.repo <<EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.29/rpm/
enabled=1
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes-new/core/stable/v1.29/rpm/repodata/repomd.xml.key
EOF# 安装kubeadm、kubelet、kubectl(指定版本)
sudo dnf install -y kubelet-1.29.5 kubeadm-1.29.5 kubectl-1.29.5# 设置kubelet开机启动(暂不启动)
sudo systemctl enable kubelet
五、初始化Master节点(仅Master执行)
#初始化命令
sudo kubeadm init
--apiserver-advertise-address=192.168.153.200
--control-plane-endpoint=192.168.153.200
--image-repository registry.cn-hangzhou.aliyuncs.com/google_containers
--kubernetes-version v1.29.5
--service-cidr=10.96.0.0/16
--pod-network-cidr=172.20.0.0/16# 配置kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
(如果初始化报错)检查并修复 containerd 配置(所有节点)
步骤1、检查并修复 containerd 配置
#移动配置文件 containerd 默认配置文件(若存在旧配置冲突):
sudo mv /etc/containerd/config.toml /root/config.toml
#重新生成 containerd 配置文件:
containerd config default | sudo tee /etc/containerd/config.toml步骤2、#启用 CRI 插件: 编辑 /etc/containerd/config.toml,确保以下配置存在:
vim /etc/containerd/config.toml
更换为阿里源:
sandbox_image = "registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6"步骤3、重启 containerd 服务:
sudo systemctl restart containerd步骤4、验证 containerd 服务状态
sudo systemctl status containerd
# 输出应包含 "Active: active (running)"步骤5、Kubernetes 1.29.5 要求 containerd ≥1.6.0 版本。通过以下命令验证版本:
containerd --version
六、节点加入集群(node1/node2执行)
#使用主节点初始化完成后生成的kubeadm join命令,例如:
kubeadm join k8s-endpoint:6443 --token xxxx.xxxxxxxxxxxx \--discovery-token-ca-cert-hash sha256:xxxxxxxx...#检查当前的令牌:(在Master执行)
kubeadm token list
#这将列出现有的令牌。如果没有有效的令牌,或者需要生成新的令牌,可以继续执行下面的步骤。#生成新的令牌(如果没有令牌或令牌已过期):(在Master执行)
kubeadm token create --print-join-command
七、部署Calico网络插件(仅Master执行)
# 使用阿里云镜像源适配版本
curl -O https://raw.githubusercontent.com/projectcalico/calico/v3.27.3/manifests/calico.yaml# 修改CIDR配置(与kubeadm参数一致)
sed -i 's/192.168.0.0\/16/172.20.0.0\/16/' calico.yaml#切换到阿里云的镜像源
sed -i 's|docker.io/calico/|registry.aliyuncs.com/calico/|g' calico.yaml#如果阿里云不行,可尝试这个国内镜像源swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico
sed -i 's|registry.aliyuncs.com/calico/|swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/|g' calico.yaml
[root@master 11:36:19]# cat calico.yaml | grep image:image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/cni:v3.27.3image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/cni:v3.27.3image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/node:v3.27.3image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/node:v3.27.3image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/calico/kube-controllers:v3.27.3#执行配置文件
kubectl apply -f calico.yaml# 验证网络状态
[root@master 11:40:14]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-6b44cbc54d-fdxjk 1/1 Running 0 109m
calico-node-7q9pl 1/1 Running 0 109m
calico-node-kwtrj 1/1 Running 0 109m
calico-node-vwskq 1/1 Running 0 109m
coredns-5f98f8d567-5lldq 1/1 Running 0 123m
coredns-5f98f8d567-j5874 1/1 Running 0 123m
etcd-master 1/1 Running 0 123m
kube-apiserver-master 1/1 Running 0 123m
kube-controller-manager-master 1/1 Running 1 (115m ago) 123m
kube-proxy-96xr5 1/1 Running 0 123m
kube-proxy-f9wl6 1/1 Running 0 120m
kube-proxy-sqfrh 1/1 Running 0 118m
kube-scheduler-master 1/1 Running 1 (115m ago) 123m
#查看节点状态
[root@master 11:44:38]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready control-plane 124m v1.29.5
node1 Ready <none> 121m v1.29.5
node2 Ready <none> 118m v1.29.5
八、Dashboard部署(仅在master执行)
#下载官网yaml文件
wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml#修改Service为NodePort ,指定服务的类型为 NodePort,意味着这个服务可以通过节点的 IP 和指定的端口对外暴露。
#并添加nodePort: 30001 ,这是节点暴露的端口,外部访问时会通过这个端口。
vim recommended.yaml
[root@master 13:20:28]# grep -A 7 'spec:' recommended.yaml | head -n 8
spec:type: NodePortports:- port: 443targetPort: 8443nodePort: 30001selector:k8s-app: kubernetes-dashboard#image替换为阿里源,registry.cn-hangzhou.aliyuncs.com/google_containers
[root@master 13:09:40]# cat recommended.yaml | grep image:image: registry.cn-hangzhou.aliyuncs.com/google_containers/dashboard:v2.7.0image: registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-scraper:v1.0.8
#如果阿里云不可用 尝试这个国内镜像源swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/kubernetesui
[root@master 11:45:26]# cat recommended.yaml | grep image:image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/kubernetesui/dashboard:v2.7.0image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/kubernetesui/metrics-scraper:v1.0.8# 部署
kubectl apply -f recommended.yaml#查看所有pod 状态。
[root@master 12:59:29]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-6b44cbc54d-fdxjk 1/1 Running 0 3h4m
kube-system calico-node-7q9pl 1/1 Running 0 3h4m
kube-system calico-node-kwtrj 1/1 Running 0 3h4m
kube-system calico-node-vwskq 1/1 Running 0 3h4m
kube-system coredns-5f98f8d567-5lldq 1/1 Running 0 3h18m
kube-system coredns-5f98f8d567-j5874 1/1 Running 0 3h18m
kube-system etcd-master 1/1 Running 0 3h18m
kube-system kube-apiserver-master 1/1 Running 0 3h18m
kube-system kube-controller-manager-master 1/1 Running 1 (3h10m ago) 3h18m
kube-system kube-proxy-96xr5 1/1 Running 0 3h18m
kube-system kube-proxy-f9wl6 1/1 Running 0 3h15m
kube-system kube-proxy-sqfrh 1/1 Running 0 3h13m
kube-system kube-scheduler-master 1/1 Running 1 (3h10m ago) 3h18m
kubernetes-dashboard dashboard-metrics-scraper-bd84c9d8b-x2gmj 1/1 Running 0 136m
kubernetes-dashboard kubernetes-dashboard-5cc694d9b-825gq 1/1 Running 0 136m#查看 kubernetes-dashboard 命名空间下资源状态
kubectl get pods,svc -n kubernetes-dashboard
[root@master 12:59:29]# kubectl get pods,svc -n kubernetes-dashboard
NAME READY STATUS RESTARTS AGE
pod/dashboard-metrics-scraper-bd84c9d8b-x2gmj 1/1 Running 0 136m
pod/kubernetes-dashboard-5cc694d9b-825gq 1/1 Running 0 136mNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/dashboard-metrics-scraper ClusterIP 10.96.100.159 <none> 8000/TCP 136m
service/kubernetes-dashboard NodePort 10.96.67.20 <none> 443:30001/TCP 136m1、获取节点 IP
kubectl get nodes -o wide2、访问地址,浏览器输入(注意使用 HTTPS):
https://192.168.153.200:30001
#绕过证书警告(开发环境)
Chrome:在页面任意位置输入 thisisunsafe(无需回车)。
Firefox:点击 高级 -> 接受风险并继续。3、创建管理员账号(若未提前创建):
kubectl create serviceaccount dashboard-admin -n kubernetes-dashboard
kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:dashboard-admin4、获取 Token
kubectl -n kubernetes-dashboard get secret dashboard-admin-token -o go-template='{{.data.token | base64decode}}'
获取 Token乱码报错解决方法:
#步骤 1:确认 ServiceAccount 关联的 Secret
[root@master 12:59:31]# kubectl -n kubernetes-dashboard describe sa dashboard-admin
Name: dashboard-admin
Namespace: kubernetes-dashboard
Labels: <none>
Annotations: <none>
Image pull secrets: <none>
Mountable secrets: <none>
Tokens: dashboard-admin-token
Events: <none>#步骤 2:手动创建 Secret 并关联 Token(若未生成 Secret,需手动创建:)
kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:name: dashboard-admin-tokennamespace: kubernetes-dashboardannotations:kubernetes.io/service-account.name: dashboard-admin
type: kubernetes.io/service-account-token
EOF#步骤 3:验证 Secret 内容,确保 Secret 包含 token 字段:
kubectl -n kubernetes-dashboard get secret dashboard-admin-token -o jsonpath='{.data.token}'
#若输出为 null,删除旧 Secret 并重新创建:
kubectl -n kubernetes-dashboard delete secret dashboard-admin-token
kubectl apply -f <上述 YAML 文件>步骤 4:获取 Token
kubectl -n kubernetes-dashboard get secret dashboard-admin-token -o go-template='{{.data.token | base64decode}}'
[root@master 12:59:30]# kubectl -n kubernetes-dashboard get secret dashboard-admin-token -o go-template='{{.data.token | base64decode}}'
eyJhbGciOiJSUzI1NiIsImtpZCI6IlBINlNjODNwR1duZWR4TWVfV3pkRWZsTG1UUzJxZGRTb1pyTHBrNkFZRUUifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiOWVhMWY0OGUtZDNjYS00ZjViLWEzODAtN2U3MjE3MGRiYTdmIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmVybmV0ZXMtZGFzaGJvYXJkOmRhc2hib2FyZC1hZG1pbiJ9.tyg0bnwtzbdR7bqGOPZ8ZgkfDvVd9ZpSoeDCX1qbMM4SgUu6mdlbd5UTNdR-Yq6e-F3TzVHfobkSfWvLBumdRjTPj9qvDedzPhl2nB8vdx2VNE4dvkyJ_OlB3MqJdFH9wuzU93ovRbbOULjTnTm2AOUWck1eJFw8YVmbgHmx4xnfLSlcSFOIbeJmhm1rPGZlsRDQgIlcnVAhPkPpuBO21wrLtzQwL0D6aVGxRaNXQMhlj1lqz-duaXd6aK7kkXQvO1M4xJoktmT2Ey-JDf9fygt7AP2saC86KRWK0B3drRkNNkSFeZ9VDhoPPf6KsZ9hG1zVUjUOpZFTED6zDZ0PMw
#确保 Dashboard 服务已正确暴露(如 NodePort 30001),并通过浏览器访问:
[root@master 12:59:31]# kubectl get svc -n kubernetes-dashboard | grep NodePort
kubernetes-dashboard NodePort 10.96.67.20 <none> 443:30001/TCP 136m
九、故障运维
#重启docker
systemctl restart docker
#重启kubelet
systemctl restart kubelet
#查看docker 状态
systemctl status docker
#查看kubelet状态
systemctl status kubelet
#查看所有pod 状态
kubectl get pods -A
#查看kubernetes-dashboard信息
kubectl get pods,svc -n kubernetes-dashboard问题1:CoreDNS异常CrashLoopBackOff反复重启问题
#编辑 Corefile loop #将loop直接删除,避免内部循环
kubectl edit -n kube-system cm coredns
#修改完CoreDNS后,将coredns的pod重新删除后就恢复正常
kubectl delete -n kube-system pod coredns-59799fb945-tcjsl
kubectl delete -n kube-system pod coredns-59799fb945-zlqkt问题2:k8s部署calico网络后,calico-node显示READY 0/1
#原因是master节点网卡比较多,calico选择了错误的网卡,
#修改calico.yaml,指定正确的网卡名称即可。
vim calico.yaml
# Auto-detect the BGP IP address.
- name: IP_AUTODETECTION_METHODvalue: "interface=ens160"问题3:节点暴露的端口配置
vim recommended.yaml问题4:Kubernetes Dashboard 不显示 CPU 和内存数据。
#下载Metrics Server
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
#配置调整添加 TLS 忽略参数(解决证书问题)
vim components.yaml
添加:containers:- args:- --cert-dir=/tmp- --secure-port=10250- --kubelet-insecure-tls #添加此行
#切换到阿里源
sed -i 's|registry.k8s.io/metrics-server/metrics-server|registry.aliyuncs.com/google_containers/metrics-server|g' components.yaml
#部署
kubectl apply -f components.yaml
#检查 Pod 状态:
[root@master 13:11:31]# kubectl get pods -A | grep metrics-server
kube-system metrics-server-85c75cb9b4-8nrqh 1/1 Running 0 76s
#查看节点和 Pod 资源使用情况:
kubectl top nodes # 显示所有节点 CPU/内存数据。
kubectl top pods -A #显示所有命名空间下 Pod 的资源使用情况(如 CPU 和内存)。
[root@master 13:11:49]# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 211m 7% 1974Mi 54%
node1 129m 4% 2148Mi 59%
node2 103m 3% 2100Mi 58%
[root@master 13:11:55]# kubectl top pods -A
NAMESPACE NAME CPU(cores) MEMORY(bytes)
kube-system calico-kube-controllers-6b44cbc54d-fdxjk 2m 21Mi
kube-system calico-node-7q9pl 38m 186Mi
kube-system calico-node-kwtrj 38m 192Mi
kube-system calico-node-vwskq 28m 199Mi
kube-system coredns-5f98f8d567-5lldq 3m 21Mi
kube-system coredns-5f98f8d567-j5874 2m 20Mi
kube-system etcd-master 29m 70Mi
kube-system kube-apiserver-master 55m 319Mi
kube-system kube-controller-manager-master 15m 98Mi
kube-system kube-proxy-96xr5 1m 22Mi
kube-system kube-proxy-f9wl6 1m 21Mi
kube-system kube-proxy-sqfrh 1m 18Mi
kube-system kube-scheduler-master 4m 35Mi
kube-system metrics-server-85c75cb9b4-8nrqh 4m 16Mi
kubernetes-dashboard dashboard-metrics-scraper-bd84c9d8b-x2gmj 1m 15Mi
kubernetes-dashboard kubernetes-dashboard-5cc694d9b-825gq 1m 23Mi 问题5:Token有效期时间太短。
方法1:
#打印输出配置
kubectl -n kubernetes-dashboard get deploy kubernetes-dashboard -o yaml
#编辑,添加- --token-ttl=43200 # 新增参数单位:秒,43200秒=12小时)
kubectl -n kubernetes-dashboard edit deploy kubernetes-dashboard
spec:containers:- args:- --auto-generate-certificates- --namespace=kubernetes-dashboard- --token-ttl=43200 # 新增参数单位:秒,43200秒=12小时)
#验证修改
[root@master 14:01:34]# kubectl -n kubernetes-dashboard get deploy kubernetes-dashboard -o yaml | grep "token-ttl"- --token-ttl=43200
#若修改导致 Dashboard 无法启动,可通过以下命令回滚:
kubectl -n kubernetes-dashboard rollout undo deploy kubernetes-dashboard方法2:#直接生成长期有效的 Token
kubectl -n kubernetes-dashboard create token dashboard-admin --duration=720h # 有效期 720 小时(30 天)
问题1:
问题2:
问题3:
问题4:
问题5: