1. 原理介绍
- 设置 HPA 每次最小扩容 Pod 数为可用区数量,以期可用区间 Pod 同步扩容
- 设置 TopologySpreadConstraints 可用区分散 maxSkew 为 1,以尽可能可用区间 Pod 均匀分布
2. 实验验证
2.1. 准备 Kind 集群
准备如下配置文件,命名为 kind-cluster.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-planeimage: kindest/node:v1.24.0@sha256:0866296e693efe1fed79d5e6c7af8df71fc73ae45e3679af05342239cdc5bc8e
- role: workerimage: kindest/node:v1.24.0@sha256:0866296e693efe1fed79d5e6c7af8df71fc73ae45e3679af05342239cdc5bc8elabels:topology.kubernetes.io/zone: "us-east-1a"
- role: workerimage: kindest/node:v1.24.0@sha256:0866296e693efe1fed79d5e6c7af8df71fc73ae45e3679af05342239cdc5bc8elabels:topology.kubernetes.io/zone: "us-east-1c"
上述配置为集群定义了 2 个工作节点,并分别打上了不同的可用区标签。
执行如下命令创建该 Kubernetes 集群:
$ kind create cluster --config cluster-1.24.yaml
Creating cluster "kind" ...✓ Ensuring node image (kindest/node:v1.24.0) 🖼 ✓ Preparing nodes 📦 📦 📦 ✓ Writing configuration 📜 ✓ Starting control-plane 🕹️ ✓ Installing CNI 🔌 ✓ Installing StorageClass 💾 ✓ Joining worker nodes 🚜
Set kubectl context to "kind-kind"
You can now use your cluster with:kubectl cluster-info --context kind-kindHave a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂
检查集群运行正常:
$ kubectl get node --show-labels
NAME STATUS ROLES AGE VERSION LABELS
kind-control-plane Ready control-plane 161m v1.24.0 beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=kind-control-plane,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node.kubernetes.io/exclude-from-external-load-balancers=
kind-worker Ready <none> 160m v1.24.0 beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=kind-worker,kubernetes.io/os=linux,topology.kubernetes.io/zone=us-east-1a
kind-worker2 Ready <none> 160m v1.24.0 beta.kubernetes.io/arch=arm64,beta.kubernetes.io/os=linux,kubernetes.io/arch=arm64,kubernetes.io/hostname=kind-worker2,kubernetes.io/os=linux,topology.kubernetes.io/zone=us-east-1c
2.2. 安装 metrics-server 组件
HPA 依赖 metrics-server 提供监控指标,通过如下命令安装:
$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
提示: 国内网络不能直接下载到 registry.k8s.io/metrics-server/metrics-server:v0.6.4
镜像,可以替换为等同的 shidaqiu/metrics-server:v0.6.4
。同时,关闭 tls 安全校验,如下图:
检查部署后的 metrics-server 运行正常:
$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
kind-control-plane 238m 5% 667Mi 8%
kind-worker 76m 1% 207Mi 2%
kind-worker2 41m 1% 110Mi 1%
2.3. 部署测试服务
准备如下 YAML,命名为 hpa-php-demo.yaml
注意:Deployment 的 topologySpreadConstraints 配置为可用区分散!
apiVersion: apps/v1
kind: Deployment
metadata:name: php-web-demo
spec:selector:matchLabels:run: php-web-demoreplicas: 1template:metadata:labels:run: php-web-demospec:topologySpreadConstraints:- maxSkew: 1topologyKey: kubernetes.io/zonewhenUnsatisfiable: ScheduleAnywaylabelSelector:matchLabels:run: php-web-democontainers:- name: php-web-demoimage: shidaqiu/hpademo:latestports:- containerPort: 80resources:limits:cpu: 500mrequests:cpu: 200m
---
apiVersion: v1
kind: Service
metadata:name: php-web-demolabels:run: php-web-demo
spec:ports:- port: 80selector:run: php-web-demo
部署上述服务:
kubectl apply -f hpa-php-demo.yaml
2.4. 部署 HPA 配置
准备 HPA 配置文件,命名为 hpa-demo.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:name: php-web-demo
spec:scaleTargetRef:apiVersion: apps/v1kind: Deploymentname: php-web-demominReplicas: 2maxReplicas: 10metrics:- type: Resourceresource:name: cputarget:type: UtilizationaverageUtilization: 50behavior:scaleDown:stabilizationWindowSeconds: 300policies:- type: Percentvalue: 50periodSeconds: 15- type: Podsvalue: 2periodSeconds: 15scaleUp:stabilizationWindowSeconds: 0policies:- type: Percentvalue: 100periodSeconds: 15- type: Podsvalue: 2periodSeconds: 15 selectPolicy: Max
部署上述 HPA 配置:
$ kubectl apply -f hpa-demo.yaml
上述 HPA 通过 scaleUp 和 scaleDown 定义了扩容和缩容的行为,每次扩容一倍或 2 个 Pod(取较大者),每次缩容一半或 2 个 Pod(取较大者)。
2.5. 验证扩容
扩容前,观察 Pod 分别运行在两个区:
$ kubectl get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
php-web-demo-d6d66c8d5-22tn6 1/1 Running 0 6m57s 10.244.2.3 kind-worker2 <none> <none>
php-web-demo-d6d66c8d5-tz8m9 1/1 Running 0 76s 10.244.1.3 kind-worker <none> <none>
给服务施加压力:
$ kubectl run -it --rm load-generator --image=busybox /bin/sh
进入容器后,执行如下脚本:
while true; do wget -q -O- http://php-web-demo; done
可以观察到 Pod 扩容时,同时在两个可用区进行,实现了可用区同步扩容的效果
停止施加压力,可以观察到 Pod 缩容保持了可用区分散的状态
如何保证缩容后,Pod 仍在多可用区均匀分散?
可以考虑借助 descheduler 的 rebalance 能力,参考 https://github.com/kubernetes-sigs/descheduler?tab=readme-ov-file#removepodsviolatingtopologyspreadconstraint