背景
笔者尝试部署手动部署promethues去采集kubelet的node节点数据信息时报错
笔者的promethus的配置文件和promthues的clusterrole配置如下所示:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: prometheus
rules:
- apiGroups: [""]resources:- nodes- nodes/proxy# - nodes/metrics- services- endpoints- podsverbs: ["get", "list", "watch"]
- apiGroups:- extensionsresources:- ingressesverbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:name: prometheusnamespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: prometheus
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: prometheus
subjects:
- kind: ServiceAccountname: prometheusnamespace: default---
apiVersion: v1
data:prometheus.yml: |-global:scrape_interval: 15s evaluation_interval: 15sscrape_configs:- job_name: 'kubernetes-nodes'tls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: node- job_name: 'kubernetes-service'tls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: service- job_name: 'kubernetes-endpoints'tls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: endpoints- job_name: 'kubernetes-ingress'tls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: ingress- job_name: 'kubernetes-pods'tls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtbearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: pod- job_name: 'kubernetes-kubelet'scheme: httpstls_config:ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crtinsecure_skip_verify: truebearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/tokenkubernetes_sd_configs:- role: noderelabel_configs:- action: labelmapregex: __meta_kubernetes_node_label_(.+)kind: ConfigMap
metadata:name: prometheus-config---
apiVersion: v1
kind: "Service"
metadata:name: prometheuslabels:name: prometheus
spec:ports:- name: prometheusprotocol: TCPport: 9090targetPort: 9090selector:app: prometheustype: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:name: prometheuslabels:app: prometheus
spec:replicas: 1selector:matchLabels:app: prometheustemplate:metadata:labels:app: prometheusspec:serviceAccountName: prometheusserviceAccount: prometheuscontainers:- name: prometheusimage: prom/prometheus:v2.2.1command:- "/bin/prometheus"args:- "--config.file=/etc/prometheus/prometheus.yml"ports:- containerPort: 9090volumeMounts:- mountPath: "/etc/prometheus"name: prometheus-configvolumes:- name: prometheus-configconfigMap:name: prometheus-config
解决措施
笔者已经在promethues的配置文件中添加了insecure_skip_verify: true选项,这个选项跳过了tls的校验。这时候报错server returned HTTP status 403 Forbidden很显然是接口权限问题。
问题一:https://10.101.12.132:10250/metrics这个接口是做什么
https://10.101.12.132:10250/metrics 是一个特定的路径,通常用于获取 Kubernetes
集群中的节点(Node)的指标数据。也就是说,它提供了节点级别的监控指标
问题二:这个接口主要由什么资源进行权限控制
在 Kubernetes 中,https://10.101.12.132:10250/metrics 接口相关的权限通常由 ClusterRole 或 ClusterRoleBinding 来管理。这两个角色资源对于授予集群范围的权限非常有用。ClusterRole 定义了一组权限,它们可以在整个集群中使用。ClusterRoleBinding 则用于将角色绑定到用户、组或其他实体上,以授予这些实体访问相应权限的能力。要授予访问 https://10.101.12.132:10250/metrics 接口的权限,可能需要使用以下 ClusterRole 和 ClusterRoleBinding 示例作为参考:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:name: node-metrics-reader
rules:- apiGroups: [""]resources: ["nodes/metrics"]verbs: ["get", "list"]---apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:name: node-metrics-reader-binding
roleRef:apiGroup: rbac.authorization.k8s.iokind: ClusterRolename: node-metrics-reader
subjects:
- kind: Username: <username> # 替换为具体的用户名或组名
我们尝试修改前文中的promethues的ClusterRole中的配置,删除前文中的注释,添加 - nodes/metrics资源的可操作权限,问题解决
参考文章: 部署了 prometheus, 在 target 中显示 cadvisor 与 nodes 的状态都是 down