监控k8s controller和scheduler,创建serviceMonitor以及Rules

目录

一、修改kube-controller和kube-schduler的yaml文件

二、创建service、endpoint、serviceMonitor

三、Prometheus验证

四、创建PrometheusRule资源

五、Prometheus验证


直接上干货

一、修改kube-controller和kube-schduler的yaml文件

注意:修改时要一个节点一个节点的修改,等上一个修改的节点服务正常启动后再修改下个节点

kube-controller文件路径:/etc/kubernetes/manifests/kube-controller-manager.yaml
kube-scheduler文件路径:/etc/kubernetes/manifests/kube-scheduler.yamlvim /etc/kubernetes/manifests/kube-controller-manager.yaml
vim /etc/kubernetes/manifests/kube-scheduler.yaml

二、创建service、endpoint、serviceMonitor

kube-controller-monitor.yaml

apiVersion: v1
kind: Service
metadata:labels:k8s-app: kube-controller-managername: kube-controller-manage-monitornamespace: kube-system
spec:ports:- name: https-metricsport: 10257protocol: TCPtargetPort: 10257sessionAffinity: Nonetype: ClusterIP
--- 
apiVersion: v1
kind: Endpoints
metadata:labels:k8s-app: kube-controller-managername: kube-controller-manage-monitornamespace: kube-system
subsets:
- addresses:- ip: 10.50.238.191- ip: 10.50.107.48- ip: 10.50.140.151ports:- name: https-metricsport: 10257protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:labels:k8s-app: kube-controller-managername: kube-controller-managernamespace: kube-system
spec:endpoints:- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/tokeninterval: 30sport: https-metricsscheme: httpstlsConfig:insecureSkipVerify: truejobLabel: k8s-appnamespaceSelector:matchNames:- kube-systemselector:matchLabels:k8s-app: kube-controller-manager

kube-scheduler-monitor.yaml

apiVersion: v1
kind: Service
metadata:labels:k8s-app: kube-schedulername: kube-scheduler-monitornamespace: kube-system
spec:ports:- name: https-metricsport: 10259protocol: TCPtargetPort: 10259sessionAffinity: Nonetype: ClusterIP
--- 
apiVersion: v1
kind: Endpoints
metadata:labels:k8s-app: kube-schedulername: kube-scheduler-monitornamespace: kube-system
subsets:
- addresses:- ip: 10.50.238.191- ip: 10.50.107.48- ip: 10.50.140.151ports:- name: https-metricsport: 10259protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:labels:k8s-app: kube-schedulername: kube-schedulernamespace: kube-system
spec:endpoints:- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/tokeninterval: 30sport: https-metricsscheme: httpstlsConfig:insecureSkipVerify: truejobLabel: k8s-appnamespaceSelector:matchNames:- kube-systemselector:matchLabels:k8s-app: kube-scheduler

root@10-50-238-191:/home/sunwenbo/prometheus-serviceMonitor/serviceMonitor/kubernetes-cluster# kubectl  apply -f ./
service/kube-controller-manage-monitor created
endpoints/kube-controller-manage-monitor created
servicemonitor.monitoring.coreos.com/kube-controller-manager created
service/kube-scheduler-monitor created
endpoints/kube-scheduler-monitor created
servicemonitor.monitoring.coreos.com/kube-scheduler created

三、Prometheus验证

四、创建PrometheusRule资源

kube-controller-rules.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:annotations:meta.helm.sh/release-namespace: cattle-monitoring-systemprometheus-operator-validated: "true"generation: 3labels:app: rancher-monitoringapp.kubernetes.io/instance: rancher-monitoringapp.kubernetes.io/part-of: rancher-monitoringname: kube-controller-managernamespace: cattle-monitoring-system
spec:groups:- name: kube-controller-manager.rulerules:- alert: K8SControllerManagerDownexpr: absent(up{job="kube-controller-manager"} == 1)for: 1mlabels:severity: criticalcluster: manage-prodannotations:description: There is no running K8S controller manager. Deployments and replication controllers are not making progress.summary: No kubernetes controller manager are reachable- alert: K8SControllerManagerDownexpr: up{job="kube-controller-manager"} == 0for: 1mlabels:severity: warningcluster: manage-prodannotations:description: kubernetes controller manager {{ $labels.instance }} is down. {{ $labels.instance }} isn't reachablesummary: kubernetes controller manager is down- alert: K8SControllerManagerUserCPUexpr: sum(rate(container_cpu_user_seconds_total{pod=~"kube-controller-manager.*",container_name!="POD"}[5m]))by(pod) > 5for: 5mlabels:severity: warningcluster: manage-prodannotations:description: kubernetes controller manager {{ $labels.instance }} is user cpu time > 5s. {{ $labels.instance }} isn't reachablesummary: kubernetes controller 负载较高超过5s- alert: K8SControllerManagerUseMemoryexpr: sum(rate(container_memory_usage_bytes{pod=~"kube-controller-manager.*",container_name!="POD"}[5m])/1024/1024)by(pod) > 20for: 5mlabels:severity: infocluster: manage-prodannotations:description: kubernetes controller manager {{ $labels.instance }} is use memory More than 20MBsummary: kubernetes controller 使用内存超过20MB- alert: K8SControllerManagerQueueTimedelayexpr: histogram_quantile(0.99, sum(rate(workqueue_queue_duration_seconds_bucket{job="kubernetes-controller-manager"}[5m])) by(le)) > 10for: 5mlabels:severity: warningcluster: manage-prodannotations:description: kubernetes controller manager {{ $labels.instance }} is QueueTimedelay More than 10ssummary: kubernetes controller 队列停留时间超过10秒,请检查ControllerManager

kube-scheduler-rules.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:annotations:meta.helm.sh/release-namespace: cattle-monitoring-systemprometheus-operator-validated: "true"generation: 3labels:app: rancher-monitoringapp.kubernetes.io/instance: rancher-monitoringapp.kubernetes.io/part-of: rancher-monitoringname: kube-schedulernamespace: cattle-monitoring-system
spec:groups:- name: kube-scheduler.rulerules:- alert: K8SSchedulerDownexpr: absent(up{job="kube-scheduler"} == 1)for: 1mlabels:severity: criticalcluster: manage-prodannotations:description: "There is no running K8S scheduler. New pods are not being assigned to nodes."summary: "all k8s scheduler is down"- alert: K8SSchedulerDownexpr: up{job="kube-scheduler"} == 0for: 1mlabels:severity: warningcluster: manage-prodannotations:description: "K8S scheduler {{ $labels.instance }} is no running. New pods are not being assigned to nodes."summary: "k8s scheduler {{ $labels.instance }} is down"- alert: K8SSchedulerUserCPUexpr: sum(rate(container_cpu_user_seconds_total{pod=~"kube-scheduler.*",container_name!="POD"}[5m]))by(pod) > 1for: 5mlabels:severity: warningcluster: manage-prodannotations:current_value: '{{$value}}'description: "kubernetes scheduler {{ $labels.instance }} is user cpu time > 1s. {{ $labels.instance }} isn't reachable"summary: "kubernetes scheduler 负载较高超过1s,当前值为{{$value}}"- alert: K8SSchedulerUseMemoryexpr: sum(rate(container_memory_usage_bytes{pod=~"kube-scheduler.*",container_name!="POD"}[5m])/1024/1024)by(pod) > 20for: 5mlabels:severity: infocluster: manage-prodannotations:current_value: '{{$value}}'description: "kubernetess scheduler {{ $labels.instance }} is use memory More than 20MB"summary: "kubernetes scheduler 使用内存超过20MB,当前值为{{$value}}MB"- alert: K8SSchedulerPodPendingexpr: sum(scheduler_pending_pods{job="kubernetes-scheduler"})by(queue) > 5for: 5mlabels:severity: infocluster: manage-prodannotations:current_value: '{{$value}}'description: "kubernetess scheduler {{ $labels.instance }} is Pending pod More than 5"summary: "kubernetes scheduler pod无法调度 > 5,当前值为{{$value}}"- alert: K8SSchedulerPodPendingexpr: sum(scheduler_pending_pods{job="kubernetes-scheduler"})by(queue) > 10for: 5mlabels:severity: warningcluster: manage-prodannotations:current_value: '{{$value}}'description: kubernetess scheduler {{ $labels.instance }} is Pending pod More than 10summary: "kubernetes scheduler pod无法调度 > 10,当前值为{{$value}}"- alert: K8SSchedulerPodPendingexpr: sum(rate(scheduler_binding_duration_seconds_count{job="kubernetes-scheduler"}[5m])) > 1for: 5mlabels:severity: warningcluster: manage-prodannotations:current_value: '{{$value}}'description: kubernetess scheduler {{ $labels.instance }}summary: "kubernetes scheduler pod 无法绑定调度有问题,当前值为{{$value}}"- alert: K8SSchedulerVolumeSpeedexpr: sum(rate(scheduler_volume_scheduling_duration_seconds_count{job="kubernetes-scheduler"}[5m])) > 1for: 5mlabels:severity: warningcluster: manage-prodannotations:current_value: '{{$value}}'description: kubernetess scheduler {{ $labels.instance }}summary: "kubernetes scheduler pod Volume 速度延迟,当前值为{{$value}}"- alert: K8SSchedulerClientRequestSlowexpr: histogram_quantile(0.99, sum(rate(rest_client_request_duration_seconds_bucket{job="kubernetes-scheduler"}[5m])) by (verb, url, le)) > 1for: 5mlabels:severity: warningcluster: manage-prodannotations:current_value: '{{$value}}'description: kubernetess scheduler {{ $labels.instance }}summary: "kubernetes scheduler 客户端请求速度延迟,当前值为{{$value}}"
root@10-50-238-191:/home/sunwenbo/prometheus-serviceMonitor/rules# kubectl  apply -f kube-controller-rules.yaml 
prometheusrule.monitoring.coreos.com/kube-apiserver-rules configured
root@10-50-238-191:/home/sunwenbo/prometheus-serviceMonitor/rules# kubectl  apply -f kube-scheduler-rules.yaml 
prometheusrule.monitoring.coreos.com/kube-apiserver-rules configured
root@10-50-238-191:/home/sunwenbo/prometheus-serviceMonitor/rules# 

五、Prometheus验证

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/233434.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

HTML CSS 进度条

1 原生HTML标签 <meter>&#xff1a;显示已知范围的标量值或者分数值<progress>&#xff1a;显示一项任务的完成进度&#xff0c;通常情况下&#xff0c;该元素都显示为一个进度条 1.1 <meter> <html><head><style>meter{width:200px;}…

微软官宣放出一个「小模型」,仅2.7B参数,击败Llama2和Gemini Nano 2

就在前一阵谷歌深夜炸弹直接对标 GPT-4 放出 Gemini 之后&#xff0c;微软这两天也紧锣密鼓进行了一系列动作。尽管时间日趋圣诞假期&#xff0c;但是两家巨头硬碰硬的军备竞赛丝毫没有停止的意思。 就在昨日&#xff0c;微软官宣放出一个“小模型” Phi-2&#xff0c;这个 Ph…

vim 基本命令查找和替换

vim简单的命令用着还好。比如插入&#xff0c;删除&#xff0c;查询。但替换就用的比较少。所以&#xff0c;还是需要用的时候拿出来对照者看。 使用vim编辑文件&#xff1a; vim xxx 复制 进入之后的界面叫做命令模式界面。可以修改文件编辑的时候叫做插入模式。 (命令模…

k8s pod网络排查教程

1、背景 背景&#xff1a;在日常的k8s运维中&#xff0c;经常会遇到pod之间网络无法访问&#xff0c;域名无法解释的情况。且容器中网络排查命令不全&#xff0c;导致无法准确定位问题。 2、nsenter介绍 #Centos 下载方式 $ yum install util-linux -ynsenter 是一个 Linux …

学习k8s

学习k8s 我为什么要用k8s 和其他部署方式的区别是什么? 传统部署方式 java --> package --> 放到服务器上 --> Tomcat 如果是同时进行写操作,会存在并发问题. 用户 --网络带宽–> 服务器 -->服务 同一个服务器上,多个服务: 网络资源的占用 内存的占用 cpu的占…

三、W5100S/W5500+RP2040之MicroPython开发<DNS示例>

文章目录 1. 前言2. 相关网络信息2.1 简介2.2 DNS工作过程2.3 优点2.4 应用 3. WIZnet以太网芯片4. DNS解析示例讲解以及使用4.1 程序流程图4.2 测试准备4.3 连接方式4.4 相关代码4.5 烧录验证 5. 注意事项6. 相关链接 1. 前言 在这个智能硬件和物联网时代&#xff0c;MicroPyt…

2312llvm,02前端

前端 编译器前端,在生成目标相关代码前,把源码变换为编译器的中间表示.因为语言有独特语法和语义,所以一般,前端只处理一个语言或一组类似语言. 比如Clang,处理C,C,objective-C源码. 介绍Clang Clang项目是C,C,Objective-C官方的LLVM前端.Clang的官方网站在此. 实际编译器(…

【一】FPGA实现SPI协议之SPI协议介绍

【一】FPGA实现SPI协议之SPI协议介绍 一、spi协议解析 spi协议有4根线&#xff0c;主机输出从机输入MOSI、主机输入从机输出MISO、时钟信号SCLK、片选信号SS\CS 。 一般用于主机和从机之间通信。由主机发起读请求和写请求&#xff0c;主机的权限是主动的&#xff0c;从机是被…

iOS 将sdk更新到最新并为未添加版本号的三方库增加版本号

1、更新cocoapod sudo gem install cocoapods2、更新sdk pod update3、查看最新版本号 # 查看最新版本号 cat Podfile.lock4、增加版本号 将查询到的版本号添加到pod中 pod MJRefresh, 3.7.6

C/C++编程中的算法实现技巧与案例分析

C/C编程语言因其高效、灵活和底层的特性&#xff0c;被广大开发者用于实现各种复杂算法。本文将通过10个具体的算法案例&#xff0c;详细探讨C/C在算法实现中的技巧和应用。 一、冒泡排序&#xff08;Bubble Sort&#xff09; 冒泡排序&#xff08;Bubble Sort&#xff09;是一…

【Hadoop精讲】HDFS详解

目录 理论知识点 角色功能 元数据持久化 安全模式 SecondaryNameNode(SNN) 副本放置策略 HDFS写流程 HDFS读流程 HA高可用 CPA原则 Paxos算法 HA解决方案 HDFS-Fedration解决方案&#xff08;联邦机制&#xff09; 理论知识点 角色功能 元数据持久化 另一台机器就…

腾讯云微服务11月产品月报 | TSE 云原生 API 网关支持 WAF 对象接入

2023年 11月动态 TSE 云原生 API 网关 1、支持使用私有 DNS 解析 服务来源支持私有 DNS 解析器&#xff0c;用户可以添加自己的 DNS 解析器地址进行私有域名解析&#xff0c;适用于服务配置了私有域名的用户。 2、支持 WAF 对象接入 云原生 API 网关对接 Web 安全防火墙&…

一种基于外观-运动语义表示一致性的视频异常检测框架 论文阅读

A VIDEO ANOMALY DETECTION FRAMEWORK BASED ON APPEARANCE-MOTION SEMANTICS REPRESENTATION CONSISTENCY 论文阅读 ABSTRACT1. INTRODUCTION2. PROPOSED METHOD3. EXPERIMENTAL RESULTS4. CONCLUSION阅读总结&#xff1a; 论文标题&#xff1a;A VIDEO ANOMALY DETECTION FRA…

锐捷配置完全stub区域

一、实验拓扑 二、实验目的 在运行OSPF协议的网络中&#xff0c;配置STU区域可以减少路由器的路由条目&#xff0c;减小路由器的压力&#xff0c;有效提高路由器的性能。 三、实验配置 第一步&#xff1a;全局配置OSPF R1 ruijie>enable R1#conf terminal R1(config)#hos…

C51--小车——串口/蓝牙控制及点动

串口控制&#xff1a; 建立串口分文件&#xff1a;uart.c 和 uart.h文件声明函数 #include "reg52.h" #include "motor.h" #include "string.h"#define SIZE 12sfr AUXR 0x8E;char buffer[SIZE]; //数组操作发送字符串//串口初始化 void U…

Https图片链接下载问题

1. 获取方法 入参是一个Url, 和一个随机的名称. 返回值是MultipartFile, 这里因为我这里需要调接口传到服务器, 这里也可以直接通过inputStream进行操作. 按需修改 /*** 通过Url获取文件** param url* param fileName 随机产生一个文件名, 可以是uuid等* return* throws Excep…

Postman使用总结--生成测试报告

1.执行生成的命令格式 newman run 用例集文件 .json -e 环境文件 .json -d 数据文件 .json/.csv -r htmlextra --reporter- htmlextra-export 测试报告名 .html -e 和 -d 是 非必须的。 如果没有使用 环境&#xff0c;不需要指定 -e 如果没有使用 数据…

数据结构与算法之美学习笔记:37 | 贪心算法:如何用贪心算法实现Huffman压缩编码?

目录 前言如何理解“贪心算法”&#xff1f;贪心算法实战分析解答开篇内容小结 前言 本节课程思维导图&#xff1a; 接下来几节&#xff0c;我会讲几种更加基本的算法。它们分别是贪心算法、分治算法、回溯算法、动态规划。更加确切地说&#xff0c;它们应该是算法思想&#x…

XZ_iOS 之 M1 M2 M3的M系列芯片的Mac苹果电脑安装cocoapods

安装的前提&#xff0c;应用程序->终端->右键-显示简介->勾选 使用Rosetta打开&#xff0c;如下图&#xff0c;然后重启终端 安装的顺序如下&#xff1a;Homebrew->rvm->ruby->cocoapods 1、安装Homebrew /bin/bash -c "$(curl -fsSL https://raw.git…

eclipse的安装与配置

1、下载 eclipse 下载地址&#xff1a;https://www.eclipse.org/downloads/ 点击 【Download Package】 找到JavaEE IDE&#xff0c;点击【Windows x86_64】 点击【Select Another Mirror】&#xff0c;然后点击国内任意一个大学镜像下载即可&#xff01; 下载成功后&…