k8s集群部署vmalert和prometheusalert实现钉钉告警

先决条件

安装以下软件包:git, kubectl, helm, helm-docs,请参阅本教程。

1、安装 helm

wget https://xxx-xx.oss-cn-xxx.aliyuncs.com/helm-v3.8.1-linux-amd64.tar.gz
tar xvzf helm-v3.8.1-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin
rm -rf linux-amd64

2、安装victoria-metrics-alert

(1)使用以下命令添加 helm chart存储库

helm repo add vm https://victoriametrics.github.io/helm-charts/helm repo update

(2)列出vm/victoria-metrics-alert可供安装的helm版本

helm search repo vm/victoria-metrics-alert -l

(3)victoria-metrics-alert将图表的默认值导出到文件values.yaml

helm show values vm/victoria-metrics-alert > values.yaml

(4)根据环境需要更改values.yaml文件中的值,完整配置参考如下

# Default values for victoria-metrics-alert.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.serviceAccount:# Specifies whether a service account should be createdcreate: true# Annotations to add to the service accountannotations: {}# The name of the service account to use.# If not set and create is true, a name is generated using the fullname templatename:# mount API token to pod directlyautomountToken: trueimagePullSecrets: []rbac:create: truepspEnabled: truenamespaced: falseextraLabels: {}annotations: {}server:name: serverenabled: trueimage:repository: victoriametrics/vmalerttag: "" # rewrites Chart.AppVersionpullPolicy: IfNotPresentnameOverride: ""fullnameOverride: ""## See `kubectl explain poddisruptionbudget.spec` for more## ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/podDisruptionBudget:enabled: false# minAvailable: 1# maxUnavailable: 1labels: {}# -- Additional environment variables (ex.: secret tokens, flags) https://github.com/VictoriaMetrics/VictoriaMetrics#environment-variablesenv:[]# - name: VM_remoteWrite_basicAuth_password#   valueFrom:#     secretKeyRef:#       name: auth_secret#       key: passwordreplicaCount: 1# deployment strategy, set to standard k8s defaultstrategy:type: RollingUpdaterollingUpdate:maxSurge: 25%maxUnavailable: 25%# specifies the minimum number of seconds for which a newly created Pod should be ready without any of its containers crashing/terminating# 0 is the standard k8s defaultminReadySeconds: 0# vmalert reads metrics from source, next section represents its configuration. It can be any service which supports# MetricsQL or PromQL.datasource:url: "http://192.168.47.9:8481/select/0/prometheus/"basicAuth:username: ""password: ""remote:write:url: ""read:url: ""notifier:alertmanager:url: "http://192.168.112.68:9093"extraArgs:envflag.enable: "true"envflag.prefix: VM_loggerFormat: json# Additional hostPath mountsextraHostPathMounts:[]# - name: certs-dir#   mountPath: /etc/kubernetes/certs#   subPath: ""#   hostPath: /etc/kubernetes/certs#   readOnly: true# Extra Volumes for the podextraVolumes:[]#- name: example#  configMap:#    name: example# Extra Volume Mounts for the containerextraVolumeMounts:[]# - name: example#   mountPath: /exampleextraContainers:[]#- name: config-reloader#  image: reloader-imageservice:annotations: {}labels: {}clusterIP: ""## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips##externalIPs: []loadBalancerIP: ""loadBalancerSourceRanges: []servicePort: 8880type: ClusterIP# Ref: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip# externalTrafficPolicy: "local"# healthCheckNodePort: 0ingress:enabled: falseannotations: {}#   kubernetes.io/ingress.class: nginx#   kubernetes.io/tls-acme: 'true'extraLabels: {}hosts: []#   - name: vmselect.local#     path: /select#     port: httptls: []#   - secretName: vmselect-ingress-tls#     hosts:#       - vmselect.local# For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName# See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress# ingressClassName: nginx# -- pathType is only for k8s >= 1.1=pathType: PrefixpodSecurityContext: {}# fsGroup: 2000securityContext:{}# capabilities:#   drop:#   - ALL# readOnlyRootFilesystem: true# runAsNonRoot: true# runAsUser: 1000resources:{}# We usually recommend not to specify default resources and to leave this as a conscious# choice for the user. This also increases chances charts run on environments with little# resources, such as Minikube. If you do want to specify resources, uncomment the following# lines, adjust them as necessary, and remove the curly braces after 'resources:'.# limits:#   cpu: 100m#   memory: 128Mi# requests:#   cpu: 100m#   memory: 128Mi# Annotations to be added to the deploymentannotations: {}# labels to be added to the deploymentlabels: {}# Annotations to be added to podpodAnnotations: {}podLabels: {}nodeSelector: {}priorityClassName: ""tolerations: []affinity: {}# vmalert alert rules configuration configuration:# use existing configmap if specified# otherwise .config values will be usedconfigMap: ""config:alerts:groups:- name: 磁盘挂载错误rules:- alert: 磁盘挂载错误expr: mount_error == 1for: 1mlabels:level: 1severity: warningannotations:description: "{{$labels.job}}链{{$labels.instance}}节点磁盘挂载错误!"serviceMonitor:enabled: falseextraLabels: {}annotations: {}
#    interval: 15s
#    scrapeTimeout: 5s# -- Commented. HTTP scheme to use for scraping.
#    scheme: https# -- Commented. TLS configuration to use when scraping the endpoint
#    tlsConfig:
#      insecureSkipVerify: truealertmanager:enabled: truereplicaCount: 1podMetadata:labels: {}annotations: {}image: prom/alertmanagertag: v0.20.0retention: 120hnodeSelector: {}priorityClassName: ""resources: {}tolerations: []imagePullSecrets: []podSecurityContext: {}extraArgs: {}# key: value# external URL, that alertmanager will expose to receiversbaseURL: ""# use existing configmap if specified# otherwise .config values will be usedconfigMap: ""config:global:resolve_timeout: 5mroute:# default receiverreceiver: ops_notify# tag to group bygroup_by: [alertname]# How long to initially wait to send a notification for a group of alertsgroup_wait: 30s# How long to wait before sending a notification about new alerts that are added to a groupgroup_interval: 60s# How long to wait before sending a notification again if it has already been sent successfully for an alertrepeat_interval: 1hreceivers:- name: ops_notifywebhook_configs:- url: http://192.168.157.59:8080/prometheusalert?type=dd&tpl=prometheus-dd&split=falsesend_resolved: trueinhibit_rules:- source_match:severity: 'warning'target_match:severity: 'warning'equal: ['alertname', 'job']templates: {}#  alertmanager.tmpl: |-service:annotations: {}type: ClusterIPport: 9093# if you want to force a specific nodePort. Must be use with service.type=NodePort# nodePort:ingress:enabled: falseannotations: {}#   kubernetes.io/ingress.class: nginx#   kubernetes.io/tls-acme: 'true'extraLabels: {}hosts: []#   - name: alertmanager.local#     path: /#     port: webtls: []#   - secretName: alertmanager-ingress-tls#     hosts:#       - alertmanager.local# For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName# See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress# ingressClassName: nginx# -- pathType is only for k8s >= 1.1=pathType: PrefixpersistentVolume:# -- Create/use Persistent Volume Claim for alertmanager component. Empty dir if falseenabled: false# -- Array of access modes. Must match those of existing PV or dynamic provisioner. Ref: [http://kubernetes.io/docs/user-guide/persistent-volumes/](http://kubernetes.io/docs/user-guide/persistent-volumes/)accessModes:- ReadWriteOnce# -- Persistant volume annotationsannotations: {}# -- StorageClass to use for persistent volume. Requires alertmanager.persistentVolume.enabled: true. If defined, PVC created automaticallystorageClass: ""# -- Existing Claim name. If defined, PVC must be created manually before volume will be boundexistingClaim: ""# -- Mount path. Alertmanager data Persistent Volume mount root path.mountPath: /data# -- Mount subpathsubPath: ""# -- Size of the volume. Better to set the same as resource limit memory property.size: 50Mi

(5)使用命令测试安装:

helm install vmalert vm/victoria-metrics-alert -f values.yaml -n victoria-metrics --debug --dry-run

(6)使用以下命令安装

helm install vmalert vm/victoria-metrics-alert -f values.yaml -n victoria-metrics

(7)通过运行以下命令获取 pod 列表

kubectl get pods -A | grep 'alert'

(8)通过运行以下命令获取应用程序

helm list -f vmalert -n victoria-metrics

(9)使用命令查看应用程序版本的历史记录vmalert

helm history vmalert -n victoria-metrics

(10)更新配置

cd /root/vmalert
#修改value.yaml文件
helm upgrade vmalert vm/victoria-metrics-alert -f values.yaml -n victoria-metrics

(11)查看service

kubectl get svc -n victoria-metrics

3、安装prometheusalert

(1)使用helm部署

git clone https://github.com/feiyu563/PrometheusAlert.git
cd PrometheusAlert/example/helm/prometheusalert
#如需修改配置文件,请更新config中的app.conf
helm install -n victoria-metrics prometheus-alert .

(2)values.yaml配置文件参考

cat /root/PrometheusAlert/example/helm/prometheusalert/values.yaml

# Default values for prometheusalert.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.global:imagePullSecrets: []# - name: "registry-secret"replicaCount: 1image:# 支持配置自定义模版需要重出镜像,或者使用本人构建镜像:lusson/prometheusalert:v1.0repository: feiyu563/prometheus-alert:v4.8pullPolicy: IfNotPresentnameOverride: ""
fullnameOverride: ""service:type: ClusterIPport: 8080ingress:enabled: trueannotations: {}# kubernetes.io/ingress.class: nginx# kubernetes.io/tls-acme: "true"hosts:- host: prometheusalert.xxxxx.compaths: ["/"]tls: []resources:limits:cpu: 1000mmemory: 1024Mirequests:cpu: 100mmemory: 128MinodeSelector: {}tolerations: []affinity: {}

(3)app.conf配置参考

cat /root/PrometheusAlert/example/helm/prometheusalert/config/app.conf

(4)ingress.yaml配置参考

cat /root/PrometheusAlert/example/helm/prometheusalert/templates/ingress.yaml{{- if .Values.ingress.enabled -}}
{{- $fullName := include "prometheusalert.fullname" . -}}
{{- $svcPort := .Values.service.port -}}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:name: {{ $fullName }}labels:
{{ include "prometheusalert.labels" . | indent 4 }}{{- with .Values.ingress.annotations }}annotations:{{- toYaml . | nindent 4 }}{{- end }}
spec:
{{- if .Values.ingress.tls }}tls:{{- range .Values.ingress.tls }}- hosts:{{- range .hosts }}- {{ . | quote }}{{- end }}secretName: {{ .secretName }}{{- end }}
{{- end }}rules:{{- range .Values.ingress.hosts }}- host: {{ .host | quote }}http:paths:{{- range .paths }}- path: {{ . }}pathType: Prefixbackend:service:name: {{ $fullName }}port:number: {{ $svcPort }}{{- end }}{{- end }}
{{- end }}

(5)更新配置

cd /root/PrometheusAlert/example/helm/prometheusalert
helm upgrade -n victoria-metrics prometheus-alert .

(6)重启pod

删除Pod
helm delete prometheus-alert -n victoria-metrics查看pods和service
kubectl get pods -n victoria-metrics
kubectl get svc -n victoria-metrics重新安装
helm install -n victoria-metrics prometheus-alert .查看pods和service
kubectl get pods -n victoria-metrics
kubectl get svc -n victoria-metrics

(1)告警测试

模板内容:

{{ $var := .externalURL}}{{ range $k,$v:=.alerts }}
{{if eq $v.status "resolved"}}
## [巡检恢复信息]({{$v.generatorURL}})
#### [{{$v.labels.alertname}}]({{$var}})
###### 告警级别:{{$v.labels.level}}
###### 开始时间:{{$v.startsAt}}
###### 故障主机:{{$v.labels.instance}}
##### {{$v.annotations.description}}
{{else}}
## [巡检告警信息]({{$v.generatorURL}})
#### [{{$v.labels.alertname}}]({{$var}})
###### 告警级别:{{$v.labels.level}}
###### 开始时间:{{$v.startsAt}}
###### 故障主机:{{$v.labels.instance}}
##### {{$v.annotations.description}}
{{end}}
{{ end }}

(2)查看日志

(3)查看钉钉告警

参考文档:https://github.com/VictoriaMetrics/helm-charts/tree/master/charts/victoria-metrics-alert

参考文档:https://github.com/feiyu563/PrometheusAlert/tree/master/example/helm/prometheusalert

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/38784.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

12 注册登录

12 注册登录 整体概述 使用数据库连接池实现服务器访问数据库的功能,使用POST请求完成注册和登录的校验工作。 本文内容 介绍同步实现注册登录功能,具体涉及到流程图、载入数据库表、提取用户名和密码、注册登录流程与页面跳转的代码实现。 流程图&a…

六、Linux系统下,文件操作命令都有哪些?

总括: 创建文件/文件夹:touch; 查看:cat/more; 复制:copy; 移动文件/文件夹:mv; 删除:rm; 1、创建文件 (1)语法&#x…

阿里云与中国中医科学院合作,推动中医药行业数字化和智能化发展

据相关媒体消息,阿里云与中国中医科学院的合作旨在推动中医药行业的数字化和智能化发展。随着互联网的进步和相关政策的支持,中医药产业受到了国家的高度关注。这次合作将以“互联网 中医药”为载体,致力于推进中医药文化的传承和创新发展。…

AIGC绘画:基于Stable Diffusion进行AI绘图

文章目录 AIGC深度学习模型绘画系统stable diffusion简介stable diffusion应用现状在线网站云端部署本地部署Stable Diffusion AIGC深度学习模型绘画系统 stable diffusion简介 Stable Diffusion是2022年发布的深度学习文本到图像生成模型,它主要用于根据文本的描述…

通俗讲解-动量梯度下降法原理与代码实例

本站原创文章,转载请说明来自《老饼讲解-BP神经网络》bp.bbbdata.com 目录 一.动量梯度下降法介绍 1.1 动量梯度下降法简介与思想 1.2 动量梯度下降法的算法流程 二.动量梯度下降法代码实例 2.1 动量梯度下降法实例代码 一.动量梯度下降法介绍…

2023年上半年数学建模竞赛题目汇总与难度分析

2023年上半年数学建模竞赛题目汇总与难度分析 ​由于近年来国赛ABC题出题方式漂浮不定,没有太大的定性,目前总体的命题方向为,由之前的单一模型问题变为数据分析评价优化或者预测类题目是B、C题的主要命题方向。为了更好地把握今年命题的主方…

vue3-vuex

一、概念 (1)Vuex 是一个状态和数据管理的框架,负责管理项目中多个组件和多个页面共享的数据。 (2)在开发项目的时候,我们就会把数据分成两个部分,一种数据是在某个组件内部使用,我…

【C++】STL案例1-评委打分

0.前言 1.系统自动生成的评委评分代码&#xff1a; #include <iostream> using namespace std; #include <deque> #include <vector> #include <algorithm> #include <string>//选手类 class Player { public:Player(string name, float score)…

机器学习深度学习——机器翻译(序列生成策略)

&#x1f468;‍&#x1f393;作者简介&#xff1a;一位即将上大四&#xff0c;正专攻机器学习的保研er &#x1f30c;上期文章&#xff1a;机器学习&&深度学习——seq2seq实现机器翻译&#xff08;详细实现与原理推导&#xff09; &#x1f4da;订阅专栏&#xff1a;机…

最新AI系统ChatGPT网站程序源码+搭建教程/公众号/H5端/安装配置教程/完整知识库

1、前言 SparkAi系统是基于国外很火的ChatGPT进行开发的Ai智能问答系统。本期针对源码系统整体测试下来非常完美&#xff0c;可以说SparkAi是目前国内一款的ChatGPT对接OpenAI软件系统。 那么如何搭建部署AI创作ChatGPT&#xff1f;小编这里写一个详细图文教程吧&#xff01;…

基于IDE Eval Resetter延长IntelliJ IDEA等软件试用期的方法(包含新版本软件的操作方法)

本文介绍基于IDE Eval Resetter插件&#xff0c;对集成开发环境IntelliJ IDEA等JetBrains公司下属的多个开发软件&#xff0c;加以试用期延长的方法。 我们这里就以IntelliJ IDEA为例&#xff0c;来介绍这一插件发挥作用的具体方式。不过&#xff0c;需要说明使用IDE Eval Rese…

Hlang社区-社区主页实现

文章目录 前言首页结构固定导航栏左侧导航itemitem标志头部推荐文章展示ITEM实现ToolTip完整实现首页完整实现前言 废话不多说,直接看到效果,这里的话是我们社区主页,不是产品宣传主页哈: 是的也许你已经发现了这个页面和某个网站长得贼像。没错是这样的,这个布局我确实…

vue3+vite+pinia

目录 一、项目准备 1.1、Vite搭建项目 1.2、vue_cli创建项目 二、组合式API(基于setup) 2.1、ref 2.2、reactive 2.3、toRefs 2.4、watch和watchEffect 2.5、computed 2.6、生命周期钩子函数 2.7、setup(子组件)的第一个参数-props 2.8、setup(子组件)的第二个参数…

STM32 CubeMX (Freertos任务:创建、删除、挂起、恢复)

STM32 CubeMX Freertos STM32 CubeMX &#xff08;Freertos任务&#xff1a;创建、删除、挂起、恢复&#xff09; STM32 CubeMX Freertos前言一、STM32 CubeMX 配置时钟树配置使能串口&#xff0c;用于用于检查实验现象使用STM32 CubeMX 库&#xff0c;配置Freertos创建任务 二…

概念解析 | 长尾分布:从无处不在的‘少数派’中挖掘价值

注1:本文系“概念解析”系列之一,致力于简洁清晰地解释、辨析复杂而专业的概念。本次辨析的概念是:长尾分布(Long-Tail Distribution)。 揭秘长尾分布:从无处不在的‘少数派’中挖掘价值 What is a Long Tail Distribution? (Definition & Example) - Statology 一、背…

神经网络基础-神经网络补充概念-02-逻辑回归

概念 逻辑回归是一种用于二分分类问题的统计学习方法&#xff0c;尽管名字中带有"回归"一词&#xff0c;但实际上它用于分类任务。逻辑回归的目标是根据输入特征来预测数据点属于某个类别的概率&#xff0c;然后将概率映射到一个离散的类别标签。 逻辑回归模型的核…

多主题自适应知识变现博客论坛,支持docker一键部署

iblog 给大家推荐一个多主题自适应&#xff0c;支持付费收款的博客论坛系统&#xff0c;支持docker一键部署&#xff0c;支持企业微信通知。 前端 多主题 自适应 个人页 后端 H2 console 运行命令 docker run -d --name iblog --restartalways -p 8080:8080 -e consoletrue …

RabbitMQ简单使用

RabbitMq是一个消息中间件&#xff1a;它接收消息、转发消息。你可以把它理解为一个邮局&#xff1a;当你向邮箱里寄出一封信后&#xff0c;邮递员们就能最终将信送到收信人手中。 RabbitMq、消息相关术语如下&#xff1a; 生产者&#xff1a;生产者只发送消息&#xff0c;发…

Postman接口自动化测试实例

一.实例背景 在实际业务中&#xff0c;经常会出现让用户输入用户密码进行验证的场景。而为了安全&#xff0c;一般都会先请求后台服务器获取一个随机数做为盐值&#xff0c;然后将盐值和用户输入的密码通过前端的加密算法生成加密后串传给后台服务器&#xff0c;后台服务器接到…

针对Android项目蓝牙如何学习

一、概述(Overview) 蓝牙是一种专有的开放式无线技术标准,用于在固定和移动设备之间进行短距离数据交换(使用2400–2480 MHz ISM波段的短波长无线电传输),从而创建具有高度安全性的个人局域网(PANs)。由电信供应商爱立信(telecoms vendor Ericsson)于1994年创建,[1…