prometheus+alertmanager+webhook钉钉机器人告警

版本:centos7.9 python3.9.5 alertmanager0.25.0 prometheus2.46.0

安装alertmanager prometheus 配置webhook

# 解压:
tar -xvf alertmanager-0.25.0.linux-amd64.tar.gz
tar -xvf prometheus-2.46.0.linux-amd64.tar.gz
mv alertmanager-0.25.0.linux-amd64 alertmanager
mv prometheus-2.46.0.linux-amd64 prometheus# 安装Python
yum install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gcc make libffi-devel
cd /app
wget https://www.python.org/ftp/python/3.8.0/Python-3.8.0.tgz
tar -xvf Python-3.9.0.tgz
cd Python-3.9.0
./configure prefix=/usr/local/python3
make &&  make install
yum install gcc libffi-devel  openssl-devel -y  
ln -s /usr/local/python3/bin/python3.9 /usr/bin/python3
ln -s /usr/local/python3/bin/pip3.9 /usr/bin/pip3
pip3 install -U pip#配置webhook
pwd
/app/jiankong
cd /app
mdkir webhook
cd webhook
yum epel-release -y
yum install openssl11 openssl11-devel
pip3 install urllib3==1.26.15
pip3 install --upgrade cryptography
pip3 install --upgrade pyopenssl
pip3 install --upgrade requests
pip3 install flask
vim /app/webhook/main.py#!/usr/local/bin/python3
# coding: utf-8
import json
from datetime import datetime
import requests
from requests.exceptions import RequestException
from flask import Flask
from flask import requestapp = Flask(__name__)@app.route('/', methods=['POST'])
def send_wechat():if request.method == 'POST':post_data = request.get_data()data = json.loads(post_data.decode('utf-8'))for alert in data.get('alerts'):webchat(alert)return "success\n"@app.route('/dingtalk', methods=['POST'])
def send_dingtalk():if request.method == 'POST':post_data = request.get_data()data = json.loads(post_data.decode('utf-8'))access_token = 'dxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxb'for alert in data.get('alerts'):content = dingtalk_msgformat(alert)dingding_sendmsg(access_token, content)return "success\n"@app.route('/prometheus_dingtalk', methods=['POST'])
def send_prodingtalk():if request.method == 'POST':post_data = request.get_data()data = json.loads(post_data.decode('utf-8'))access_token = '8xxxxxxxxxxxxxxxxxxxxxxxxxxx0'   #设置钉钉机器人for alert in data.get('alerts'):content = dingtalk_msgformat(alert)dingding_sendmsg(access_token, content)return "success\n"@app.route('/dingding_send', methods=['POST'])
def dingding_send():if request.method == 'POST':post_dingding_data = request.get_data()json_dingding_data = json.loads(post_dingding_data.decode('utf-8'))content = json_dingding_data["content"]access_token = json_dingding_data["access_token"]dingding_sendmsg(access_token, content)return "ok"def webchat(data):url = 'http://92.168.60.xxx:4567/send'users_list = []usernames = {}with open('/app/webhook/users', encoding='utf-8') as f:usernames = dict(line.strip().split(':') for line in f if line)      users_cn = data.get('annotations').get('sendUsers')for i in users_cn.split(','):if usernames.get(i):users_list.append(usernames.get(i))users = ','.join(users_list)message = '''status: %s
alertlevel: %s
alertname: %s
message: %s
startsAt: %s
endsAt: %s
消息发送时间: %s
消息发送给: %s''' % (data.get('status'), data.get('annotations').get('severity'), data.get('labels').get('alertname'),data.get('annotations').get('message'), data.get('startsAt'), data.get('endsAt'),datetime.now().isoformat(), users_cn)params = {'tos': users, 'content': message}requests.post(url=url, data=params)def dingtalk_msgformat(data):message = f'''status: {data.get('status')}
alertlevel: {data.get('annotations').get('severity')}
alertname: {data.get('labels').get('alertname')}
message: {data.get('annotations').get('message')}
startsAt: {data.get('startsAt')}
endsAt: {data.get('endsAt')}
消息发送时间:{datetime.now().isoformat()}
消息发送给:{data.get('annotations').get('sendUsers')}'''return messagedef dingding_sendmsg(access_token, content):headers = {'content-type': 'application/json','Accept': 'application/json;charset=utf-8',}payload = {"text": {"content": content},"at": {"atMobiles": "","isAtAll": False,},"msgtype": "text",}webhook_url = 'https://oapi.dingtalk.com/robot/send?access_token=%s' %access_tokentry:response = requests.post(webhook_url, data=json.dumps(payload), headers=headers)response.raise_for_status()except RequestException as e:raise eif __name__ == '__main__':app.run(host='0.0.0.0', port=5000)#cat /etc/system/systemd/webhook.service
# 做成服务
[Unit]
Description= Webhook wechat for prometheus
After=network.target[Service]
#Restart=always
#RestartSec=30
#Type=simple
WorkingDirectory=/app/webhook
ExecStart=//usr/local/python3/bin/python3.9  /app/webhook/main.py[Install]
WantedBy=multi-user.target

配置alertmanager prometheus

# 配置数据存储目录
mkdir -p /data/prometheus/prometheus /data/prometheus/alertmanager
[root@rabbit4-64 prometheus]# ls
alertmanager  prometheus
[root@rabbit4-64 data]#  useradd prometheus
[root@rabbit4-64 data]# chown -R prometheus.prometheus  /data/prometheus  
[root@rabbit4-64 prometheus]# ll  
总用量 0
drwxr-xr-x. 2 prometheus prometheus 6 531 10:25 alertmanager
drwxr-xr-x. 2 prometheus prometheus 6 531 10:25 prometheus
[root@rabbit4-64 data]# # alertmanager配置
[root@rabbit3-63 alertmanager]# cat /app/jiankong/alertmanager/alertmanager.yml
global:resolve_timeout: 5mroute:group_by: ['alertname', 'cluster', 'service','instance']group_wait: 10sgroup_interval: 5srepeat_interval: 1hreceiver: 'wechat'
receivers:
- name: 'wechat'webhook_configs:- url: 'http://192.168.xxxxx:5000/prometheus_dingtalk'
inhibit_rules:- source_match:severity: 'critical'target_match:severity: 'warning'equal: ['alertname', 'dev', 'instance']# 设置成服务
[root@rabbit3-63 alertmanager]#  cat /etc/systemd/system/alertmanager.service 
[Unit]
Description=Prometheus Alert Manager
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target[Service]
User=prometheus
LimitNOFILE=65535
WorkingDirectory=/app/jiankong/alertmanager
ExecStart=/app/jiankong/alertmanager/alertmanager \--config.file=/app/jiankong/alertmanager/alertmanager.yml \--storage.path=/data/prometheus/alertmanager
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=60s[Install]
WantedBy=multi-user.target# prometheus配置
mkdir /app/jiankong/prometheus/rules
cd  /app/jiankong/prometheus/
chown -R prometheus.prometheus rules/
[root@rabbit4-64 prometheus]# cat prometheus.yml
global:scrape_interval:     60s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 60s # Evaluate rules every 15 seconds. The default is every 1 minute.scrape_timeout: 55s
alerting:alertmanagers:- static_configs:- targets:- 192.168.70.xx:9093   # 设置alertmanager的地址rule_files:- '/app/jiankong/prometheus/rules/*.rules'  
scrape_configs:- job_name: 'prometheus'static_configs:- targets: ['localhost:9090']- job_name: "node"static_configs:- targets:["192.168.x0.xx:9100","192.168.x0.xx:9100",]# 做成服务
[root@localhost alertmanager]# cat /etc/systemd/system/prometheus.service 
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target[Service]
User=prometheus
LimitNOFILE=65535
WorkingDirectory=/app/jiankong/prometheus
ExecStart=/app/jiankong/prometheus/prometheus  --log.level=info \--config.file=/app/jiankong/prometheus/prometheus.yml \--storage.tsdb.retention.time=10d \--storage.tsdb.path=/data/prometheus/prometheus
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=60s[Install]
WantedBy=multi-user.target# 配置相关规则
[root@rabbit4-64 rules]# cat up.rules 
groups:
- name: instance_statusrules:- alert: 系统或服务异常请运维紧急查看!!!expr: up == 0for: 30slabels:severity: pageannotations:sendUsers: "李处长"message: "{{$labels.instance}} 来自于 job {{$labels.job}} 已经采集失败超过五分钟。"severity: "Warning"[root@rabbit4-64 rules]# ls
linux_hosts.rules  up.rules
[root@rabbit4-64 rules]# cat up.rules 
groups:
- name: instance_statusrules:- alert: 系统或服务异常请运维紧急查看!!!expr: up == 0for: 30slabels:severity: pageannotations:sendUsers: "李处长"message: "{{$labels.instance}} 来自于 job {{$labels.job}} 已经采集失败超过五分钟。"[root@rabbit4-64 rules]# cat linux_hosts.rules 
groups:
- name: linux_host_statusrules:- alert: data_node_cpu_too_load_highexpr: (1 - avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[5m]))) * 100 > 60for: 5mlabels:severity: pageannotations:sendUsers: "李处长,陈处长"message: "{{ $labels.instance }} 来自于job {{ $labels.job }}  CPU 使用率连续五分钟超过 60%,当前值: {{ $value }} ,请检查主机应用!"severity: "Warning"- alert: node_filesystem_usage_hignexpr: node_filesystem_free_bytes{device !~'tmpfs', fstype!~'rootfs'} / node_filesystem_size_bytes < 0.15for: 5mlabels:severity: pageannotations:sendUsers: "李处长,陈处长"message: "{{ $labels.instance }} 来自于job {{ $labels.job }} 磁盘 {{$labels.device}} 挂载点 {{$labels.mountpoint}} 使用率超过 85%,当前值: {{ $value }} ,请检查主机磁盘!"severity: "Warning"- alert: data_node_memory_too_usage_highexpr: (node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Buffers_bytes - node_memory_Cached_bytes) / node_memory_MemTotal_bytes * 100 > 95for: 5mlabels:severity: pageannotations:sendUsers: "李处长,陈处长"message: "{{ $labels.instance }} 来自于job {{ $labels.job }} 内存使用率连续五分钟超过 95%, 当前值: {{ $value }} ,请检查主机应用!"severity: "Warning"- alert: yunclassroom_process_memory_use_highexpr: namedprocess_namegroup_memory_bytes{job="yunbanji_web",memtype="resident"}/1024/1024  > 4000for: 5mlabels:severity: pageannotations:sendUsers: "李处长,陈处长"message: "{{ $labels.instance }} 来自于进程 {{ $labels.groupname }} 内存使用率连续五分钟超过 4000M, 当前值: {{ $value }} ,请检查主机应用!"severity: "Warning"

启动相关服务

systemctl daemon-reload
systemctl start webhook
systemctl start alertmanager
systemctl start prometheus
systemctl status xxx  #查看状态[root@localhost jiankong]# netstat -nltp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:5000            0.0.0.0:*               LISTEN      60670/python3.9     
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1014/sshd           
tcp6       0      0 :::9093                 :::*                    LISTEN      60467/alertmanager  
tcp6       0      0 :::9094                 :::*                    LISTEN      60467/alertmanager  
tcp6       0      0 :::9100                 :::*                    LISTEN      1013/node_exporter  
tcp6       0      0 :::22                   :::*                    LISTEN      1014/sshd           
tcp6       0      0 :::9090                 :::*                    LISTEN      60500/prometheus 

访问:ip:9090 ip:9094
在这里插入图片描述
在这里插入图片描述

测试:当其中一台挂掉了实现告警
在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/diannao/22134.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

C/S模型测试及优化

1 1.1代码示例 #include<stdio.h> #include<stdio.h>#include <sys/types.h> /* See NOTES */ #include <sys/socket.h>#include <netinet/in.h> #include <netinet/ip.h> /* superset of previous */ #include <arpa/inet.…

计算机基础知识

计算机基础知识 计算机是如何工作的 在本节内容中,介绍了计算机核心工作机制,计算机结构,框架等,计算机需求在人类的历史中是广泛存在的,发展大体经历了从一般计算工具到机械计算机到目前的电子计算的发展过程 文章目录 计算机基础知识一、冯诺依曼体系结构二、CPUCPU的核心参…

STM 32_HAL_SDIO_SD卡

STM32的SDIO&#xff08;Secure Digital Input Output&#xff09; 接口是一种用于SD卡和MMC卡的高速数据传输接口。它允许STM32微控制器与多种存储卡和外设进行通信&#xff0c;支持多媒体卡&#xff08;MMC卡&#xff09;、SD存储卡、SDI/O卡和CE-ATA设备。STM32的SDIO控制器…

JAVA-学习

一、垃圾回收机制 1、为什么要进行垃圾回收机制 如果不进行垃圾回收&#xff0c;内存迟早都会被消耗空&#xff0c;因为我们在不断的分配内存空间而不进行回收。除非内存无限大&#xff0c;我们可以任性的分配而不回收&#xff0c;但是事实并非如此。所以&#xff0c;垃圾回收…

【大模型】在大语言模型的架构中,Transformer有何作用?

Transformer在大语言模型架构中的作用 Transformer是一种用于序列到序列&#xff08;Seq2Seq&#xff09;任务的深度学习模型&#xff0c;由Vaswani等人于2017年提出。在大语言模型&#xff08;LLM&#xff09;的架构中&#xff0c;Transformer扮演着关键的角色&#xff0c;它…

Flink中因java的泛型擦除导致的报错及解决

【报错】 Exception in thread "main" org.apache.flink.api.common.functions.InvalidTypesException: The return type of function Custom Source could not be determined automatically, due to type erasure. You can give type information hints by using th…

【php实战项目训练】——thinkPhP的登录与退出功能的实现,让登录退出畅通无阻

&#x1f468;‍&#x1f4bb;个人主页&#xff1a;开发者-曼亿点 &#x1f468;‍&#x1f4bb; hallo 欢迎 点赞&#x1f44d; 收藏⭐ 留言&#x1f4dd; 加关注✅! &#x1f468;‍&#x1f4bb; 本文由 曼亿点 原创 &#x1f468;‍&#x1f4bb; 收录于专栏&#xff1a…

Mendix 创客访谈录|Mendix助力开发高手10日交付复杂应用,且支持移动端呈现

本期创客 莊秉勳 布鲁科技技术顾问 各位Mendix社群的夥伴好&#xff0c;我是莊秉勳&#xff0c;大家也可以叫我Danny。 我大學是資訊科學背景&#xff0c;在與Mendix相遇前&#xff0c;曾在一上市製造企業&#xff0c;擔任軟體工程師&#xff0c;負責企業內部軟體開發&#xf…

Virtualbox 安装unbuntu + qemu

0. 前言 关于 Virualbox 安装虚拟机的优秀文章太多了&#xff0c;笔者主要是着重梳理一些安装小细节&#xff0c;利己利人&#xff01;&#xff01; 如果需要保姆式的安装教程&#xff0c;可以查看后续的参考链接。 1. VirtualBox 的安装 直接去官网搜索最近的软件即可&…

js每日十题(二)

1. 6.3 js第6题 以下结语句中&#xff0c;返回true的是&#xff1f; A !![] B 1’1’ C nullundefined D !!’’ 答&#xff1a; A选项&#xff0c;由于数组属于对象类型&#xff0c;所以空的数组转换成布尔型是true&#xff0c;前置&#xff01;&#xff01;&#xff0c;两次取…

【paper】环形虚拟管内的多无人机协同目标包围

Multi-UAV cooperative target encirclement within an annular virtual tube2022.8ELSEVIER Aerospace Science and Technology【Q1 5.6】Yan Gao 全权 北航 Q1 Background&#xff1a;本文试图解决一个什么样的问题&#xff1f; 多无人机对单个静态目标进行连续包围任务&…

基于聚类和回归分析方法探究蓝莓产量影响因素与预测模型研究附录

&#x1f31f;欢迎来到 我的博客 —— 探索技术的无限可能&#xff01; &#x1f31f;博客的简介&#xff08;文章目录&#xff09; 目录 背景数据说明数据来源思考 附录数据预处理导入包以及数据读取数据预览数据处理 相关性分析聚类分析数据处理确定聚类数建立k均值聚类模型 …

12- Redis 中的 链表 数据结构

Redis 的 List 对象的底层实现之一就是链表。C 语言本身没有链表这个数据结构&#xff0c;所以 Redis 自己设计了一个链表数据结构。 1. 链表节点结构设计 先来看看【链表节点】结构的样子&#xff1a; typedef struct listNode {//前置节点struct listNode *prev;//后置节点…

liunx配置网络的命令

liunx配置网络的命令 文章目录 liunx配置网络的命令ifconfig命令查看路由表信息netstat命令ss命令lsof命令ping 命令nslookup命令 ifconfig命令 ifconfig:显示正在工作的网卡&#xff0c;启动的设备 ifconfig -a 展示所有设备 ens33: flags4163<UP,BROADCAST,RUNNING,MUL…

RK3588+FPGA+算能BM1684X:高性能AI边缘计算盒子,应用于视频分析、图像视觉等

搭载RK3588&#xff08;四核 A76四核 A55&#xff09;&#xff0c;CPU主频高达 2.4GHz &#xff0c;提供1MB L2 Cache 和 3MB L3 &#xff0c;Cache提供更强的 CPU运算能力&#xff0c;具备6T AI算力&#xff0c;可扩展至38T算力。 产品规格 系统主控CPURK3588&#xff0c;四核…

数字、钱工具栏

/*** 提取字符串中的 数字 带小数点 &#xff0c;没有就返回"0"** param money* return*/fun getMoney(money: String): String {var money moneyvar pattern Pattern.compile("(\\d\\.\\d)")var m pattern.matcher(money)if (m.find()) {money if (m.…

数据挖掘实战-基于长短期记忆网络(LSTM)的黄金价格预测模型 | 97% 准确度

&#x1f935;‍♂️ 个人主页&#xff1a;艾派森的个人主页 ✍&#x1f3fb;作者简介&#xff1a;Python学习者 &#x1f40b; 希望大家多多支持&#xff0c;我们一起进步&#xff01;&#x1f604; 如果文章对你有帮助的话&#xff0c; 欢迎评论 &#x1f4ac;点赞&#x1f4…

攻防世界babyRE做法(Linux调试)

在做题之前我们先了解一些知识点&#xff0c;首先是汇编中的知识点&#xff0c;汇编中&#xff0c;数据和代码可以说是一回事&#xff0c;数据和代码可以无条件相互转换&#xff0c;换句话说&#xff0c;数据就是代码&#xff0c;代码就是数据 接下来开始做题&#xff0c;简单…

【经验总结】怎样查看计算机CPU核数量

方法一&#xff1a;任务管理器 方法二&#xff1a;设备管理器 选中“计算机”右击&#xff0c;选择“属性”&#xff1a; 选择“设备管理器”->“处理器”&#xff0c;有几行就代表有几个核&#xff1a;

Ubuntu系统升级k8s节点的node节点遇到的问题

从1.23版本升级到1.28版本 node节点的是Ubuntu系统20.04的版本 Q1 node节点版本1.23升级1.28失败 解决办法&#xff1a; # 改为阿里云镜像 vim /etc/apt/sources.list.d/kubernetes.list# 新增 deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main# 执…