Prometheus入门
Setup
Reference:https://prometheus.io/docs/introduction/overview/
- exporters:你可以部署在你想要获取metrics的应用旁,接收Prometheus请求,从应用程序中收集数据并转换为正确的格式,最后返回给Prometheus;
- Service Discovery:一旦应用安装并运行exporters,Prometheus需要知道它们的位置,才能明确监控的内容并判断是否响应;
- Retrieval:Prometheus通过发送HTTP请求获取响应并进行解析和存储;
- TSDB:时序数据库,Prometheus将数据存储在自定义数据库中,推荐使用SSD;
#Download
wget https://github.com/prometheus/prometheus/releases/download/v2.45.5/prometheus-2.45.5.linux-amd64.tar.gztar xvfz prometheus-*.tar.gzcd prometheus-2.45.5.linux-amd64/#Start
./prometheus --config.file=prometheus.yml
修改 prometheus.yml
global:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: "prometheus"static_configs:- targets: ["localhost:9090"]
./prometheus
运行
浏览器打开http://localhost:9090/
Node Exporter
Node Exporter在 Unix 系统(例如 Linux)上公开内核和机器级别的指标。它提供所有标准指标,例如 CPU、内存、磁盘空间、磁盘 I/O 和网络带宽。
https://prometheus.io/download/
dean@mint:~$ wget https://github.com/prometheus/node_exporter/releases/download/v1.8.1/node_exporter-1.8.1.linux-amd64.tar.gz
dean@mint:~$ tar -xzf node_exporter-1.8.1.linux-amd64.tar.gz
dean@mint:~$ cd node_exporter-1.8.1.linux-amd64/
为了让 Prometheus 监控 Node Exporter,需要通过添加额外的 scrape 配置来更新 prometheus.yml:
global:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
scrape_configs:- job_name: "prometheus"static_configs:- targets: ["localhost:9090"]- job_name: "node"static_configs:- targets: ["localhost:9100"]
job=“node” 称为标签匹配器,它限制返回的指标
Altering
Altering有两部分:
- 向 Prometheus 添加警报规则,定义警报的逻辑。
- Alertmanager 将触发警报转换为通知,例如电子邮件。
Ctrl-C 停止Node Exporter,Targets 页面将显示 Node Exporter 的状态如图所示,错误连接被拒绝,因为 TCP 端口上没有任何监听,并且 HTTP 请求被拒绝。
对于警报规则,需要一个 PromQL 表达式,该表达式仅返回您希望警报的结果。
在 Prometheus 的警报规则中添加此表达式,并且告诉 Prometheus 它将与哪个 Alertmanager 通信。
prometheus.yml
global:scrape_interval: 10sevaluation_interval: 10s
rule_files:
- rules.yml
alerting:alertmanagers:- static_configs:- targets:- localhost:9093
scrape_configs:- job_name: "prometheus"static_configs:- targets: ["localhost:9090"]- job_name: "node"static_configs:- targets: ["localhost:9100"]
rules.yml
将根据规则每 10 秒评估一次警报,如果连续返回一个序列至少一分钟,那么警报将被视为已触发
groups:
- name: Examplerules:- alert: InstanceDownexpr: up == 0for: 1m
现在已经触发警报,需要一个 Alertmanager 来处理它
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
tar -xzf alertmanager-0.27.0.linux-amd64.tar.gz
cd alertmanager-0.27.0.linux-amd64/
- 开启smtp服务
- 配置规则 reference:https://wx.mail.qq.com/list/readtemplate?name=app_intro.html#/agreement/authorizationCode
需要对 Alertmanager 进行配置。 Alertmanager常用邮箱进行通知,这里使用QQ邮箱
global:smtp_smarthost: 'smtp.qq.com:465'smtp_from: 'sender@qq.com'smtp_auth_username: 'sender@qq.com'smtp_auth_password: 'auth_passwd'smtp_require_tls: falseroute:receiver: example-emailreceivers:- name: example-emailemail_configs:- to: 'receiver@qq.com'
启动Alertmanager和Prometheus ,一段时间后邮箱收到邮件