在ubuntu上搭建系统监控系统

大纲

数据生产方
- 安装和运行
- 验证
数据收集、存储和分发方
- 下载和解压
- 修改配置
- 运行
- 验证
数据消费方
- 下载和运行
- 验证
- - 新增数据源
  - 新增看板
  - 关联看板和数据源
  - 效果展现
参考资料

在一个监控系统中，一定会有“数据生产方”和“数据消费方”存在。“数据生产方”用于产出需要监控的相关指标数据；“数据消费方”使用这些数据产生额外的信息和功能，比如数据图表化表达、异常数据预警等。
请添加图片描述
当“数据生产方”变多时，系统往往会演化出“数据收集方”用于统一收集数据。这个时候“数据消费方”可以通过“数据收集方”获得全部数据。

当“数据消费方”变多时，不同的“数据消费方”会有不同诉求。比如有的只要A“数据生产方”的数据；有的既要A的、也要B的数据。于是整个系统又会演化出“数据分发方”，用于满足消费方的不同诉求。
在这里插入图片描述
随着数据越来越多，且生产和消费并非一定要紧密连接，在“数据收集方”和“数据分发方”之间就会演化出“数据仓储方”。它的出现让“数据收集方”和“数据分发方”实现了解耦，且提升了系统的健壮性。
在这里插入图片描述
在实际生产中，我们往往使用prometheus和grafana来实现该系统中重要的两部分。
prometheus主要用于收集、存储和分发数据。虽然prometheus可以展现数据，但是功能并不强大，所以将其限定在非消费区域。
grafana主要用于消费数据。主要体现就是各种报表形式展现数据，以及提供一些基于规则数据告警。
在这里插入图片描述
“数据生产方”需要给prometheus提供规定协议的数据。本文我们并不对此进行介绍，而是专注于将系统搭建和验证。为了简单起见，我们选用了prometheus开源项node_exporter作为“数据生产方”。

数据生产方

安装和运行

下载并解压node_exporter。（可以从https://prometheus.io/download/#node_exporter找到最新的版本）

wget https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz 
tar xvfz node_exporter-1.7.0.linux-amd64.tar.gz

运行node_exporter

cd node_exporter-1.7.0.linux-amd64/
./node_exporter

验证

在本机上使用localhost:9100/metrics(跨环境使用，则配置IP)访问node_exporter产生的数据。
在这里插入图片描述

数据收集、存储和分发方

下载和解压

下载并解压prometheus。（可以在https://prometheus.io/download/#prometheus找到最新版）

wget https://github.com/prometheus/prometheus/releases/download/v2.51.0/prometheus-2.51.0.linux-amd64.tar.gz .
tar -zvxf prometheus-2.51.0.linux-amd64.tar.gz

修改配置

进入prometheus目录下可以找到prometheus.yml

cd prometheus-2.51.0.linux-amd64/

修改prometheus.yml文件，新增对node_exporter的监控。
原来的部分配置

scrape_configs:# The job name is added as a label `job=&lt;job_name&gt;` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["localhost:9090"]

修改后的配置

scrape_configs:# The job name is added as a label `job=&lt;job_name&gt;` to any timeseries scraped from this config.- job_name: "prometheus"# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ["localhost:9090"]- job_name: "node_exporter"static_configs:- targets: ["localhost:9100"]

运行

通过指定配置的方式启动prometheus。

./prometheus --config.file=./prometheus.yml

验证

在本机上使用localhost:9090(跨环境使用，则配置IP)访问prometheus后台页面。
在这里插入图片描述
可以看到node_exporter已经被监控。

我们还可以在图形化（Graph）的输入框中输入以下指令查看数据图表展现效果。

Metric	Meaning
rate(node_cpu_seconds_total{mode=“system”}[1m])	在最后一分钟内，每秒在系统模式下花费的平均CPU时间（以秒为单位）
node_filesystem_avail_bytes	非root用户可用的文件系统空间（以字节为单位）
rate(node_network_receive_bytes_total[1m])	最后一分钟内每秒接收的平均网络流量（以字节为单位）

在这里插入图片描述

数据消费方

prometheus虽然可以配置一些看板和告警，但是可视化并不是它的核心。于是我们引入效果更好的grafana来做“数据消费方”。

下载和运行

下载并解压grafana。（可以在https://grafana.com/grafana/download找到最新版）

wget https://dl.grafana.com/enterprise/release/grafana-enterprise-10.4.1.linux-amd64.tar.gz
tar -zxvf grafana-enterprise-10.4.1.linux-amd64.tar.gz
cd grafana-v10.4.1/
./bin/grafana server