在监控里面Prometheus现在用的还是比较多的,一般我们都是在Kubernetes环境里面部署,然后监控咱们的容器化环境,今天给大家分享一些不一样的,使用二进制的方式在机器上直接部署,并且监控机器上的进程。
说到监控大家通常都是主机级别的监控那么我们想要监控进程的话怎么实现呢,Prometheus里面有个process-exporter可以帮助我们实现。GitHub - ncabatoff/process-exporter: Prometheus exporter that mines /proc to report on selected processes
部署Prometheus
关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
关闭selinux
sed -i "s/^SELINUX=*/SELINUX=disabled/g" /etc/selinux/config
setenforce 0
解压软件包
https://github.com/prometheus/prometheus/releases/download/v2.26.0/prometheus-2.26.0.linux-amd64.tar.gz
tar -zxvf prometheus-2.26.0.linux-amd64.tar.gz -C /usr/local
mv /usr/local/prometheus-2.26.0.linux-amd64/ /usr/local/prometheus
https://github.com/prometheus/prometheus/releases/download/v2.26.0/prometheus-2.26.0.linux-amd64.tar.gz
创建用户
groupadd prometheus
useradd -g prometheus -s /sbin/nologin prometheus
创建Prometheus数据存储目录
mkdir -p /var/lib/prometheus
chown -R prometheus /var/lib/prometheuschown -R prometheus:prometheus /usr/local/prometheus/
修改配置文件
# 在scrape_configs下添加个job_name,指定要监控的目标
# process-exporter在被监控端运行,监听的端口号为9256
[root@prometheus prometheus]# cat /usr/local/prometheus/prometheus.yml
# my global config
global:scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.# scrape_timeout is set to the global default (10s).# Alertmanager configuration
alerting:alertmanagers:- static_configs:- targets:# - alertmanager:9093# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:# - "first_rules.yml"# - "second_rules.yml"# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.- job_name: 'prometheus'# metrics_path defaults to '/metrics'# scheme defaults to 'http'.static_configs:- targets: ['localhost:9090']- job_name: 'node'scrape_interval: 10sstatic_configs:- targets: ['192.168.207.165:9256']labels:instance: node
启动服务
/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml
部署process-exporter
关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
关闭selinux
sed -i "s/^SELINUX=*/SELINUX=disabled/g" /etc/selinux/config
setenforce 0
解压软件包
https://github.com/ncabatoff/process-exporter/releases/download/v0.8.2/process-exporter-0.8.2.linux-amd64.tar.gz
tar zxf process-exporter-0.8.2.linux-amd64.tar.gz -C /usr/local/
cd /usr/local/process-exporter-0.8.2.linux-amd64/
准备配置文件
# 监控所有进程
# 还有别的很多种写法可以查看GitHub
cat > config.yml << 'EOF'
process_names:- name: "{{.Comm}}"cmdline:- '.+'
EOF# 监控指定进程
# 根据你的需求要监控的服务多的话按照格式继续往下写
cat > config.yml << 'EOF'
process_names:- name: "{{.Matches}}"cmdline:- 'sshd'- name: "{{.Matches}}"cmdline:- 'mysqld'
EOF
启动
# 该配置文件使用的是刚才展示的监控所有进程的
./process-exporter -config.path config.yml
验证
可以打开Prometheus的页面进行查看看看能不能获取到进程的数据,可以使用curl调接口,下面演示一下使用curl调接口获取sshd进程的数据
# 修改为自己的IP
curl -G 'http://<prometheus_host>:9090/api/v1/query' --data-urlencode 'query=namedprocess_namegroup_context_switches_total{ctxswitchtype="nonvoluntary", groupname="sshd", instance="node", job="node"}'# 示例
curl -G 'http://192.168.207.131:9090/api/v1/query' --data-urlencode 'query=namedprocess_namegroup_context_switches_total{ctxswitchtype="nonvoluntary", groupname="sshd", instance="node", job="node"}'
# 可以看到获取到的是有数据的
[root@bogon ~]# curl -G 'http://192.168.207.131:9090/api/v1/query' --data-urlencode 'query=namedprocess_namegroup_context_switches_total{ctxswitchtype="nonvoluntary", groupname="sshd", instance="node", job="node"}'{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"namedprocess_namegroup_context_switches_total","ctxswitchtype":"nonvoluntary","groupname":"sshd","instance":"node","job":"node"},"value":[1718781659.381,"2"]}]}}