zabbix3.2监控

自动化运维框架

运维标准流程监控管理容量管理、关联关系、任务管理、自动部署、分布式集群、传统集群、机器管理安全控制灾难管理

自动化监控

监控评估
数据采集主动式数据采集: client、公共插件、自定义脚本被动式服务状态: 服务状态、程序状态、用户访问质量第三方信息    公司内相关系统
数据处理复杂计算阈值判断智能分析
报警与联动报警策略联动处理报警跟踪问题管理
API

通常监控方式

通过snmp监控（nms/agent）  
　　v1、v2明文传输，不安全（非敏感文件），v3认证也很薄弱(目前不流行)通过snmp get 、get-next、get-bulk等采集数据。但是如何存储这些数据？如使用snmptrap实现主动监控，风险很大有些不支持安装agent的需要用到，最优是安装agent，都没的话ssh方式Catci（php开发）工具能帮助
　　1、采集数据 (依赖snmp，不需要客户端安装agent，本身不是监控工具）
　　2、保存数据
　　3、展示数据
　　4、数据分析及告警。但是告警功能一般。也可以使用rrd轮转数据库，rrd绘图，rrdtool也是个强大的工具（通过插件也能实现告警）Nagios（强大报警工具，也是独立的监控功能）加强告警功能（阈值），是个强大的告警工具，实现告警状态切换功能（邮件，msn，短信）不适合分析展示数据，趋势分析。
能分析依赖关系。不保存数据，只关心状态转换。需要额外插件配置保存相关数据。独立的监控工具，在监控端安装agent也可实现很好的监控主动监控。节点数大于200多，估计要延时Agent+控制端架构  实现可靠监控，管理功能更加强大（路由器没法装哦）上述2种都是不适合大规模监控，一般结合使用Zabbix可以说是nagios和cacti整合，自动发现监控设备 支持分布式监控。为中小规模提供完整的解决方案
　　　　monitoring with snmp --> network device
　　　　monitoring with zabbix agent --> servers with zabbix agent
　　　　monitoring with ping or port check --> servers without zabbix agent
　　支持的平台
　　　　solaris、mac、windows、hp-ux、freebsd、unix、openbsd
ssh脚本（需要账号）

监控系统(MS)的功能

　　Data gatheringData storage VisualisationAlerting

zabbix的监控功能

zabbix agent  cpu/mem/network/disk/service/log/file/other（win性能计数器）
snmp agent
ipmi agent
agentless monitoring
web monitoring　　response time/download speed/response code/content/web scenarios/https and http　　
database monitoring
internal check
calculated monitoring
custom cammand monitoring

支持的通知方法手段

email
sms
jabber
chat message
command execution

zabbix架构

zabbix web gui
zabbix database
zabbix server (zabbix proxy)-->web pages-->icmp/ipmi/snmp:device-->agent:os

逻辑架构

zabbix常用术语

host主机: 　　要监控的网络设备，可由IP或DNS名称指定
hostgroup主机组:　　主机的逻辑容器
item监控项： 进行数据收集的核心，每个item都由“key”进行标识
trigger触发器：    一个表达式，用于评估监控对象的某特定item内所接收到的数据是否在合理范围内
event事件： 即发生的一个值得关注的事情，例如状态改变，新agent或从新上线的agent的自动注册等
action动作：    对于特定事件事先定义的处理方法，通过包含操作和条件
escalation报警升级
media媒介
notification通知：    通过选定的媒介向用户发送的有关事件的信息
remote command
template    通常包含item trigger graph screen等
applicaton：　　一组item的集合
web scennario：　　用于检测web站点可用性的一个或多个HTTP请求

监控步骤，定义一次完整的监控

1、确定zabbix监控对象，即添加主机节点  手动添加/自动发现  host-->组成hostgroup
2、定义item(item key)    item-->组成application，多台主机监控同一指标时，定义成模板     
3、定义grahp --->组成screen      展示，graph自定义整合成graph等
4、trigger定义阈值        -->产生event(discovery也产生event) 定义数据指标（依赖关系，zabbix仅运行在triger上定义依赖关系，nagios可以主机级别）
5、actions  (action   condition  operations（sms/remote command） ）

zabbix3.2通过模板监控mysql

1、zabbix-GUI链接Template App MySQL模板
2、zabbix-agent端zabbix_agentd.conf 加入UserParameter后重启zabbix-agent
　　UserParameter=mysql.status[*],/usr/local/mysql/bin/mysql -uroot -prootabcd -h127.0.0.1 -e "show global status" | awk '/$1\>/ {print $$2}'
　　UserParameter=mysql.ping,/usr/local/mysql/bin/mysqladmin -uroot -prootabcd -h 127.0.0.1 ping | grep -c alive
　　UserParameter=mysql.version,/usr/local/mysql/bin/mysql -V

zabbix3.2监控nginx status

1、开启nginx status

location /status {stub_status on;access_log off;allow 127.0.0.1;allow 192.168.0.80; deny all; }

# curl http://192.168.0.80/status
Active connections: 7 
server accepts handled requests
 67 67 136 
Reading: 0 Writing: 1 Waiting: 6

Active connections：对后端发起的活动连接数，即正在处理的活动连接数
accepts：　　　　　启动到现在共处理了 67个连接
handled：　　　　　启动到现在共成功创建67次握手，和上行相同，说明没丢失请求
requests 　　　　　表示共处理了136次请求
Reading: 　　　　　 Nginx 读取到客户端的Header信息数.
Writing: 　　　　　　Nginx 返回给客户端的Header信息数.
Waiting: 　　　　　开启keep-alive的情况下,这个值等于 active – (reading + writing),意思就是Nginx已经处理完成,正在等候下一次请求指令的驻留连接.

在访问效率很高，请求很快被处理完毕的情况下，Waiting 数比较多是正常的。如果 reading + writing 数较多，则说明并发访问量很大，正在处理过程中

2、zabbix-agent端zabbix_agentd.conf 加入UserParameter后重启zabbix-agent
　　UserParameter=Nginx.active[*],/usr/bin/curl -s "http://$1:$2/status" | awk '/^Active/ {print $NF}'
　　UserParameter=Nginx.accepts[*],/usr/bin/curl -s "http://$1:$2/status" | awk 'NR==3 {print $$1}'
　　UserParameter=Nginx.handled[*],/usr/bin/curl -s "http://$1:$2/status" | awk 'NR==3 {print $$2}'
　　UserParameter=Nginx.requests[*],/usr/bin/curl -s "http://$1:$2/status" | awk 'NR==3 {print $$3}'
　　UserParameter=Nginx.reading[*],/usr/bin/curl -s "http://$1:$2/status" | grep 'Reading' | cut -d " " -f2
　　UserParameter=Nginx.writing[*],/usr/bin/curl -s "http://$1:$2/status" | grep 'Writing' | cut -d " " -f4
　　UserParameter=Nginx.waiting[*],/usr/bin/curl -s "http://$1:$2/status" | grep 'Waiting' | cut -d " " -f6

3、在主机节点上添加item（示例）
　　Name：    Nginx.active
　　Key：     Nginx.active[{HOST.IP},80]    
　　Applications：Nginx Status

zabbix agent 类型所有 key

log[file,<regexp>,<encoding>,<maxlines>,<mode>,<output>] 监控日志文件
logrt[file_pattern,<regexp>,<encoding>,<maxlines>,<mode>,<output>] 监控轮转日志文件
net.dns[<ip>,zone,<type>,<timeout>,<count>] 检测DNS服务是否开启
net.dns.record[<ip>,zone,<type>,<timeout>,<count>] 获取DNS查询数据
net.if.collisions[if] Out-of-window collision
net.if.discovery 列出网卡，通常用于低级别的discovery
net.if.in[if,<mode>] 统计网卡进流量
net.if.out[if,<mode>] 统计网卡出流量
net.if.total[if,<mode>] 统计网卡进出流量和

net.tcp.listen[port] 检测端口是否可用
net.tcp.port[<ip>,port] 检测TCP端口是否可用
net.tcp.service[service,<ip>,<port>] 检测服务是否开启，端口是否可用
net.tcp.service.perf[service,<ip>,<port>] 检测服务性能
net.udp.listen[port] 检测UDP端口是否可用
proc.mem[<name>,<user>,<mode>,<cmdline>] 用户进程内存消耗
proc.num[<name>,<user>,<state>,<cmdline>] 某用户某些状态的进程数量
sensor[device,sensor,<mode>] 读取硬件传感器
system.boottime 系统启动的时间戳
system.cpu.intr 设备中断数
system.cpu.load[<cpu>,<mode>] CPU负载
system.cpu.num[<type>] CPU数量
system.cpu.switches CPU上下文切换次数
system.cpu.util[<cpu>,<type>,<mode>] CPU使用率
system.hostname[<type>] 返回主机名
system.hw.chassis[<info>] 返回机架信息
system.hw.cpu[<cpu>,<info>] 返回CPU信息
system.hw.devices[<type>] 列出PCI或者USB
system.hw.macaddr[<interface>,<format>] 列出MAC地址
system.localtime[<type>] 系统时间
system.run[command,<mode>] 在指定的主机上运行命令（EnableRemoteCommands=1 ）
system.stat[resource,<type>] 虚拟内存状态
system.sw.arch 返回软件信息
system.sw.os[<info>] 返回系统信息
system.sw.packages[<package>,<manager>,<format>] 已安装软件列表
system.swap.in[<device>,<type>] 磁盘到内存
system.swap.out[<device>,<type>] 内存到磁盘
system.swap.size[<device>,<type>] 交换分区大小
system.uname 返回主机相应信息
system.uptime 系统运行时常（s）
system.users.num 登陆用户数
vfs.dev.read[<device>,<type>,<mode>] 磁盘读取状态
vfs.dev.write[<device>,<type>,<mode>] 磁盘写入状态
vfs.file.cksum[file] 计算文件校验
vfs.file.contents[file,<encoding>] 获取文件内容
vfs.file.exists[file] 检测文件是否存在
vfs.file.md5sum[file] 文件MD5校验码
vfs.file.regexp[file,regexp,<encoding>,<start line>,<end line>,<output>] 文件中搜索字符串，返回行
vfs.file.regmatch[file,regexp,<encoding>,<start line>,<end line>] 文件中搜索字符串，1找到，0未找到
vfs.file.size[file 文件大小
vfs.fs.discovery 列出挂载的文件系统
vfs.fs.inode[fs,<mode>] 文件系统inode数量
vfs.fs.size[fs,<mode>] 文件系统空间
vm.memory.size[<mode>] 内存大小
web.page.get[host,<path>,<port>] 获取网页内容
web.page.perf[host,<path>,<port>] 获取完全加载网页耗时
web.page.regexp[host,<path>,<port>,<regexp>,<length>,<output>] 在网页中搜索字符串
vfs.file.time[file,<mode>] 文件时间

zabbix Simple checks 基本检测

zabbix Simple checks 基本检测 
Simple checks 通常用来检查远程未安装代理或者客户端的服务。使用 simple checks，被监控客户端无需安装 zabbix 
agent 客户端，zabbix server 直接使用 simple checks 来收据数据，一基本上都是用来检测远程服务器某端口是否在
监听#yum install fping
#grep FpingLocation /usr/local/zabbix/etc/zabbix_server.conf
FpingLocation=/usr/sbin/fping
# chown root:zabbix /usr/sbin/fping 
# chmod 4710 /usr/sbin/fping

icmpping[<target>,<packets>,<interval>,<size>,<timeout>] 检测是否支持icmpping
icmppingloss[<target>,<packets>,<interval>,<size>,<timeout>] 返回丢包率
icmppingsec[<target>,<packets>,<interval>,<size>,<timeout>,<mode>] 返回响应时间
net.tcp.service[service,<ip>,<port>] 检测服务是否运行并且接受tcp连接
net.tcp.service.perf[service,<ip>,<port>] 检测服务性能