TCollector

TCollector

tcollector is a client-side process that gathers data from local collectors and pushes the data to OpenTSDB. You run it on all your hosts, and it does the work of sending each host's data to the TSD.

tcollector是client-side(客户端)进程,收集本地数据,然后推送OPenTSDB。在所有的主机上运行,然后把主机上的数据发送给TSD。

 

OpenTSDB is designed to make it easy to collect and write data to it. It has a simple protocol, simple enough for even a shell script to start sending data. However, to do so reliably and consistently is a bit harder. What do you do when your TSD server is down? How do you make sure your collectors stay running? This is where tcollector comes in.

OpenTSDB的设计目标是让数据采集以及数据存储变得更容易。其拥有简单的协议,简单得可以支持shell脚本发送数据。但是,如何做到可靠性和一致性是个比较难的事情。当TSD server故障的时候如何处理?如何确保你的collectors保持持续运行?tcollector可以做到这些。

 

Tcollector does several things for you:

  • Runs all of your data collectors and gathers their data
  • Does all of the connection management work of sending data to the TSD
  • You don't have to embed all of this code in every collector you write
  • Does de-duplication of repeated values
  • Handles all of the wire protocol work for you, as well as future enhancements
Tcollector做的几件事情:
  • 运行所有的数据采集程序,并将数据收集起来
  • 做所有发送数据到TSD的连接管理工作
  • 在你自己实现的collector中无需实现这个功能
  • 对于重复的数据会做删除工作
  • 可以处理各种传输协议,包括一些增强版本

 

Deduplication

Typically you want to gather data about everything in your system. This generates a lot of datapoints, the majority of which don't change very often over time (if ever). However, you want fine-grained resolution when they do change. Tcollector remembers the last value and timestamp that was sent for all of the time series for all of the collectors it manages. If the value doesn't change between sample intervals, it suppresses sending that datapoint. Once the value does change (or 10 minutes have passed), it sends the last suppressed value and timestamp, plus the current value and timestamp. In this way all of your graphs and such are correct. Deduplication typically reduces the number of datapoints TSD needs to collect by a large fraction. This reduces network load and storage in the backend. A future OpenTSDB release however will improve on the storage format by using RLE (among other things), making it essentially free to store repeated values.

通常情况下,你想采集系统内所有的数据。这将会产生很多的数据点,大部分的数据点在一段时间内是不会变化的。但是,当有变化的时候,你期望平滑(fine-grained solution)的解决方案。Tcollector在一次采集周期内会记录为每个collector记录最后一次值以及采集时间。如果在相同的时间间隔内,值没有变化的话,它会这个数据点暂时保存起来,不发送。一旦数据发送变化(10分钟时间过去),它将发送最后的值以及时间戳,加上当前的值和时间戳。这样的话,你的数据图就是正确的。重复数据删除会减少TSD中数据点的个数,而这些数据点是需要采集的。这样的话,会减少后端的网络开销以及存储。OpenTSD将使用RLE改善存储的格式,使更本质地解决重复数据存储问题。

Collecting lots of metrics with tcollector

Collectors in tcollector can be written in any language. They just need to be executable and output the data to stdout. Tcollector will handle the rest. The collectors are placed in the collectors directory. Tcollector iterates over every directory named with a number in that directory and runs all the collectors in each directory. If you name the directory 60, then tcollector will try to run every collector in that directory every 60 seconds. Use the directory 0 for any collectors that are long-lived and run continuously. Tcollector will read their output and respawn them if they die. Generally you want to write long-lived collectors since that has less overhead. OpenTSDB is designed to have lots of datapoints for each metric (for most metrics we send datapoints every 15 seconds).

If there any non-numeric named directories in the collectors directory, then they are ignored. We've included a lib and etc directory for library and config data used by all collectors.

在tcollector中的Collectors可以支持任何语言。只需要它们是可执行的,并且将输出结果输出到标准输出中。Tcollector会处理这些结果。

这些collectors放在collectors目录下。Tcolletor遍历每个以数字命名的目录,然后运行每个目录下所有的collectors。如果你的目录命名为60,tcollcetor将每60s运行这个目录下每个collector。使用目录0表示是一个常住的进行,一直运行。Tcollector将读取它们的输出,如果进程挂了,tcollector将重启它。通常情况下,你想写一个常住的collecotrs,因为这样开销相对比较小。OpenTSDB设计是针对每个metric有很多数据点,对于大部分的metrics每15s发送一次数据。

如果在collectors中有不是数据命名的目录,将被忽略。同时还包括collectors使用到的lib和etc目录。

 

Installation of tcollector

You need to clone tcollector from GitHub:

git clone git://github.com/OpenTSDB/tcollector.git

and edit 'tcollector/startstop' script to set following variable: TSD_HOST=dns.name.of.tsd

To avoid having to run mkmetric for every metric that tcollector tracks you can to start TSD with the --auto-metric flag. This is useful to get started quickly, but it's not recommended to keep this flag in the long term, to avoid accidental metric creation.

在GitHub上下载相关代码:

git clone git://github.com/OpenTSDB/tcollector.git

在startstop中修改TSD_HOST= dns.name.of.tsd,TSD主机的域名。

为了避免每个metric都运行mkmetric,使用--auto-metric启动TSD。这样启动更快一些,但是不推荐在长时间保持这个设置,避免新增的metric。

 

Collectors bundled with tcollector

The following are the collectors we've included as part of the base package, together with all of the metric names they report on and what they mean. If you have any others you'd like to contribute, we'd love to hear about them so we can reference them or include them with your permission in a future release.

下面的collector是系统自带的基础包,每个metric的实际意思和名称吻合。

General collectors

0/dfstat.py

df状态相关的,和/usr/bin/df命令类似

These stats are similar to ones provided by /usr/bin/df util.

  • df.bytes.total

    total size of data

  • df.bytes.used

    bytes used

  • df.bytes.free

    bytes free

  • df.inodes.total

    total number of inodes

  • df.inodes.used

    number of inodes used

  • df.inodes.free

    number of inodes free

These metrics include time series tagged with each mount point and the filesystem's fstype. This collector filters out any cgroup, debugfs, devtmpfs, rpc_pipefs, rootfs filesystems, as well as any any mountpoints mounted under /dev//sys//proc/, and /lib/.

这些metric包括时间序列以及文件系统的fstype。这个collector过滤任何的cgroup,debugfs,devtmpfs,rpc_pipefs,rootfs等文件系统,以及/dev,/sys/,/proc/,/lib/等挂着点。

With these tags you can select to graph just a specific filesystem, or all filesystems with a particular fstype (e.g. ext3).

有了这些tags,你可以选择指定文件系统的图,也可以选择特定fstype对应的所有文件系统。

输出结果如下图:

[root@etch171 mars171 0]# ./dfstat.py   
df.bytes.total 1413306095 4159016960 mount=/ fstype=ext3
df.bytes.used 1413306095 3396472832 mount=/ fstype=ext3
df.bytes.percentused 1413306095 81.6652796722 mount=/ fstype=ext3
df.bytes.free 1413306095 762544128 mount=/ fstype=ext3
df.inodes.total 1413306095 1048576 mount=/ fstype=ext3
df.inodes.used 1413306095 74363 mount=/ fstype=ext3
df.inodes.percentused 1413306095 7.09180831909 mount=/ fstype=ext3
df.inodes.free 1413306095 974213 mount=/ fstype=ext3
df.bytes.total 1413306095 241564782592 mount=/data1 fstype=ext3
df.bytes.used 1413306095 202218672128 mount=/data1 fstype=ext3
df.bytes.percentused 1413306095 83.7119839896 mount=/data1 fstype=ext3
df.bytes.free 1413306095 39346110464 mount=/data1 fstype=ext3
df.inodes.total 1413306095 60882944 mount=/data1 fstype=ext3
df.inodes.used 1413306095 645826 mount=/data1 fstype=ext3
df.inodes.percentused 1413306095 1.06076670668 mount=/data1 fstype=ext3
df.inodes.free 1413306095 60237118 mount=/data1 fstype=ext3
......

  

0/ifstat.py

来自于文件/proc/net/dev

These stats are from /proc/net/dev.

  • proc.net.bytes

    (rate) Bytes in/out

  • proc.net.packets

    (rate) Packets in/out

  • proc.net.errs

    (rate) Packet errors in/out

  • proc.net.dropped

    (rate) Dropped packets in/out

These are interface counters, tagged with the interface, iface=, and direction= in or out. Only ethN interfaces are tracked. We intentionally exclude bondN interfaces, because bonded interfaces still keep counters on their child ethN interfaces and we don't want to double-count a box's network traffic if you don't select on iface=.

输出的结果是和具体iface绑定,有in和out两个方向。只有ethN的网卡接口会跟踪。有意识地排除bondN接口,因为绑定的接口在ethN接口中进行记数,如果不选择具体iface=的话,这样可以避免double-count。

输出结果:

proc.net.fifo.errs 1413338912 0 iface=eth0 direction=in
proc.net.frame.errs 1413338912 0 iface=eth0 direction=in
proc.net.compressed 1413338912 0 iface=eth0 direction=in
proc.net.multicast 1413338912 6869312 iface=eth0 direction=in
proc.net.bytes 1413338912 1064085376 iface=eth0 direction=out
proc.net.packets 1413338912 7305051 iface=eth0 direction=out
proc.net.errs 1413338912 0 iface=eth0 direction=out
proc.net.dropped 1413338912 0 iface=eth0 direction=out
proc.net.fifo.errs 1413338912 0 iface=eth0 direction=out
proc.net.collisions 1413338912 0 iface=eth0 direction=out
proc.net.carrier.errs 1413338912 0 iface=eth0 direction=out
proc.net.compressed 1413338912 0 iface=eth0 direction=out
proc.net.bytes 1413338912 100779466516 iface=eth1 direction=in
proc.net.packets 1413338912 862873063 iface=eth1 direction=in
proc.net.errs 1413338912 124 iface=eth1 direction=in
proc.net.dropped 1413338912 0 iface=eth1 direction=in
proc.net.fifo.errs 1413338912 0 iface=eth1 direction=in
proc.net.frame.errs 1413338912 124 iface=eth1 direction=in
proc.net.compressed 1413338912 0 iface=eth1 direction=in
proc.net.multicast 1413338912 781541 iface=eth1 direction=in
proc.net.bytes 1413338912 90765358317 iface=eth1 direction=out
proc.net.packets 1413338912 976995995 iface=eth1 direction=out
proc.net.errs 1413338912 0 iface=eth1 direction=out
proc.net.dropped 1413338912 0 iface=eth1 direction=out

 

0/iostat.py

Data is from /proc/diskstats.

  • iostat.disk.*

    per-disk stats

  • iostat.part.*

    per-partition stats (see note below on different metrics, depending on if you have a 2.6 kernel before 2.6.25 or after.)

See iostats.txt

[root@typhoeus79 ice_test_m avaliables]# more /proc/diskstats 1       0 ram0 0 0 0 0 0 0 0 0 0 0 01       1 ram1 0 0 0 0 0 0 0 0 0 0 01       2 ram2 0 0 0 0 0 0 0 0 0 0 01       3 ram3 0 0 0 0 0 0 0 0 0 0 01       4 ram4 0 0 0 0 0 0 0 0 0 0 01       5 ram5 0 0 0 0 0 0 0 0 0 0 01       6 ram6 0 0 0 0 0 0 0 0 0 0 01       7 ram7 0 0 0 0 0 0 0 0 0 0 01       8 ram8 0 0 0 0 0 0 0 0 0 0 01       9 ram9 0 0 0 0 0 0 0 0 0 0 01      10 ram10 0 0 0 0 0 0 0 0 0 0 01      11 ram11 0 0 0 0 0 0 0 0 0 0 01      12 ram12 0 0 0 0 0 0 0 0 0 0 01      13 ram13 0 0 0 0 0 0 0 0 0 0 01      14 ram14 0 0 0 0 0 0 0 0 0 0 01      15 ram15 0 0 0 0 0 0 0 0 0 0 08       0 sda 194745 287649 6810384 578134 68316366 101295831 1361828191 887830852 0 157754620 8883733288       1 sda1 5048 1768 178050 23162 130155 188328 2548968 2781202 0 722873 28043158       2 sda2 1100 5771 53512 4594 506 24646 201216 10205 0 7300 147988       3 sda3 53769 7820 1194332 125424 13980592 15361786 234893716 116798689 0 55437523 1169133338       4 sda4 2 0 4 34 0 0 0 0 0 34 348       5 sda5 5325 158518 165019 5156 7897969 16671932 196569642 65590406 0 28552584 655889618       6 sda6 67094 57688 2033043 200871 42956415 34695714 621323346 634120014 0 86205636 6343039608       7 sda7 62381 56041 3185872 218802 3350729 34353425 306291303 68530336 0 11667279 687478303       0 hda 0 0 0 0 0 0 0 0 0 0 09       0 md0 0 0 0 0 0 0 0 0 0 0 0

/proc/diskstats has 11 stats for a given physical device. These are all rate counters, except ios_in_progress.

/proc/diskstats对于物理设备有11个状态,下面是这些值:

.read_requests       Number of reads completed        已经完成读的数目
.read_merged         Number of reads merged           合并读的数目
.read_sectors        Number of sectors read           扇区读的数目
.msec_read           Time in msec spent reading       
.write_requests      Number of writes completed
.write_merged        Number of writes merged
.write_sectors       Number of sectors written
.msec_write          Time in msec spent writing
.ios_in_progress     Number of I/O operations in progress
.msec_total          Time in msec doing I/O
.msec_weighted_total Weighted time doing I/O (multiplied by ios_in_progress)

in 2.6.25 and later, by-partition stats are reported the same as disks.

Note

in 2.6 before 2.6.25, partitions have only 4 stats per partition

.read_issued
.read_sectors
.write_issued
.write_sectors

For partitions, these *_issued are counters collected before requests are merged, so aren't the same as *_requests (which is post-merge, which more closely represents represents the actual number of disk transactions).

Given that diskstats provides both per-disk and per-partition data, for TSDB purposes we put them under different metrics (versus the same metric and different tags). Otherwise, if you look at a given metric, the data for a given box will be double-counted, since a given operation will increment both the disk series and the partition series. To fix this, we output by-disk data to iostat.disk.* and by-partition data to iostat.part.*.

两种不同的维度

0/netstat.py

Socket分配以及网络统计,读取的文件是

 78         sockstat = open("/proc/net/sockstat")79         netstat = open("/proc/net/netstat")80         snmp = open("/proc/net/snmp")

  

例子:

[root@eos176 data1]# cat /proc/net/sockstat 
sockets: used 200
TCP: inuse 88 orphan 2 tw 290 alloc 89 mem 39
UDP: inuse 8 mem 2
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0

 

Socket allocation and network statistics.

Metrics from /proc/net/sockstat.

  • net.sockstat.num_sockets

    Number of sockets allocated (only TCP)

  • net.sockstat.num_timewait

    Number of TCP sockets currently in TIME_WAIT state

  • net.sockstat.sockets_inuse

    Number of sockets in use (TCP/UDP/raw)

  • net.sockstat.num_orphans

    Number of orphan TCP sockets (not attached to any file descriptor)

  • net.sockstat.memory

    Memory allocated for this socket type (in bytes)

  • net.sockstat.ipfragqueues

    Number of IP flows for which there are currently fragments queued for reassembly

Metrics from /proc/net/netstat (netstat -s command).

  • net.stat.tcp.abort

    Number of connections that the kernel had to abort. <code>type=memory</code> is especially bad, the kernel had to drop a connection due to having too many orphaned sockets. Other types are normal (e.g. timeout)

  • net.stat.tcp.abort.failed

    Number of times the kernel failed to abort a connection because it didn't even have enough memory to reset it (bad)

  • net.stat.tcp.congestion.recovery

    Number of times the kernel detected spurious retransmits and was able to recover part or all of the CWND

  • net.stat.tcp.delayedack

    Number of delayed ACKs sent of different types.

  • net.stat.tcp.failed_accept

    Number of times a connection had to be dropped after the 3WHS. reason=full_acceptq indicates that the application isn't accepting connections fast enough. You should see SYN cookies too

  • net.stat.tcp.invalid_sack

    Number of invalid SACKs we saw of diff types. (requires Linux v2.6.24-rc1 or newer)

  • net.stat.tcp.memory.pressure

    Number of times a socket entered the "memory pressure" mode (not great).

  • net.stat.tcp.memory.prune

    Number of times a socket had to discard received data due to low memory conditions (bad)

  • net.stat.tcp.packetloss.recovery

    Number of times we recovered from packet loss by type of recovery (e.g. fast retransmit vs SACK)

  • net.stat.tcp.receive.queue.full

    Number of times a received packet had to be dropped because the socket's receive queue was full. (requires Linux v2.6.34-rc2 or newer)

  • net.stat.tcp.reording

    Number of times we detected re-ordering and how

  • net.stat.tcp.syncookies

    SYN cookies (both sent &amp; received 

输出结果:

[root@typhoeus79 ice_test_m avaliables]# ./iostat.py   
iostat.disk.read_requests 1413341296 194745 dev=sda
iostat.disk.read_merged 1413341296 287649 dev=sda
iostat.disk.read_sectors 1413341296 6810384 dev=sda
iostat.disk.msec_read 1413341296 578134 dev=sda
iostat.disk.write_requests 1413341296 68320119 dev=sda
iostat.disk.write_merged 1413341296 101301793 dev=sda
iostat.disk.write_sectors 1413341296 1361905911 dev=sda
iostat.disk.msec_write 1413341296 887834437 dev=sda
iostat.disk.ios_in_progress 1413341296 0 dev=sda
iostat.disk.msec_total 1413341296 157756976 dev=sda
iostat.disk.msec_weighted_total 1413341296 888376910 dev=sda

  

0/nfsstat.py--RPC统计

These stats are from /proc/net/rpc/nfs.

  • nfs.client.rpc.stats

    RPC stats counter

It tagged with the type (<code>type=</code>) of operation. There are 3 operations: authrefrsh - number of times the authentication information refreshed, calls - number of calls conducted, and retrans - number of retransmissions

  • nfs.client.rpc

    RPC calls counter

It tagged with the version (version=) of NFS server that conducted the operation, and name of operation (op=)

Description of operations can be found at appropriate RFC: NFS ver. 3 RFC1813, NFS ver. 4 RFC3530, NFS ver. 4.1 RFC5661.

 

0/procnettcp.py

读取文件是/proc/net/tcp{,6}

 

These stats are all from /proc/net/tcp{,6}. (Note if IPv6 is enabled, some IPv4 connections seem to get put into /proc/net/tcp6). Collector sleeps 60 seconds in between intervals. Due in part to a kernel performance issue in older kernels and in part due to systems with many TCP connections, this collector can take sometimes 5 minutes or more to run one interval, so the frequency of datapoints can be highly variable depending on the system.

  • proc.net.tcp

    Number of TCP connections

For each run of the collector, we classify each connection and generate subtotals. TSD will automatically total these up when displaying the graph, but you can drill down for each possible total or a particular one. Each connection is broken down with a tag for user=username(with a fixed list of users we care about or put under "other" if not in the list). It is also broken down into state with state=, (established, time_wait, etc). It is also broken down into services with <code>service=</code> (http, mysql, memcache, etc) Note that once a connection is closed, Linux seems to forget who opened/handled the connection. For connections in time_wait, for example, they will always show user=root. This collector does generate a large amount of datapoints, as the number of points is (S*(U+1)*V), where S=number of TCP states, U=Number of users you track, and V=number of services (collections of ports). The deduper does dedup this down very well, as only 3 of the 10 TCP states are generally ever seen. On a typical server this can dedup down to under 10 values per interval.

 

0/procstats.py

Miscellaneous stats from /proc.

  • proc.stat.cpu

    (rate) CPU counters (jiffies), tagged by cpu type (type=user, nice, system, idle, iowait, irq, softirq, etc). As a rate they should aggregate up to approximately 100*numcpu per host. Best viewed as type=* or maybe type={user|nice|system|iowait|irq}

  • proc.stat.intr

    (rate) Number of interrupts

  • proc.stat.ctxt

    (rate) Number of context switches

See http://www.linuxhowtos.org/System/procstat.htm

  • proc.vmstat.*

    A subset of VM Stats from /proc/vmstat (mix of rate and non-rate). See http://www.linuxinsight.com/proc_vmstat.html .

  • proc.meminfo.*

    Memory usage stats from /proc/meminfo. See the Linux kernel documentation

  • proc.loadavg.*

    1min, 5min, 15min, runnable, total_threads metrics from /proc/loadavg

  • proc.uptime.total

    (rate) Seconds since boot

  • proc.uptime.now

    (rate) Seconds since boot that the system has been idle

  • proc.kernel.entropy_avail

    Amount of entropy (in bits) available in the input pool (the one that's cryptographically strong and backing /dev/random among other things). Watch this value on your frontend servers that do SSL unwrapping, if it gets too low, your SSL performance will suffer

  • sys.numa.zoneallocs

    Number of pages allocated from the preferred node (type=hit) or not (type=miss)

  • sys.numa.foreign_allocs

    Number of pages this node allocated because the preferred node didn't have a free page to accommodate the request

  • sys.numa.allocation

    Number of pages allocated locally (type=local) or remotely (type=remote) for processes executing on this node

  • sys.numa.interleave

    Number of pages allocated successfully by the interleave strategy

     

0/smart-stats.py

Stats from SMART disks.

  • smart.raw_read_error_rate

    Data related to the rate of hardware read errors that occurred when reading data from a disk surface. The raw value has different structure for different vendors and is often not meaningful as a decimal number. (vendor specific)

  • smart.throughput_performance

    Overall throughput performance of a hard disk drive

  • smart.spin_up_time

    Average time of spindle spin up (from zero RPM to fully operational [millisecs])

  • smart.start_stop_count

    A tally of spindle start/stop cycles

  • smart.reallocated_sector_ct

    Count of reallocated sectors

  • smart.seek_error_rate

    Rate of seek errors of the magnetic heads. (vendor specific)

  • smart.seek_time_performance

    Average performance of seek operations of the magnetic heads

  • smart.power_on_hours

    Count of hours in power-on state, shows total count of hours (or minutes, or seconds) in power-on state. (vendor specific)

  • smart.spin_retry_count

    Count of retry of spin start attempts

  • smart.recalibration_retries

    The count that recalibration was requested (under the condition that the first attempt was unsuccessful)

  • smart.power_cycle_count

    The count of full hard disk power on/off cycles

  • smart.soft_read_error_rate

    Uncorrected read errors reported to the operating system

  • smart.program_fail_count_chip

    Total number of Flash program operation failures since the drive was deployed

  • smart.erase_fail_count_chip

    "Pre-Fail" Attribute

  • smart.wear_leveling_count

    The maximum number of erase operations performed on a single flash memory block

  • smart.used_rsvd_blk_cnt_chip

    The number of a chip’s used reserved blocks

  • smart.used_rsvd_blk_cnt_tot

    "Pre-Fail" Attribute (at least HP devices)

  • smart.unused_rsvd_blk_cnt_tot

    "Pre-Fail" Attribute (at least Samsung devices)

  • smart.program_fail_cnt_total

    Total number of Flash program operation failures since the drive was deployed

  • smart.erase_fail_count_total

    "Pre-Fail" Attribute

  • smart.runtime_bad_block

    The total count of all read/program/erase failures

  • smart.end_to_end_error

    The count of parity errors which occur in the data path to the media via the drive's cache RAM (at least Hewlett-Packard)

  • smart.reported_uncorrect

    The count of errors that could not be recovered using hardware ECC

  • smart.command_timeout

    The count of aborted operations due to HDD timeout

  • smart.high_fly_writes

    HDD producers implement a Fly Height Monitor that attempts to provide additional protections for write operations by detecting when a recording head is flying outside its normal operating range. If an unsafe fly height condition is encountered, the write process is stopped, and the information is rewritten or reallocated to a safe region of the hard drive. This attribute indicates the count of these errors detected over the lifetime of the drive

  • smart.airflow_temperature_celsius

    Airflow temperature

  • smart.g_sense_error_rate

    The count of errors resulting from externally induced shock & vibration

  • smart.power-off_retract_count

    The count of times the heads are loaded off the media

  • smart.load_cycle_count

    Count of load/unload cycles into head landing zone position

  • smart.temperature_celsius

    Current internal temperature

  • smart.hardware_ecc_recovered

    The count of errors that were recovered using hardware ECC

  • smart.reallocated_event_count

    Count of remap operations. The raw value of this attribute shows the total count of attempts to transfer data from reallocated sectors to a spare area

  • smart.current_pending_sector

    Count of "unstable" sectors (waiting to be remapped, because of unrecoverable read errors)

  • smart.offline_uncorrectable

    The total count of uncorrectable errors when reading/writing a sector

  • smart.udma_crc_error_count

    The count of errors in data transfer via the interface cable as determined by ICRC (Interface Cyclic Redundancy Check)

  • smart.write_error_rate

    The total count of errors when writing a sector

  • smart.media_wearout_indicator

    The normalized value of 100 (when the SSD is new) and declines to a minimum value of 1

  • smart.transfer_error_rate

    Count of times the link is reset during a data transfer

  • smart.total_lba_writes

    Total count of LBAs written

  • smart.total_lba_read

    Total count of LBAs read

Description of metrics can be found at: S.M.A.R.T. article on wikipedia. The best way to understand/find metric is to look at producer's specification.

 

Other collectors

0/couchbase.py

0/elasticsearch.py

0/hadoop_datanode_jmx.py

0/haproxy.py

0/hbase_regionserver_jmx.py

0/mongo.py

0/mysql.py

Stats from MySQL (relational database).

Refer to the following documentation for metrics description: InnoDB Innodb monitors, Global Show status, Engine Show engine, Slave Show slave status, Process list Show process list.

 

0/postgresql.py

0/redis-stats.py

Stats from Redis (key-value store).

Refer to the following documentation for metrics description: Redis info comands.

0/riak.py

0/varnishstat.py

Stats from Varnish (HTTP accelerator).

0/zookeeper.py

Stats from Zookeeper (centralized service for distributed synchronization).

Refer to the following documentation for metrics description: Zookeeper admin commands.

代码结构如下:

[root@etch171 mars171 collectors]# tree -L 2
.
|-- 0
|   |-- couchbase.py
|   |-- dfstat.py
|   |-- elasticsearch.py
|   |-- graphite_bridge.py
|   |-- hadoop_datanode.py
|   |-- hadoop_namenode.py
|   |-- haproxy.py
|   |-- hbase_master.py
|   |-- hbase_regionserver.py
|   |-- ifstat.py
|   |-- iostat.py
|   |-- mongo.py
|   |-- mysql.py
|   |-- netstat.py
|   |-- nfsstat.py
|   |-- opentsdb.sh
|   |-- postgresql.py
|   |-- procnettcp.py
|   |-- procstats.py
|   |-- redis-stats.py
|   |-- riak.py
|   |-- smart-stats.py
|   |-- udp_bridge.py
|   |-- varnishstat.py
|   |-- zabbix_bridge.py
|   |-- zfsiostats.py
|   |-- zfskernstats.py
|   `-- zookeeper.py
|-- __init__.py
|-- etc
|   |-- __init__.py
|   |-- config.py
|   |-- graphite_bridge_conf.py
|   |-- mysqlconf.py
|   |-- postgresqlconf.py
|   |-- udp_bridge_conf.py
|   `-- zabbix_bridge_conf.py
`-- lib|-- __init__.py|-- hadoop_http.py`-- utils.py

  

 

【参考资料】

1、http://opentsdb.net/docs/build/html/user_guide/utilities/tcollector.html

2、http://en.wikipedia.org/wiki/Wire_protocol

3、http://www.ttlsa.com/opentsdb/opentsdb-nagios-monitoring-and-alarming-realization/

转载于:https://www.cnblogs.com/gsblog/p/4025482.html

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/273569.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

设计模式 之 工厂模式

项目源码&#xff1a;https://gitee.com/Jacob-gitee/DesignMode 个人博客&#xff1a;https://jacob.org.cn 女娲造人的故事 东汉《风俗通》记录了一则神话故事&#xff1a;“开天辟地&#xff0c;未有人民&#xff0c;女娲搏黄土做人”&#xff0c;讲述的内容就是大家非常熟…

设计模式 之 单例模式

项目源码&#xff1a;https://gitee.com/Jacob-gitee/DesignMode 个人博客&#xff1a;https://jacob.org.cn 宗旨 Ensure a class has only one instance,and provide a global point of access to it.&#xff08;确保某一个类只有一个实例&#xff0c;而且自行实例化并向整个…

如何实现滑动scrollview上下隐藏

问题描述现在有一个需求&#xff0c;就是一个界面如下ABCA固定在顶部&#xff0c;C固定在底部其中B是一个scrollview(也可能是listview)&#xff0c;要实现&#xff0c;在向上滑动B的时候&#xff0c;A平滑的往上滑&#xff0c;同时C平滑的往下滑&#xff0c;直到消失&#xff…

设计模式 之 抽象工厂模式

项目源码&#xff1a;https://gitee.com/Jacob-gitee/DesignMode 个人博客 &#xff1a;https://jacob.org.cn 女娲的失误 工厂模式中讲了女娲造人的故事。人是造出来了&#xff0c;世界也热闹了&#xff0c;可是低头一看&#xff0c;都是清一色的类型&#xff0c;缺少关爱、仇…

strip 命令的使用方法

用途 通过除去绑定程序和符号调试程序使用的信息&#xff0c;降低扩展公共对象文件格式&#xff08;XCOFF&#xff09;的对象文件的大小。 语法 strip [ -V ] [ -r [ -l ] | -x [ -l ] | -t | -H | -e | -E ] [ -X {32 |64 |32_64 }] [ -- ] File ... 描…

设计模式 之 模板模式

项目源码&#xff1a;https://gitee.com/Jacob-gitee/DesignMode 个人博客 &#xff1a;http://jacob.org.cn 女娲的失误 工厂模式中讲了女娲造人的故事。人是造出来了&#xff0c;世界也热闹了&#xff0c;可是低头一看&#xff0c;都是清一色的类型&#xff0c;缺少关爱、仇…

使用Java高速实现进度条

基于有人问到如何做进度条&#xff0c;以下给个简单的做法&#xff1a; 主要是使用JProgressBar&#xff08;Swing内置javax.swing.JProgressBar&#xff09;和SwingWorker&#xff08;Swing内置javax.swing.SwingWorker&#xff09; 有人肯定会说&#xff0c;不是用线程做的吗…

Linux 安装JDK

个人博客 &#xff1a;https://www.siyuan.run CSDN&#xff1a;https://blog.csdn.net/siyuan 微信小程序&#xff1a;思远Y 安装时使用到的命令&#xff1a; cd&#xff1a;切换目录。 eg&#xff1a;cd / mkdir&#xff1a;创建目录。 eg&#xff1a;mkdir jacob 创建单极目…

Css导航

<div> <ul> <li><a></a></li> <li><a></a></li> <li><a></a></li> .. </ul> </div> <li>中也可包含 <ul> <a></a> <li><a></a>&…

关于js的function.来自百度知道的回答,学习了.

在js中&#xff0c;创建一个函数对象的语法是var myFunction new Function(arg1,…,agrN, body);其中&#xff0c;该函数对象的N个参数放在 函数主体参数body的前面&#xff0c;即函数主体参数必须放在参数列表的最后&#xff0c;也可以无参数new Function(body)。你添加第三个…

Ribbon 支持的9大负载均衡策略

个人博客 &#xff1a;https://www.siyuan.run CSDN&#xff1a;https://blog.csdn.net/siyuan 微信小程序&#xff1a;思远Y 线性轮询策略&#xff1a; RoundRibbonRule BaseLoadBalancer 负载均衡器默认采用线性负载轮询负载均衡策略。 工作流程&#xff1a; RoundRibbonRule…

fedora20开机启动配置:systemctl

老版fedora中使用chkconfig配置开机启动&#xff0c;fedora20中&#xff0c;使用chkconfig会出现各种问题。使用systemctl配置。 具体表格如下 转载于:https://www.cnblogs.com/hh6plus/p/5548083.html

Mysql 字符操作函数相关

常用的字符串函数&#xff1a; 函数说明CONCAT(s1,s2&#xff0c;...)返回一个或多个待拼接的内容&#xff0c;任意一个为NULL则返回值为NULL。CONCAT_WS(x,s1,s2,...)返回多个字符串拼接之后的字符串&#xff0c;每个字符串之间有一个x。SUBSTRING(s,n,len)、MID(s,n,len)两个…

“cvSnakeImage”: 找不到标识符

1>g:\project\opencv\helloopencv\helloopencv\helloopencv.cpp(74) : error C2065: “CV_VALUE”: 未声明的标识符1>g:\project\opencv\helloopencv\helloopencv\helloopencv.cpp(74) : error C3861: “cvSnakeImage”: 找不到标识符 增加头文件 #include <opencv2/l…

Shell 快速入门

个人博客 &#xff1a;https://www.siyuan.run CSDN&#xff1a;https://blog.csdn.net/siyuan 微信小程序&#xff1a;思远Y 概述 Shell 是一个用 C 语言编写的程序&#xff0c;它是用户使用 Linux 的桥梁。Shell 既是一种命令语言&#xff0c;又是一种程序设计语言。 Shell…

Andriod开发 --插件安装、环境配置、问题集锦

1.用Eclipse搭建Android开发环境和创建第一个Android项目&#xff08;Windows平台&#xff09; 链接阅读http://www.cnblogs.com/allenzheng/archive/2012/11/10/2762379.html 搭建环境中的不同之处&#xff1a; &#xff08;1&#xff09;我在安装过程中&#xff0c;在安装ADT…

《Java 高并发》01 高并发基本概念

基本概念 同步和异步 同步和异步通常是用来形容一次方法调用。 同步方法调用一旦开始&#xff0c;调用者必须等到方法返回才能继续执行后续操作。 异步方法调用更像一个消息传递&#xff0c;一旦开始&#xff0c;方法调用就会立即返回&#xff0c;调用者就可以继续后续的操…

Android之Http网络编程(四)

前面几篇博文简单的介绍了一些常见的Http的操作&#xff0c;这些操作几乎都是在新开的线程中进行的网络请求&#xff0c;并在日志中打印出获取到的网络数据。那么&#xff0c;问题来了&#xff01;&#xff08;呃~感觉下一句是蓝翔有木有&#xff1f;&#xff09;如何在把获取到…

《Java 高并发》02 多线程的特性

多线程的三大特性&#xff1a;原子性、可见性和有序性。 原子性 原子性是指一个操作或者多个操作&#xff0c;一旦开始就不会被其他线程干扰&#xff0c;即使是在多个线程一起执行的情况下也不会被干扰。或者不执行。 原子性主要是为了保证数据一致&#xff0c;线程安全问题…

U3D-FSM有限状态机的简单设计

http://coder.beitown.com/archives/592 在之前的文章里介绍了一个基础U3D状态机框架&#xff08;Unity3D游戏开发之状态流框架&#xff09;即大Switch的枚举状态控制。这种方法虽然容易理解&#xff0c;编程方法也相对简单&#xff0c;但是弊端是当状态变得复杂之后&#xff0…