nagios进程监控check_ps.sh
nagios-plugins自带的有一个check_procs插件。用来做进程监控,不过该插件只能做已存在的进程fork超出多少个子进程的监控告警。而对进程是否存在不能做监控。对进程的监控可以借助check_ps.sh这个脚本去实现,而且其还可以配合pnp4nagios方便的出图。
使用方法是,打开check_ps.sh在nagios上的项目页 ,下载check_ps.sh。将其放到/usr/local/nagios/libexec目录。将check_ps.php放到/usr/local/pnp4nagios/share/templates目录。具体用法如下:
1[root@jjh-cc libexec]# ./check_ps.sh -h
2check_ps.sh -p firefox [-w 10] [-c 20] [-t cpu]
3Options:
4 -p/--process)
5 You need to provide a string for which the ps output is then
6 then "greped".
7 -w/--warning)
8 Defines a warning level for a target which is explained
9 below. Default is: off
10 -c/--critical)
11 Defines a critical level for a target which is explained
12 below. Default is: off
13 -t/--target)
14 A target can be defined via -t. Choose between cpu and mem.
15 Default is: mem
默认告警阀值是对内存的,可以通过-t参数设置为对cpu告警。也可以只对输出值做监控,不做告警。如下:
1[root@cc libexec]# /usr/local/nagios/libexec/check_ps.sh -p 'JjhControlServerMain'
2OK - Process: JjhControlServerMain, User: www, CPU: 1.0%, RAM: 31.2%, Start: Nov30, CPU Time: 8049 min | 'cpu'=1.0 'memory'=31.2 'cputime'=8049
面是监控的一个java程序及其输出。配合check_ps.php模板,输出的图样如下:
不过官方上传的模板写的有一个小错误,多了一个连接符。导致模板使用时会报错:Notice: Undefined offset: 2 。正确的模板代码如下:
1<?php $opt[1] = "--vertical-label "percent" -u 100 -l 0 -r --title "CPU/Memory Usage for $hostname / $servicedesc" ";
2$opt[2] = "--vertical-label "minutes" -u 100 -l 0 -r --title "cputime for $hostname / $servicedesc" ";
3$def[1] = "DEF:cpu=$rrdfile:$DS[1]:AVERAGE " ;
4$def[1] .= "DEF:memory=$rrdfile:$DS[2]:AVERAGE " ;
5$def[2] = "DEF:cputime=$rrdfile:$DS[3]:AVERAGE " ; #该句前面没有'.'连接符,官方上传的有,会报错
6$def[1] .= "COMMENT:"\t\t\tLAST\t\t\tAVERAGE\t\t\tMAX\n" " ;
7$def[2] .= "COMMENT:"\t\t\tLAST\t\t\tAVERAGE\t\t\tMAX\n" " ;
8$def[1] .= "LINE2:cpu#E80C3E:"CPU\t\t" " ;
9$def[1] .= "GPRINT:cpu:LAST:"%6.2lf %%\t\t" " ;
10$def[1] .= "GPRINT:cpu:AVERAGE:"%6.2lf \t\t" " ;
11$def[1] .= "GPRINT:cpu:MAX:"%6.2lf \n" " ;
12$def[1] .= "LINE2:memory#008000:"Memory\t" " ;
13$def[1] .= "GPRINT:memory:LAST:"%6.2lf %%\t\t" " ;
14$def[1] .= "GPRINT:memory:AVERAGE:"%6.2lf \t\t" " ;
15$def[1] .= "GPRINT:memory:MAX:"%6.2lf \n" " ;
16$def[2] .= "AREA:cputime#E80C3E:"CPUTime\t" " ;
17$def[2] .= "GPRINT:cputime:LAST:"%6.2lf min\t\t" " ;
18$def[2] .= "GPRINT:cputime:AVERAGE:"%6.2lf min\t\t" " ;
19$def[2] .= "GPRINT:cputime:MAX:"%6.2lf min\n" " ;
20??>
该脚本有一个缺点:对于多进程的进程名的监控输出结果不准确。如nginx进程,其可能有多个子进程,我们想要输出所有nginx进程一共使用的CPU和MEM的多少,该脚本就不对了。具体如下:
1root@test:#./check_ps.sh -p nginx
2OK - Process: nginx, User: root, CPU: 0.0%, RAM: 0.7%, Start: Oct17, CPU Time: 7 min | 'cpu'=0.0 'memory'=0.7 'cputime'=7
3root@test:#ps auxf|grep nginx
4root 12732 0.0 0.0 103244 796 pts/1 S+ 10:30 0:00 _ grep nginx
5root 31302 0.0 0.7 70836 27992 ? Ss Oct17 0:07 nginx: master process /App/nginx/sbin/nginx -c /App/nginx/conf/nginx.conf
6www 32610 2.6 1.4 96804 55452 ? S Dec07 88:56 _ nginx: worker process
7www 32611 2.5 1.4 96804 55508 ? S Dec07 87:17 _ nginx: worker process
8www 32612 2.8 1.4 96276 55108 ? S Dec07 96:41 _ nginx: worker process
9www 32613 2.7 1.4 96980 55800 ? S Dec07 92:02 _ nginx: worker process
10root@test:#./check_ps.sh -p 'nginx: worker process'
11OK - Process: nginx: worker process, User: www, CPU: 2.6%, RAM: 1.4%, Start: Dec07, CPU Time: 5338 min | 'cpu'=2.6 'memory'=1.4 'cputime'=5338
不过既然是一个shell脚本,想要实现上面对类nginx这样的情况的监控,稍加修改脚本即可实现。只不过多一个sum求和操作,在此不再赘述。
捐赠本站(Donate)
如您感觉文章有用,可扫码捐赠本站!(If the article useful, you can scan the QR code to donate))
- Author: shisekong
- Link: https://blog.361way.com/nagios-check-processes/2873.html
- License: This work is under a 知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议. Kindly fulfill the requirements of the aforementioned License when adapting or creating a derivative of this work.