nagios监控postfix队列
前几天公司的邮件系统被内部的一个员工搞的不能收发邮件,原因是因为其在java程序中加入了告警设置。出现告警后会由公司的邮箱向他的163邮箱发送告警邮件。不想程序写的不严谨,出现了死循环。搞的不停的向163邮箱发邮件,163的反垃圾机制过滤后。所有的邮件发不出去,只能排在队列里。等发现时,发现队列里已经有15万多个在等待发送的邮件了。结果是:公司老大很生气,邮件系统出问题了都没人知道。为什么不加入nagios监控里。
老大发话了,咱也只能屁颠屁颠的去办了。本来想自己写个插件,不过有现成的为什么不用呢,先去了exchange.nagios.org里找监控postfix队列的插件。相关的插件有几个,具体可以查看页面:http://exchange.nagios.org/index.php?option=com_mtree&task=search&Itemid=74&searchword=postfix ,大致看了,都差不多。正好在搜索时从网上又发现了另外一个脚本,即监控发送内容的多少又监控队列的多少(说白了几个脚本不过是利用mailq和postqueue -p罢了)
1#!/bin/bash
2STATE_OK=0
3STATE_WARNING=1
4STATE_CRITICAL=2
5STATE_UNKNOWN=3
6
7#default options
8postfix_dir=/var/spool/postfix
9warning_active=100
10critical_active=2000
11warning_deferred=500
12critical_deferred=1000
13warning_other=1
14critical_other=100
15
16
17function usage {
18echo "$0 [-dir postfix_dir] [-wa warning_active] [-ca critical_active] [-wd warning_deferred] [-cd critical_deferred] [-wo warning_other] [-co critical_other]" 1>&2
19}
20
21
22if [ -z $# ]; then
23 echo "Error : need argument!" 1>&2
24 usage
25 exit $STATE_UNKNOWN
26fi
27
28while test -n "$1"; do
29 case "$1" in
30 --dir|-d ) postfix_dir=$2
31 shift;;
32 --wa|-w ) warning_active=$2
33 shift;;
34 --ca|-c ) critical_active=$2
35 shift;;
36 --wd ) warning_deferred=$2
37 shift;;
38 --cd ) critical_deferred=$2
39 shift;;
40 --wo ) warning_other=$2
41 shift;;
42 --co ) warning_other=$2
43 shift;;
44 *) echo "Wrong arguments!" 1>&2
45 usage
46 exit $STATE_UNKNOWN ;;
47 esac
48 shift
49done
50
51queue=$(/usr/bin/mailq | tail -n 1)
52# queue empty = ok
53if [ "$queue" == "Mail queue is empty" ] ; then
54 perfdata="'req'=0;;; 'size'=0KB;;; 'active'=0;$warning_active;$critical_active; 'bounce'=0;$warning_other;$warning_other; 'corrupt'=0;$warning_other;$warning_other; 'deferred'=0;$warning_deferred;$critical_deferred; 'maildrop'=0;$warning_other;$warning_other; "
55 output="$queue"
56 echo "OK - ${output} | ${perfdata}"
57 exit $STATE_OK
58else
59 queue_req=$(echo $queue | cut -d ' ' -f 5)
60 queue_size=$(echo $queue | cut -d ' ' -f 2) # in KB
61 queue_active=$(find $postfix_dir/active -type f | wc -l)
62 queue_bounce=$(find $postfix_dir/bounce -type f | wc -l)
63 queue_corrupt=$(find $postfix_dir/corrupt -type f | wc -l)
64 queue_deferred=$(find $postfix_dir/deferred -type f | wc -l)
65 queue_maildrop=$(find $postfix_dir/maildrop -type f | wc -l)
66 perfdata="'req'=$queue_req;;; 'size'=${queue_size}KB;;; 'active'=$queue_active;$warning_active;$critical_active; 'bounce'=$queue_bounce;$warning_other;$warning_other; 'corrupt'=$queue_corrupt;$warning_other;$warning_other; 'deferred'=$queue_deferred;$warning_deferred;$critical_deferred; 'maildrop'=$queue_maildrop;$warning_other;$warning_other; "
67fi
68
69#echo $perfdata
70#echo "postfix_dir $postfix_dir - warning_active $warning_active - critical_active $critical_active - warning_deferred $warning_deferred - critical_deferred $critical_deferred - warning_other $warning_other - critical_other $critical_other"
71
72returnCrit=0
73returnWarn=0
74errorString=""
75#Check critical and warning state for each queue
76if [ $queue_active -ge $critical_active ]; then
77 returnCrit=1
78 errorString="$errorString - CRIT $queue_active > $critical_active actives"
79elif [ $queue_active -ge $warning_active ]; then
80 returnWarn=1
81 errorString="$errorString - WARN $queue_active > $warning_active actives"
82fi
83if [ $queue_bounce -ge $critical_other ]; then
84 returnCrit=1
85 errorString="$errorString - CRIT $queue_bounce > $critical_other bounce"
86elif [ $queue_bounce -ge $warning_other ]; then
87 returnWarn=1
88 errorString="$errorString - CRIT $queue_bounce > $warning_other bounce"
89fi
90if [ $queue_corrupt -ge $critical_other ]; then
91 returnCrit=1
92 errorString="$errorString - CRIT $queue_corrupt > $critical_other corrupt"
93elif [ $queue_corrupt -ge $warning_other ]; then
94 returnWarn=1
95 errorString="$errorString - WARN $queue_corrupt > $warning_other corrupt"
96fi
97if [ $queue_deferred -ge $critical_deferred ]; then
98 returnCrit=1
99 errorString="$errorString - CRIT $queue_deferred > $critical_deferred deferred"
100elif [ $queue_deferred -ge $warning_deferred ]; then
101 returnWarn=1
102 errorString="$errorString - WARN $queue_deferred > $warning_deferred deferred"
103fi
104if [ $queue_maildrop -ge $critical_other ]; then
105 returnCrit=1
106 errorString="$errorString - CRIT $queue_maildrop > $critical_other maildrop"
107elif [ $queue_maildrop -ge $warning_other ]; then
108 returnWarn=1
109 errorString="$errorString - WARN $queue_maildrop > $warning_other maildrop"
110fi
111
112output="$queue_req request(s) ($queue_size kB)"
113if [ $returnCrit == 0 ] && [ $returnWarn == 0 ] ; then
114 echo "OK - ${output} | ${perfdata}"
115 returnCode=$STATE_OK
116elif [ $returnCrit == 0 ] && [ $returnWarn == 1 ] ; then
117 echo "WARNING - ${output} ${errorString} | ${perfdata}"
118 returnCode=$STATE_WARNING
119else
120 echo "CRITICAL - ${output} ${errorString} | ${perfdata}"
121 returnCode=$STATE_CRITICAL
122fi
123
124exit $returnCode
注:脚本刚拿来用时,是有问题的,我把其中出问题的部分的判断已经改好了,可以直接拿走使用。
接着修改邮件服务器的nrpe.cfg文件,增加如下command监控:
1command[check_postque]=/App/nagios/libexec/check_postque -w 50-c 100 -W 3000000 -C 5000000 -p postfix
2#队列数大于50告警,100严重告警。邮件大小总计300M告警,邮件大小总计500M严重告警
注:我的nagios的程序是安装在/App/nagios目录的,如果你安装的其他目录,上面的command中的路径也需要做相应的修改,不然会出问题的。
然后,kill掉nrpe进程,并重新启动/App/nagios/bin/nrpe -c /App/nagios/etc/nrpe.cfg -d 在nagios中心主控端也需要添加相应的一条监控service
1define service{
2 use local-service,srv-pnp
3 host_name XXX.XX.XX.XX
4 service_description check_postque
5 check_command check_nrpe!check_postque
6 }
我上面把自己的IP给改成了XXX,具体改成自己的就好了,use也需要nagios之前的配置改。操作完成后,在主控端通过/App/nagios/bin/nagios -v /App/nagios/etc/nagios.cfg查看是不是配置文件有问题,如果没问题就可以进入/etc/init.d目前通过./nagios reload重新加载配置文件了。
如果有问题,也可以通过在主控端通过
./check_nrpe -H XXX.XX.XX.XX -c check_postque查看是不是有输出来检测nrpe通信是不是有问题。
我这边因为没有邮件在发送,所以得到的结果是OK : Mail queue is empty。
捐赠本站(Donate)
如您感觉文章有用,可扫码捐赠本站!(If the article useful, you can scan the QR code to donate))
- Author: shisekong
- Link: https://blog.361way.com/nagios-postfix/1248.html
- License: This work is under a 知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议. Kindly fulfill the requirements of the aforementioned License when adapting or creating a derivative of this work.