cacti对流量的监控非常强大,但其他方面的监控能力相比nagios略有不足。而大多IT企业里的开源监控系统以nagios居多,而nagios上也有相应的配套流量插件 —— pnp4nagios 。不过同大多数流量监控绘图软件实现方式一样,大多是通过SNMP协议获取数据,存储为rrdtool 格式。SNMP协议确实十分强大,不过nagios用户使用nrpe的应该更多些。当然,配合nagios使用时,两者之间并不冲突。难道就没办法通过check_nrpe方式配合pnp4nagios实现绘图了吗? 当然是可以的。

通过一段时间内ifconfig获取的流量值求差并求除,计算出平均大小,然后按pnp4nagios所要求的格式输出即可。具体脚本如下:

 1#!/bin/sh
 2########################################################
 3#                                                      #
 4#         www.361way.com                               #
 5# Useage: check_traffic -i Interface -w warn -c cirt   #
 6#                                                      #
 7########################################################
 8while getopts ":i:c:w:h" optname
 9  do
10    case "$optname" in
11      "i")
12        INT=$OPTARG
13        ;;
14      "c")
15        CIRT=$OPTARG
16        ;;
17      "w")
18        WARN=$OPTARG
19        ;;
20      "h")
21        echo "Useage: check_traffic -i Interface -w warn -c cirt"
22        exit
23        ;;
24      "?")
25        echo "Unknown option $OPTARG"
26        exit
27        ;;
28      ":")
29        echo "No argument value for option $OPTARG"
30        exit
31        ;;
32      *)
33      # Should not occur
34        echo "Unknown error while processing options"
35        exit
36        ;;
37    esac
38  done
39[ -z $INT ]&& echo "Please input Device!"&&exit
40ifconfig $INT >/dev/null 2>&1
41[ $? -ne 0 ] && echo "error: no device $INT" && exit || DEVICE=$INT
42[ -z $WARN ] && WARN=1048576
43[ -z $CIRT ] && CIRT=2097152
44DIR=/App/nagios/tmp
45FILE=$DIR/.network-$DEVICE.tmp
46[ -e $DIR ] || mkdir -p $DIR
47chown -R nagios.nagios $DIR
48[ -e $FILE ] || >$FILE
49if [ `cat /App/nagios/tmp/.network-$DEVICE.tmp | wc -c` -eq 0 ];then
50        echo -en `date +%s`"t" >$FILE
51        echo -en `ifconfig $DEVICE | grep "RX bytes" | awk '{print $2}' | awk -F: '{print $NF}'`"t" >>$FILE
52        echo `ifconfig $DEVICE | grep "RX bytes" | awk '{print $6}' | awk -F: '{print $NF}'`>>$FILE
53        echo "This is first run"
54else
55        New_Time=`date +%s`
56        New_In=`ifconfig $DEVICE | grep "RX bytes" | awk '{print $2}' | awk -F: '{print $NF}'`
57        New_Out=`ifconfig $DEVICE | grep "RX bytes" | awk '{print $6}' | awk -F: '{print $NF}'`
58        Old_Time=`cat $FILE | awk '{print $1}'`
59        Old_In=`cat $FILE | awk '{print $2}'`
60        Old_Out=`cat $FILE | awk '{print $3}'`
61        Diff_Time=`echo "$New_Time-$Old_Time"|bc`
62        [ $Diff_Time -le 5 ] && echo "less 5s" && exit
63        Diff_In=`echo "scale=0;($New_In-$Old_In)*8/$Diff_Time"|bc`
64        Diff_Out=`echo "scale=0;($New_Out-$Old_Out)*8/$Diff_Time"|bc`
65        [ $Diff_In -le 0 ] && Diff_In=`cat $FILE | awk '{print $4}'`
66        [ $Diff_Out -le 0 ] && Diff_Out=`cat $FILE | awk '{print $5}'`
67        echo "$New_Time $New_In $New_Out $Diff_In $Diff_Out" >$FILE
68        if [ $Diff_In -gt $CIRT -o $Diff_In -eq $CIRT ];then
69                echo -e "CIRT - $Diff_In|In=${Diff_In};${WARN};${CIRT};0;0;Out=${Diff_Out};${WARN};${CIRT};0;0"
70                exit 2
71        fi
72        if [ $Diff_In -gt $WARN -o $Diff_In -eq $WARN ];then
73                echo -e "WARN - $Diff_In|In=${Diff_In};${WARN};${CIRT};0;0;Out=${Diff_Out};${WARN};${CIRT};0;0"
74                exit 1
75        fi
76        if [ $Diff_In -lt $WARN ];then
77                echo -e "OK - $Diff_In|In=${Diff_In};${WARN};${CIRT};0;0;Out=${Diff_Out};${WARN};${CIRT};0;0"
78                exit 0
79        fi
80fi

另外我们也可以从cat /proc/net/dev的结果里进行分析,具体可以根据下面的脚本修改下,和上面ifconfig计算平均流量的原理都是一样的:

 1#!/bin/bash
 2echo -n "which nic?"
 3read eth
 4echo "the nic is "$eth
 5echo -n "how much seconds:"
 6read sec
 7echo "duration is "$sec" seconds, wait please..."
 8infirst=$(awk '/'$eth'/{print $1 }' /proc/net/dev |sed 's/'$eth'://')
 9outfirst=$(awk '/'$eth'/{print $10 }' /proc/net/dev)
10sumfirst=$(($infirst+$outfirst))
11sleep $sec"s"
12inend=$(awk '/'$eth'/{print $1 }' /proc/net/dev |sed 's/'$eth'://')
13outend=$(awk '/'$eth'/{print $10 }' /proc/net/dev)
14sumend=$(($inend+$outend))
15sum=$(($sumend-$sumfirst))
16echo $sec" seconds total :"$sum"bytes"
17aver=$(($sum/$sec))
18echo "avrage :"$aver"bytes/sec"

注:第二个脚本获取的结果都是以bytes为单位的,即B 。和iptraf等流量监控工具获取到的结果是有8倍的差距的 ,iftraf 等工具获取的结果是以bit为单位的,即b 。如kb/s 、Mb/s 。为了同一般IDC公司所谓流量统一,所以我将第一个脚本里的结果转化也了bit

pnp4nagios所使用的流量监控模板为:

<pre class="prettyprint lang-php"><?php $opt[1] = "--vertical-label bits/s --title "Traffic for $hostname / $servicedesc" ";
$colors = array(
       'red'=?> '#FF0000',
       'green' => '#00FF00',
       'blue' => '#0000FF',
       'yellow' => '#FFFF00',
       'black' => '#000000',
       'deepred' => '#330000',
        );
$def[1] =  "DEF:var1=$rrdfile:$DS[1]:AVERAGE " ;
$def[1] .= "DEF:var2=$rrdfile:$DS[2]:AVERAGE " ;
$def[1] .= "HRULE:$WARN[1]#FFFF00 ";
$def[1] .= "HRULE:$CRIT[1]#FF0000 ";
$def[1] .= "AREA:var1$colors[green]:"In " " ;
$def[1] .= "GPRINT:var1:LAST:"%6.2lf last" " ;
$def[1] .= "GPRINT:var1:AVERAGE:"%6.2lf avg" " ;
$def[1] .= "GPRINT:var1:MAX:"%6.2lf maxn" ";
$def[1] .= "LINE:var2$colors[blue]:"Out " " ;
$def[1] .= "GPRINT:var2:LAST:"%6.2lf last" " ;
$def[1] .= "GPRINT:var2:AVERAGE:"%6.2lf avg" " ;
$def[1] .= "GPRINT:var2:MAX:"%6.2lf Totaln" " ;
/*
$def[1] .= "CDEF:total=var1,var2,+ " ;
$def[1] .= "LINE1:total$colors[black]:"Total " " ;
*/
?>

最后,得到的监控结果如下图:

check_traffic_pnp4nagios

另外从http://exchange.nagios.org/ 站点上还发现有一个号称也是通过check_nrpe实现,可以达到上面效果的一个插件。不过其用的是另外一个模板。具体可以参看nagios上的相关页面:http://exchange.nagios.org/directory/Plugins/Network-Connections%2C-Stats-and-Bandwidth/check_iftraffic_nrpe/details 。有兴趣的可以试下效果 。