nagios模板、主机、服务与通知
监控的作用有两个:一是可以通过查看历史或当前了解主机一段时间内的运行情况、负载情况;一是在出现状况时及时发出通知,告知相关人员进行处理。这里主要说下后者。 在nagios的配置中,关于主机状态和服务状态通知的方式主要有三种调用方法,一是通过contacts或contact_groups;一是通过模板引用define contacts;一是通过define host模板引用。
本文主要为承接 nagios分组相关 这篇日志而写的。该文中最后提到nagios的配置引用方式非常灵活。这里就结合监控通知联系人的调用方式做一个说明。
一、联系人引用方法一(通过contacts或contact_groups)
先通过define contacts定义好通知人和通知方式,在主机或服务中的引用如下:
1define service{
2 use window-service #引用定义的服务模板
3 host_name jjh
4 service_description PING
5 check_command check_ping!100.0,20%!500.0,60%
6 contacts admin1 #需事先定义过
7 }
注:上面的use使用的是模板,对应我们经常说的templates.cfg中的内容。contacts引用的是contacts.cfg中的内容。
二、联系人引用方法二(通过模板引用define contacts)
1、先定义联系人
1define contact {
2 contact_name ZheJiang
3 use generic-contact #联系人中引用模板
4 alias ZheJiang_Mobile
5 service_notification_commands notify-service-by-email,notify-service-by-sms
6 email [email protected],[email protected]
7 pager "1366XXXXXXX,13819XXXXXX"
8 }
2、通过use引用
1define service{
2 use ZheJiang #引用联系人
3 host_name ZJ-ZJ-App
4 service_description CPU Load
5 low_flap_threshold 0
6 high_flap_threshold 0.999
7 check_command check_nrpe!check_load
8}
9define service {
10 use ZheJiang #引用联系人
11 host_name ZJ-ZJ-App
12 service_description Check_Disk
13 check_command check_nrpe!check_disk
14}
注:这里直接使用通过use使用了contact定义,use的作用类似于编程中的include ,就是把前面定义过的东西直接套过来用。而上面define的contact里又use了templates.cfg中的定义。templates.cfg一般会定义通知触发条件,时间周期等。
三、联系人引用方法三(通过define host模板引用)
这里提到的方法和方法二其实是个对调,就是先定义好联系人,再在templates.cfg中通过contacts或contact_groups调用联系人。而host-xxxx.cfg中再去引用templates.cfg中的模板。由于方法二中已经提到过contacts.cfg中联系人的定义,这里就省过。这里只列几个templates.cfg中的常见定义:
1#定义联系人模板
2define contact{
3 name generic-contact ; The name of this contact template
4 service_notification_period 24x7 ; service notifications can be sent anytime
5 host_notification_period 24x7 ; host notifications can be sent anytime
6 service_notification_options w,u,c,r,f,s ; #触发条件,下同
7 host_notification_options d,u,r,f,s ; send notifications for all host states, flapping events, and scheduled downtime events
8 service_notification_commands notify-service-by-email ; send service notifications via email
9 host_notification_commands notify-host-by-email ; send host notifications via email
10 register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL CONTACT, JUST A TEMPLATE!
11 }
12#定义主机模板
13define host{
14 name generic-host ; The name of this host template
15 notifications_enabled 1 ; Host notifications are enabled
16 event_handler_enabled 1 ; Host event handler is enabled
17 flap_detection_enabled 1 ; Flap detection is enabled
18 failure_prediction_enabled 1 ; Failure prediction is enabled
19 process_perf_data 1 ; Process performance data
20 retain_status_information 1 ; Retain status information across program restarts
21 retain_nonstatus_information 1 ; Retain non-status information across program restarts
22 notification_period 24x7 ; Send host notifications at any time
23 register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
24 }
25define host{
26 name JJH-server ; The name of this host template
27 use generic-host ; This template inherits other values from the generic-host template
28 check_period 24x7 ; By default, Linux hosts are checked round the clock
29 check_interval 5 ; Actively check the host every 5 minutes
30 retry_interval 1 ; Schedule host check retries at 1 minute intervals
31 max_check_attempts 10 ; Check each Linux host 10 times (max)
32 check_command check-host-alive ; Default command to check Linux hosts
33 notification_period workhours
34 notification_interval 120 ; Resend notifications every 2 hours
35 notification_options d,u,r ; Only send notifications for specific host states
36 contact_groups admins-jjh #包含引用联系人组
37 hostgroups JJH-servers
38 register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
39 }
40#定义服务模板
41define service{
42 name generic-service ; The 'name' of this service template
43 active_checks_enabled 1 ; Active service checks are enabled
44 passive_checks_enabled 1 ; Passive service checks are enabled/accepted
45 parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
46 obsess_over_service 1 ; We should obsess over this service (if necessary)
47 check_freshness 0 ; Default is to NOT check service 'freshness'
48 notifications_enabled 1 ; Service notifications are enabled
49 event_handler_enabled 1 ; Service event handler is enabled
50 flap_detection_enabled 1 ; Flap detection is enabled
51 failure_prediction_enabled 1 ; Failure prediction is enabled
52 process_perf_data 1 ; Process performance data
53 retain_status_information 1 ; Retain status information across program restarts
54 retain_nonstatus_information 1 ; Retain non-status information across program restarts
55 is_volatile 0 ; The service is not volatile
56 check_period 24x7 ; The service can be checked at any time of the day
57 max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state
58 normal_check_interval 10 ; Check the service every 10 minutes under normal conditions
59 retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be determined
60 contact_groups admins ; Notifications get sent out to everyone in the 'admins' group
61 notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events
62 notification_interval 60 ; Re-notify about service problems every hour
63 notification_period 24x7 ; Notifications can be sent out at any time
64 register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
65 }
66#通过将notifications_enabled设为0,关闭通知
67define service{
68 name no-notice-service ; The name of this service template
69 use generic-service ; Inherit default values from the generic-service definition
70 max_check_attempts 4 ; Re-check the service up to 4 times in order to determine its final (hard) state
71 normal_check_interval 5 ; Check the service every 5 minutes under normal conditions
72 notifications_enabled 0 ; Service notifications are enabled
73 event_handler_enabled 0
74 retry_check_interval 1 ; Re-check the service every minute until a hard state can be determined
75 register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
76 }
77#以下服务模板中指定了通知(联系人)组
78define service{
79 name windows-service ; The name of this service template
80 use generic-service ; Inherit default values from the generic-service definition
81 max_check_attempts 4 ; Re-check the service up to 4 times in order to determine its final (hard) state
82 normal_check_interval 5 ; Check the service every 5 minutes under normal conditions
83 retry_check_interval 1 ; Re-check the service every minute until a hard state can be determined
84 register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
85 contact_groups admins-win #包含引用联系人组
86 }
87define service{
88 name JJH-service ; The name of this service template
89 use generic-service ; Inherit default values from the generic-service definition
90 max_check_attempts 4 ; Re-check the service up to 4 times in order to determine its final (hard) state
91 normal_check_interval 5 ; Check the service every 5 minutes under normal conditions
92 retry_check_interval 1 ; Re-check the service every minute until a hard state can be determined
93 register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
94 contact_groups admins-jjh #包含引用联系人组
95 }
注:以上模板的书写可以看到非常灵活,可以设置是否通知,联系人组,通知频率,触发条件等。模板的书写为了以后在xxxhost.cfg中use引用方便,简少书写的内容。这又类似于编程中的变量。
而在比如361way.cfg之样的主机中引用模板时如下:
1#模板引用
2define host{
3 use JJH-server #使用模板
4 host_name jjh-cc
5 parents aliyun
6 statusmap_image linux40.gd2
7 alias jjh-cc
8 address 115.29.161.54
9 notification_interval 0
10 process_perf_data 1
11 action_url /pnp4nagios/graph?host=$HOSTNAME$
12 }
13define service{
14 use JJH-service,srv-pnp ; Name of service template to use
15 host_name jjh-cc
16 service_description PING
17 check_command check_ping!100.0,20%!500.0,60%
18 }
19define service{
20 use JJH-service,srv-pnp
21 host_name jjh-cc
22 service_description check_cpu
23 check_command check_nrpe!check_cpu
24 }
四、总结
以上主要通过示例试图说明白nagios内contacts.cfg、templates.cfg、XXXhost.cfg之间的灵活引用关系。不过这里还省略了一个timeperiods.cfg (主要用于定义时间,例如工作或休息,中国时间和美国时间等通知的时间范围)。如果直接看上面的配置或我上面提到的三种方式可能会越看越迷糊,下面几句总结可能会对理解有所帮助。
1、从最笨的一思路出发,你在hostxxx.cfg中定义监控项时,可以直接加入service_notification_options、service_notification_period、notification_interval、notification_interval、contact_groups等参数。一样的可以实现你的监控通知需要。
2、为简化上面的笨方法,你将以上参数定义了一个变量,给其取了一个名字,在templates.cfg中做了定义,然后在hostxxx.cfg中通过use + name(template.cfg中定义的)的方式调用。ok,上面提到的参数都在模板中了,可以省略了。
3、联系人比较多时,不同的应用和主机要通知到不同的人,又取了一个contacts.cfg的文件,在其中对主要对通知人员做了定义和划分。无论是contacts use templates还是templates contact contact.cfg,最终不过是让其配置做了个汇总给hostxxx.cfg use 。
4、配置文件无论几个或者取什么名字等无所谓,如果你高兴,可以只设置一个配置文件。多个配置文件名的作用是便于区分,便于查找,简化工作。最终只要在nagios.cfg中include,nagios可以很多的做出处理。
5、define的作你就可以当做是定义变量,use的作用可以当作是引用变量或include配置文件。contacts、contact_groups这些都是nagios参数,可以看作系统内部函数。
参考页面:nagios在线手册
捐赠本站(Donate)
如您感觉文章有用,可扫码捐赠本站!(If the article useful, you can scan the QR code to donate))
- Author: shisekong
- Link: https://blog.361way.com/nagios-templates-contacts/2858.html
- License: This work is under a 知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议. Kindly fulfill the requirements of the aforementioned License when adapting or creating a derivative of this work.