单pod里多prometheus exporter端口监控
一、背景
一同事遇到客户在使用华为云CCE时,在一个pod里运行有多个进程,分别需要使用对应的prometheus exporter监控对应的数据。如:pod里同时运行的有nginx、mysql、php,三者都需要配置prometheus监控,在ECS虚拟机上部署是比较简单的,直接运行多个exporter程序,并在prometheus端进行配置就行了,不过k8s里会略有一些变化。
实现思路:有两种实现方法。
- 再运行一个进程或sidecar容器,该容器会将所有的exporter进行聚合处理。如 exporter-merger
- 另外一种就是硬编码在 prometheus 的 annotations 声明里。
方法2实现起来会相对复杂,方法1会有额外的资源开销,但实现起来比较简单。
二、制作测试镜像
对应的文件内容如下:
1[root@ecs-82f5]~/make# ls
2create_mysql_user.sql Dockerfile nginx_status.conf prometheus-mysqld-exporter start.sh
Dockerfile文件内容如下:
1[root@ecs-82f5]~/make# more Dockerfile
2FROM ubuntu:latest
3RUN apt-get update \
4 && apt-get -y install nginx prometheus-nginx-exporter mysql-server prometheus-mysqld-exporter
5COPY nginx_status.conf /etc/nginx/sites-enabled/nginx_status.conf
6COPY prometheus-mysqld-exporter /etc/default/prometheus-mysqld-exporter
7COPY create_mysql_user.sql /tmp/create_mysql_user.sql
8COPY start.sh /opt/start.sh
9EXPOSE 80 9113 9104
10ENTRYPOINT ["/bin/bash","/opt/start.sh"]
11#ENTRYPOINT ["/bin/bash"]
对应的启动文件内容如下:
1[root@ecs-82f5]~/make# cat start.sh
2#!/bin/bash
3# -------------------------------
4set -e
5service nginx start
6/etc/init.d/prometheus-nginx-exporter start
7#/usr/bin/prometheus-nginx-exporter &
8#mkdir -p /nonexistent
9service mysql start
10# 下行一定要放后台执行,不然重启或stop以后start会报错
11mysql
nginx.conf文件内容如下:
1[root@ecs-82f5]~/make# cat nginx_status.conf
2server {
3 listen 8080;
4 server_name localhost;
5 location /stub_status {
6 stub_status on;
7 access_log off;
8 }
9}
创建用户的SQL文件内容如下:
1[root@ecs-82f5]~/make# cat create_mysql_user.sql
2CREATE USER prometheus@localhost IDENTIFIED BY 'StrongPassword';
3GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO prometheus@localhost;
4FLUSH PRIVILEGES;
mysql exporter这里选择的是以sock方式配置,当然也可以选择使用用户名密码方式进行配置,不过选择密码方式,如果密码修改了,对应的配置文件里的信息也需要同步修改。
1[root@ecs-82f5]~/make# cat prometheus-mysqld-exporter
2ARGS=""
3### Database authentication
4#
5# By default the DATABASE connection string will be read from
6# the file specified with the -config.my-cnf parameter. For example:
7# ARGS='--config.my-cnf /etc/mysql/debian.cnf'
8#
9# Note that SSL options can only be set using a cnf file.
10# To set a connection string from the environment instead, set the
11# DATA_SOURCE_NAME variable.
12# To use UNIX domain sockets authentication with or without password:
13# DATA_SOURCE_NAME="prometheus:nopassword@unix(/run/mysqld/mysqld.sock)/"
14DATA_SOURCE_NAME="prometheus@unix(/run/mysqld/mysqld.sock)/"
15# To use a TCP connection and password authentication:
16# DATA_SOURCE_NAME="prometheus:password@(hostname:port)/dbname"
编译完成后,运行容器,对应的容器运行信息如下:
1[root@ecs-82f5]~/make# docker ps
2CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3ec003b9ed2df 6c6463dd846d "/bin/bash /opt/star…" 5 hours ago Up 3 seconds 80/tcp, 9104/tcp, 9113/tcp wonderful_diffie
通过对应的IP+ 端口可以查看到监控信息:
1curl http://172.16.0.135:9104/metrics
2curl http://172.16.0.135:9113/metrics
三、CCE里运行并配合prometheus监控
上传SWR的步骤略过,因为比较简单,直接来看应用部署的yaml文件:
1kind: Deployment
2apiVersion: apps/v1
3metadata:
4 name: mysqlnginx
5 namespace: default
6 generation: 4
7 labels:
8 appgroup: ''
9 version: v1
10 annotations:
11 deployment.kubernetes.io/revision: '4'
12spec:
13 replicas: 1
14 selector:
15 matchLabels:
16 app: mysqlnginx
17 version: v1
18 template:
19 metadata:
20 creationTimestamp: null
21 labels:
22 app: mysqlnginx
23 version: v1
24 annotations:
25 metrics.alpha.kubernetes.io/custom-endpoints: '[{"api":"prometheus","path":"","port":"","names":""}]'
26 prometheus.io/mypath: /metrics
27 prometheus.io/myport: '9104'
28 prometheus.io/path: /metrics
29 prometheus.io/port: '9113'
30 prometheus.io/scrape: 'true'
31 spec:
32 containers:
33 - name: container-0
34 image: 'swr.la-north-2.myhuaweicloud.com/app1/nginxmysql:exporter'
35 resources:
36 limits:
37 cpu: '2'
38 memory: 2Gi
39 requests:
40 cpu: 250m
41 memory: 512Mi
42 terminationMessagePath: /dev/termination-log
43 terminationMessagePolicy: File
44 imagePullPolicy: IfNotPresent
45 restartPolicy: Always
46 terminationGracePeriodSeconds: 30
47 dnsPolicy: ClusterFirst
48 securityContext: {}
49 imagePullSecrets:
50 - name: default-secret
51 affinity: {}
52 schedulerName: default-scheduler
53 tolerations:
54 - key: node.kubernetes.io/not-ready
55 operator: Exists
56 effect: NoExecute
57 tolerationSeconds: 300
58 - key: node.kubernetes.io/unreachable
59 operator: Exists
60 effect: NoExecute
61 tolerationSeconds: 300
62 dnsConfig:
63 options:
64 - name: timeout
65 value: ''
66 - name: ndots
67 value: '5'
68 - name: single-request-reopen
69 strategy:
70 type: RollingUpdate
71 rollingUpdate:
72 maxUnavailable: 1
73 maxSurge: 0
74 revisionHistoryLimit: 10
75 progressDeadlineSeconds: 600
注意这里的prometheus.io部分,这里有两个端口,可以写多个,如果对应的path路径不一样,也可以写多个。
接下来修改prometheus的configmap文件内容,由于CCE里有插件prometheus,安装完后,可以通在configmap的最后增加如下部分:
1- job_name: kubernetes-pods1
2 honor_timestamps: true
3 scrape_interval: 15s
4 scrape_timeout: 10s
5 metrics_path: /metrics
6 scheme: http
7 kubernetes_sd_configs:
8 - role: pod
9 tls_config:
10 insecure_skip_verify: true
11 relabel_configs:
12 - source_labels: [__meta_kubernetes_pod_label_app]
13 separator: ;
14 regex: nginx-mysql //注意这里使用的是CCE里创建时使用的名称
15 replacement: $1
16 action: keep
17 - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_mypath]
18 separator: ;
19 regex: (.+)
20 target_label: __metrics_path__
21 replacement: $1
22 action: replace
23 - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_myport]
24 separator: ;
25 regex: ([^:]+)(?::\d+)?;(\d+)
26 target_label: __address__
27 replacement: $1:$2
28 action: replace
29 - source_labels: [__meta_kubernetes_pod_name]
30 separator: ;
31 regex: (.+)
32 replacement: $1
33 action: drop
34 metric_relabel_configs:
35 - source_labels: [__name__]
36 separator: ;
37 regex: kube_node_labels
38 replacement: $1
39 action: drop
这里可以看到,我加了prometheus_io_myport部分。这个区配信息使用了精准匹配,如果不使用精准匹配会匹配上的更多,可以直接使用kubernetes-pods的策略复制,只修改mypath和myport部分对比查看prometheus上对应的 targets、Configuration
的变化。
有用的信息可以通过下图中的信息获取:
这部分也可以参考github上的讨论:https://github.com/prometheus/prometheus/issues/3756
需要注意的是configmap文件修改后,并未直接生效,这点感觉有点不正常。因为默认华为prometheus插件配置的有自动加载程序:
1[root@qqqq-84409 ~]# kubectl exec -it prometheus-0 -n monitoring -- /bin/bash
2Defaulted container "prometheus-server-configmap-reload" out of: prometheus-server-configmap-reload, prometheus-server
3OCI runtime exec failed: exec failed: container_linux.go:330: starting container process caused "exec: \"/bin/bash\": stat /bin/bash: no such file or directory": unknown
4command terminated with exit code 126
这里是sidecar方式运行的prometheus-server,这里我特意没有指定container。可以看到有两个containers — prometheus-server-configmap-reload, prometheus-server
。
因为更新configmap信息后,通过以下命令查看日志并未查看到信息更新:
1[root@qqqq-84409 ~]# kubectl logs -f prometheus-0 -c prometheus-server -n monitoring
不过这可难不道老运维,参看我之前的博文: 如何重载Prometheus配置
通过以下两种任一种方式都可以重载配置:
1curl -X POST IP:9090/-/reload
2kill -HUP prometheus-server-pid
重载完配置,可以通过prometheus的web管理界面里的config(status下)确认配置是否已更新生效。然后再在target里确认两个端口对应的信息都已获取到:
四、最后
以上的配置方法除了适用于默认单Pod里单container(里面多运行多进程)之外,也适用于单Pod里多containers(大于等于2)的情况,两种情况我都进行了测试,发现都可以正常使用,不过自定义配置文件的方法相对复杂,技术门槛要求高,通过插件聚合的方式实现起来简单一些。
捐赠本站(Donate)
如您感觉文章有用,可扫码捐赠本站!(If the article useful, you can scan the QR code to donate))
- Author: shisekong
- Link: https://blog.361way.com/mulit-port-prometheus-pod/6797.html
- License: This work is under a 知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议. Kindly fulfill the requirements of the aforementioned License when adapting or creating a derivative of this work.