一、需求(requirement)

华为云CCE提供了prometheus插件,可以很方便的进行安装,不过其没有提供grafana的安装插件。同时一些情况下,想通过自定义的方式安装,这篇内容就记录下如何使用 helm 在k8s CCE上进行prometheus + grafana的安装。

HUAWEI CLOUD CCE provides the prometheus Add-ons, which can be easily installed, but it does not provide the grafana installation plug-in. At the same time, in some cases, if you want to install in a custom way, this article records how to use helm to install prometheus + grafana on k8s CCE.

二、helm环境准备(helm environment preparation)

helm可以安装在任一台可以管理cce k8s集群的主机上。也可以参考huaweicloud的官方指导:https://support.huaweicloud.com/intl/en-us/usermanual-cce/cce_10_0144.html 。

Helm can be installed on any host that can manage the cce k8s cluster. You can also refer to the official guidance of huaweicloud: https://support.huaweicloud.com/intl/en-us/usermanual-cce/cce_10_0144.html .

安装操作指令如下(The installation commands are as follows):

 1# configure kubeconfig
 2[root@ccetest-87180 .kube]# mv /tmp/kubeconfig.json config
 3[root@ccetest-87180 .kube]# ll
 4total 8
 5-rw------- 1 root root 5759 Aug 29 22:26 config
 6[root@ccetest-87180 .kube]# kubectl get nodes
 7NAME            STATUS   ROLES    AGE   VERSION
 8192.168.0.249   Ready    <none>   16m   v1.19.16-r1-CCE22.5.1
 9
10# install helm
11wget https://get.helm.sh/helm-v3.3.0-linux-amd64.tar.gz
12tar -xzvf helm-v3.3.0-linux-amd64.tar.gz
13mv linux-amd64/helm /usr/local/bin/helm
14helm version

三、安装Prometheus套件(Install the Prometheus suite)

prometheus-community提供了套件kube-prometheus-stack,包含prometheus、grafana、alertmanager软件。可以很方便的通过helm完成安装。

prometheus-community provides the suite kube-prometheus-stack, which contains prometheus, grafana, and alertmanager software. Installation can be easily done through helm.

 1[root@ccetest-87180 ~]# helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
 2"prometheus-community" has been added to your repositories
 3[root@ccetest-87180 ~]# helm repo update
 4Hang tight while we grab the latest from your chart repositories...
 5...Successfully got an update from the "prometheus-community" chart repository
 6Update Complete. ⎈ Happy Helming!⎈
 7[root@ccetest-87180 ~]# helm install prometheus prometheus-community/kube-prometheus-stack
 8NAME: prometheus
 9LAST DEPLOYED: Mon Aug 29 22:30:15 2022
10NAMESPACE: default
11STATUS: deployed
12REVISION: 1
13NOTES:
14kube-prometheus-stack has been installed. Check its status by running:
15  kubectl --namespace default get pods -l "release=prometheus"
16
17Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
18[root@ccetest-87180 ~]# kubectl get svc
19NAME                                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
20alertmanager-operated                     ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   1s
21kubernetes                                ClusterIP   10.247.0.1       <none>        443/TCP                      21m
22prometheus-grafana                        ClusterIP   10.247.193.101   <none>        80/TCP                       26s
23prometheus-kube-prometheus-alertmanager   ClusterIP   10.247.105.191   <none>        9093/TCP                     26s
24prometheus-kube-prometheus-operator       ClusterIP   10.247.198.72    <none>        443/TCP                      26s
25prometheus-kube-prometheus-prometheus     ClusterIP   10.247.28.103    <none>        9090/TCP                     26s
26prometheus-kube-state-metrics             ClusterIP   10.247.125.106   <none>        8080/TCP                     26s
27prometheus-operated                       ClusterIP   None             <none>        9090/TCP                     1s
28prometheus-prometheus-node-exporter       ClusterIP   10.247.104.255   <none>        9100/TCP                     26s
29
30[root@ccetest-87180 ~]# kubectl expose deployment prometheus-grafana --target-port=3000 --type=NodePort --name=prometheus-grafana-ext
31or
32kubectl port-forward deployment/prometheus-grafana 3000
33username: admin
34password: prom-operator
35
36[root@ccetest-87180 ~]# vim values.yaml
37[root@ccetest-87180 ~]# helm upgrade --install prometheus prometheus-community/kube-prometheus-stack -f values.yaml
38Release "prometheus" has been upgraded. Happy Helming!
39NAME: prometheus
40LAST DEPLOYED: Tue Aug 30 01:16:25 2022
41NAMESPACE: default
42STATUS: deployed
43REVISION: 2
44NOTES:
45kube-prometheus-stack has been installed. Check its status by running:
46  kubectl --namespace default get pods -l "release=prometheus"
47
48Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.

这里只是安装prometheus套件,如果需要安装其他软件,也可以使用helm仓库里的charts安装,命令如下(This is just to install the prometheus suite. If you need to install other software, you can also use the charts in the helm repository to install, the command is as follows):

1helm repo add stable https://charts.helm.sh/stable
2helm search repo software-name
3
4# list install software and uninstall software
5helm list
6helm uninstall software-name
7
8# list the repos
9helm repo list

完成安装后,可以使用kubectl get svc查询到的grafana对应的IP + 端口进行访问(After the installation is complete, you can use the IP + port corresponding to grafana queried by kubectl get svc to access)。

cce-prometheus
cce-prometheus-grafana

四、使用EVS或SFStrubo(Use EVS or SFStrubo)

按照上面的默认安装,在k8s中prometheus、grafana、altermanager的数据存储类型是EmptyDir,这样就导致容器出现异常重建时,之前的数据就不存在了。所以要做数据的持久化,可以选择使用Huaweicloud的EVS、OBS、SFS三个类型的存储服务,不过由于OBS对象存储速度较慢,不太适合该场景,这里就排除掉了。

According to the default installation above, the data storage type of prometheus, grafana and altermanager in k8s is EmptyDir, so that when the container is rebuilt abnormally, the previous data does not exist. Therefore, for data persistence, you can choose to use three types of Huaweicloud storage services: EVS, OBS, and SFS. However, because OBS object storage is slow, it is not suitable for this scenario, so it is excluded here.

由于使用数据存储方式的定义是在values里定义的,所以可以先通过 helm inspect values 指令获取当前的默认配置,在修改里面的相关配置后,通过Helm进行安装或更新。

Since the definition of the data storage method is defined in values, you can first obtain the current default configuration through the helm inspect values command, and then install or update it through Helm after modifying the relevant configuration in it.

1[root@ccetest-87180 ~]# helm inspect values prometheus-community/prometheus > prometheus.values.yaml
2[root@ccetest-87180 ~]# helm install prometheus prometheus-community/kube-prometheus-stack --values /root/prometheus.values.yaml

1. 使用EVS存储数据(Storing data with EVS)

在使用prometheus.values.yaml部署之前,可以修改其中的三个服务的存储部分的定义,具体修改内容如下(Before deploying with prometheus.values.yaml, you can modify the definition of the storage part of the three services. The specific modifications are as follows):

 1## Prometheus StorageSpec for persistent data
 2## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/storage.md
 3##
 4#storageSpec: {}
 5storageSpec:
 6## Using PersistentVolumeClaim
 7##
 8    volumeClaimTemplate:
 9    spec:
10        storageClassName: csi-disk
11        accessModes: ["ReadWriteOnce"]
12        resources:
13        requests:
14            storage: 80Gi
15#    selector: {}

在使用EVS存储时,不需要提前创建PV\PVC,使用该配置后,系统会自动创建。(When using EVS storage, you do not need to create PV\PVC in advance. After using this configuration, the system will create it automatically.)

更新安装(upgrade install)

1[root@ccetest-87180 ~]# helm upgrade --install prometheus prometheus-community/kube-prometheus-stack -f values.yaml

2. 使用SFS共享存储数据(Store data using SFS shares)

使用SFS共享存储,需要提前创建好SFS、PV、PVC,再在values文件中更新相关内容。(To use SFS shared storage, you need to create SFS、PV and PVC in advance, and then update the relevant content in the values file.)

 1[root@ccetest-87180 ~]# cat pv-sfsturbo.yaml
 2apiVersion: v1
 3kind: PersistentVolume
 4metadata:
 5  name: pv-sfsturbo-example
 6  annotations:
 7    pv.kubernetes.io/provisioned-by: everest-csi-provisioner
 8spec:
 9  mountOptions:
10  - hard
11  - timeo=600
12  - nolock
13  accessModes:
14  - ReadWriteMany
15  capacity:
16    storage: 500Gi
17  claimRef:
18    apiVersion: v1
19    kind: PersistentVolumeClaim
20    name: prometheus-prometheus-kube-prometheus-prometheus-db-prometheus-prometheus-kube-prometheus-prometheus-0
21    namespace: default
22  csi:
23    driver: sfsturbo.csi.everest.io
24    fsType: nfs
25    volumeAttributes:
26      everest.io/share-export-location: 192.168.0.236:/
27      storage.kubernetes.io/csiProvisionerIdentity: everest-csi-provisioner
28    volumeHandle: 65aac901-875f-40fa-961f-40e50c5a46f8
29  persistentVolumeReclaimPolicy: Retain
30  storageClassName: csi-sfsturbo
31
32[root@ccetest-87180 ~]# cat pvc-sfsturbo.yaml
33apiVersion: v1
34kind: PersistentVolumeClaim
35metadata:
36  annotations:
37    volume.beta.kubernetes.io/storage-provisioner: everest-csi-provisioner
38  name: prometheus-prometheus-kube-prometheus-prometheus-db-prometheus-prometheus-kube-prometheus-prometheus-0
39  namespace: default
40spec:
41  accessModes:
42  - ReadWriteMany
43  resources:
44    requests:
45      storage: 500Gi
46  storageClassName: csi-sfsturbo
47  volumeName: pv-sfsturbo-example
48
49[root@ccetest-87180 ~]# kubectl create -f pv-sfsturbo.yaml -f pvc-sfsturbo.yaml

对应的values文件中的内容为(The content in the corresponding values file is):

 1## Prometheus StorageSpec for persistent data
 2## ref: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/storage.md
 3##
 4#storageSpec: {}
 5storageSpec:
 6## Using PersistentVolumeClaim
 7##
 8    volumeClaimTemplate:
 9    spec:
10        storageClassName: csi-sfsturbo
11        volumeName: pv-sfsturbo-example
12        accessModes: ["ReadWriteOnce"]
13        resources:
14        requests:
15            storage: 500Gi
16#    selector: {}

同样的可以使用如下命令安装或更新(The same can be installed or updated using the following commands):

1[root@ccetest-87180 ~]# helm install prometheus prometheus-community/kube-prometheus-stack --values /root/values.yaml
2or
3[root@ccetest-87180 ~]# helm upgrade --install prometheus prometheus-community/kube-prometheus-stack -f values.yaml

cloud-volume-sfs-cce
cloud-volume-sfs-cce

这部分可以参考官方文档:https://support.huaweicloud.com/intl/zh-cn/usermanual-cce/cce_01_0272.html

This part can refer to the official documentation: https://support.huaweicloud.com/intl/zh-cn/usermanual-cce/cce_01_0272.html

storageClassName的类型比较重要,每家云厂商关于这部分的定义是不同,具体可以使用以下指令确认相关信息。(The type of storageClassName is more important. Each cloud vendor defines this part differently. You can use the following commands to confirm the relevant information.)

1[root@ccetest-87180 ~]# kubectl get sc
2[root@ccetest-87180 ~]# kubectl get pv
3[root@ccetest-87180 ~]# kubectl get pvc