In the following set of exercises StatefulSets are presented in a practical manner. The PostgreSQL [11] RDBMS is used as an example as the database is both widely known and of great utility to any developer. The goal of the exercises are not to build a production grade automation for PostgreSQL but to illustrate StatefulSet concepts.

Designing the StatefulSet

In order to create the PostgreSQL StatefulSet we proceed with the following steps:

  • Identify or build (a) container image(s)
  • Specify a headless Service
  • Specify a StatefulSet
  • Provision the Service and StatefulSet
  • Conduct simple experiments

Start with finding a container image.

The Container Image

An essential part of the StatefulSet for PostgreSQL is the database server itself. Luckily, there is no need to containerize PostgreSQL as this as already been done. Hence, use the official PostgreSQL Docker Image found at Docker Hub [1]. Pause this tutorial for a second and have a quick look at the description of the container image. This is where you’ll find how the container image can be parameterized – a major challenge when creating the PostgreSQL StatefulSet. In particular, this is where you find out how to set a proper password. Try to find the corresponding setting yourself so that you understand the structure of the Secret described in the next section.

Creating a Secret

Starting PostgreSQL requires an administrator password which will be stored as a Secret.

Create a PostgreSQL Secret containing the password for the admin user:

1kubectl create secret generic postgresql-secrets --from-literal=POSTGRES_PASSWORD=tes6Aev8

Verify its creation:

1kubectl describe secret postgresql-secrets

Creating a Service

In order to provide a stable network identity to address the PostgreSQL server create a headless services by creating the file 10-service.yaml:

 1apiVersion: v1
 2kind: Service
 3metadata:
 4  name: postgresql-svc
 5  labels:
 6    app: postgresql-a
 7spec:
 8  ports:
 9    - port: 5432
10      name: postgresql-port
11  clusterIP: None
12  selector:
13    app: postgresql-a

The attribute clusterIP: None of the Service specification denotes that this is a headless Service.

Create the Service:

1kubectl apply -f 10-service.yaml

Inspect the Service:

1kubectl describe service postgresql-svc

The output should be similar to:

 1Name:              postgresql-svc
 2Namespace:         default
 3Labels:            app=postgresql-a
 4Annotations:       Selector:  app=postgresql-a
 5Type:              ClusterIP
 6IP:                None
 7Port:              postgresql-port  5432/TCP
 8TargetPort:        5432/TCP
 9Endpoints:         172.17.0.5:5432
10Session Affinity:  None
11Events:            <none

Pay attention to the IP attribute.

Normally, a Kubernetes Service has a cluster-internal IP address as seen in the Service example of the ReplicaSet lesson. Requests to the Service IP are then load balanced across the Service endpoints, e.g. Pods binding to the Service by using matching Labels.

In contrast to a regular Service, a headless Service does not have a cluster IP address. This is why it is declared using the ClusterIP: None declaration.

So in contrast to a standard Service, a headless Service does not perform load balancing. Depending on the selectors defined for the Service cluster-internal DNS entries will be created.

Creating the StatefulSet

Now, with the preliminaries covered, the actual StatefulSet can be created in 30-stateful-set.yaml:

 1apiVersion: apps/v1
 2kind: StatefulSet
 3metadata:
 4  name: postgresql-sfs
 5spec:
 6  selector:
 7    matchLabels:
 8      app: postgresql-a # has to match .spec.template.metadata.labels
 9  serviceName: 'postgresql-svc'
10  replicas: 1 # by default is 1
11  template:
12    metadata:
13      labels:
14        app: postgresql-a # has to match .spec.selector.matchLabels
15    spec:
16      terminationGracePeriodSeconds: 10
17      containers:
18        - name: postgresql-db
19          image: postgres:14.5
20          env:
21            - name: POSTGRES_PASSWORD
22              valueFrom:
23                secretKeyRef:
24                  name: postgresql-secrets
25                  key: POSTGRES_PASSWORD
26          ports:
27            - containerPort: 5432
28              name: postgresql-port
29          volumeMounts:
30            - name: data
31              mountPath: /var/lib/postgresql/data
32  volumeClaimTemplates:
33    - metadata:
34        name: data
35      spec:
36        accessModes: ['ReadWriteOnce']
37        storageClassName: "csi-disk"
38        resources:
39          requests:
40            storage: 1Gi

Have you noticed how the Secret is mounted as an environment variable as described in the container image description [1]?

Also notice the volumeClaimTemplates section. The term Volume Claim Template indicates that this is not a Persistent Volume Claim (PVC). Consider the StatefulSet has specified multiple replicas, three (3) for instance. In this case three Persistent Volume Claims need to be created. As each PVC is then parameterized with the individual replica’s Pod identity, the actual Persistent Volume Claims are similar but not identical. The Persistent Volume Claim Template describes their commonalities.

Execute the spec:

1kubectl apply -f 30-stateful-set.yaml

List the StatefulSets:

1kubectl get statefulsets

Describe the StatefulSet:

1kubectl describe statefulset postgresql-sfs

In case your StatefulSet doesn’t become ready you may want to investigate its Pod which can be selected using a Label.

List the Pods of the StatefulSet by using its label app=postgresql-a:

1kubectl get pods -l app=postgresql-a

Your Pod may have entered the CrashLoopBackOff state as is failed to start several times in a row so it’s time to do some detective work.

By listing the Pods you have obtained the Pod name postgresql-sfs-0. The name perfectly demonstrates the Pod identity that comes as number 0 attached to the StatefulSet name postgresql-sfs.

Retrieve the Pod’s logs:

1kubectl logs postgresql-sfs-0

And you should see the entry:

 1The files belonging to this database system will be owned by user "postgres".
 2This user must also own the server process.
 3
 4The database cluster will be initialized with locale "en_US.utf8".
 5The default database encoding has accordingly been set to "UTF8".
 6The default text search configuration will be set to "english".
 7
 8Data page checksums are disabled.
 9
10initdb: error: directory "/var/lib/postgresql/data" exists but is not empty
11It contains a lost+found directory, perhaps due to it being a mountpoint.
12Using a mountpoint directly as the data directory is not recommended.
13Create a subdirectory under the mountpoint.

The PostgreSQL Image description [1] says:

PGDATA This optional variable can be used to define another location – like a subdirectory – for the database files. The default is /var/lib/postgresql/data. If the data volume you’re using is a filesystem mountpoint (like with GCE persistent disks) or remote folder that cannot be chowned to the postgres user (like some NFS mounts), Postgres initdb recommends a subdirectory be created to contain the data.

This means we have to tell PostgreSQL to change its data directory to something like /var/lib/postgresql/data/pgdata by passing the path using the PGDATA environment variable.

 1apiVersion: apps/v1
 2kind: StatefulSet
 3metadata:
 4  name: postgresql-sfs
 5spec:
 6  selector:
 7    matchLabels:
 8      app: postgresql-a # has to match .spec.template.metadata.labels
 9  serviceName: 'postgresql-svc'
10  replicas: 1 # by default is 1
11  template:
12    metadata:
13      labels:
14        app: postgresql-a # has to match .spec.selector.matchLabels
15    spec:
16      terminationGracePeriodSeconds: 10
17      containers:
18        - name: postgresql-db
19          image: postgres:14.5
20          env:
21            - name: POSTGRES_PASSWORD
22              valueFrom:
23                secretKeyRef:
24                  name: postgresql-secrets
25                  key: POSTGRES_PASSWORD
26            - name: PGDATA
27              value: /var/lib/postgresql/data/pgdata
28          ports:
29            - containerPort: 5432
30              name: postgresql-port
31          volumeMounts:
32            - name: data
33              mountPath: /var/lib/postgresql/data
34  volumeClaimTemplates:
35    - metadata:
36        name: data
37      spec:
38        accessModes: ['ReadWriteOnce']
39        storageClassName: "csi-disk"
40        resources:
41          requests:
42            storage: 1Gi

First delete the existing StatefulSet:

1kubectl delete statefulset postgresql-sfs

There is no problem with the Persistent Volume as it’s empty (beside of the lost+found folder). So with the newly introduced environment variable PGDATA you can apply the spec again:

1kubectl apply -f 30-stateful-set.yaml

And by executing:

1kubectl get statefulset postgresql-sfs

You should see the StatefulSet being RUNNING.

Congratulations! You have deployed your first StatefulSet.

  1. PostgreSQL Docker Image at Docker Hub, https://hub.docker.com/_/postgres
  2. Kubernetes Examples on GitHub, Persistent Volume Provisioning, https://github.com/kubernetes/examples/blob/master/staging/persistent-volume-provisioning/README.md
  3. PostgreSQL Documentation – psql, https://www.postgresql.org/docs/12/app-psql.html
  4. Kelsey Hightower @ Twitter, https://twitter.com/kelseyhightower/status/935252923721793536?lang=en
  5. Cloud Foundry, https://www.cloudfoundry.org/
  6. Open Service Broker API, https://www.openservicebrokerapi.org/
  7. anynines, a9s Data Services, https://www.anynines.com/data-services
  8. Kubernetes Documentation, Concepts, ServiceCatalog, https://kubernetes.io/docs/concepts/extend-kubernetes/service-catalog/
  9. Wikipedia, Principle of Least Privilege, https://en.wikipedia.org/wiki/Principle_of_least_privilege
  10. Kubernetes Documentation, Concepts, Services Networking, Service, https://kubernetes.io/docs/concepts/services-networking/service/#headless-services
  11. PostgreSQL, https://www.postgresql.org/