Running stateful applications with Red Hat OpenShift on Nutanix HCI

Nutanix.dev - Running stateful applications with Red Hat OpenShift on Nutanix HCI

This is the fourth and final blog post of the Red Hat OpenShift on Nutanix HCI series. The previous three blogs covered the deployment of an OpenShift cluster on Nutanix and the installation of the Nutanix CSI Operator on OpenShift to consume the rich set of Nutanix Unified Storage. The below post picks up from the third blog so at this point, you should ideally have a fully functional OpenShift cluster combined with Nutanix data services. If you haven’t had the chance to take a look, please review them here – Blog 1, Blog 2, Blog 3.

Overview

Although Kubernetes was originally designed to run stateless workloads, the technology has matured over time and enterprises are increasingly adopting the platform to run their stateful applications. In a survey conducted by the Data on Kubernetes community, 90% of the respondents believe that Kubernetes is ready for stateful workloads, and 70% of them are already running them in production with databases taking the top spot. Having the ability to standardize different workloads on Kubernetes and ensure consistency are seen as the key factors that drive value for businesses.

Nutanix provides an industry-leading HCI platform that is ideal for running cloud-native workloads running on Kubernetes at scale. The Nutanix architecture offers better resilience for both Kubernetes platform components and application data. With the addition of each HCI node, apart from scaling the Kubernetes compute nodes, there is an additional storage controller as well which results in improved storage performance for your stateful applications. 

The Nutanix Unified Storage is made available to cloud-native applications with the Nutanix CSI driver. Applications use standard Kubernetes objects such as PersistentVolumeClaims, PersistentVolumes, and StorageClasses to access its capabilities. The CSI driver also enables users to take Persistent Volume snapshots using API objects VolumeSnaphot, VolumeSnapshotContent, and VolumeSnapshotClass. Snapshots represent a point-in-time copy of a volume and can be used to provision a new volume or to restore existing volumes to the previous snapshotted data. OpenShift Container Platform deploys the snapshot controller and the related API objects as part of the Nutanix CSI Operator as described in Blog 3

In this blog, we will be deploying a PostgreSQL database and see how data stored on the Nutanix platform can be recovered in the event of a disaster, by leveraging the Nutanix CSI Operator.

Prerequisites

  • Install Git

Please clone this Github repo before proceeding further. There are YAML manifest files and scripts that should assist you with the process.

git clone https://github.com/nutanixdev/stateful-app_ocp_nutanix.git && cd stateful-app_ocp_nutanix

In this demo, we will be using oc to manage the OCP, which is similar to kubectl. Please note that most of the operations can be performed from the OpenShift web console interface as well.

We will be using the Helm package manager to deploy PostgreSQL.

Note: The OpenShift Container Platform also provides OperatorHub in the web console interface, from where you can install numerous applications as Operators.

Verifying Nutanix CSI Operator storage

  1. Ensure that the CSI pods are in Running state.
    oc get pods -n ntnx-system
  2. Ensure that the CSI driver secret used while interacting with the storage system is accurate. 
    oc get secret ntnx-secret -n ntnx-system -o jsonpath='{.data.key}' |base64 -d && echo
  3. Create a StorageClass from the provided YAML manifest. We will be using Nutanix Volumes to provide block storage. In case you have already created a StorageClass in the previous blog, you will still have to create this as the rest of this blog assumes there will be a StorageClass named nutanix-volumes.
    allowVolumeExpansion: true
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
       annotations:
           storageclass.kubernetes.io/is-default-class: "true"
       name: nutanix-volumes
    parameters:
      csi.storage.k8s.io/provisioner-secret-name: ntnx-secret
      csi.storage.k8s.io/provisioner-secret-namespace: ntnx-system
      csi.storage.k8s.io/node-publish-secret-name: ntnx-secret 
      csi.storage.k8s.io/node-publish-secret-namespace: ntnx-system
      csi.storage.k8s.io/controller-expand-secret-name: ntnx-secret
      csi.storage.k8s.io/controller-expand-secret-namespace: ntnx-system
      csi.storage.k8s.io/fstype: ext4
      #isSegmentedIscsiNetwork: is-segmented-iscsi-network
      flashMode: ENABLED
      storageContainer: SelfServiceContainer
      #chapAuth: ENABLED | DISABLED
      storageType: NutanixVolumes
      #whitelistIPMode: ENABLED/DISABLED  
      #whitelistIPAddr: ip-address
    provisioner: csi.nutanix.com
    reclaimPolicy: Delete

    oc create -f manifests/storageclass.yaml 

Install PostgreSQL

Let’s go ahead and deploy the Helm chart for PostgreSQL.

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install postgresql-prod bitnami/postgresql

Please refer to this guide if you wish to tweak the values and customize the installation. 

Wait until the database and the corresponding pods are running. Also, verify that the PV is created and the PVC is bound to the PV.

oc get statefulset
NAME              READY   AGE
postgresql-prod   1/1     94s
oc get pods
NAME                READY   STATUS    RESTARTS   AGE
postgresql-prod-0   1/1     Running   0          63s
oc get pvc
NAME                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
data-postgresql-prod-0   Bound    pvc-3b93dbf4-05fe-4e57-bac5-f9c696863f6a   8Gi        RWO            nutanix-volumes   116s
Note: This creates a default user "postgres" for the database.

To get the password for the user, please run this.
$ oc get secret --namespace default postgresql-prod -o jsonpath="{.data.postgres-password}" | base64 --decode && echo

To login to the database, please run this.
$ oc exec -it postgresql-prod-0 -- psql -U postgres -d postgres -p 5432

Workflow

We will be populating the PostgreSQL database with some sample data. The database consists of multiple related tables. A table consists of rows and columns which store structured data.

We will be creating two tables, one for Nutanix and the other for Red Hat, and inserting values into both these tables. To automate this process, there is a script provided that inserts the data at different times.  

Also, there is a second script provided that takes volume snapshots of the database at regular intervals. Similar to the StorageClass, the VolumeSnapshotClass object describes the classes of storage when provisioning a volume snapshot.

Note the driver csi.nutanix.com used in the YAML manifest volumesnapshotclass.yaml.

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshotClass
metadata:
  name: nutanix-volume-snapshot-class
driver: csi.nutanix.com
parameters:
  storageType: NutanixVolumes
  csi.storage.k8s.io/snapshotter-secret-name: ntnx-secret
  csi.storage.k8s.io/snapshotter-secret-namespace: ntnx-system
deletionPolicy: Delete

The VolumeSnapshot object is similar to a PVC – it denotes the request for a volume snapshot from a user. Here we dynamically provision a snapshot by specifying a PVC as the data source.

apiVersion: snapshot.storage.k8s.io/v1beta1
kind: VolumeSnapshot
metadata:
  name: postgresql-snapshot-0
spec:
  volumeSnapshotClassName: nutanix-volume-snapshot-class
  source:
    persistentVolumeClaimName: data-postgresql-prod-0

The second script will take a snapshot of the database volume every minute for a total of five times. During this time, the first script would have modified the database contents.

MinuteData InsertionSnapshot
0
1
2
3
4

The table above displays the data objects created after the successful execution of both scripts. Read ahead and run the outlined steps to ingest data and further, verify it in the Data verification section after that.

Let us ensure that there are no tables in the database currently.

export POSTGRES_PASSWORD=$(oc get secret --namespace default postgresql-prod -o jsonpath="{.data.postgres-password}" | base64 --decode)

oc exec -it postgresql-prod-0 -- psql "postgresql://postgres:$POSTGRES_PASSWORD@127.0.0.1/postgres" postgres -c "\d ;"
Did not find any relations.
  1. Create the VolumeSnapshotClass.
    oc create -f manifests/volumesnapshotclass.yaml
  2. Run the data generator script.
    nohup /bin/bash scripts/data.sh &>/dev/null &
  3. Run the snapshot generator script.
    nohup /bin/bash scripts/snapshot.sh &>/dev/null &
    

Verify both the scripts are running in the background.

$ jobs
[1]-  Running                 nohup /bin/bash scripts/data.sh &> /dev/null &
[2]+  Running                 nohup /bin/bash scripts/snapshot.sh &> /dev/null &

Data verification 

You can see that the nutanix table has been inserted right after executing the script. Also, there is a row inserted into the table with some values.

oc exec -it postgresql-prod-0 -- psql "postgresql://postgres:$POSTGRES_PASSWORD@127.0.0.1/postgres" postgres -c "\d ;"
               List of relations
 Schema |      Name      |   Type   |  Owner   
--------+----------------+----------+----------
 public | nutanix        | table    | postgres
 public | nutanix_id_seq | sequence | postgres
(2 rows)
oc exec -it postgresql-prod-0 -- psql "postgresql://postgres:$POSTGRES_PASSWORD@127.0.0.1/postgres" postgres -c "SELECT * FROM nutanix ;"
 id |  name  | location | address  |     created_on      
----+--------+----------+----------+---------------------
  1 | Karbon | USA      | San Jose | 2009-09-01 00:00:00
(1 row)

This is also probably the time you should take a coffee break and get back to your desk after more than five minutes. Let us wait for the scripts to finish execution.

If you check the data that exists after a while, you should see that the redhat table has also been added.

oc exec -it postgresql-prod-0 -- psql "postgresql://postgres:$POSTGRES_PASSWORD@127.0.0.1/postgres" postgres -c "\d ;"
               List of relations
 Schema |      Name      |   Type   |  Owner   
--------+----------------+----------+----------
 public | nutanix        | table    | postgres
 public | nutanix_id_seq | sequence | postgres
 public | redhat         | table    | postgres
 public | redhat_id_seq  | sequence | postgres
(4 rows)
oc exec -it postgresql-prod-0 -- psql "postgresql://postgres:$POSTGRES_PASSWORD@127.0.0.1/postgres" postgres -c "SELECT * FROM redhat ;"
 id |   name    | location |    address     |     created_on      
----+-----------+----------+----------------+---------------------
  2 | OpenShift | USA      | North Carolina | 1993-03-26 00:00:00
(1 row)

Finally, verify that all the five volume snapshots have been created. Note that the time difference between each of them is one minute.

oc get volumesnapshot
NAME                    READYTOUSE   SOURCEPVC                SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                   SNAPSHOTCONTENT                                    CREATIONTIME   AGE
postgresql-snapshot-0   true         data-postgresql-prod-0                           8Gi           nutanix-volume-snapshot-class   snapcontent-5e8b96c9-f391-4678-8020-f63fdf299309   8m27s          8m28s
postgresql-snapshot-1   true         data-postgresql-prod-0                           8Gi           nutanix-volume-snapshot-class   snapcontent-ffab2731-d502-45c7-8f60-73a185c6125b   7m27s          7m28s
postgresql-snapshot-2   true         data-postgresql-prod-0                           8Gi           nutanix-volume-snapshot-class   snapcontent-466db932-c631-4f7f-8c44-65d8e70bb16e   6m28s          6m28s
postgresql-snapshot-3   true         data-postgresql-prod-0                           8Gi           nutanix-volume-snapshot-class   snapcontent-e1dab8a1-68ae-4904-8ae8-fe914d2525cf   5m28s          5m28s
postgresql-snapshot-4   true         data-postgresql-prod-0                           8Gi           nutanix-volume-snapshot-class   snapcontent-12e32469-d684-4a48-bc3c-5520d08b9296   4m27s          4m28s

Simulating application failure

We can simulate a PostgreSQL database failure by deleting the StatefulSet and the associated PVCs.

helm uninstall postgresql-prod

oc delete pvc data-postgresql-prod-0

Data restoration

After you verify that the database has been removed, we can restore the data from the snapshots first and then deploy the database again.

Let’s assume we wished to restore the production database from the latest snapshot available. We would create a PVC pointing to the fifth snapshot as the data source (postgresql-snapshot-4).

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: data-postgresql-prod-0
spec:
 storageClassName: nutanix-volumes
 dataSource:
   name: postgresql-snapshot-4
   kind: VolumeSnapshot
   apiGroup: snapshot.storage.k8s.io
 accessModes:
   - ReadWriteOnce
 resources:
   requests:
     storage: 8Gi
oc create -f manifests/pvc-prod.yaml

Once we verify the PVC has been bound to the PV, we can go ahead and deploy the database again.

helm install postgresql-prod bitnami/postgresql

Wait for a couple of minutes for the database to be in a Running state and then verify the data. We see the redhat table and the correct values in the columns as expected.

oc exec -it postgresql-prod-0 -- psql "postgresql://postgres:$POSTGRES_PASSWORD@127.0.0.1/postgres" postgres -c "\d ;"
               List of relations
 Schema |      Name      |   Type   |  Owner   
--------+----------------+----------+----------
 public | nutanix        | table    | postgres
 public | nutanix_id_seq | sequence | postgres
 public | redhat         | table    | postgres
 public | redhat_id_seq  | sequence | postgres
(4 rows)
oc exec -it postgresql-prod-0 -- psql "postgresql://postgres:$POSTGRES_PASSWORD@127.0.0.1/postgres" postgres -c "SELECT * FROM redhat ;"
 id |   name    | location |    address     |     created_on      
----+-----------+----------+----------------+---------------------
  2 | OpenShift | USA      | North Carolina | 1993-03-26 00:00:00
(1 row)

Now say the dev team wanted to rollback to an earlier snapshot to review some changes, we could create another PVC and this time pointing to the first snapshot as the data source. (postgresql-snapshot-0). 

The YAML manifest is provided as pvc-dev.yaml.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
 name: data-postgresql-dev-0
spec:
 storageClassName: nutanix-volumes
 dataSource:
   name: postgresql-snapshot-0
   kind: VolumeSnapshot
   apiGroup: snapshot.storage.k8s.io
 accessModes:
   - ReadWriteOnce
 resources:
   requests:
     storage: 8Gi
oc create -f manifests/pvc-dev.yaml

Deploy another instance of the database as postgresql-dev

helm install postgresql-dev bitnami/postgresql

Once the pods are running. we can verify that the nutanix table exists but not the redhat table!

oc exec -it postgresql-dev-0 -- psql "postgresql://postgres:$POSTGRES_PASSWORD@127.0.0.1/postgres" postgres -c "\d ;"
               List of relations
 Schema |      Name      |   Type   |  Owner   
--------+----------------+----------+----------
 public | nutanix        | table    | postgres
 public | nutanix_id_seq | sequence | postgres
(2 rows)

Summary

The Nutanix cloud platform has consistently been recognized as a leader for providing unified storage solutions. Now we have seen that it can run containerized stateful workloads as well in production. Key enterprise use cases such as disaster recovery are addressed through snapshots and restoration of data. The Nutanix CSI Operator also delivers other key features such as Volume expansion and Volume cloning which have not been explored in this blog. 

Nutanix and Red Hat work together to delight customers with a full-stack platform that can build and scale containerized and virtualized applications in a hybrid multi-cloud environment.

Note: If you wish to delete the PostgreSQL database and the associated objects and restore the OpenShift cluster back to the default state, then run the reset.sh script that’s provided.

$ nohup /bin/bash reset.sh &>/dev/null &