Red Hat OpenShift IPI on Nutanix Cloud Platform

Nutanix.dev - Red Hat OpenShift IPI on Nutanix Cloud Platform

Table of Contents

Nutanix and Red Hat continue to offer certified, streamlined solutions that our shared customers are looking for as they navigate the hybrid multicloud landscape. While it is certainly possible to install a platform agnostic Red Hat OpenShift cluster on Nutanix, this method often relies on the administrator to deploy all of the required machines, the operating system image, load balancer, DNS entries and so on. Nutanix NCM Self-Service and Red Hat Ansible Automation Platform can be leveraged for end-to-end  automation of these workflows, but our customers require a natively integrated solution between the two platforms. 

We are happy to announce that with the release of Red Hat OpenShift 4.11, the platform’s full stack automated installation process known as Installer Provisioned Infrastructure or IPI is now available for the Nutanix Cloud Platform. 

With the IPI method, the OpenShift installer integrates with the Nutanix Prism APIs in creating the AHV virtual machines, installing the boot image and bootstrapping the entire cluster. There is no requirement to create and configure an external load balancer as it is integrated into the cluster during installation. Furthermore, scaling the cluster up and down to accommodate changing workloads can be done without user intervention. This is made feasible by Nutanix’s full Machine API support, based on the upstream Cluster API project and custom OpenShift resources. 

Let’s dig deeper into the IPI installation deployment workflow with a step-by-step process.

Pre-requisites

  • Red Hat OpenShift Container Platform version 4.11 has been tested for specific compatibility with the following software versions:
    • Prism Central version pc.2022.4
    • AOS version 5.20.4 (LTS) and 6.1.1 (STS)
  • A Nutanix cluster with a minimum of 800 GB of storage.  
  • A Nutanix user account with a role assigned that must be able to perform CRUD operations on VMs, categories and images.
  • A valid SSL certificate issued from a trusted CA for the Prism Central. For information on using self-signed certificates, please refer to the OpenShift documentation.
  • If a firewall exists in the environment, ensure that port 9440 to the Prism Central IP address is accessible.
  • You must use AHV IP Address Management (IPAM) for the machine network and ensure that it is configured to provide persistent IP addresses to the cluster machines
  • Two static IP addresses must be reserved for the cluster API VIP and ingress VIP. If you are using Nutanix’s IPAM feature, then you can reserve them from a subnet using the following command from any Controller VM in the Prism element cluster.
acli net.add_to_ip_blacklist <network_name> ip_list=ip_address1,ip_address2
  • You must create DNS records for the two static IP addresses in the appropriate DNS server. These must be of the form
    • api.<cluster_name>.<base_domain>. for the API VIP
    • *.apps.<cluster_name>.<base_domain>. for the ingress VIP


In our demo, we will be installing a cluster with cluster name ocp-demo and base domain mypcfqdn.uk. Let’s verify that the API and ingress VIP wildcard DNS entries are valid.

[root@openshift_provisoner ~]# dig +short api.ocp-demo.mypcfqdn.uk
10.55.68.150

[root@openshift_provisoner ~]# dig +short test.apps.ocp-demo.mypcfqdn.uk
10.55.68.151

Download the installation program

Login to the Red Hat Hybrid Cloud Console and navigate to the Nutanix AOS page to get started. Download and extract the OpenShift installer program and the oc tools required to manage the cluster. Make sure you download the Pull secret as well. Alternatively, you can obtain the installation program from here as well.

$ ls
kubectl  oc  openshift-install  pull_secret.json

Cloud Credential Operator utility

The Cloud Credential Operator is a controller that lets OCP request credentials for a particular cloud provider. Nutanix only supports setting CCO to manual mode for now. In manual mode, a user manages cloud credentials instead of the Cloud Credential Operator (CCO). You will have to extract the CredentialsRequest CRs in the release image and create the Kubernetes Secrets from it using the Prism Central credentials. We will be using the ccoctl tool for this.

Let’s extract the ccoctl binary from the release image.

  • Obtain the OpenShift release image.
$ RELEASE_IMAGE=$(./openshift-install version | awk '/release image/ {print $3}')
  • Get the CCO container image from the release image.
$ CCO_IMAGE=$(oc adm release info --image-for='cloud-credential-operator' $RELEASE_IMAGE)
  • Extract the ccoctl binary from the CCO container image and make it executable. 
$ oc image extract $CCO_IMAGE --file="/usr/bin/ccoctl" -a pull_secret.json
$ chmod u+x ccoctl && cp ccoctl /usr/local/bin/
$ ls
ccoctl  kubectl  oc  openshift-install  pull_secret.json
Note: The ccoctl is a Linux binary and must run in a Linux environment.

Let’s create a YAML file that holds the Prism credentials in a directory “creds”. Below is a sample of the credentials format.

$ cat creds/pc_credentials.yaml 
credentials:
- type: basic_auth
  data:
    prismCentral:
      username: <username_for_prism_central>
      password: <password_for_prism_central>

Extract the CredentialsRequest objects for Nutanix Cloud Platform from the release image and store it in a directory called “credreqs”.

$ oc adm release extract --credentials-requests --cloud=nutanix --to=credreqs -a pull_secret.json $RELEASE_IMAGE

Finally use the ccoctl tool to process the CredentialsRequest objects and generate the secret manifests, which will be required later.

$ ccoctl nutanix create-shared-secrets --credentials-requests-dir=credreqs --output-dir=. --credentials-source-filepath=creds/pc_credentials.yaml
Output:
2022/08/02 04:01:58 Saved credentials configuration to: manifests/openshift-machine-api-nutanix-credentials-credentials.yaml

Verify that the file has been created; the expected output should be as seen below.

$ cat manifests/openshift-machine-api-nutanix-credentials-credentials.yaml
apiVersion: v1
kind: Secret
metadata:
  name: nutanix-credentials
  namespace: openshift-machine-api
type: Opaque
data:
  credentials: ******************************************************************************************************************************************************

Creating the installation configuration file

You would use three sets of files during the installation – an installation configuration file that is named install_config.yaml, Kubernetes manifests and Ignition config files for your machine types. The install_config file contains Nutanix platform specific details, which are transformed into Kubernetes manifests. These manifests are then wrapped into Ignition config files that are used by the installation program to create the cluster.

Creating the install_config file is an interactive process. Launch the install by running the following command.

$ openshift-install create install-config
  • Select nutanix as the platform
  • Provide Prism Central details such as the endpoint, the Prism Element cluster and the network subnet to use
  • Provide OpenShift cluster details such as cluster name, base domain and the VIP for API and Ingress
$ openshift-install create install-config

? SSH Public Key /root/.ssh/nk_id_rsa.pub
? Platform nutanix
? Prism Central ntnxdemo.mypcfqdn.uk
? Port 9440
? Username demo-admin
? Password [? for help] ************
INFO Connecting to Prism Central ntnxdemo.mypcfqdn.uk 
? Prism Element DM3-POC068
? Subnet Secondary-demo
? Virtual IP Address for API 10.55.68.150
? Virtual IP Address for Ingress 10.55.68.151
? Base Domain mypcfqdn.uk
? Cluster Name ocp-demo
? Pull Secret [? for help] ******************************************************************************************************************************************************
***************************************************************************************************
INFO Install-Config created in: .   

Once the install_config.yaml file is created, make sure to back it up safely as this file will be consumed and deleted during the installation process. You can reuse the same file to build multiple clusters as well, by modifying the necessary parameters such as name, VIPs etc.

Below is a sample install-config.yaml file that was generated from the process. Before moving forward, make sure to open it up and confirm that the information is accurate. We’ll continue using the defaults.

$ cat install-config.yaml 
apiVersion: v1
baseDomain: mypcfqdn.uk
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform: {}
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform: {}
  replicas: 3
credentialsMode: Manual
metadata:
  creationTimestamp: null
  name: ocp-demo
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  nutanix:
    apiVIP: 10.55.68.150
    ingressVIP: 10.55.68.151
    prismCentral:
      endpoint:
        address: ntnxdemo.mypcfqdn.uk
        port: 9440
      password: ********
      username: demo-admin
    prismElements:
    - endpoint:
        address: 10.55.68.37
        port: 9440
      uuid: 0005e4c8-1f34-9dc5-0000-000000014039
    subnetUUIDs:
    - 8346214e-584c-4689-b525-c6019bbc4856
publish: External
pullSecret: '{"auths": …}'
sshKey: '********'

To learn more about all the parameters available and to customize the file as you wish, please refer to the OpenShift documentation.

Generate the installation manifests from the install_config YAML file.

$ openshift-install create manifests
INFO Consuming Install Config from target directory 
INFO Manifests created in: manifests and openshift 
Note: Ensure that the openshift-machine-api-nutanix-credentials-credentials.yaml we generated earlier exists in the manifests directory.

We are ready to create the cluster now.

Deploying the cluster

Initialize the cluster deployment by running the following command.

$ openshift-install create cluster

This creates the bootstrap, master and worker ignition files and then the installer creates the cluster from them. The installer will fetch and create the boot image which is used to power on the bootstrap and control plane nodes. When the control plane is ready, it deletes the bootstrap node and creates the worker nodes as specified in the configuration.

$ openshift-install create cluster

INFO Consuming Common Manifests from target directory 
INFO Consuming Worker Machines from target directory 
INFO Consuming Openshift Manifests from target directory 
INFO Consuming Master Machines from target directory 
INFO Consuming OpenShift Install (Manifests) from target directory 
INFO Creating infrastructure resources...         
INFO Waiting up to 20m0s (until 4:43AM) for the Kubernetes API at https://api.ocp-demo.mypcfqdn.uk:6443... 
INFO API v1.24.0+9546431 up                       
INFO Waiting up to 30m0s (until 4:55AM) for bootstrapping to complete... 
INFO Destroying the bootstrap resources...        
INFO Waiting up to 40m0s (until 5:15AM) for the cluster at https://api.ocp-demo.mypcfqdn.uk:6443 to initialize... 
INFO Waiting up to 10m0s (until 4:56AM) for the openshift-console route to be created... 
INFO Install complete!                            
INFO To access the cluster as the system:admin user when using 'oc', run 
INFO     export KUBECONFIG=/root/IPI/auth/kubeconfig 
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp-demo.mypcfqdn.uk 
INFO Login to the console with user: "kubeadmin", and password: "xxxx-xxxx-xxxx" 
INFO Time elapsed: 25m29s   

When the cluster deployment completes successfully, the terminal displays the login credentials along with the link to access the OpenShift web console.  

If you switch back to Prism Central, you will see that the machines have been deployed on all the AHV nodes in the cluster. Prism gives you insights not only into the VM configuration but detailed statistics on their network, storage, efficiency and so on.

 Here is a screenshot captured from Prism Central after the deployment.

Let’s grab the OpenShift link and login to the cluster’s administrator console. You should see that the provider is Nutanix as we expect.

Nutanix CSI Operator

After the installation, let’s configure storage for the cluster. Applications on OpenShift can consume Nutanix storage via the Nutanix CSI driver which is packaged as a certified Red Hat OpenShift Operator. The Nutanix CSI Operator provides this scalable and persistent storage by leveraging Volumes for block storage and Files for file storage.

Head over to the Operators tab in the console and search for Nutanix. Install the certified Nutanix Operator and follow the instructions to create StorageClasses and PVCs. 

For a detailed walkthrough, please refer to this demo.

Machine API provider

With the IPI installation from OpenShift 4.11, Nutanix also introduces support for the Machine API. The Machine API Operator manages the underlying OpenShift VMs through the concept of Machines and MachineSets. 

If we select the Nodes view, we should see all the Nutanix VMs that have been created here.

Moving to the Machine view, we see each Node is managed as Machines by the Machine API.

The installer creates a MachineSet by default with the three worker machines in it. MachineSets are to Machines as ReplicaSets are to Pods.

Lets jump back to the command line to take a look.

$ oc get machinesets -n openshift-machine-api
NAME                    DESIRED   CURRENT   READY   AVAILABLE   AGE
ocp-demo-t82dn-worker   3         3         3       3           64m
$ oc get machines -n openshift-machine-api
NAME                          PHASE     TYPE   REGION   ZONE   AGE
ocp-demo-t82dn-master-0       Running                          64m
ocp-demo-t82dn-master-1       Running                          64m
ocp-demo-t82dn-master-2       Running                          64m
ocp-demo-t82dn-worker-bpl9w   Running                          59m
ocp-demo-t82dn-worker-cwntr   Running                          59m
ocp-demo-t82dn-worker-ndvlr   Running                          59m

You can manually control the Machine count by modifying the MachineSet’s replicas. When you delete a Machine, the MachineSet will recreate it as well. You can also create additional MachineSets for special purposes like Infrastructure Nodes.

Although this is excellent, the autoscaler provides even greater value. It is critical to have a cluster that adapts to changing workloads. We will create the cluster auto scaler and machine autoscaler objects which enables OpenShift to automatically scale the infrastructure to meet the deployment needs. 

Below are sample YAML objects we will be using.

apiVersion: "autoscaling.openshift.io/v1"
kind: "ClusterAutoscaler"
metadata:
  name: "default"
spec:
  resourceLimits:
    maxNodesTotal: 20
  scaleDown:
    enabled: true
    delayAfterAdd: 10s
    delayAfterDelete: 10s
    delayAfterFailure: 10s

The ClusterAutoscaler allows you to specify cluster-wide scaling limits for resources like cores, memory etc. Here, we have specified the maximum number of nodes in the cluster to be 20.

apiVersion: "autoscaling.openshift.io/v1beta1"
kind: "MachineAutoscaler"
metadata:
  name: "ocp-demo-t82dn-worker"
  namespace: "openshift-machine-api"
spec:
  minReplicas: 1 
  maxReplicas: 12 
  scaleTargetRef: 
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet 
    name: ocp-demo-t82dn-worker

The MachineAutoscaler automatically adjusts the number of machines in a machine set. Ensure that the name value in the spec matches an existing machine set (in our example, it will be ocp-demo-t82dn-worker). 

Let’s go ahead and create these objects.

$ oc create -f clusterautoscaler.yaml 
clusterautoscaler.autoscaling.openshift.io/default created
$ oc create -f machineautoscaler.yaml 
machineautoscaler.autoscaling.openshift.io/ocp-demo-t82dn-worker created

Refer to the official Openshift documentation to learn about all the other parameters and limits you may want to configure in the cluster.

Autoscaling the cluster

Now let’s generate some load that will demonstrate scaling OpenShift on Nutanix. We will be creating a project autoscale-example and creating a Job in it that will generate a massive load on the cluster. A lot of pods will be created forcing the autoscaler to scale the number of worker machines.

$ oc adm new-project autoscale-example && oc project autoscale-example
Created project autoscale-example
Now using project "autoscale-example" on server "https://api.ocp-demo.mypcfqdn.uk:6443".

The below Job will create 50 pods which will run in parallel. Each of them would use 500M memory and 500m CPU cores, and would terminate after 8 minutes.

apiVersion: batch/v1
kind: Job
metadata:
  generateName: demo-job-
spec:
  template:
    spec:
      containers:
      - name: work
        image: busybox
        command: ["sleep",  "480"]
        resources:
          requests:
            memory: 500Mi
            cpu: 500m
      restartPolicy: Never
  backoffLimit: 4
  completions: 50
  parallelism: 50
$ oc create -f demo-job.yaml

If we wait for a minute and check the pod status, we would see a huge number of pods running.

$ oc get pod -n autoscale-example
NAME                   READY   STATUS    RESTARTS   AGE
demo-job-jfj5k-29xqc   0/1     Pending   0          63s
demo-job-jfj5k-2j8zf   1/1     Running   0          63s
demo-job-jfj5k-2xnng   1/1     Running   0          63s
demo-job-jfj5k-5ftfc   0/1     Pending   0          63s
demo-job-jfj5k-5kw7z   0/1     Pending   0          63s
demo-job-jfj5k-69j48   0/1     Pending   0          63s
demo-job-jfj5k-69xbw   1/1     Running   0          63s
demo-job-jfj5k-6z9n5   1/1     Running   0          63s
demo-job-jfj5k-7lxz2   0/1     Pending   0          63s
demo-job-jfj5k-7pz2g   1/1     Running   0          63s
demo-job-jfj5k-bbwgj   0/1     Pending   0          63s
demo-job-jfj5k-bnlzv   0/1     Pending   0          63s
demo-job-jfj5k-bqdq5   0/1     Pending   0          63s
demo-job-jfj5k-c4psn   1/1     Running   0          63s
demo-job-jfj5k-c9gxs   0/1     Pending   0          63s
demo-job-jfj5k-cc4ws   0/1     Pending   0          63s
demo-job-jfj5k-crbsf   0/1     Pending   0          63s
demo-job-jfj5k-fj2p5   0/1     Pending   0          63s
.
.
.

Looking at the machine count, we see a lot of machines are being created and added.

$ oc get machines -n openshift-machine-api
NAME                          PHASE          TYPE   REGION   ZONE   AGE
ocp-demo-t82dn-master-0       Running                               120m
ocp-demo-t82dn-master-1       Running                               120m
ocp-demo-t82dn-master-2       Running                               120m
ocp-demo-t82dn-worker-7zgzb   Provisioning                          17s
ocp-demo-t82dn-worker-8gh6f   Provisioning                          17s
ocp-demo-t82dn-worker-bpl9w   Running                               116m
ocp-demo-t82dn-worker-cwntr   Running                               116m
ocp-demo-t82dn-worker-dcvch   Provisioning                          17s
ocp-demo-t82dn-worker-dffnb   Provisioning                          17s
ocp-demo-t82dn-worker-m5jzm   Provisioning                          17s
ocp-demo-t82dn-worker-mh8vh   Provisioning                          17s
ocp-demo-t82dn-worker-ndvlr   Running                               116m
ocp-demo-t82dn-worker-vz4r4   Provisioning                          17s

If we return to the Prism console, we see these VMs are being created on the Nutanix platform.

After eight minutes, the workload starts to terminate and the load on the cluster reduces. We can now see that the machine autoscaler will begin to delete the unnecessary Machines from the MachineSet.

$ oc get machines -n openshift-machine-api
NAME                          PHASE      TYPE   REGION   ZONE   AGE
ocp-demo-t82dn-master-0       Running                           130m
ocp-demo-t82dn-master-1       Running                           130m
ocp-demo-t82dn-master-2       Running                           128m
ocp-demo-t82dn-worker-7zgzb   Deleting                          10m7s
ocp-demo-t82dn-worker-bpl9w   Running                           126m
ocp-demo-t82dn-worker-cwntr   Running                           126m
ocp-demo-t82dn-worker-dcvch   Running                           10m7s
ocp-demo-t82dn-worker-dffnb   Running                           10m7s
ocp-demo-t82dn-worker-m5jzm   Deleting                          10m7s
ocp-demo-t82dn-worker-ndvlr   Running                           126m
ocp-demo-t82dn-worker-vz4r4   Running                           10m7s

Additionally, the same is visible on the OpenShift console and we see the VM deletions on Nutanix from Prism.

Conclusion

With OpenShift 4.11, deploying and managing an enterprise Kubernetes solution on Nutanix has become easier. The IPI installer’s integration allows for automated provisioning of the OpenShift cluster by employing the native Nutanix APIs. Operators gain back a lot of time and effort not only on day 0 work, but on day 2 capabilities. By dynamically provisioning the underlying Nutanix infrastructure, the OpenShift cluster can automatically scale up and down to accommodate the changing needs of modern applications.

© 2024 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product, feature and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. Other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s). This post may contain links to external websites that are not part of Nutanix.com. Nutanix does not control these sites and disclaims all responsibility for the content or accuracy of any external site. Our decision to link to an external site should not be considered an endorsement of any content on such a site. Certain information contained in this post may relate to or be based on studies, publications, surveys and other data obtained from third-party sources and our own internal estimates and research. While we believe these third-party studies, publications, surveys and other data are reliable as of the date of this post, they have not independently verified, and we make no representation as to the adequacy, fairness, accuracy, or completeness of any information obtained from third-party sources.