Nutanix and Red Hat continue to offer certified, streamlined solutions that our shared customers are looking for as they navigate the hybrid multicloud landscape. While it is certainly possible to install a platform agnostic Red Hat OpenShift cluster on Nutanix, this method often relies on the administrator to deploy all of the required machines, the operating system image, load balancer, DNS entries and so on. Nutanix NCM Self-Service and Red Hat Ansible Automation Platform can be leveraged for end-to-end automation of these workflows, but our customers require a natively integrated solution between the two platforms.
We are happy to announce that with the release of Red Hat OpenShift 4.11, the platform’s full stack automated installation process known as Installer Provisioned Infrastructure or IPI is now available for the Nutanix Cloud Platform.
With the IPI method, the OpenShift installer integrates with the Nutanix Prism APIs in creating the AHV virtual machines, installing the boot image and bootstrapping the entire cluster. There is no requirement to create and configure an external load balancer as it is integrated into the cluster during installation. Furthermore, scaling the cluster up and down to accommodate changing workloads can be done without user intervention. This is made feasible by Nutanix’s full Machine API support, based on the upstream Cluster API project and custom OpenShift resources.
Let’s dig deeper into the IPI installation deployment workflow with a step-by-step process.
Pre-requisites
- Red Hat OpenShift Container Platform version 4.11 has been tested for specific compatibility with the following software versions:
- Prism Central version pc.2022.4
- AOS version 5.20.4 (LTS) and 6.1.1 (STS)
- A Nutanix cluster with a minimum of 800 GB of storage.
- A Nutanix user account with a role assigned that must be able to perform CRUD operations on VMs, categories and images.
- A valid SSL certificate issued from a trusted CA for the Prism Central. For information on using self-signed certificates, please refer to the OpenShift documentation.
- If a firewall exists in the environment, ensure that port 9440 to the Prism Central IP address is accessible.
- You must use AHV IP Address Management (IPAM) for the machine network and ensure that it is configured to provide persistent IP addresses to the cluster machines
- Two static IP addresses must be reserved for the cluster API VIP and ingress VIP. If you are using Nutanix’s IPAM feature, then you can reserve them from a subnet using the following command from any Controller VM in the Prism element cluster.
acli net.add_to_ip_blacklist <network_name> ip_list=ip_address1,ip_address2
- You must create DNS records for the two static IP addresses in the appropriate DNS server. These must be of the form
- api.<cluster_name>.<base_domain>. for the API VIP
- *.apps.<cluster_name>.<base_domain>. for the ingress VIP
In our demo, we will be installing a cluster with cluster name ocp-demo and base domain mypcfqdn.uk. Let’s verify that the API and ingress VIP wildcard DNS entries are valid.
[root@openshift_provisoner ~]# dig +short api.ocp-demo.mypcfqdn.uk
10.55.68.150
[root@openshift_provisoner ~]# dig +short test.apps.ocp-demo.mypcfqdn.uk
10.55.68.151
Download the installation program
Login to the Red Hat Hybrid Cloud Console and navigate to the Nutanix AOS page to get started. Download and extract the OpenShift installer program and the oc tools required to manage the cluster. Make sure you download the Pull secret as well. Alternatively, you can obtain the installation program from here as well.
$ ls
kubectl oc openshift-install pull_secret.json
Cloud Credential Operator utility
The Cloud Credential Operator is a controller that lets OCP request credentials for a particular cloud provider. Nutanix only supports setting CCO to manual mode for now. In manual mode, a user manages cloud credentials instead of the Cloud Credential Operator (CCO). You will have to extract the CredentialsRequest CRs in the release image and create the Kubernetes Secrets from it using the Prism Central credentials. We will be using the ccoctl tool for this.
Let’s extract the ccoctl binary from the release image.
- Obtain the OpenShift release image.
$ RELEASE_IMAGE=$(./openshift-install version | awk '/release image/ {print $3}')
- Get the CCO container image from the release image.
$ CCO_IMAGE=$(oc adm release info --image-for='cloud-credential-operator' $RELEASE_IMAGE)
- Extract the ccoctl binary from the CCO container image and make it executable.
$ oc image extract $CCO_IMAGE --file="/usr/bin/ccoctl" -a pull_secret.json
$ chmod u+x ccoctl && cp ccoctl /usr/local/bin/
$ ls
ccoctl kubectl oc openshift-install pull_secret.json
Note: The ccoctl is a Linux binary and must run in a Linux environment.
Let’s create a YAML file that holds the Prism credentials in a directory “creds”. Below is a sample of the credentials format.
$ cat creds/pc_credentials.yaml
credentials:
- type: basic_auth
data:
prismCentral:
username: <username_for_prism_central>
password: <password_for_prism_central>
Extract the CredentialsRequest objects for Nutanix Cloud Platform from the release image and store it in a directory called “credreqs”.
$ oc adm release extract --credentials-requests --cloud=nutanix --to=credreqs -a pull_secret.json $RELEASE_IMAGE
Finally use the ccoctl tool to process the CredentialsRequest objects and generate the secret manifests, which will be required later.
$ ccoctl nutanix create-shared-secrets --credentials-requests-dir=credreqs --output-dir=. --credentials-source-filepath=creds/pc_credentials.yaml
Output:
2022/08/02 04:01:58 Saved credentials configuration to: manifests/openshift-machine-api-nutanix-credentials-credentials.yaml
Verify that the file has been created; the expected output should be as seen below.
$ cat manifests/openshift-machine-api-nutanix-credentials-credentials.yaml
apiVersion: v1
kind: Secret
metadata:
name: nutanix-credentials
namespace: openshift-machine-api
type: Opaque
data:
credentials: ******************************************************************************************************************************************************
Creating the installation configuration file
You would use three sets of files during the installation – an installation configuration file that is named install_config.yaml, Kubernetes manifests and Ignition config files for your machine types. The install_config file contains Nutanix platform specific details, which are transformed into Kubernetes manifests. These manifests are then wrapped into Ignition config files that are used by the installation program to create the cluster.
Creating the install_config file is an interactive process. Launch the install by running the following command.
$ openshift-install create install-config
- Select nutanix as the platform
- Provide Prism Central details such as the endpoint, the Prism Element cluster and the network subnet to use
- Provide OpenShift cluster details such as cluster name, base domain and the VIP for API and Ingress
$ openshift-install create install-config
? SSH Public Key /root/.ssh/nk_id_rsa.pub
? Platform nutanix
? Prism Central ntnxdemo.mypcfqdn.uk
? Port 9440
? Username demo-admin
? Password [? for help] ************
INFO Connecting to Prism Central ntnxdemo.mypcfqdn.uk
? Prism Element DM3-POC068
? Subnet Secondary-demo
? Virtual IP Address for API 10.55.68.150
? Virtual IP Address for Ingress 10.55.68.151
? Base Domain mypcfqdn.uk
? Cluster Name ocp-demo
? Pull Secret [? for help] ******************************************************************************************************************************************************
***************************************************************************************************
INFO Install-Config created in: .
Once the install_config.yaml file is created, make sure to back it up safely as this file will be consumed and deleted during the installation process. You can reuse the same file to build multiple clusters as well, by modifying the necessary parameters such as name, VIPs etc.
Below is a sample install-config.yaml file that was generated from the process. Before moving forward, make sure to open it up and confirm that the information is accurate. We’ll continue using the defaults.
$ cat install-config.yaml
apiVersion: v1
baseDomain: mypcfqdn.uk
compute:
- architecture: amd64
hyperthreading: Enabled
name: worker
platform: {}
replicas: 3
controlPlane:
architecture: amd64
hyperthreading: Enabled
name: master
platform: {}
replicas: 3
credentialsMode: Manual
metadata:
creationTimestamp: null
name: ocp-demo
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
machineNetwork:
- cidr: 10.0.0.0/16
networkType: OpenShiftSDN
serviceNetwork:
- 172.30.0.0/16
platform:
nutanix:
apiVIP: 10.55.68.150
ingressVIP: 10.55.68.151
prismCentral:
endpoint:
address: ntnxdemo.mypcfqdn.uk
port: 9440
password: ********
username: demo-admin
prismElements:
- endpoint:
address: 10.55.68.37
port: 9440
uuid: 0005e4c8-1f34-9dc5-0000-000000014039
subnetUUIDs:
- 8346214e-584c-4689-b525-c6019bbc4856
publish: External
pullSecret: '{"auths": …}'
sshKey: '********'
To learn more about all the parameters available and to customize the file as you wish, please refer to the OpenShift documentation.
Generate the installation manifests from the install_config YAML file.
$ openshift-install create manifests
INFO Consuming Install Config from target directory
INFO Manifests created in: manifests and openshift
Note: Ensure that the openshift-machine-api-nutanix-credentials-credentials.yaml we generated earlier exists in the manifests directory.
We are ready to create the cluster now.
Deploying the cluster
Initialize the cluster deployment by running the following command.
$ openshift-install create cluster
This creates the bootstrap, master and worker ignition files and then the installer creates the cluster from them. The installer will fetch and create the boot image which is used to power on the bootstrap and control plane nodes. When the control plane is ready, it deletes the bootstrap node and creates the worker nodes as specified in the configuration.
$ openshift-install create cluster
INFO Consuming Common Manifests from target directory
INFO Consuming Worker Machines from target directory
INFO Consuming Openshift Manifests from target directory
INFO Consuming Master Machines from target directory
INFO Consuming OpenShift Install (Manifests) from target directory
INFO Creating infrastructure resources...
INFO Waiting up to 20m0s (until 4:43AM) for the Kubernetes API at https://api.ocp-demo.mypcfqdn.uk:6443...
INFO API v1.24.0+9546431 up
INFO Waiting up to 30m0s (until 4:55AM) for bootstrapping to complete...
INFO Destroying the bootstrap resources...
INFO Waiting up to 40m0s (until 5:15AM) for the cluster at https://api.ocp-demo.mypcfqdn.uk:6443 to initialize...
INFO Waiting up to 10m0s (until 4:56AM) for the openshift-console route to be created...
INFO Install complete!
INFO To access the cluster as the system:admin user when using 'oc', run
INFO export KUBECONFIG=/root/IPI/auth/kubeconfig
INFO Access the OpenShift web-console here: https://console-openshift-console.apps.ocp-demo.mypcfqdn.uk
INFO Login to the console with user: "kubeadmin", and password: "xxxx-xxxx-xxxx"
INFO Time elapsed: 25m29s
When the cluster deployment completes successfully, the terminal displays the login credentials along with the link to access the OpenShift web console.
If you switch back to Prism Central, you will see that the machines have been deployed on all the AHV nodes in the cluster. Prism gives you insights not only into the VM configuration but detailed statistics on their network, storage, efficiency and so on.
Here is a screenshot captured from Prism Central after the deployment.
Let’s grab the OpenShift link and login to the cluster’s administrator console. You should see that the provider is Nutanix as we expect.
Nutanix CSI Operator
After the installation, let’s configure storage for the cluster. Applications on OpenShift can consume Nutanix storage via the Nutanix CSI driver which is packaged as a certified Red Hat OpenShift Operator. The Nutanix CSI Operator provides this scalable and persistent storage by leveraging Volumes for block storage and Files for file storage.
Head over to the Operators tab in the console and search for Nutanix. Install the certified Nutanix Operator and follow the instructions to create StorageClasses and PVCs.
For a detailed walkthrough, please refer to this demo.
Machine API provider
With the IPI installation from OpenShift 4.11, Nutanix also introduces support for the Machine API. The Machine API Operator manages the underlying OpenShift VMs through the concept of Machines and MachineSets.
If we select the Nodes view, we should see all the Nutanix VMs that have been created here.
Moving to the Machine view, we see each Node is managed as Machines by the Machine API.
The installer creates a MachineSet by default with the three worker machines in it. MachineSets are to Machines as ReplicaSets are to Pods.
Lets jump back to the command line to take a look.
$ oc get machinesets -n openshift-machine-api
NAME DESIRED CURRENT READY AVAILABLE AGE
ocp-demo-t82dn-worker 3 3 3 3 64m
$ oc get machines -n openshift-machine-api
NAME PHASE TYPE REGION ZONE AGE
ocp-demo-t82dn-master-0 Running 64m
ocp-demo-t82dn-master-1 Running 64m
ocp-demo-t82dn-master-2 Running 64m
ocp-demo-t82dn-worker-bpl9w Running 59m
ocp-demo-t82dn-worker-cwntr Running 59m
ocp-demo-t82dn-worker-ndvlr Running 59m
You can manually control the Machine count by modifying the MachineSet’s replicas. When you delete a Machine, the MachineSet will recreate it as well. You can also create additional MachineSets for special purposes like Infrastructure Nodes.
Although this is excellent, the autoscaler provides even greater value. It is critical to have a cluster that adapts to changing workloads. We will create the cluster auto scaler and machine autoscaler objects which enables OpenShift to automatically scale the infrastructure to meet the deployment needs.
Below are sample YAML objects we will be using.
apiVersion: "autoscaling.openshift.io/v1"
kind: "ClusterAutoscaler"
metadata:
name: "default"
spec:
resourceLimits:
maxNodesTotal: 20
scaleDown:
enabled: true
delayAfterAdd: 10s
delayAfterDelete: 10s
delayAfterFailure: 10s
The ClusterAutoscaler allows you to specify cluster-wide scaling limits for resources like cores, memory etc. Here, we have specified the maximum number of nodes in the cluster to be 20.
apiVersion: "autoscaling.openshift.io/v1beta1"
kind: "MachineAutoscaler"
metadata:
name: "ocp-demo-t82dn-worker"
namespace: "openshift-machine-api"
spec:
minReplicas: 1
maxReplicas: 12
scaleTargetRef:
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
name: ocp-demo-t82dn-worker
The MachineAutoscaler automatically adjusts the number of machines in a machine set. Ensure that the name value in the spec matches an existing machine set (in our example, it will be ocp-demo-t82dn-worker).
Let’s go ahead and create these objects.
$ oc create -f clusterautoscaler.yaml
clusterautoscaler.autoscaling.openshift.io/default created
$ oc create -f machineautoscaler.yaml
machineautoscaler.autoscaling.openshift.io/ocp-demo-t82dn-worker created
Refer to the official Openshift documentation to learn about all the other parameters and limits you may want to configure in the cluster.
Autoscaling the cluster
Now let’s generate some load that will demonstrate scaling OpenShift on Nutanix. We will be creating a project autoscale-example and creating a Job in it that will generate a massive load on the cluster. A lot of pods will be created forcing the autoscaler to scale the number of worker machines.
$ oc adm new-project autoscale-example && oc project autoscale-example
Created project autoscale-example
Now using project "autoscale-example" on server "https://api.ocp-demo.mypcfqdn.uk:6443".
The below Job will create 50 pods which will run in parallel. Each of them would use 500M memory and 500m CPU cores, and would terminate after 8 minutes.
apiVersion: batch/v1
kind: Job
metadata:
generateName: demo-job-
spec:
template:
spec:
containers:
- name: work
image: busybox
command: ["sleep", "480"]
resources:
requests:
memory: 500Mi
cpu: 500m
restartPolicy: Never
backoffLimit: 4
completions: 50
parallelism: 50
$ oc create -f demo-job.yaml
If we wait for a minute and check the pod status, we would see a huge number of pods running.
$ oc get pod -n autoscale-example
NAME READY STATUS RESTARTS AGE
demo-job-jfj5k-29xqc 0/1 Pending 0 63s
demo-job-jfj5k-2j8zf 1/1 Running 0 63s
demo-job-jfj5k-2xnng 1/1 Running 0 63s
demo-job-jfj5k-5ftfc 0/1 Pending 0 63s
demo-job-jfj5k-5kw7z 0/1 Pending 0 63s
demo-job-jfj5k-69j48 0/1 Pending 0 63s
demo-job-jfj5k-69xbw 1/1 Running 0 63s
demo-job-jfj5k-6z9n5 1/1 Running 0 63s
demo-job-jfj5k-7lxz2 0/1 Pending 0 63s
demo-job-jfj5k-7pz2g 1/1 Running 0 63s
demo-job-jfj5k-bbwgj 0/1 Pending 0 63s
demo-job-jfj5k-bnlzv 0/1 Pending 0 63s
demo-job-jfj5k-bqdq5 0/1 Pending 0 63s
demo-job-jfj5k-c4psn 1/1 Running 0 63s
demo-job-jfj5k-c9gxs 0/1 Pending 0 63s
demo-job-jfj5k-cc4ws 0/1 Pending 0 63s
demo-job-jfj5k-crbsf 0/1 Pending 0 63s
demo-job-jfj5k-fj2p5 0/1 Pending 0 63s
.
.
.
Looking at the machine count, we see a lot of machines are being created and added.
$ oc get machines -n openshift-machine-api
NAME PHASE TYPE REGION ZONE AGE
ocp-demo-t82dn-master-0 Running 120m
ocp-demo-t82dn-master-1 Running 120m
ocp-demo-t82dn-master-2 Running 120m
ocp-demo-t82dn-worker-7zgzb Provisioning 17s
ocp-demo-t82dn-worker-8gh6f Provisioning 17s
ocp-demo-t82dn-worker-bpl9w Running 116m
ocp-demo-t82dn-worker-cwntr Running 116m
ocp-demo-t82dn-worker-dcvch Provisioning 17s
ocp-demo-t82dn-worker-dffnb Provisioning 17s
ocp-demo-t82dn-worker-m5jzm Provisioning 17s
ocp-demo-t82dn-worker-mh8vh Provisioning 17s
ocp-demo-t82dn-worker-ndvlr Running 116m
ocp-demo-t82dn-worker-vz4r4 Provisioning 17s
If we return to the Prism console, we see these VMs are being created on the Nutanix platform.
After eight minutes, the workload starts to terminate and the load on the cluster reduces. We can now see that the machine autoscaler will begin to delete the unnecessary Machines from the MachineSet.
$ oc get machines -n openshift-machine-api
NAME PHASE TYPE REGION ZONE AGE
ocp-demo-t82dn-master-0 Running 130m
ocp-demo-t82dn-master-1 Running 130m
ocp-demo-t82dn-master-2 Running 128m
ocp-demo-t82dn-worker-7zgzb Deleting 10m7s
ocp-demo-t82dn-worker-bpl9w Running 126m
ocp-demo-t82dn-worker-cwntr Running 126m
ocp-demo-t82dn-worker-dcvch Running 10m7s
ocp-demo-t82dn-worker-dffnb Running 10m7s
ocp-demo-t82dn-worker-m5jzm Deleting 10m7s
ocp-demo-t82dn-worker-ndvlr Running 126m
ocp-demo-t82dn-worker-vz4r4 Running 10m7s
Additionally, the same is visible on the OpenShift console and we see the VM deletions on Nutanix from Prism.
Conclusion
With OpenShift 4.11, deploying and managing an enterprise Kubernetes solution on Nutanix has become easier. The IPI installer’s integration allows for automated provisioning of the OpenShift cluster by employing the native Nutanix APIs. Operators gain back a lot of time and effort not only on day 0 work, but on day 2 capabilities. By dynamically provisioning the underlying Nutanix infrastructure, the OpenShift cluster can automatically scale up and down to accommodate the changing needs of modern applications.