Run Attention on Nutanix

Nutanix.dev-RunAttentiononNutanix

Table of Contents

Summary

The goal of this article is to show how customers can land their next generation AI workloads on Nutanix Cloud Platform (NCP). As a first step, we show how one can implement the Attention mechanism from scratch on NCP. Attention is the core concept behind the cutting edge generative AI models and Large Language Models (LLMs). Hence we chose to demonstrate how the core or the kernel of the latest AI models can be easily deployed on Nutanix AHV. We go deep into the Attention mechanism to give our users insight into these models, while showing them the simplicity of our platform. 

Use Cases for Nutanix

Many enterprises need support for running advanced AI/ML models so that they can extract value out of stored data. Here are a few examples: 

  1. Visual perception e.g., optical character recognition, object detection
  2. Automated chatbot e.g., ChatGPT and numerous variants
  3. Speech recognition systems e.g., Siri, Alexa, Google assistant
  4. Retail e.g. Amazon Go

For  these models, Attention is used as a fundamental building block. Therefore, it is critical that our customers understand the basic mechanisms and also have the ability of running Attention from scratch on the Nutanix cluster.  The following figure shows how AI/ML is being integrated into the core Nutanix infrastructure layer. The Attention based Transformer models take a central role in this integration. We strive to ensure customers can build a wide array of AI apps such as NLP, vision, contact center, recommendation, etc. by enabling the Attention mechanism from scratch.  

AI/ML integration into the core Nutanix infrastructure layer

Introduction

ChatGPT has captivated human/societal attention and has accelerated the AI transformation of the Enterprise. It demonstrated the power of Attention mechanisms as it dispenses the need for recurrent neural networks (RNNs), long short-term memory (LSTM), and convolution neural networks (CNNs) in sequence modelling. Hence we start our Nutanix AI journey with a deep dive of the Attention mechanisms.

For a token in an input sequence, the attention mechanism is used to weigh the importance of different parts of the input sequence by assigning weights relative to the token. The Attention mechanism is  implemented using self-attention. In self-attention, the input sequence is transformed into three separate vectors:

  • Query (Q)
  • Key (K)
  • Value (V)

Query vector corresponds to the current token being processed, while Key and Value vectors correspond to all other tokens in the input sequence. The attention is expressed as a function of (Q, K, V):

In the following sections, we will implement the Attention mechanism from scratch. 

Setting up Nutanix Infrastructure for Attention

  1. Log into Prism Element
Nutanix Prism Element UI
  1. Set up the VM. We set up a VM with appropriate resource configurations. For this article, we have used the following VM spec:
    • 16 single core vCPUs
    • 64 GB of RAM
    • NVIDIA® A100 tensor core passthrough GPU with 80GB memory. The GPU is installed with the NVIDIA® RTX 15.0 driver for Ubuntu OS (NVIDIA-Linux-x86_64-525.60.13-grid.run).
    • Ubuntu 22.04 operating system
Virtual Machine creation and configuration

On running the nvidia-smi command, we get the following output. It shows the Driver version to be 525.60.13 and CUDA® version to be 12.0.

Output from running nvidia-smi

It also shows other GPU specifications as mentioned in the following table.

ItemValueDescription
GPU0GPU Index
Name NVIDIA A100GPU Name
Temp36 CCore GPU Temperature
PerfP0GPU Performance
Persistence-MOnPersistence Mode
Pwr: Usage/Cap65 W / 300 WGPU Power Usage and it capability
Bus Id00000000:00:06.0domain:bus:device.function
Disp. AOffDisplay Active
Memory-Usage44136MiB / 81920MiBMemory allocation out of total memory
Volatile Uncorr. ECC0Counter of uncorrectable ECC memory error
GPU-Util0%GPU Utilization
Compute M.DefaultCompute Mode
MIG M.DisabledMulti-Instance Mode
GPU specification details

Implementation of Attention Mechanism

  1. We take an input sentence, assign an index to each word based on alphabetical order, and compile a dictionary of {word: index}.
  1. Next, we use this dictionary to assign an integer index to each word:
  1. Now, using the integer-vector representation of the input sentence, we can use an embedding layer to encode the inputs into a real-vector embedding. Here, we will use a 16-dimensional embedding such that each input word is represented by a 16-dimensional vector. Since the sentence consists of 11 words, this will result in a 11 x 16 dimensional embedding. The choice of embedding dimension is completely arbitrary. 
  1. Next, we define three weight matrices for Q, K, and V. Self-attention utilizes three weight matrices, referred to as Wq, Wk, and Wv, which are adjusted as model parameters during training. These matrices serve to project the inputs into query, key, and value components of the sequence, respectively.
  1. Now suppose we are interested in computing the attention-vector for the second input element – the second input element acts as the query here. 
  1. Next, we compute K and V for the input sentence. 
  1. Finally, we compute the attention vector using the formulation proposed by Vaswani et al. 

Self-Attention Visualization

The chart below shows the visualization of the self-attention mechanism in action for the test sentence: “The animal didn\’t cross the street because it was too tired.” for the word “animal”. The word “animal” in the test sentence is most strongly related to itself as per the attention score. Other important words seem to be “was”, “tired”, “it”. These words capture the words with high importance or high information content with respect to the word “animal.” Also, it is to be noted that the attention model is untrained and the weight vectors are fully randomized. The attention scores will be more informative after a thorough training.

Self-attention mechanism in action with random weight vectors

Conclusion

Thus we have shown how we can leverage the Nutanix Cloud Platform to land the broad class of attention/transformer based AI models on very modest infrastructure. This is the heart of the current wave of generative AI, LLMs etc. NCP allows customers to achieve consistency across their entire infrastructure stack, from edge to the cloud. In addition, we believe that NCP can help customers with their ROI for intense AI workloads by a combination of class leading unified storage, better integrated management, operations and security, along with data management and governance. 

References

“Attention is all you need”, NIPS 2017 – Vaswani et al.

Appendix

Use Cases of Attention-based AI Models in the Industry

  • Visual perception e.g., optical character recognition, object detection.
  • Automated chatbot e.g., ChatGPT and numerous variants.
  • Speech recognition systems e.g., Siri, Alexa, Google assistant.
  • Automated decision-making systems e.g., AlphaGo.
  • Robotics e.g., retail packaging, manufacturing pipelines.
  • Predictive healthcare e.g., early disease recognition, AI driven protocols in cancer.
  • Drug discovery e.g., AlphaFold, high throughput docking.
  • Compliance e.g., named entity recognition, contract understanding.
  • Legal e.g., document summarization, compliance enforcement.
  • Manufacturing e.g., predictive maintenance tasks.
  • Transportation e.g., autonomous vehicle tasks.
  • Developer productivity e.g, code completion.
  • Retail e.g. reducing shrinkage, visual supply chain, fraud detection.

© 2024 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product, feature and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. Other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s). This post may contain links to external websites that are not part of Nutanix.com. Nutanix does not control these sites and disclaims all responsibility for the content or accuracy of any external site. Our decision to link to an external site should not be considered an endorsement of any content on such a site. Certain information contained in this post may relate to or be based on studies, publications, surveys and other data obtained from third-party sources and our own internal estimates and research. While we believe these third-party studies, publications, surveys and other data are reliable as of the date of this post, they have not independently verified, and we make no representation as to the adequacy, fairness, accuracy, or completeness of any information obtained from third-party sources.