Nutanix Benefit 1: Dynamically Distributed Storage

Nutanix.dev - Nutanix Benefit 1 Dynamically Distributed Storage

Table of Contents

View all current content in this series and make sure you don’t miss upcoming installments: Nutanix Top 10 Benefits Series.

We released an infographic along with AOS 6.5 LTS showing the Nutanix Cloud Platform (NCP) is built to meet the demands of your business-critical applications and databases. In this series of blogs we’ll lay out the technical background of each claim so you can see the benefits of the Nutanix architecture for your critical applications.

In this first entry we’ll focus on dynamically distributed storage and what that means for high performance applications, resilience, and scale.

Dynamically Distributed Storage

One of the most important and challenging architecture decisions made at Nutanix was to build a real-time fully-distributed data path for AOS, the Nutanix storage operating system. While static data placement, typical of conventional RAID used in scale-up disk subsystems, would have been an easier approach, a fully distributed model is required to accommodate network, component, and node failure conditions that are typical in a scale-out model.  

Only a fully distributed model can deliver the performance that truly matters, namely consistent low-latency performance and fast, parallel rebuilds during failure conditions.  This architectural design point was top of mind for Nutanix founding engineers who had worked on distributed systems pioneered by software-defined, scale-out hyperscalers like Google and Amazon.

AOS introduced the concept of a Controller Virtual Machine (CVM) that runs distributed services on each node. AOS real-time intelligence is divided and spread across the scale-out cluster into these services so no one entity presents a single point of failure and any node can assume leadership of any service as required.

The above shows only 4 of the many distributed and optimized services that make up AOS. For a detailed breakdown of all of the distributed services, refer to the Nutanix Bible. Next, let’s take a look at how data placement decisions are made inside a node in AOS.

Data Placement for Performance Optimization

AOS always writes dynamically to the best disk based on a real-time assessment of performance, usage and health of disks across the cluster. There is no static binding done when a vdisk is created or the first time a write is done. The storage layer in AOS is divided into two separate regions, oplog and extent store. AOS characterizes writes coming into the system and appropriately places them. The Nutanix Bible describes the I/O path below in detail.

Oplog is like a filesystem journal. The job of the oplog is to quickly persist and replicate bursty random writes. The oplog is dynamically provisioned among the highest performing disks in the storage node and only a section of the highest performance disks in every node forms the oplog. Oplogs from all nodes co-ordinate replication of writes amongst themselves to achieve data availability. An important thing to remember here is oplog is not a caching layer in the system. Writes to oplog are persistent.

Extent store is bulk storage for AOS that spans all disk tiers and it receives writes either directly (sustained random or sequential) or writes that are coalesced and sequentially drained from oplog.

Depending on how incoming data is characterized, AOS dynamically assigns data to the most appropriate storage devices for the best performance and capacity. Additionally, algorithms in AOS provide further optimization by moving data between disk tiers as needed depending on access patterns of applications. Performance is optimized by spreading data dynamically across multiple devices without creating a bottleneck on a single device.

Availability Domains

Availability domains determine where the data components are placed. Nutanix, like most cloud systems, creates multiple copies of data across the cluster. To address replication cost, administrators can also select  distributed parity with erasure coding to provide data redundancy. 

By default, AOS provides disk and node awareness which means data copies are placed such that no 2 or 3 copies of the same data are placed on the same disk and node. Additionally, Nutanix can also provide block and rack awareness which ensures data copies are not placed on the same block and rack providing protection against block and rack failures. This distribution of data is dynamically done by AOS without manual setup or administrator intervention. For more details on availability domains refer to the Nutanix Bible.

Why Does This Matter?

The Nutanix founding engineers made the hard decision to build a dynamic, intelligent, and resilient architecture. The result is that HCI administrators don’t have to estimate and manually configure static decisions, like dedicating caching disks exclusively for performance. With AOS they can build an enterprise platform that dynamically scales. Most importantly, this architecture delivers consistent low latency during failure conditions which is why Nutanix is uniquely qualified to deliver the performance and availability required for performance-sensitive business-critical applications.

There are no manual disk groups or dedicated cache drives to manage with AOS so administrators can automate provisioning of resources as required and immediately start using them. AOS also allows administrators to mix all-flash and hybrid nodes in the same cluster for a balance of performance and large capacity. By automatically characterizing incoming I/O and placing it in the appropriate tier, AOS ensures optimal performance for your applications. As application access to data changes, AOS automatically manages and moves data accordingly to the right tier ensuring consistent performance. In addition to ensuring performance needs are met, data is also placed intelligently in proper availability domains in a manner that delivers data redundancy and availability for applications.

In the next blog, we will dig deeper into how dynamic distributed storage is achieved with automated app-aware data management. You’ll learn how Nutanix AOS easily and automatically handles failures and provides self-healing while also driving consistent performance for reads and writes.

© 2024 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product, feature and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. Other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s). This post may contain links to external websites that are not part of Nutanix.com. Nutanix does not control these sites and disclaims all responsibility for the content or accuracy of any external site. Our decision to link to an external site should not be considered an endorsement of any content on such a site. Certain information contained in this post may relate to or be based on studies, publications, surveys and other data obtained from third-party sources and our own internal estimates and research. While we believe these third-party studies, publications, surveys and other data are reliable as of the date of this post, they have not independently verified, and we make no representation as to the adequacy, fairness, accuracy, or completeness of any information obtained from third-party sources.