Simplifying Access to Geo-Distributed Object Data using Global Namespaces

Nutanix.dev-SimplifyingAccesstoGeo-DistributedObje

Table of Contents

Introduction

Federation is an exciting and important new feature that was added to Nutanix Objects in the 4.0 release (Spring 2023). It enables a global namespace to be created across multiple Nutanix object stores, even if they are thousands of miles apart in entirely different geographic locations. Buckets hosted by these different object stores then appear to exist within a single object store, offering a consolidated view of the data. Organizations with object data spread across multiple edge and core sites across the world that are looking to simplify and consolidate access to their distributed datasets can thus benefit greatly from this feature. While much of Federation’s value lies in how it dramatically simplifies data access in geo-distributed environments, it’s also well suited to scenarios where there is simply a requirement for extreme scalability – a single Federation can store hundreds of petabytes of data across scores of object stores, all under one S3 namespace. This blog explores the Federation feature, digging into how it works, how it’s managed and where it proves most useful. Note that the terms federated namespace and global namespace are used interchangeably.

Federation Architecture

Let’s start by understanding what Federation looks like from an architectural standpoint. The makeup of a Federation is visualized nicely in Prism Central (see figure 1). 

A diagram of a cloud

Description automatically generated with medium confidence
Figure 1: Prism view of Federation components – members and core members

During the process of creating a Federation (which is when you’d see figure 1) you add members to it. These are simply existing object stores that contribute resources to the federated namespace, adding to its overall capacity and performance capabilities. A subset of these members is then selected to serve as core members – core members are just regular members with extra responsibilities. Between them the core members maintain and manage the Federation namespace. The services that run inside the core members are:

  1. Federation Metadata Service – responsible for tracking which buckets are hosted on which object stores (members) within the Federation. They are also responsible for maintaining consensus within the Federation.
  2. Federation Controller – a “traffic cop” type service responsible for routing client requests to the correct member. This service communicates with the Federation metadata service to understand what lives where (e.g. chi-bkt01 is hosted on oss-chicago). It also directly handles create update delete (CRUD) operations for Federation buckets thus eliminating the possibility of conflicts arising within the namespace. For example, you can’t have two identically named buckets in the same namespace.

It’s possible to have just 1 core member, although 3 or more core members are recommended for fault tolerance. The maximum allowed core members is 5; this provides the highest level of fault tolerance for the Federation i.e. the namespace remains available even if 2 core members go offline.

Something you may have noticed in the Prism Central (PC) visualization in the figure 1 example is the fact that three members sit under one PC while two members sit under another. Given that PCs are designed to manage Nutanix resources only in their own region, it’s likely that in a widely geo-distributed environment different members will be managed by different PCs. That’s not a problem for Federation but a key thing to remember is that, for Federation to work, trust needs to be established between these PCs. This is achieved by pairing the Availability Zones that the PCs represent (see figure 2). After this has been done the IAM user access keys can then be replicated between PCs and you’re good to go.

A screenshot of a login screen

Description automatically generated with medium confidence
Figure 2: Establishing trust across availability zones (Prism Centrals in different regions)

Viewing Utilization and Scaling

Prism Central provides a Federated Namespaces page that conveniently shows the utilization across all members of the Federation in terms of number of buckets deployed and capacity consumed within those buckets (see figure 3). Note that the numbers reported are specific to the Federation in question, so utilization metrics pertaining to members’ local namespaces and any other Federations they may belong to (discussed a little later) are filtered out. 

A screenshot of a computer

Description automatically generated with medium confidence
Figure 3: Viewing utilization within a Federation

From this same view it’s possible to add more members to or remove existing members from a Federation. If removing a member, you must first remove any Federation buckets that member is hosting. Once that’s done removing a member, like adding a member, is quick and easy.

How Client Requests Are Routed

Let’s examine the sequence of events when an S3 client issues a GET, PUT or HEAD (metadata) request against a Federation bucket. Note that the client remains connected to the same Federation member throughout the entire process.

  1. The client connects to a member (let’s call this the ‘client-connected member’) and issues a GET, PUT or HEAD (metadata) request
  2. Upon receipt of the request, the client-connected member checks its cache to see if it has handled a request for this bucket before. If it has, steps 3 and 4 below are bypassed
  3. If the bucket location is not in the client-connected member’s cache, the client-connected member asks a core member which member is hosting the bucket
  4. In the core member, the Federation controller service asks the Federation metadata service which member is hosting the bucket. Once it has this information, the Federation controller returns it to the client-connected member
  5. The client-connected member knows which Federation member is hosting the bucket (this data has been cached) so it relays the client request directly to that member
  6. The requested operation is performed by the member hosting the bucket, and the data (or simply the success code in the case of a PUT) is returned to the client-connected member
  7. The client-connected member relays the data and/or success code back to the client

Things are a little different when a client issues a bucket create, update or delete (CRUD) request. Here’s what happens with a create bucket request for example:

  1. The client connects to a member (‘client-connected member’) and issues a request to create a bucket named “bucket01”
  2. Upon receipt of the request, the client-connected member asks the core member to create the bucket in the federated namespace and place it on the client-connected member (i.e., itself)
  3. In the core member, the Federation controller service asks the Federation metadata service if a bucket with the name “bucket01” already exists in the namespace. If it does, the request is failed and the failure code is passed back to the client via the client-connected member. If a bucket with this name does not already exist, the Federation controller:
    1. instructs the client-connected member to create “bucket01”
    2. instructs the Federation metadata service to update the Federation metadata with the name and location of “bucket01”
  4. The core member’s Federation controller informs the client-connected member that creation of “bucket01” has completed in full (with Federation metadata also successfully updated)
  5. The client-connected member informs the client that “bucket01” has been successfully created in the federated namespace

Managing Namespaces

What impact do federated namespaces have on member’s own local namespaces? As ever, this is best explained using an example. In figure 4 below we have 5 object stores all participating in a Federation together. An S3 client looking into the namespace sees 6 buckets listed. The 6 buckets, as you can see in the diagram, are spread out across 5 different object stores in 5 different locations, but to the client it looks like all of those buckets belong to the same, single object store.

A picture containing diagram

Description automatically generated
Figure 4: Consolidated view of distributed buckets

To be clear, when an object store joins a Federation, its existing data (if there is any) remains completely unaffected. The federated namespace that the object store is now a member of sits alongside its own local namespace (and any other Federation namespaces the object store may be a member of) and all namespaces are managed completely independently. This distinction is clear in Prism Central, as shown in figure 5. Looking at the bucket listings for the individual object store, we see three tabs. The first tab represents the object store’s local namespace, followed by a tab for each Federation the object store is a member of – two in the example shown but in fact an object store can be a member of up to 32 different Federations at the same time. 

Figure 5: Prism view of the namespaces associated with an object store

Under the local namespace tab you’re presented with a list of all the buckets in the local namespace. When a Federation namespace tab is selected, however, you see a list of whichever buckets in that Federation are hosted on this particular object store (figure 6) – what you do not see are other buckets in the same Federation that happen to be hosted on other object stores. 

A screenshot of a computer

Description automatically generated
Figure 6: List of Federation buckets hosted on an individual Federation member

To obtain that consolidated view you can use Objects Browser. This can be launched either using the link provided in Prism Central, or by entering a special URL into your browser (more on that further below). For those not familiar with Objects Browser, it’s a browser based S3 client that also provides a healthy amount of self-service object store management – its name really doesn’t do it justice! Objects Browser is included in every Nutanix Objects deployment. With the 4.0 release, Objects Browser is Federation-aware meaning it can provide a complete listing of all buckets within a federated namespace (see figure 7), assuming of course the authenticated IAM user has the appropriate access permissions to all the buckets. With the right permissions the authenticated user can perform puts and gets to any Federation bucket, create or delete buckets in the global namespace and manage lifecycle policies associated with Federation buckets.

A screenshot of a bucket list

Description automatically generated
Figure 7: Objects browser provides a consolidated view of all buckets in a Federation

You may be aware that a Nutanix object store’s local namespace can be accessed using the following Objects Browser URL format: https://<object store name>/objectsbrowser. While that URL will take you to a list of buckets in the object store’s local namespace, the following URL can be used to access a Federation namespace via a specific member (let’s say the one the user knows is closest to them): https://<object store name>/objectsbrowser?namespace=<federation name>. Any buckets created using Objects Browser will be hosted on the member specified at the beginning of the URL.

In fact, any S3 client can access all buckets in a federated namespace regardless of the individual bucket locations. The difference with Objects Browser is that (a) you choose the member through which you connect to the federated namespace and (b) it’s a Nutanix UI.

Objects Replication: Fault Tolerance and Migration

In many (if not most) Federation environments using replication makes a lot of sense, be it to provide fault tolerance for buckets within the federated namespace, or to migrate data between namespaces (see figure 8). Streaming replication relationships are easily set up, and there’s a lot of flexibility with the options that are available. Replication can be set up, in either direction, between;

  • buckets in the same federated namespace (fault tolerance use case)
  • buckets in different federated namespaces (data migration use case)
  • a bucket in a local namespace and a bucket in a federated namespace (data migration use case)
A picture containing screenshot, text, diagram, design

Description automatically generated
Figure 8: Replication use cases for federated namespaces

It’s worth noting also that the source and destination buckets can be on the same object store, useful for migration use cases. Figure 9 below shows the different namespace types presented by Prism Central that can be set up as replication targets.

A screenshot of a computer

Description automatically generated with low confidence
Figure 9: Objects namespace types to which replication can be setup

When it comes to using replication to achieve fault tolerance within the namespace, a Global Server Load Balancer (GSLB) can help greatly in terms of detecting the loss of an object store and seamlessly redirecting client requests to the replication target bucket (essentially the failover). 

Client Access Locality

A GSLB is also useful for enforcing client access locality in a geo-distributed Federation, in other words ensuring that clients connect to their nearest member. This helps provide the best possible client experience when it comes to performance. Let’s explore that a little more… One of the things required to make Federation work is the addition of all participating object stores’ public IP addresses to DNS under the Federation’s name. However, if participating object stores are geo-distributed, clients could end up connected to an object store that is remote to them. Accesses will work, but they might be routed across long distances, even for the buckets that are hosted locally – this of course adds unwelcome latency. The introduction of a GSLB avoids this situation by using techniques such as network latency detection to ensure that clients are directed to their nearest object store when they connect to the federated namespace. 

In fact there is another way of ensuring access locality, one that does not involve a GSLB. The problem scenario described above assumes the use of global DNS, however if multi-site DNS has been configured in the environment, local object store IP addresses could be set up for local clients. This would ensure that clients’ DNS lookups of the Federation namespace always resolve to a member local to them. 

With both these solutions all clients can still access all data in the Federation (remote hosted bucket accesses will be internally routed by the local object store, as described earlier), but accessing local data does not require a long-distance trip.

Continuing with the locality theme, it’s also worth noting that placement of newly created buckets can be controlled by creating the bucket in Prism Central (admins only), or via Objects Browser, as discussed earlier. When it comes to creating buckets with any other S3 client, GSLB or multi-DNS are the only ways to ensure that the bucket is created locally.

Note: To avoid any confusion, the GSLBs we are talking about here are very different to the load balancers we automatically include in each Nutanix Objects deployment. Our integrated load balancers are essentially local traffic managers, discrete to a single object store, whereas a GSLB works across multiple object stores.

Multi-Cluster and Federation: What’s the Difference? 

If you already have experience with Nutanix Objects, you may be wondering how Federation differs from our existing multi-cluster feature. After all, multi-cluster too allows an Objects namespace to span multiple physical clusters, so what’s the difference? In fact, multi-cluster and Federation are different in terms of the problems they set out to solve and in how they actually work. Some of the key differences are; 

  • The resources being scaled are different. Multi-cluster adds physical AOS clusters to the namespace whereas Federation adds object store instances
  • Multi-cluster scales storage capacity only, whereas Federation scales storage capacity and performance (i.e. the Objects services themselves are also scaled)
  • Multi-cluster supports up to 5 AOS clusters in the same namespace, whereas Federation supports up to 128 object stores in a single global namespace
  • With multi-cluster all clusters must be in the same data centre whereas with Federation the object store members can be in entirely different locations
  • With multi-cluster each bucket spans all the participating clusters, whereas with Federation buckets are discretely hosted on members, i.e. a bucket will not spill over another member. In a geo-distributed environment this means performance will be consistent within each bucket.

Ultimately multi-cluster was designed for simple capacity scaling, whereas Federation simplifies access to data distributed across multiple different locations. It also provides scaling on a different level altogether.

Wrapping Up

You can expect to see more capabilities added to Federation over time, things that will make life even simpler in complex geo-distributed environments and further expand the use cases Federation is able to address. Stay tuned!

© 2024 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product, feature and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. Other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s). This post may contain links to external websites that are not part of Nutanix.com. Nutanix does not control these sites and disclaims all responsibility for the content or accuracy of any external site. Our decision to link to an external site should not be considered an endorsement of any content on such a site. Certain information contained in this post may relate to or be based on studies, publications, surveys and other data obtained from third-party sources and our own internal estimates and research. While we believe these third-party studies, publications, surveys and other data are reliable as of the date of this post, they have not independently verified, and we make no representation as to the adequacy, fairness, accuracy, or completeness of any information obtained from third-party sources.