Reducing consumption and improving sustainability: Part 2, Take only what you need

Nutanix.dev-7StrategiesPart2

Table of Contents

It’s become pretty much standard IT practice to “over provision” resources for many reasons, such as redundancy and projected (and unexpected) increased demands (growth). Or maybe just because it’s easier to buy a lot more stuff than you expect to need rather than spend time accurately gathering requirements and reviewing sizing options, especially when demand can be a moving feast that’s hard to predict. What’s more, changing or adding resources after they are deployed may be viewed as a lot of extra effort (and bureaucracy!) and that swapping people’s time / sanity for the sake of buying some extra hardware upfront can be a pretty good deal. 

So deploying way more resources than a system is ever likely to use has often been the de-facto method of capacity planning. Additionally, with different teams responsible for an entire IT stack (from hardware, through to software, to service) the different layers all tend to add their own “fat” to resource requests which can have a cumulative effect resulting in environments being even more overprovisioned than anyone would have sensibly intended.

Furthermore, procurement departments and their interactions with IT generally involve long drawn out processes, checks and double checks that aren’t really set up for the efficient, incremental addition of capacity. As such IT teams have become accustomed to avoiding the buying process by revisiting it as infrequently as possible and trying to buy as much as they think they can get away with when they can’t avoid it.

Add to this ever changing application demands and general “VM sprawl” then you can see how so much potential efficiency can be gained from right sizing and optimizing IT resources, as well as ensuring that the consumers of IT take ownership for what they have requested.  

You have the power… and the responsibility

For getting the capacity right for new environments, or adding to existing, the right tools can really help.  Nutanix has its Collector™ and Sizer™ tools to gather requirements and help accurately fit them to a Nutanix® solution, across a range of hardware platforms, including those in the public cloud. For existing environments, there are also the capacity runway planning features of the Nutanix Cloud Manager™ (NCM), which uses machine learning, “what if” scenario planning and trend analysis to accurately predict future requirements (to be covered in more detail later in this blog series), be they adding new environments or deprecating existing workloads.

That doesn’t mean that humans are not in some way required and they are, of course, ultimately responsible. Has anyone handed over their procurement entirely to AI yet?  It’s doubtful anyone  ever will. So let’s discuss some of the factors in the decision making process and the choices folks can make.

Risks vs. Frugality

There is, of course, a balance to be struck.  A lack of capacity can mean poor performance, missing SLAs, additional “fire drill” effort or, in the worst case scenario, an outage stopping an organization’s ability to function altogether. This can lead to loss of revenue, loss of reputation, additional costs and, not least, employee stress, among other ill effects. So, as a rule, it is not desirable to sacrifice availability or performance in the short term for long term sustainability objectives. After all, if a business closes due to a major outage, then clearly the sustainability goals have not been achieved.  And similarly, if IT staff are stressed and overworked then that could also adversely impact an organization’s long term outlook. So in the risk vs. savings stakes, adding undue risk for the sake of saving a few kWh of energy or a few pieces of hardware does not generally add up to a wise business decision.

But through awareness, enablement, a change in mindset and the smart use of technology, risk can be mitigated and progress can be made to more intelligent and ultimately more sustainable use of resources. Helping our customers make better decisions, plan more strategically and manage their IT resources more effectively is very much what Nutanix is all about. As is giving organizations the different options to choose from.

Choice: The great enabler

One of Nutanix’s key value propositions for a long time has been choice, in particular choice of hardware or platform. Key to enabling that choice is Nutanix Sizer’s ability to present solutions across a range of hardware, cloud or service provider platforms so that requirement scenarios can be easily analyzed to estimate their use of the available resources.  Sizer also allows users to select different hardware components, whether those are disks (HDD, SSD and/or NVMe), NIC or CPU (Intel or AMD models) and build solutions to match their requirements as closely as possible, even if the solution involves nodes with different hardware configurations (i.e. heterogeneous clusters).  Whether those requirements are based on performance, capacity, supply chain or power consumption, Sizer helps Nutanix customers consider a wide range of options and build a solution without the need to hugely overprovision, reducing waste and unnecessary expense.

Example of a solution sized in Nutanix NX hardware in Sizer

This means that a Nutanix solution can be tightly aligned to the storage, compute, feature requirements but it also lets customers choose the hardware that suits their sustainability objectives, whether they are focussed on power consumption, data center footprint or embodied emissions.

Requirements Right

Of course, optimal solution sizing can only be achieved if you can gather the right requirements. Collector is a tool provided by Nutanix to analyze the metrics of customers’ virtualized environments as well as some NAS and database technologies.  It captures both historical and point in time metrics as well as configurations to ensure that those proposing a Nutanix solution can simply and accurately gather metrics, review them and deliver them into Sizer as requirements.

Example of requirements presented in Nutanix Collector

To provide a level of assurance that these tools are being used optimally and best practices followed, Nutanix offers practitioners the Sizing Associate badge and an associated learning plan to partners and service providers. This reinforces practitioners’ understanding of Nutanix core architecture, teaches the proper methods for analyzing discovery data and how to create sizing scenarios.

Incremental Growth

Key to only taking what you need is being able to plan capacity effectively enough for incremental growth. i.e.  ideally, only ordering new hardware in time for it to arrive and be deployed just before you need it (or keeping a small standby pool – see later in this article).  This approach not only reduces the amount of hardware unnecessarily consuming electricity, but also reduces the embodied emissions associated with the manufacturing of the physical hardware.  It also turns requirement gathering, capacity planning and demand optimization from irregular fire drills or infrequent ad-hoc programmes into regular, almost day-to-day processes that are kept up to date within normal operational processes.  That is perhaps easier said than done, but through the smart use of technologies like the Nutanix Cloud Manager™ capacity runway feature and the AIOps features of the Nutanix Prism® management console. 

Further risk can be mitigated by having the ability to provision on demand capacity like the Nutanix Cloud Clusters™ (NC2) solution on the AWS® or Azure® public clouds.  This enables IT administrators to quickly and simply spin up Nutanix environments in public cloud or on a range of partner bare metal services, potentially in as little as an hour.  Not only can this provide additional resources very quickly to deal with immediate capacity issues, but it also has a number of other benefits that we’ll discuss later in this blog series.  Even if there is no immediate plan to use one of these services, either for on demand capacity or other use cases, it’s still a good idea to put in place some of the necessary processes, training and even run a PoC.  Planning ahead with NC2 like this could mitigate a range of sustainability and other business risks including those associated with availability, supply chain resilience or climate risk by facilitating the ability to quickly spin up a new site, all for a relatively modest investment of time and money.  

Of course, in order for incremental growth to be fully realized, an organization doesn’t need to just take care of the technological aspects of its resource addition process to mitigate risk. It also needs to review its procurement, financial and even legal processes to ensure that when capacity is required, the ordering and deployment isn’t unduly onerous or time consuming.  As discussed earlier in this blog, one of the reasons that IT departments sometimes avoid or stall procurement is that it is often seen as troublesome, non-productive or excessively bureaucratic, “non-IT work”.  Time spent improving the process is likely time well spent and that’s something that public cloud providers have been incredibly good at, at least on the face of it.  

Under the covers there can still often be a lot of friction and additional effort when it comes to optimisation and efficiency efforts in the public cloud. Although in some ways you can consume in smaller chunks, those chunks are still boxed into t-shirt sizes and come with constraints such as fixed CPU contention, restricted storage IO and limited bandwidth meaning that you can’t always deploy a resource to the exact configuration that suits your requirement with the same flexibility that you might have in an on-premise virtualized environment (or a less rigid approach to cloud like NC2).  There are also the complexities of reserved instances, upfront commitments, private offers, market place offers  and the continuing efforts of FinOps teams to try to prevent overprovisioning and overspend. So although public cloud has definitely changed the procurement process for IT, it has not necessarily “fixed” everything.

But just like with the use of public cloud, a change in mindset is required for incremental growth and cloud like operations in the data center to be achieved. And that change in mindset must extend across other supporting functions, including finance, FinOps, legal, compliance, facilities, hosting providers and whoever else is involved in the procurement and deployment of new capacity.

The right tool for the job

It also helps considerably if the right tools are available to easily combine information on upcoming requirements, current and  future utilization trends, and easily communicate those to other departments. Two great features of the Nutanix Cloud Platform™ (NCP) capacity runway capability that really help with this are:  

  1. “What if?” Scenario planning with the built in recommendation engine.
  2. Capacity Planning Report export (Generate PDF).

These features are natively part of NCP and greatly facilitate an IT department’s ability to plan and communicate with other departments, by helping them gather requirement specifications and then demonstrate the required resources to the procurement chain.

Screenshot of Capacity Runway trend analysis in Prism Ops

We’ll go into these in more detail later in the blog series but the following video gives an excellent overview and the AI Ops and Automation Test Drive takes you through a capacity planning exercise step by step.

There is also the trend analysis pictured above that can be used to create alerts or trigger actions (e.g. a ITSM ticket) via the X-Play™ functionality of NCP.  By using these features effectively you can take the guesswork (and spreadsheets!) out of your IT department’s capacity planning and easily reassure your procurement team and other stakeholders that you are ordering the correct hardware resources whilst also mitigating the risk of a capacity shortfall.

Example capacity planning report
Example capacity planning report

Avoid dual running and forklift upgrades

Although it might be tempting to maximize the size of each purchase so as to better negotiate a bulk discount, when you also include dual running periods required for forklift upgrades (that result in long periods of running both new and old hardware), the case for incremental scaling becomes even clearer.  Traditionally, in a 3-tier environment, the storage array has to be deployed, commissioned, integrated and presented to the hosts it supports before migrations can start.  This can introduce a long period of dual running whereby both the old and the new equipment must necessarily both be running at the same time. This may have a doubly negative effect from a sustainability and cost point of view as not only are you running two platforms side by side, but additionally, the early deployment of a new platform to enable a migration, effectively reduces the asset value lifespan of the platform. 

By having a platform that scales incrementally and can add new capacity in a matter of minutes, like the Nutanix Cloud Infrastructure™ (NCI) HCI solution, organizations can avoid these dual running periods.  For example, it typically takes less than an hour to add a new node to a Nutanix cluster.  Once the new node is cabled and ready to be powered on, it’s a very simple process requiring only a few clicks and then waiting for the new storage and compute to come online.  Capacities are updated with the newly added resources and workloads are balanced across the new capacity.  Any old nodes that are End of Life (EoL) and require removal from the cluster can then easily be “ejected”. Once the VMs they are running and the data they are storing is moved to the other nodes in the cluster, the ejected node is free to be powered down and removed from the data center within a matter of hours.  By this method, sometimes called “grandfathering”, extensive dual running periods are avoided and the hardware actually sees almost its entire supported life span.  With some legacy migration projects taking six months or even longer, over a 5-year hardware lifespan, that’s 10% run time that can be saved by avoiding dual running periods.  

The below diagram represents an oversimplified, but hopefully useful representation of an environment scaling over a period of time and demonstrates the potential benefits of fractional consumption and non forklift upgrades.

In this diagram each coloured cell represents a node, with the different colors (purple, green, peach) showing each purchasing “wave”, even if it’s incremental. The grey areas show the nodes that would be purchased if a “5-year, all upfront” purchase approach is taken, in essence showing what can be saved. 

In this example, where we are arbitrarily growing the environment by one node per year (so starting with around 15% year on year growth and gradually decreasing).

But it probably deserves a post all on its own because there’s many factors to discuss including how hardware improves, adding in the separate storage arrays, how seasonal changes in demand might affect how future requirements and changes in demand are satisfied.

Leave it off!

For organizations that need to provision a good deal of spare capacity for potential demand spikes or for those that just feel that the “just in time” ordering of hardware doesn’t suit them, incremental scaling might seem like an approach that doesn’t fit the risk profile of their business or use case.

But if the addition of capacity is a quick and simple operation, like with adding capacity to a nutanix cluster, keeping hardware ready, but turned off, can be a great way to mitigate these perceived risks, whilst still making savings on energy consumption.  Alternatively, an organization could use Nutanix Cloud Clusters (NC2) as burst capacity for spikes or to mitigate against extended hardware lead times or other unexpected events and we’ll explore this later in this blog series.  

But for now, consider that when deploying a new Nutanix environment, you may wish to initially deploy all the nodes for validation purposes, but then scale down the cluster to a base three or four nodes whilst the wider configuration / testing takes place and migrations start. Then you can simply scale the cluster up as demand increases and only use what you actually need.

In Conclusion

There are many tools available to help with capacity planning and that simplify the process, but the key to effectively taking advantage of these tools is people, and not even necessarily the IT teams.  Aligning procurement, finance and other stakeholders and educating them on the tools you are using, the approaches you are taking and how you intend to mitigate risk will be key to streamlining your organizations use of resources.

To this end Nutanix’s tools and features help you to:

  1. Gather the right requirements – e.g. by using Nutanix Collector
  2. Size solutions optimally – e.g. by using Nutanix Sizer
  3. Operationalise capacity planning and adding resource – e.g. by using NCM capacity runway
  4. Automatically adjusting VM resources – e.g.  by using NCM X-play and auto-pilot

But it’s up to people to start using the tools!  The good news is that once capacity planning has been operationalised, it should mean that it’s a relatively trivial process that can also save a lot of money in terms of new technology purchases.  So it can be well worth the effort!

By working together and making use of these tools teams can better set an organization’s strategy to only consuming what they need, both in terms of energy demand and the hardware resources then purchase.  

© 2023 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product, feature and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. Other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s). This post may contain links to external websites that are not part of Nutanix.com. Nutanix does not control these sites and disclaims all responsibility for the content or accuracy of any external site. Our decision to link to an external site should not be considered an endorsement of any content on such a site. Certain information contained in this post may relate to or be based on studies, publications, surveys and other data obtained from third-party sources and our own internal estimates and research. While we believe these third-party studies, publications, surveys and other data are reliable as of the date of this post, they have not independently verified, and we make no representation as to the adequacy, fairness, accuracy, or completeness of any information obtained from third-party sources.

This post may contain express and implied forward-looking statements, which are not historical facts and are instead based on our current expectations, estimates and beliefs. The accuracy of such statements involves risks and uncertainties and depends upon future events, including those that may be beyond our control, and actual results may differ materially and adversely from those anticipated or implied by such statements. Any forward-looking statements included herein speak only as of the date hereof and, except as required by law, we assume no obligation to update or otherwise revise any of such forward-looking statements to reflect subsequent events or circumstances

© 2024 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product, feature and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. Other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s). This post may contain links to external websites that are not part of Nutanix.com. Nutanix does not control these sites and disclaims all responsibility for the content or accuracy of any external site. Our decision to link to an external site should not be considered an endorsement of any content on such a site. Certain information contained in this post may relate to or be based on studies, publications, surveys and other data obtained from third-party sources and our own internal estimates and research. While we believe these third-party studies, publications, surveys and other data are reliable as of the date of this post, they have not independently verified, and we make no representation as to the adequacy, fairness, accuracy, or completeness of any information obtained from third-party sources.