Configuring HA & Live Migration in Acropolis

In Acropolis (AHV) managed clusters, you can enable high availability for the cluster to ensure that VMs can be migrated and restarted on another node in case of failure. If you have not modified high availability configuration from previous version of Acropolis base software releases, best effort VM availability is by default implemented in Acropolis.

Virtual Machine High Availability (VMHA)

VMHA ensures that critical VMs are restarted on another Acropolis Hypervisor (AHV) host in the cluster if a host fails.

Enabling High Availability for the Cluster

Login to Nutanix Prism.

In the gear icon pull-down list of the main menu select Manage VM High Availability.

There are two VM high availability modes:

Default - This does not require any configuration and is included by default when an Acropolis Hypervisor-based Nutanix cluster is installed. When an AHV host becomes unavailable, the failed VMs that were running on the failed AHV host restart on the remaining hosts, depending on the available resources. Not all of the failed VMs will restart if the remaining hosts do not have sufficient resources.

Review the settings Using aCLI ha.get command:

Guarantee - This non-default configuration reserves space to guarantee that all failed VMs will restart on other hosts in the AHV cluster during a host failure.

To enter Guarantee mode, select the Enable HA check box, seen below. A message will appear displaying the amount of memory reserved and how many AHV host failures can be tolerated.

Review the settings Using aCLI ha.get command:

VM Summary on the Prism Main Dashboard:
OK: This state implies that the cluster is protected against a host failure.

The Guarantee mode configuration will reserve resources to protect against:
  • One Acropolis host failure, if the cluster is configured for Nutanix Fault Tolerance 1 (Redundancy Factor 2).
  •  Up to two Acropolis host failures, if the Nutanix cluster is configured for Nutanix Fault Tolerance 2 (Redundancy Factor 3).
Using aCLI, the connnnand for designating the maximum number of AHVhost failures to be tolerated is: nutanix@cvm$ acli ha.update num_host_failures_to_tolerate=x

In Guarantee mode, one or more AHV hosts will be reserved for VMHA when the cluster consists of homogeneous nodes (i.e., where hosts are the same model and RAM size). Resources may be reserved across multiple hosts in chunks called segments if the cluster consists of heterogeneous nodes, with some nodes containing more RAM than others.

There are two different VMHA reservation types in Guarantee mode:
  • HAReserveHosts - One or more hosts are reserved for VMHA.
  • HAReserveSegments - Resources are reserved across multiple hosts.

The homogenous cluster reservation type can be changed using aCLI.
  • Set ReserveHosts
nutanix@cvm$ acli ha.update reservation_type=kAcropolisHAReserveHosts
  • Set ReserveSegments
nutanix@cvm$ acli ha.update reservation_type=kAcropolisHAReserveSegments

Heterogeneous clusters will always use the segment reservation type.

VMHA can be disabled per VM by setting a negative value (-1) when creating or updating a VM.

nutanix@CVM$ acli vm.update <VM Name> ha_priority=-l nutanix@CVM$ acli vm.create <VM Name> ha_priority=-l

When a failed AHV host comes back online after a VMHA event, VMs that were previously running on the host will be migrated back to the original host to maintain data locality.

Live Migration

Live migration lets you move a user VM from one Acropolis host to another while the VM is powered on. This feature follows similar resource rules as VMHA to determine if migration can occur—as long as enough RAM and CPU cycles are available on the target host, live migration will initiate.
Acropolis selects a target host automatically, but you can specify a target if required.

Live migration can be started with any of the following methods:
  • Put the Acropolis host in maintenance mode. (VM evacuation)
  • PRISM Ul (VM Page)

  • aCLI (automatic, targeted, or maintenance mode)
  • REST API (automatic, targeted, or maintenance mode)

CPU types are abstracted from the VMs because AHV uses virtual CPUs that are compatible between all nodes in the AHV cluster. As long as enough CPU cycles are available in the destination AHV host, a VM can be migrated.

By default, live migration will use as much available bandwidth as required over the Acropolis host management interface, brO and bondO. The bandwidth allocated for migration can be restricted via aCLI and the REST API using bandwidth_mbps=x during each migration.

The following aCLI example enforces a 100 mbps limit when migrating a VM, slow-lane-VMl, to Acropolis host

nutanix@CVM$ acli vm.migrate slow-lane-VM1 bandwidth_mbps=100 host= live=yes

The live option defines if the VM should remain powered on (live=yes) or be suspended (live=false) during the migration.

To prevent resource contention. Acropolis limits the number of simultaneous automatic live migrations events to two. Nutanix does not recommend any specific live migration method; use the live migration method suited to the current administrative task.

Note that user-initiated live migrations are not limited in simultaneous number, so consider the following to avoid resource contention from both a compute and network perspective.
  • Do not exceed two simultaneously initiated live migrations that share the same destination or source hosts.
  • Keep the total number of simultaneously initiated live migrations in an AHV cluster to a maximum of one per destination and source AHV host pair. However, one live migration per destination and source AHV host pair might be too much in large AHV clusters from a network performance perspective. For smaller AHV clusters it may be possible to run more than one live migration per destination and source AHV host pair, depending on available network capacity.


Best Practices for Acropolis Hypervisor

VM High Availability In Acropolis

No comments:

Post a Comment