vSphere High Availability (HA) | Clusters and High Availability (2024)

This chapter is from the book 

This chapter is from the book

VCP-DCV for vSphere 7.x (Exam 2V0-21.20) Official Cert Guide, 4th Edition

Learn More Buy

This chapter is from the book

This chapter is from the book 

VCP-DCV for vSphere 7.x (Exam 2V0-21.20) Official Cert Guide, 4th Edition

Learn More Buy

vSphere High Availability (HA)

vSphere HA is a cluster service that provides high availability for the virtual machines running in the cluster. You can enable vSphere High Availability (HA) on a vSphere cluster to provide rapid recovery from outages and cost-effective high availability for applications running in virtual machines. vSphere HA provides application availability in the following ways:

  • It protects against server failure by restarting the virtual machines on other hosts in the cluster when a host failure is detected, as illustrated in Figure 4-2.

    FIGURE 4-2 vSphere HA Host Failover

  • It protects against application failure by continuously monitoring a virtual machine and resetting it if a failure is detected.

  • It protects against datastore accessibility failures by restarting affected virtual machines on other hosts that still have access to their datastores.

  • It protects virtual machines against network isolation by restarting them if their host becomes isolated on the management or vSAN network. This protection is provided even if the network has become partitioned.

Benefits of vSphere HA over traditional failover solutions include the following:

  • Minimal configuration

  • Reduced hardware cost

  • Increased application availability

  • DRS and vMotion integration

vSphere HA can detect the following types of host issues:

  • Failure: A host stops functioning.

  • Isolation: A host cannot communicate with any other hosts in the cluster.

  • Partition: A host loses network connectivity with the primary host.

When you enable vSphere HA on a cluster, the cluster elects one of the hosts to act as the primary host. The primary host communicates with vCenter Server to report cluster health. It monitors the state of all protected virtual machines and secondary hosts. It uses network and datastore heartbeating to detect failed hosts, isolation, and network partitions. vSphere HA takes appropriate actions to respond to host failures, host isolation, and network partitions. For host failures, the typical reaction is to restart the failed virtual machines on surviving hosts in the cluster. If a network partition occurs, a primary host is elected in each partition. If a specific host is isolated, vSphere HA takes the predefined host isolation action, which may be to shut down or power down the host’s virtual machines. If the primary host fails, the surviving hosts elect a new primary host. You can configure vSphere to monitor and respond to virtual machine failures, such as guest OS failures, by monitoring heartbeats from VMware Tools.

Note

Although vCenter Server is required to implement vSphere HA, the health of an HA cluster is not dependent on vCenter Server. If vCenter Server fails, vSphere HA still functions. If vCenter Server is offline when a host fails, vSphere HA can fail over the affected virtual machines.

vSphere HA Requirements

When planning a vSphere HA cluster, you need to address the following requirements:

vSphere High Availability (HA) | Clusters and High Availability (3)

  • The cluster must have at least two hosts, licensed for vSphere HA.

  • Hosts must use static IP addresses or guarantee that IP addresses assigned by DHCP persist across host reboots.

  • Each host must have at least one—and preferably two—management networks in common.

  • To ensure that virtual machines can run any host in the cluster, the hosts must access the networks and datastores.

  • To use VM Monitoring, you need to install VMware Tools in each virtual machine.

  • IPv4 or IPv6 can be used.

Note

The Virtual Machine Startup and Shutdown (automatic startup) feature is disabled and unsupported for all virtual machines residing in a vSphere HA cluster.

vSphere HA Response to Failures

You can configure how a vSphere HA cluster should respond to different types of failures, as described in Table 4-7.

vSphere High Availability (HA) | Clusters and High Availability (4)

Table 4-7 vSphere HA Response to Failure Settings

Option

Description

Host Failure Response > Failure Response

If Enabled, the cluster responds to host failures by restarting virtual machines. If Disabled, host monitoring is turned off, and the cluster does not respond to host failures.

Host Failure Response > Default VM Restart Priority

You can indicate the order in which virtual machines are restarted when the host fails (higher priority machines first).

Host Failure Response > VM Restart Priority Condition

This condition must be met before HA restarts the next priority group.

Response for Host Isolation

You can indicate the action that you want to occur if a host becomes isolated. You can choose Disabled, Shutdown and Restart VMs, or Power Off and Restart VMs.

VM Monitoring

You can indicate the sensitivity (Low, High, or Custom) with which vSphere HA responds to lost VMware Tools heartbeats.

Application Monitoring

You can indicate the sensitivity (Low, High, or Custom) with which vSphere HA responds to lost application heartbeats.

Note

If multiple hosts fail, the virtual machines on the failed host migrate first in order of priority, and then the virtual machines from the next host.

Heartbeats

The primary host and secondary hosts exchange network heartbeats every second. When the primary host stops receiving these heartbeats from a secondary host, it checks for ping responses or the presence of datastore heartbeats from the secondary host. If the primary host does not receive a response after checking for a secondary host’s network heartbeat, ping, or datastore heartbeats, it declares that the secondary host has failed. If the primary host detects datastore heartbeats for a secondary host but no network heartbeats or ping responses, it assumes that the secondary host is isolated or in a network partition.

If any host is running but no longer observes network heartbeats, it attempts to ping the set of cluster isolation addresses. If those pings also fail, the host declares itself to be isolated from the network.

vSphere HA Admission Control

vSphere uses admission control when you power on a virtual machine. It checks the amount of unreserved compute resources and determines whether it can guarantee that any reservation configured for the virtual machine is configured. If so, it allows the virtual machine to power on. Otherwise, it generates an “Insufficient Resources” warning.

vSphere HA Admission Control is a setting that you can use to specify whether virtual machines can be started if they violate availability constraints. The cluster reserves resources so that failover can occur for all running virtual machines on the specified number of hosts. When you configure vSphere HA admission control, you can set options described in Table 4-8.

Table 4-8 vSphere HA Admission Control Options

Option

Description

Host Failures Cluster Tolerates

Specifies the maximum number of host failures for which the cluster guarantees failover

Define Host Failover Capacity By set to Cluster Resource Percentage

Specifies the percentage of the cluster’s compute resources to reserve as spare capacity to support failovers

Define Host Failover Capacity By set to Slot Policy (powered-on VMs)

Specifies a slot size policy that covers all powered-on VMs

Define Host Failover Capacity By set to Dedicated Failover Hosts

Specifies the designated hosts to use for failover actions

Define Host Failover Capacity By set to Disabled

Disables admission control

Performance Degradation VMs Tolerate

Specifies the percentage of performance degradation the VMs in a cluster are allowed to tolerate during a failure

If you disable vSphere HA admission control, then you enable the cluster to allow virtual machines to power on regardless of whether they violate availability constraints. In the event of a host failover, you may discover that vSphere HA cannot start some virtual machines.

In vSphere 6.5, the default Admission Control setting is Cluster Resource Percentage, which reserves a percentage of the total available CPU and memory resources in the cluster. For simplicity, the percentage is calculated automatically by defining the number of host failures to tolerate (FTT). The percentage is dynamically changed as hosts are added to or removed from the cluster. Another new enhancement is the Performance Degradation VMs Tolerate setting, which controls the amount of performance reduction that is tolerated after a failure. A value of 0% indicates that no performance degradation is tolerated.

With the Slot Policy option, vSphere HA admission control ensures that a specified number of hosts can fail, leaving sufficient resources in the cluster to accommodate the failover of the impacted virtual machines. Using the Slot Policy option, when you perform certain operations, such as powering on a virtual machine, vSphere HA applies admission control in the following manner:

  • Step 1. HA calculates the slot size, which is a logical representation of memory and CPU resources. By default, it is sized to satisfy the requirements for any powered-on virtual machine in the cluster. For example, it is sized to accommodate the virtual machine with the greatest CPU reservation and the virtual machine with the greatest memory reservation.

  • Step 2. HA determines how many slots each host in the cluster can hold.

  • Step 3. HA determines the current failover capacity of the cluster, which is the number of hosts that can fail and still leave enough slots to satisfy all the powered-on virtual machines.

  • Step 4. HA determines whether the current failover capacity is less than the configured failover capacity (provided by the user).

  • Step 5. If the current failover capacity is less than the configured failover capacity, admission control disallows the operation.

If a cluster has a few virtual machines that have much larger reservations than the others, they will distort slot size calculation. To remediate this, you can specify an upper bound for the CPU or memory component of the slot size by using advanced options. You can also set a specific slot size (CPU size and memory size). The next section describes the advanced options that affect the slot size.

vSphere HA Advanced Options

You can set vSphere HA advanced options by using the vSphere Client or in the fdm.cfg file on the hosts. Table 4-9 provides some of the advanced vSphere HA options.

Table 4-9 Advanced vSphere HA Options

Option

Description

das.isolationaddressX

Provides the addresses to use to test for host isolation when no heartbeats are received from other hosts in the cluster. If this option is not specified (which is the default setting), the management network default gateway is used to test for isolation. To specify multiple addresses, you can set das.isolationaddressX, where X is a number between 0 and 9.

das.usedefaultisolationaddress

Specifies whether to use the default gateway IP address for isolation tests.

das.isolationshutdowntimeout

For scenarios where the host’s isolation response is to shut down, specifies the period of time that the virtual machine is permitted to shut down before the system powers it off.

das.slotmeminmb

Defines the maximum bound on the memory slot size.

das.slotcpuinmhz

Defines the maximum bound on the CPU slot size.

das.vmmemoryminmb

Defines the default memory resource value assigned to a virtual machine whose memory reservation is not specified or is zero. This is used for the Host Failures Cluster Tolerates admission control policy.

das.vmcpuminmhz

Defines the default CPU resource value assigned to a virtual machine whose CPU reservation is not specified or is zero. This is used for the Host Failures Cluster Tolerates admission control policy. If no value is specified, the default of 32 MHz is used.

das.heartbeatdsperhost

Specifies the number of heartbeat datastores required per host. The default is 2. The acceptable values are 2 to 5.

das.config.fdm.isolationPolicyDelaySec

Specifies the number of seconds the system delays before executing the isolation policy after determining that a host is isolated. The minimum is 30. A lower value results in a 30-second delay.

das.respectvmvmantiaffinityrules

Determines whether vSphere HA should enforce VM–VM anti-affinity rules even when DRS is not enabled.

Virtual Machine Settings

To use the Host Isolation Response Shutdown and Restart VMs setting, you must install VMware Tools on the virtual machine. If a guest OS fails to shut down in 300 seconds (or a value specified by das.isolationshutdowntimeout), the virtual machine is powered off.

You can override the cluster’s settings for Restart Priority and Isolation Response for each virtual machine. For example, you might want to prioritize virtual machines providing infrastructure services such as DNS or DHCP.

At the cluster level, you can create dependencies between groups of virtual machines. You can create VM groups, host groups, and dependency rules between the groups. In the rules, you can specify that one VM group cannot be restarted if another specific VM group is started.

VM Component Protection (VMCP)

Virtual Machine Component Protection (VMCP) is a vSphere HA feature that can detect datastore accessibility issues and provide remediation for affected virtual machines. When a failure occurs such that a host can no longer access the storage path for a specific datastore, vSphere HA can respond by taking actions such as creating event alarms or restarting a virtual machine on other hosts. The main requirements are that vSphere HA is enabled in the cluster and that ESX 6.0 or later is used on all hosts in the cluster.

The failures VMCP detects are permanent device loss (PDL) and all paths down (APD). PDL is an unrecoverable loss of accessibility to the storage device that cannot be fixed without powering down the virtual machines. APD is a transient accessibility loss or other issue that is recoverable.

For PDL and APD failures, you can set VMCP to either issue event alerts or to power off and restart virtual machines. For APD failures only, you can additionally control the restart policy for virtual machines by setting it to Conservative or Aggressive. With the Conservative setting, the virtual machine is powered off only if HA determines that it can be restarted on another host. With the Aggressive setting, HA powers off the virtual machine regardless of the state of other hosts.

Virtual Machine and Application Monitoring

VM Monitoring restarts specific virtual machines if their VMware Tools heartbeats are not received within a specified time. Likewise, Application Monitoring can restart a virtual machine if the heartbeats from a specific application in the virtual machine are not received. If you enable these features, you can configure the monitoring settings to control the failure interval and reset period. Table 4-10 lists these settings.

Table 4-10 VM Monitoring Settings

Setting

Failure Interval

Reset Period

High

30 seconds

1 hour

Medium

60 seconds

24 hours

Low

120 seconds

7 days

The Maximum per-VM resets setting can be used to configure the maximum number of times vSphere HA attempts to restart a specific failing virtual machine within the reset period.

vSphere HA Best Practices

You should provide network path redundancy between cluster nodes. To do so, you can use NIC teaming for the virtual switch. You can also create a second management network connection, using a separate virtual switch.

When performing disruptive network maintenance operations on the network used by clustered ESXi hosts, you should suspend the Host Monitoring feature to ensure that vSphere HA does not falsely detect network isolation or host failures. You can reenable host monitoring after completing the work.

To keep vSphere HA agent traffic on the specified network, you should ensure that the VMkernel virtual network adapters used for HA heartbeats (enabled for management traffic) do not share the same subnet as VMkernel adapters used for vMotion and other purposes.

Use the das.isolationaddressX advanced option to add an isolation address for each management network.

Proactive HA

Proactive High Availability (Proactive HA) integrates with select hardware partners to detect degraded components and evacuate VMs from affected vSphere hosts before an incident causes a service interruption. Hardware partners offer a vCenter Server plug-in to provide the health status of the system memory, local storage, power supplies, cooling fans, and network adapters. As hardware components become degraded, Proactive HA determines which hosts are at risk and places them into either Quarantine Mode or Maintenance Mode. When a host enters Maintenance Mode, DRS evacuates its virtual machines to healthy hosts, and the host is not used to run virtual machines. When a host enters Quarantine Mode, DRS leaves the current virtual machines running on the host but avoids placing or migrating virtual machines to the host. If you prefer that Proactive HA simply make evacuation recommendations rather than automatic migrations, you can set Automation Level to Manual.

The vendor-provided health providers read sensor data in the server and provide the health state to vCenter Server. The health states are Healthy, Moderate Degradation, Severe Degradation, and Unknown.

vSphere High Availability (HA) | Clusters and High Availability (2024)
Top Articles
How to Withdraw From the Job Interview Process - NerdWallet
Money making guide/Mining iron ore (free-to-play)
Penn Foster 1098 T Form
Evansville Craigslist Com
7 Star Movie Download
Yanina Molina Tv
Christine Paduch Howell Nj
Interview with Valeria Golino, member of the Feature Films Jury - Festival de Cannes
Pleads Irksomely Crossword Clue
Lubbock Avalanche Journal Newspaper Obituaries
Breckie Hill Fapello
Ms. Ortiz Sells Tomatoes Wholesale
Https E22 Ultipro Com Login Aspx
Huffington Horoscope Cancer
Craigslist Parker Az
LIVE UPDATES: South Shore Week 3 high school football scores and highlights
Complaints about 563-214-#### | ReportedCalls
How To Breed A Loot Dragon In Dragonvale
Www. Craigslist. Com
Azuna Air Freshener Reviews
Financial organizations College Road
Evil Dead Rise Showtimes Near Regal Sawgrass & Imax
Shoreone Insurance A.m. Best Rating
Tallahassee Forecast 10 Day
kohahealth.patientwallet.com - Patientco | Pay Your Bill
Mail Healthcare Uiowa
Best Jumpshot
Tomorrow Tithi In Usa
Winterset Rants And Raves
Panty Note 33
Sunnyside Kaiser Pharmacy Hours
Does Nutrisystem Take Ebt
Senior Tax Analyst Vs Master Tax Advisor
John Philip Sousa and the Culture of Reassurance | Articles and Essays | The March King: John Philip Sousa | Digital Collections | Library of Congress
Youtube To Mp3 Snapsave
E 77 · Baureihe 177 · bay. EG3 · preuß. EG 701–25 Fotos
Cnme Patient Portal
Dora Saves Fairytale Land/Transcript
Froedtert Billing Phone Number
Compress PDF - quick, online, free
Craigslist Free Stuff Chula Vista
Quantum Break's Story and Ending Explained
How To Add Friends On Regal App
Pg Thomasson Funeral Services Obituaries
Game 76 Fnf
Excel Module 4 Sam End Of Module Project 2
Box Csun
Gulfstream Park Entries And Results
Wild Fork Foods Login
O’Fallon, Illinois | Build Your Life and Family Here
Does Destiny Bond Work On Tera Raids
Half Sleeve Hood Forearm Tattoos
Latest Posts
Article information

Author: Reed Wilderman

Last Updated:

Views: 5785

Rating: 4.1 / 5 (72 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Reed Wilderman

Birthday: 1992-06-14

Address: 998 Estell Village, Lake Oscarberg, SD 48713-6877

Phone: +21813267449721

Job: Technology Engineer

Hobby: Swimming, Do it yourself, Beekeeping, Lapidary, Cosplaying, Hiking, Graffiti

Introduction: My name is Reed Wilderman, I am a faithful, bright, lucky, adventurous, lively, rich, vast person who loves writing and wants to share my knowledge and understanding with you.