Configuring Remediation Settings

You can control the behavior of the ESXi hosts and virtual machines during the remediation process. You can create a global remediation policy that applies to all clusters in a vCenter Server instance. You can also set a remediation policy to a specific cluster.

When you run cluster compliance checks, the Coordinator module runs a series of checks on each host to determine their state and whether some additional actions must be taken to ensure the success of the remediation operation. In case one or more hosts in the cluster are evaluated as non-compliant, additional checks are run on those hosts to evaluate whether they must be rebooted or put into maintenance mode. Currently, VMware provides a set of behavior controls (remediation policies) regarding the virtual machines and the hosts in a cluster. This set of remediation policies might change with the next vSphere release.

How Remediation Policies Overrides Work

The vSphere Lifecycle Manager provides a default global policy configuration that must be applied on each cluster during remediation. Through the vSphere Automation APIs, you can change the global policies and create some cluster-specific policies. Before remediating a cluster, you can use the APIs to determine the effective global and cluster-specific remediation policies. The following graphic describes how the mechanism of policy overrides works.

Figure 1. How Remediation Policies Work

Remediation policies override depending on the effective global and cluster-specific policies that you define.

All clusters in a vCenter Server instance inherit the default or the overridden global policy settings unless the global policy is explicitly overridden on a cluster level.

Editing Global or Cluster-Specific Remediation Policies

To view the currently set global remediation policy, call the get(cluster_ID) method of the com.vmware.esx.settings.defaults.clusters.policies.Apply interface. You receive a com.vmware.esx.settings.defaults.clusters.policies.ApplyTypes.ConfiguredPolicySpec instance that contains the configuration settings of the global remediation policy. To edit a global remediation policy, call the set(policy_spec) method of the com.vmware.esx.settings.defaults.clusters.policies.Apply interface. Pass as an argument a com.vmware.esx.settings.defaults.clusters.policies.ApplyTypes.ConfiguredPolicySpec instance and define new values to the global policy settings. To view the effective global remediation policy settings for a cluster, call the get() method of the com.vmware.esx.settings.defaults.clusters.policies.apply.Effective interface. The method returns an EffectivePolicySpec instance that contains the effective global policies applicable for all clusters in your vCenter Server environment.

To view the cluster-specific remediation policies, call the get(cluster_ID) method of the com.vmware.esx.settings.clusters.policies.Apply interface. The method returns a com.vmware.esx.settings.clusters.policies.ApplyTypes.ConfiguredPolicySpec instance that contains the cluster-specific policies to be applied during remediation. To change the cluster-specific policy, call the set(cluster_ID,policy_spec) method of the Apply interface. Pass as argument a com.vmware.esx.settings.clusters.policies.ApplyTypes.ConfiguredPolicySpec instance and describe the cluster-specific remediation policies. To view the effective cluster-specific policies, call the get(cluster_ID) method of the com.vmware.esx.settings.clusters.policies.apply.Effective interface. The method returns an EffectivePolicySpec instance that describes the effective cluster-specific policies.

Remediation Policy Options

Use the com.vmware.esx.settings.defaults.clusters.policies.ApplyTypes.ConfiguredPolicySpec and com.vmware.esx.settings.clusters.policies.ApplyTypes.ConfiguredPolicySpec classes to describe a global or cluster-specific remediation policy. For the vSphere 7.0 release, VMware provides the following methods to configure a global or cluster-specific policy.
Method Description
setDisableDpm(disableDpm) Disable the VMware Distributed Power Management (DPM) feature for all clusters or for a specific cluster. DPM monitors the resource consumption of the virtual machines in a cluster. If the total available resource capacity of the hosts in a cluster is exceeded, DPM powers off ( or recommends powering off ) one or more hosts after migrating their virtual machines. When resources are considered underutilized and capacity is needed, DPM powers on (or recommends powering on) hosts. Virtual machines are migrated back to these hosts.

During the cluster remediation, the vSphere Lifecycle Manager cannot wake up and remediate hosts that are automatically put into a stand-by mode by DPM. These hosts stay non-compliant when DPM turns them on. The vSphere Distributed Resource Scheduler (DRS) is unable to migrate virtual machines to the hosts which are not remediated with the desired state for the cluster.

To disable DPM during the cluster remediation, call the setDisableDpm(disableDpm) method of the ConfiguredPolicySpec instance and pass as argument true. By default, the vSphere Lifecycle Manager temporarily disables DPM and turns on the hosts to complete the remediation. DPM is enabled again when the cluster remediation finishes.

setDisableHac(disableHac) Disable the vSphere HA admission control. vSphere HA uses admission control to ensure that a cluster has sufficient resources to guarantee the virtual machines recovery when a host fails. If vSphere HA admission control is enabled during remediation, putting a cluster into maintenance mode fails because vMotion cannot migrate virtual machines within the cluster for capacity reasons.

To allow the vSphere Lifecycle Manager to temporary disable vSphere HA admission control, call the setDisableHac(disableHac) method of the ConfiguredPolicySpec instance and pass as argument true. By default, the vSphere HA admission control is enabled because DRS should be able to detect issues with the admission control and disable it to allow the remediation to complete.

setEvacuateOfflineVms(evacuateOfflineVms) Migrate the suspended and powered off virtual machines from the hosts that must enter maintenance mode to other hosts in the cluster. To enable this remediation policy, call the setEvacuateOfflineVms(evacuateOfflineVms) method of the ConfiguredPolicySpec instance and pass as argument true. By default, this setting is disabled in the global remediation policy.
setFailureAction(failureAction) Specify what actions the vSphere Lifecycle Manager must take if a host fails to enter maintenance mode during the remediation. To configure this policy on a global or cluster-specific level, call the setFailureAction(failureAction) method of the ConfiguredPolicySpec instance. Pass as argument an ApplyTypes.FailureAction instance. You can set the number of times that the vSphere Lifecycle Manager tries to put a host into maintenance mode and the delay between the tries. When the threshold is reached and the host failed to enter maintenance mode, the cluster remediation fails. By default, the vSphere Lifecycle Manager tries to put a host into maintenance mode three times with a five minute delay between each try before the cluster remediation fails.
setPreRemediationPowerAction (preRemediationPowerAction) Specify how the power state of the virtual machines must change before the host enters maintenance mode. If DRS is not enabled on a cluster or the automation level of a DRS cluster is not set to fully automated, the Coordinator module fails to remediate the cluster if the remediation requires a reboot or maintenance mode. You can set a policy that powers off or suspends the virtual machines on hosts that must be rebooted or must enter maintenance mode during remediation. The DRS takes care of changing the power state of the virtual machines when the host enters and exits maintenance mode.
To set a policy for the power state of the virtual machines during remediation, call the setPreRemediationPowerAction (preRemediationPowerAction) method of the ConfiguredPolicySpec instance. You can pass as argument one of the following values:
  • PreRemediationPowerAction.DO_NOT_CHANGE_VMS_POWER_STATE -
  • PreRemediationPowerAction.POWER_OFF_VMS
  • PreRemediationPowerAction.SUSPEND_VMS
Pass as argument a PreRemediationPowerAction instance and define whether the power state of the virtual machines must remain unchanged, or they must be powered off, or suspended. By default, the Coordinator must leave the power state of the virtual machines unchanged.
setEnableQuickBoot(enableQuickBoot) Reduce the reboot time of an ESXi host by skipping all the hardware initialization processes and restarting only the hypervisor. This policy is applicable only if the host platform supports the Quick Boot feature.

To enable the Quick Boot feature on the hosts during remediation, call the setEnableQuickBoot(enableQuickBoot) method of the ConfiguredPolicySpec instance and pass as argument true. By default, this policy is disabled.