Tips and Best Practices
This section discusses the various advanced transport mechanisms, and other backup issues.
VDDK 5.0 contained two new VixDiskLib calls (PrepareForAccess and EndAccess) to disable and enable Storage vMotion during backup. This prevents stale disk images from being left behind if a virtual machine has its storage moved while a backup is taking place. VMware strongly recommends use of these calls.
When an ESXi host is managed by vCenter Server, vSphere API calls cannot contact the host directly: they must go through vCenter. If necessary, especially during disaster recovery, the administrator must disassociate the ESXi host from vCenter Server before the host can be contacted directly.
Advanced transports allow programs to transfer data in the most efficient manner. SAN transport is available only when the physical-machine host has SAN access. HotAdd works for the appliance model, where backup is done from inside virtual machines. HotAdd requires the virtual machine datastore to be accessible from the backup appliance. NBDSSL is a secure fallback when over-the-network backup is your only choice.
SAN transport is supported only on physical machines, and HotAdd transport is supported only on virtual machines. SAN requires a physical proxy to share a LUN with the ESXi host where a datastore resides, enabling direct access to raw data, and bypassing the host altogether for I/O operations. HotAdd involves attaching a virtual disk to the backup proxy just like attaching the disk to a virtual machine.
Best Practices for SAN Transport
For array-based storage, SAN transport is often the best performing choice for backups when running on a physical proxy. It is disabled inside virtual machines, so use SCSI HotAdd instead on a virtual proxy.
SAN transport is not always the best choice for restores. It offers the best performance on thick disks, but the worst performance on thin disks, because of round trips through two disk manager APIs, AllocateBlock and ClearLazyZero. For thin disk restore, NBDSSL is usually faster. Changed Block Tracking (CBT) must be disabled for SAN restores. SAN transport does not support writing to redo logs (child disks including linked clones and snapshots), only to base disks. SAN transport is not supported on VVol datastores.
Before vSphere 5.5, when writing to SAN during restore, disk size had to be a multiple of the underlying VMFS block size, otherwise the write to the last fraction of a disk would fail. For example, if virtual disk had a 1MB block size and the datastore was 16.3MB large, the last 0.3MB did not get written, unless the restore software added 0.7MB of zeroes to complete the block. This was fixed in the ESXi 5.5 release.
Programs that open a local virtual disk in SAN mode might be able to read (if the disk is empty) but writing will throw an error. Even if programs call VixDiskLib_ConnextEx() with NULL parameter to accept the default transport mode, SAN is selected as the preferred mode if SAN storage is connected to the ESXi host. VixDiskLib should, but does not, check SAN accessibility on open. With local disk, programs must explicitly request NBD or NBDSSL mode.
For a Windows Server 2008 and later proxy, set SAN policy to onlineAll. Set SAN disk to read-only except for restore. You can use the diskpart utility to clear the read-only flag. SAN policy varies by Windows Server 2008 edition. For Enterprise and Datacenter editions, the default Windows SAN policy is offline, which is unnecessary when vSphere mediates SAN storage.
Best Practices for HotAdd Transport
Deploy the proxy on VMFS-5 volumes, or on VMFS-3 volumes capable of large block size (see About the HotAdd Proxy) so that the proxy can back up very large virtual disks. HotAdd is a SCSI feature and does not work for IDE disks. The paravirtual SCSI controller (PVSCSI) is not supported for HotAdd; use the LSI controller instead.
A redo log is created for HotAdded disks, on the same datastore as the base disks. Do not remove the target virtual machine (the one being backed up) while HotAdded disk is still attached. If removed, HotAdd fails to properly clean up redo logs so virtual disks must be removed manually from the backup appliance. Also, do not remove the snapshot until after cleanup. Removing it could result in an unconsolidated redo log.
Removing all disks on a controller with the vSphere Client also removes the controller. You might want to include some checks in your code to detect this in your appliance, and reconfigure to add controllers back in.
HotAdded disks should be released with VixDiskLib_Cleanup() before snapshot delete. Cleanup might cause improper removal of the change tracking (ctk) file. You can fix it by power cycling the virtual machine.
Virtual disk created on Windows by HotAdd backup or restore may have a different disk signature than the original virtual disk. The workaround is to reread or rewrite the first disk sector in NBD mode. Customers running a Windows Server 2008 or later proxy should set SAN policy to onlineAll (see SAN section above).
The HotAdd implementation assumes that proxy and target VMs are on the same datastore and accessible from the same connection, that is, the same vCenter Server. This is so VADP can obtain a list of all disks on the target VM from a connection. In vCloud environments where two vCenter Servers share a single datastore, causing VADP to make two connections, there is no mechanism for informing one connection of the disks available on the other connection. VMware needs to redesign HotAdd to support multiple connections.
Best Practices for NBDSSL Transport
Various versions of ESXi have different defaults for timeouts. Before ESXi 5.0 there were no default network file copy (NFC) timeouts. Default NFC timeout values may change in future releases.
VMware recommends that you specify default NFC timeouts in the VixDiskLib configuration file. If you do not specify a timeout, older versions of ESXi hold the corresponding disk open indefinitely, until vpxa or hostd is restarted. However if you do specify a timeout, you might need to perform some “keepalive” operation to prevent the disk from being closed on the server side. Reading block 0 periodically is a good keepalive operation. As a starting point, recommended settings are three minutes for Accept and Request, one minute for Read, ten minutes for Write, and no timeouts (0) for nfcFssrvr and nfcFssrvrWrite.
General Backup and Restore
With SSL certificate checking in vSphere 5.1 and after, DNS services must be configured in the backup proxy, otherwise SSL_Verify will fail with the “no host found” error.
For incremental backup of virtual disk, always enable changed block tracking (CBT) before the first snapshot. When doing full restores of virtual disk, disable CBT for the duration of the restore. File-based restores affect change tracking, but disabling CBT is optional for partial restore (file level restore), except with SAN transport. CBT must be disabled for SAN writes because of thin-provisioning and clear-lazy-zero operations.
Backup software should ignore independent disks (those not capable of snapshots). These virtual disks are unsuitable for backup. They throw an error if a snapshot is attempted on them.
To back up thick disk, the proxy's datastore must have at least as much free space as the maximum configured disk size for the backed-up virtual machine. Thick disk takes up all its allocated size in the datastore. To save space, you can choose thin-provisioned disk, which consumes only the space actually containing data.
If you do a full backup of lazy-zeroed thick disk with CBT disabled, the software reads all sectors, converting data in empty (lazy-zero) sectors to actual zeros. Upon restore, this full backup data will produce eager-zeroed thick disk. This is one reason why VMware recommends enabling CBT before the first snapshot.
When backing up virtual disk residing on NFS datastores, do not enable CBT, which is a VMFS only feature. If you enable CBT with NFS, thin-provisioned virtual disk may be turned thick upon restore. CBT does not work with virtual machines residing on NFS. QueryChangedDiskAreas("*") returns an error when run against NFS version 3 and earlier.
Backup and Restore of Thin-Provisioned Disk
Thin-provisioned virtual disk is created on first write. So the first-time write to thin-provisioned disk involves extra overhead compared to thick disk, whether using NBD, NBDSSL, or HotAdd. This is due to block allocation overhead, not VDDK advanced transports. However once thin disk has been created, performance is similar to thick disk, as discussed in the Performance Study of VMware vStorage Thin Provisioning.
When applications perform random I/O or write to previously unallocated areas of thin-provisioned disk, subsequent backups can be larger than expected, even with CBT enabled. In some cases, disk defragmentation might help reduce the size of backups.
Virtual Machine Configuration
Do not make verbatim copies of configuration files, which can change. For example, entries in the .vmx file point to the snapshot, not the base disk. The .vmx file contains virtual-machine specific information about current disks, and attempting to restore this information could fail. Instead use PropertyCollector and keep a record of the ConfigInfo structure.
About Changed Block Tracking
QueryChangedDiskAreas("*") returns information about areas of a virtual disk that are in use (allocated). The current implementation depends on VMFS properties, similar to properties that SAN transport mode uses to locate data on a SCSI LUN. Both rely on unallocated areas (file holes) in virtual disk, and the LazyZero designation for VMFS blocks. Thus, changed block tracking yields meaningful results only on VMFS. On other storage types, it either fails, or returns a single extent covering the entire disk.
You should enable changed block tracking in the order recommended by Enabling Changed Block Tracking. The first time you call QueryChangedDiskAreas("*"), it should return allocated areas of virtual disk. Subsequent calls return changed areas, instead of allocated areas. If you call QueryChangedDiskAreas after a snapshot but before you enable changed block tracking, it also returns unallocated areas of virtual disk. With thin-provisioned virtual disk this could be a large amount of zero data.
The guest operating system has no visibility of changed block tracking. Once a virtual machine has written to a block on virtual disk, the block is considered in use. The information required for the "*" query is computed when changed block tracking is enabled, and the .ctk file is pre-filled with allocated blocks. The mechanism cannot report changes made to virtual disk before changed block tracking was enabled.
HotAdd and SCSI Controller IDs
When using HotAdd backup, always add SCSI controllers to virtual machines in numeric order.
Most systems lack an interface to report which SCSI controller is assigned to which bus ID. HotAdd assumes that the unique ID for a SCSI controller corresponds to its bus ID. This assumption could be false. For instance, if the first SCSI controller on a VM is assigned to bus ID 0, but you add a SCSI controller and assign it to bus ID 3, HotAdd transport may fail because it expects unique ID 1. To avoid problems, when adding SCSI controllers to a VM, the bus assignment for the controller must be the next available bus number in sequence.
Also note that VMware implicitly adds a SCSI controller to a VM if a bus:disk assignment for a newly created virtual disk refers to a controller that does not yet exist. For instance, if disks 0:0 and 0:1 are already in place, adding disk 1:0 is fine, but adding disk 3:0 breaks the bus ID sequence, implicitly creating out-of-sequence SCSI controller 3. To avoid HotAdd problems, you should add virtual disks in numeric sequence.
To deal with more disks than can fit on a single controller, you must add some permanent dummy disks to the proxy VM, one on each additional controller that might be needed. Adding only the controller does not cause the controller to remain attached to the proxy VM. A real VMDK must be added on the controller to keep it attached to the proxy VM.