Best Practices for NBD Transport

NBD (network block device) is the most universal of VDDK transport modes. It does not require dedicated backup proxy VMs as does HotAdd, and works on all datastore types, not just SAN. Sections below give tips for improving NFC (network file copy) performance for NBD backups.

Parallel jobs on one NFC server: vSphere hosts have two NFC servers: one in hostd and the other in vpxa. For connections to vCenter, VDDK as an NFC client connects to the NFC server in vpxa. For connections to ESXi hosts, VDDK connects to the NFC server in hostd.

If programs connect directly to ESXi hosts, the NFC server memory limit in hostd can be increased from default 48MB by editing the /etc/vmware/hostd/config.xml file. If programs connect through vCenter, the NFC memory limit in vpxa is not configurable.

If connecting through vCenter Server, VMware recommends backing up 50 or fewer disks in parallel on a host. The NFC server cannot handle too many requests at the same time. It will queue requests in the list until previous requests have completed.

Dedicated backup network: As of vSphere 7.0, ESXi hosts support a dedicated network for NBD transport. When the tag vSphereBackupNFC is applied to a VMkernel adapter's NIC type, NBD backup traffic goes through the chosen virtual NIC. Programmers can apply the tag by making the following vSphere API call:
HostVirtualNicManager->SelectVnicForNicType(nicType,device);
Customers can use an ESXi command like this, which designates interface vmk2 for NBD backup:
esxcli network ip interface tag add -t vSphereBackupNFC -i vmk2

Network I/O Control (NIOC) for NFC backup: When NIOC is enabled in the virtual distributed switch (VDS or DVS), switch traffic is divided into various predefined network resource pools, now including one dedicated to vSphere Backup NFC. The API enumeration for this network resource pool is VADP_NIOConBackupNfc. System administrators can set this up in the vSphere Client with System Traffic > Configure > Edit, then optionally change resource settings. Thereafter any VADP NBD traffic is shaped by these VDS settings. NIOC may be used together with the dedicated backup network feature above, but this is not a requirement.

VDDK 7.0.1 introduced two new error codes, VIX_E_HOST_SERVER_SHUTDOWN and VIX_E_HOST_SERVER_NOT_AVAILABLE, to indicate entering maintenance mode (EMM) and in maintenance mode. After VixDiskLib_ConnectEx to vCenter, if the backup application calls VixDiskLib_Open for a virtual disk on an EMM host, vCenter switches to a different host if possible. Host switch is non-disruptive; backup continues. If it's too late for host switch, vCenter returns the SHUTDOWN code, saying the backup application should retry after a short delay, hoping for host switch. If no other hosts are available and the original host is in maintenance mode, vCenter returns NOT_AVAILABLE. The backup application may choose to wait, or fail the backup.

Error Code Retry Comment
VIX_E_HOST_NETWORK_CONN_REFUSED Frequently Usually caused by network error.
VIX_E_HOST_SERVER_SHUTDOWN Soon, 3 times Host will enter maintenance mode (EMM).
VIX_E_HOST_SERVER_NOT_AVAILABLE After waiting? Host is in maintenance mode (post EMM).

Host switch to avoid EMM could fail if encryption keys are not shared among hosts.

NFC compress flags: In vSphere 6.5 and later, NBD performance can be significantly improved using data compression. Three types are available (zlib, fastlz, and skipz) specified as flags when opening virtual disks with the VixDiskLib_Open() call. Data layout may impact the performance of these different algorithms.

  • VIXDISKLIB_FLAG_OPEN_COMPRESSION_ZLIB – zlib compression
  • VIXDISKLIB_FLAG_OPEN_COMPRESSION_FASTLZ – fastlz compression
  • VIXDISKLIB_FLAG_OPEN_COMPRESSION_SKIPZ – skipz compression

Asynchronous I/O: In vSphere 6.7 and later, asynchronous I/O for NBD transport mode is available. It can greatly improve data transfer speed of NBD transport mode. To implement asynchronous I/O for NBD, use the new functions VixDiskLib_ReadAsync() and VixDiskLib_WriteAsync() with callback VixDiskLib_Wait() to wait for all asynchronous operations to complete. In the development kit, see vixDiskLibSample.cpp for code examples, following the logic for -readasyncbench and -writeasyncbench options.

Many factors impact write performance. Network latency is not necessarily a significant factor. Here are test results showing improvements with VDDK 6.7:

  • stream read over 10 Gbps network with asynchronous I/O, speed of NBD is ~210 MBps
  • stream read over 10 Gbps network with block I/O, speed of NBD is ~160 MBps
  • stream write over 10 Gbps network with asynchronous I/O, speed of NBD is ~70 MBps
  • stream write over 10 Gbps network with block I/O, speed of NBD is ~60 MBps

I/O buffer improvements: In the vSphere 7.0 release, changed block tracking (CBT) has adaptable block size and configurable VMkernel memory limits for higher performance. To benefit, no software changes are required. Adaptable block size is up to four times more space efficient.

In vSphere 6.7 and later, VDDK splits read and write buffers into 64KB chunks. Changing the buffer size on the VDDK side does not lead to different memory consumption results on the NFC server side.

In vSphere 6.5 and earlier, the larger the buffer size on the VDDK side, the more memory was consumed on the NFC server side. With buffer size set to 1MB, VMware recommended backing up no more than 20 disks in parallel on an ESXi host. For a 2MB I/O buffer, no more than 10 disks, and so on.

Session limits and vCenter session reuse. In vSphere 6.5 and later, programs can reuse a vCenter Server session to avoid connection overflow. For details see "Reuse a vCenter Server Session" in chapter 4.

Network bandwidth considerations: VMware suggests that NBD backups should be done on a network with bandwidth of 10 Gbps or higher. Operations such as VM cloning or offline migration will also consume memory in the NFC server. Users must try to arrange their backup window to avoid conflict.

Log analysis for performance issues: The VDDK sample code can be run to assist with I/O performance analysis. In the configuration file, set the NFC log level to its highest value vixDiskLib.nfc.LogLevel=4. There is no need to set log level in the server for NFC asynchronous I/O. Then run sample code and investigate vddk.log and the vpxa log to assess performance.