Metrics Examples

Metrics Examples

The examples below use the default “text” format for readability. For actual programming, you will probably use one of the machine-parsable formats (XML, JSON, or YAML).

Fetch Available Statistics

The statistics command with no parameters fetches a list of available statistics:

$ vmware-toolbox-cmd stat raw

session

host

resources

vscsi scsi0:0

vnet 00:0c:29:1e:23:f3

■

session – the current session. This changes infrequently. Session changes can cause discontinuities.

■

host – information about the current hypervisor and hardware the virtual machine is running on.

■

resources – the CPU and memory usage of this virtual machine.

■

vscsi <name> – storage statistics for a specific virtual disk, where <name> is the internal name in the virtual machine’s configuration file. The library does not aggregate storage statistics across disks.

■

vnet <Ethernet Address> – network statistics for a specific virtual NIC, where <Ethernet Address> is the MAC address currently programmed into the virtual NIC. The library does not aggregate network statistics across NICs.

For information about a specific statistic, enter its name.

Session Information

$ vmware-toolbox-cmd stat raw text session

session = 4004861987670969122

uptime = 1036293956

version = VMware ESXi 6.0.0 build-12345

provider =

uuid.bios = 56 4d 2c 53 43 56 66 8e-7c 05 fd 7e 51 1e 23 f3

Sessions change for a virtual machine with power on, suspend, revert to snapshot, vMotion, reset, or when it experiences some type of hot-plug. Statistics might be discontinuous across a session change. The reason for a session change, or whether session change constitutes a host change, is not exposed to virtual machines.

■

session – a cryptographically strong random number indicating the current session. Expected to contain at least 63 bits of entropy. Changes with every session.

■

uptime – microseconds since the last session change, as measured by the host.

■

version – string representation of the hypervisor version. Not expected to be parsed. A guest OS should never change its behavior based on the hypervisor version or build number.

■

provider – string representing the provider. Set by the ExtraConfig tools.guestlib.stat.provider and intended for use by vSphere providers such as vCloud Air. Opaque contents defined by the provider.

■

uuid.bios – the virtual machine’s SMBIOS UUID, cached at boot time. VMware maintains the SMBIOS UUID as a component of guest licensing; migrating a VM retains the same UUID while cloning a VM generates a different UUID. The algorithm to generate a new UUID varies from release to release.
The vSphere API permits changing a VM’s UUID. Although vSphere prevents unintentional duplication of UUID, it allows deliberate UUID duplication, because legitimate workflows (such as lab environments) often require duplicated UUIDs.

Host Hardware

In the next example, only the first two values are provided by default. A virtual machine with ExtraConfig tools.guestlib.enableHostInfo = TRUE (a non-default setting) supplies the remaining values.

$ vmware-toolbox-cmd stat raw text host

host.cpu.processorMHz = 2399

host.cpu.coresPerPkg = 4

host.cpu.packages = 2

host.cpu.cores = 8

host.cpu.threads = 16

host.dmi.product = ProLiant ML350 G6

host.dmi.vendor = HP

■

host.cpu.processorMHz – nominal processor speed. Other metrics, such as vm.cpu.used below, are normalized to this speed.

■

host.cpu.coresPerPkg – actual cores per socket, not including hyperthreads. Useful for determining cache effects and other aspects of socket sharing. Information is also available with CPUID instruction. Note that vSphere does not implement virtual hyperthreads.

■

host.cpu.packages – number of CPU sockets on the host (non-default).

■

host.cpu.cores – number of cores on the host across all sockets, not including hyperthreads (non-default).

■

host.cpu.threads – number of logical CPUs on the host across all sockets, including hyperthreads (non-default).

■

host.dmi.product – “product” field in the host SMBIOS data (non-default).

■

host.dmi.vendor – “vendor” field in the host SMBIOS data (non-default).

Host information (dmi.product and dmi.vendor) and total capacity are hidden by default, because this information is considered sensitive and not relevant to virtual machine execution. VMware discourages use of such information, but permits it to be made available to help with support.

CPU and Memory Statistics

For implementation reasons, a virtual machine tracks CPU and memory resources slightly differently. CPU resources, including NUMA, indicate virtualization overhead, shown with vm. prefix. Memory resources are broken out by guest memory, shown with guest. prefix, and by overhead memory, with ovhd. prefix. Future implementations may add additional metrics.

This example shows various CPU and memory statistics:

$ vmware-toolbox-cmd stat raw text resources

vm.cpu.reserved = 4798

vm.cpu.limit = 11995

vm.cpu.used = 224057517

vm.cpu.contention.cpu = 65606184

vm.cpu.contention.mem = 1488848

vm.numa.local = 1837248

vm.numa.remote = 0

guest.mem.reserved = 204800

guest.mem.limit = 1536000

guest.mem.mapped = 1810144

guest.mem.consumed = 1521680

guest.mem.swapped = 3236

guest.mem.ballooned = 27104

guest.mem.swapIn = 3416

guest.mem.swapOut = 6588

ovhd.mem.swapped = 0

ovhd.mem.swapIn = 0

ovhd.mem.swapOut = 0

■

vm.cpu.reserved – (static) MHz of current CPU type reserved. Covers all virtual CPU plus overheads, so for example a 2 virtual CPU machine would need 2x host.cpu.processorMHz to be fully reserved. Overheads are insignificant except during transient conditions such as taking a backup snapshot or during a vMotion. Default 0.

■

vm.cpu.limit – (static) MHz that the virtual machine will not exceed. Default –1 means unlimited.

■

vm.cpu.used – (cumulative) microseconds of CPU time used by this virtual machine. Equivalent to esxtop %USED. See Comparison to esxtop for details.

■

vm.cpu.contention.cpu = (cumulative) CPU time the virtual machine could have run, but did not run due to CPU contention. This metric includes time losses due to hypervisor factors, such as overcommit. Specific sources of contention vary widely from release to release. See Comparison to esxtop for details about calculating CPU contention.

■

vm.cpu.contention.mem – (cumulative) CPU time the virtual machine could have run, but did not run due to memory contention. This metric includes losses due to swapping. Equivalent to esxtop %SWPWT.

■

vm.numa.local – (instantaneous) KB of memory currently local, sum across the VM’s NUMA nodes.

■

vm.numa.remote – (instantaneous) KB of memory currently remote, sum across the VM’s NUMA nodes.

■

guest.mem.reserved – (static) KB of memory reserved for the guest OS. This indicates memory that will never be ballooned or swapped. Default is 0.

■

guest.mem.limit – (static) KB of memory the guest must operate within. Default –1 means unlimited.

■

guest.mem.mapped – (instantaneous) KB of memory currently mapped into the guest; that is, memory the guest can access with zero read latency. This metric represents memory use from a guest perspective.

■

guest.mem.consumed – (instantaneous) KB of memory used to provide current mapped memory. This might be lower than mapped due to ballooning, memory sharing, or future optimizations. This metric represents memory use from a host perspective. The difference between guest.mem.mapped and guest.mem.consumed is additional memory made available due to hypervisor optimizations.

■

guest.mem.swapped – (instantaneous) KB of memory swapped to disk. A fully reserved virtual machine should never see memory swapped out in steady-state usage. Transient conditions, such as resume from memory-included snapshot, might show some swap usage.

■

guest.mem.ballooned – (instantaneous) KB of memory deliberately copied on write (COWed) to zero in the guest OS, to reduce memory usage.

■

guest.mem.swapIn – (cumulative) KB of memory swapped in for the current session.

■

guest.mem.swapOut – (cumulative) KB of memory swapped out for the current session.

■

ovhd.mem.swapped – (instantaneous) KB of overhead memory currently swapped.

■

ovhd.mem.swapIn – (cumulative) KB of overhead memory swapped in for the current session.

■

ovhd.mem.swapOut – (cumulative) KB of overhead memory swapped out for the current session.

Expected values for some of the statistics

■

vm.cpu.contention.mem – usually < 1%, anything greater indicates memory overcommit.

■

vm.cpu.contention.cpu – < 5% of incremental time during undercommit, < 50% of incremental time at normal levels of overcommit (vSphere is tuned to perform best when somewhat overcommitted).

■

When contention is < 5%, performance will be deterministic but the host is not fully used.

■

When contention is between 5% and 50%, the host is becoming fully used (maximum CPU throughput) but individual virtual machines might see less deterministic performance.

■

vm.numa.local – Expected to match guest.memory.mapped. Transient conditions such as NUMA rebalance can cause this to temporarily decrease, then return to normal as memory is migrated.

■

vm.numa.remote – Expected to be approximately zero in non-overcommitted scenarios.

■

guest.mem.mapped – Expected to equal configured guest memory; might be smaller if virtual machine has yet to access all its memory.

■

guest.mem.consumed – Expected to be approximately equal to configured guest memory; will be smaller if host memory is overcommitted.

■

guest.mem.swapped – Expected to be zero. Non-zero indicates non-graceful memory overcommit.

■

guest.mem.ballooned – Expected to be zero. Non-zero indicates graceful memory overcommit.

■

ovhd.mem.swapped – Expected to be zero. Non-zero indicates memory overcommit.

Equations for CPU and memory metrics

■

session uptime = vm.cpu.used + vm.cpu.contention.cpu + vm.cpu.contention.mem + CPU idle time

■

configured memory size = guest.mem.mapped + guest.mem.swapped + (memory not yet touched)

■

configured memory size = vm.numa.local + vm.numa.remote
(another formula for arriving at the same statistic above)

■

guest.mem.mapped = guest.mem.consumed + guest.mem.ballooned + (other copy-on-write sources)

Comparison to esxtop

Individual reasons for lack of vCPU progress are available to vSphere administrators (using either esxtop or the vSphere API) but are hidden from the guest OS to preserve isolation between the virtual machine and the configuration of the infrastructure it runs upon. The guest OS sees only an aggregate metric.

■

vm.cpu.used is equivalent to the esxtop statistic %USED for a virtual machine.

■

vm.cpu.contention.cpu is equivalent to
(%RDY – %MLMTD) + %MLMTD + %CSTP + %WAIT + (%RUN – %USED)

■

(%RDY – %MLMTD) represents time the guest OS could not run due to host CPU overutilization. Note that %RDY includes %MLMTD, which is why it is subtracted before being added.

■

%MLMTD represents time the guest OS did not run due to administrator-configured resource limits.
ESXi 6.0 and earlier did not add %MLMTD to this computation, but this is fixed in ESXi 6.5.

■

%CSTP represents time the guest OS could not run due to uneven vCPU progress.

■

%WAIT represents time the guest OS could not run due to hypervisor overheads.

■

(%RUN – %USED) corrects for any frequency scaling of the host CPU.

■

vm.cpu.contention.mem is equivalent to %SWPWT.

See https://communities.vmware.com/docs/DOC-9279 for details about esxtop.

Note on nominal CPU speed and CPU metrics

The host.cpu.processorMHz metric (in the host section) reports a nominal speed, and the virtual machine CPU metrics are normalized to the processorMHz metric. Actual processor speed might be higher or lower depending on host power management.

A virtual machine can see vm.cpu.used exceed wall clock time due to Turbo Boost, or can see vm.cpu.used lag wall clock time due to power saving modes used in conjunction with idle guests. Actual processor speed is not available to the guest OS, but is expected to be close to nominal clock speed when the guest OS is active. See http://www.vmware.com/files/pdf/techpaper/hpm-perf-vsphere55.pdf for more information about vSphere host power management.

Normalizing CPU metrics to nominal CPU speed allows the guest OS to avoid dependence on host power management settings.

Note on vm.cpu.contention.cpu

Using the Extended Guest Statistics discussed in this section, you can obtain a contention ratio by comparing contention time to actual time for a particular time interval. As contention time is reported as a sum across VCPUs, and wall time is reported for the entire virtual machine, the wall time must be scaled up by the number of VCPUs to normalize contention to a 0-100% range.

Contention% = 100 * (contention_T2 – contention_T1) / (VCPUs * (time_T2 – time_T1))

The vm.cpu.contention.cpu metric is similar to “stolen time” returned by VMGuestLib_GetCpuStolenMs (see Accessor Functions for Virtual Machine Data), except “stolen time” excludes time the virtual machine did not run due to configured resource limits. Comparing this value to esxtop requires denormalizing the contention ratio, because esxtop reports a sum of percentages across VCPUs. So:

((%RDY – %MLMTD) + %MLMTD + %CSTP + %WAIT + (%RUN – %USED)) ~= Contention% * VCPUs

Due to sample aliasing where in-guest time samples and esxtop time samples do not occur simultaneously, instantaneous esxtop values will not match instantaneous in-guest statistics. Longer time samples or averaging values collected over time will produce more comparable results.

A contention value of < 5% is normal “undercommit” operating behavior, representing minor hypervisor overheads. A contention value > 50% is “excess overcommit” and indicates CPU resource starvation – the workload would benefit from additional CPUs or migrating virtual machines to different hosts. A contention value between 5% and 50% is “normal overcommit” and is more complicated. The goal of this metric is to allow direct measurement of the performance improvement that can be obtained by adding CPU resources.

CPU utilization across all virtual machines illustrates these concepts.

CPU utilization across all virtual machines

VMware best practices describe the available CPU capacity of an ESXi host as equal to the number of cores (not including hyperthreads). A 16 core host with 2.0GHz processors has 16 cores * 2000 MHz/core = 32000 MHz available compute capacity. When actual usage is below that calculated capacity, the hypervisor is considered “under committed” – the hypervisor is scaling linearly with load applied, and is wasting capacity.

As actual usage exceeds available compute capacity, the hypervisor begins utilizing hyperthreads for running virtual machines to keep performance degradation graceful. Maximum aggregate utilization occurs during this “normal overcommit” (between 5% and 50% contention) where each virtual machine sees somewhat degraded performance but overall system throughput still increases. In this “normal overcommit” region, adding load still improves overall efficiency, though at a declining rate. Eventually, all hyperthreads are fully used. Efficiency peaks and starts to degrade; this “excess overcommit” (>50% contention) indicates the workload would be more efficient if spread across more hosts for better throughput.

One specific scenario deserves special mention: the “monster VM” that attempts to give a single VM all available compute capacity. A VM configured to match the number of host cores (not including hyperthreads) will peak at the capacity of those cores (with < 5% contention) but at a performance about 20% lower than an equivalent physical machine utilizing all cores and hyperthreads. A VM configured to match the number of host threads (2x host cores) will peak at a performance level more analogous to a physical machine, but will show about 40% contention (the upper end of “normal overcommit”) running half the cores on hyperthreads. This contention metric indicates the load would run better on a larger host with additional cores, so it is technically “overcommitted” even though performance is better than a hypervisor running at full commit. This behavior is expected when attempting to run maximally sized virtual machines.

Storage Statistics

The following example shows some I/O statistics:

$ vmware-toolbox-cmd stat raw text vscsi scsi0:0

num.reads = 12605

num.writes = 1039

size.reads = 533612032

size.writes = 14279680

latency.reads = 1944173239

latency.writes = 102025122

■

num.reads – read commands.

■

num.writes – write commands.

■

size.reads – bytes read.

■

size.writes – bytes written.

■

latency.reads – microseconds of all read commands.

■

latency.writes – microseconds of all write commands.

The read IOPs over the last 10 seconds can be calculated as:

(num.reads @ TimeNow) – (num.reads @ Time10sec)
–––––––––––––––––––––––––––––––––––––––––––––
TimeNow – Time10sec

The average latency of reads over the last ten seconds can be calculated as:

(latency.reads @ TimeNow) – (latency.reads @ Time10sec)
––––––––––––––––––––––––––––––––––––––––––––––––––––––
(num.reads @ TimeNow) – (num.reads @ Time10sec)

Network Statistics

Reservation and limit are supported on DVS (Distributed Virtual Switch) or “opaque” (NSX) switch types only; they are not supported on the default VSS switch type. Between reservation and limit, bandwidth is allocated on a share-based system, which is not meaningful to expose to a guest OS.

$ vmware-toolbox-cmd stat raw text vnet 00:0c:29:1e:23:f3

num.tx = 209

size.tx = 38137

num.rx = 10920

size.rx = 1312789

reservation = 0

limit = -1

■

num.tx – number of packets transmitted.

■

size.tx – bytes transmitted.

■

num.rx – number of packets received.

■

size.rx – bytes received.

■

reservation – guaranteed minimum bandwidth for this vNIC.

■

limit – maximum bandwidth allowed for this vNIC.

Help us improve this information. Send feedback to [email protected].