Pages

Thursday 4 February 2010

ESXTOP

Esxtop allows monitoring and collection of data for all system resources: CPU, memory, disk and network.
Understanding all the information in esxtop can seem like quiet a lot to take it at first but once you use esxtop and understand all the information you wont stop using it.

The following keys are the ones I use the most.
open console session or ssh to ESX(i) and type:
esxtop

By default the screen will be refreshed every 5 seconds, change this by typing:
s 2

Changing views is easy type the following keys for the associated views:
c = cpu 
m = memory 
n = network 
i = interrupts 
d = disk adapter 
u = disk device 
v = disk VM

To Ad/Remove fields:
:f

Changing the order:
o

Saving all the settings you’ve changed:
W
To capture the information and export it to a CSV use the following command:
esxtop -b -d 2 -n 100> esxtopcapture.csv

Where “-b” stands for batch mode, “-d 2″ is a delay of 2 seconds and “-n 100″ are 100 iterations. In this specific case esxtop will log all metrics for 200 seconds.

Help:?

here are a few of the metric thresholds that i use


DisplayMetricThresholdExplanation
CPU%RDY10Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check %MLMTD) has been set. See Jason’s explanation for vSMP VMs
CPU%CSTP100Excessive usage of vSMP. Decrease amount of vCPUs for this particular VM. This should lead to increased scheduling opportunities.
CPU%MLMTD0If larger than 0 the world is being throttled. Possible cause: Limit on CPU.
CPU%SWPWT1VM waiting on swapped pages to be read from disk. Possible cause: Memory overcommitment.
CPUTIMER/S (H)1000High timer-interrupt rate. It may be possible to reduce this rate and thus reduce overhead. The amount of overhead increases with the number of vCPUs assigned to a VM.
MEMMCTLSZ (I)1If larger than 0 host is forcing VMs to inflate balloon driver to reclaim memory as host is overcommited.
MEMSWCUR (J)1If larger than 0 host has swapped memory pages in the past. Possible cause: Overcommitment.
MEMSWR/s (J)1If larger than 0 host is actively reading from swap(vswp). Possible cause: Excessive memory overcommitment.
MEMSWW/s (J)1If larger than 0 host is actively writing to swap(vswp). Possible cause: Excessive memory overcommitment.
MEMN%L (F)80If less than 80 VM experiences poor NUMA locality. If a VM has a memory size greater than the amount of memory local to each processor, the ESX scheduler does not attempt to use NUMA optimizations for that VM and “remotely” uses memory via “interconnect”.
NETWORK%DRPTX1Dropped packages transmitted, hardware overworked. Possible cause: very high network utilization
NETWORK%DRPRX1Dropped packages received, hardware overworked. Possible cause: very high network utilization
DISKGAVG (H)25Look at “DAVG” and “KAVG” as the sum of both is GAVG.
DISKDAVG (H)25Disk latency most likely to be caused by array.
DISKKAVG (H)5Disk latency caused by the VMkernel, high KAVG usually means queuing. Check “QUED”.
DISKQUED (F)1Queue maxed out. Possibly queue depth set to low. Check with array vendor for optimal queue depth value.
DISKABRTS/s (K)1Aborts issued by guest(VM) because storage is not responding. For Windows VMs this happens after 60 seconds by default. Can be caused for instance when paths failed or array is not accepting any IO for whatever reason.
DISKRESETS/s (K)1The number of commands reset per second.
For a more detailed view over ESXTOP read the followign VMware article.

No comments:

Post a Comment