Gerrish's IT Blog: ESXTOP

Esxtop allows monitoring and collection of data for all system resources: CPU, memory, disk and network.

Understanding all the information in esxtop can seem like quiet a lot to take it at first but once you use esxtop and understand all the information you wont stop using it.

The following keys are the ones I use the most.
open console session or ssh to ESX(i) and type:

esxtop

By default the screen will be refreshed every 5 seconds, change this by typing:

s 2

Changing views is easy type the following keys for the associated views:

c = cpu

m = memory

n = network

i = interrupts

d = disk adapter

u = disk device

v = disk VM

To Ad/Remove fields:

:f

Changing the order:

o

Saving all the settings you’ve changed:

To capture the information and export it to a CSV use the following command:

esxtop -b -d 2 -n 100> esxtopcapture.csv

Where “-b” stands for batch mode, “-d 2″ is a delay of 2 seconds and “-n 100″ are 100 iterations. In this specific case esxtop will log all metrics for 200 seconds.

Help:?

here are a few of the metric thresholds that i use

Display	Metric	Threshold	Explanation
CPU	%RDY	10	Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check %MLMTD) has been set. See Jason’s explanation for vSMP VMs
CPU	%CSTP	100	Excessive usage of vSMP. Decrease amount of vCPUs for this particular VM. This should lead to increased scheduling opportunities.
CPU	%MLMTD	0	If larger than 0 the world is being throttled. Possible cause: Limit on CPU.
CPU	%SWPWT	1	VM waiting on swapped pages to be read from disk. Possible cause: Memory overcommitment.
CPU	TIMER/S (H)	1000	High timer-interrupt rate. It may be possible to reduce this rate and thus reduce overhead. The amount of overhead increases with the number of vCPUs assigned to a VM.
MEM	MCTLSZ (I)	1	If larger than 0 host is forcing VMs to inflate balloon driver to reclaim memory as host is overcommited.
MEM	SWCUR (J)	1	If larger than 0 host has swapped memory pages in the past. Possible cause: Overcommitment.
MEM	SWR/s (J)	1	If larger than 0 host is actively reading from swap(vswp). Possible cause: Excessive memory overcommitment.
MEM	SWW/s (J)	1	If larger than 0 host is actively writing to swap(vswp). Possible cause: Excessive memory overcommitment.
MEM	N%L (F)	80	If less than 80 VM experiences poor NUMA locality. If a VM has a memory size greater than the amount of memory local to each processor, the ESX scheduler does not attempt to use NUMA optimizations for that VM and “remotely” uses memory via “interconnect”.
NETWORK	%DRPTX	1	Dropped packages transmitted, hardware overworked. Possible cause: very high network utilization
NETWORK	%DRPRX	1	Dropped packages received, hardware overworked. Possible cause: very high network utilization
DISK	GAVG (H)	25	Look at “DAVG” and “KAVG” as the sum of both is GAVG.
DISK	DAVG (H)	25	Disk latency most likely to be caused by array.
DISK	KAVG (H)	5	Disk latency caused by the VMkernel, high KAVG usually means queuing. Check “QUED”.
DISK	QUED (F)	1	Queue maxed out. Possibly queue depth set to low. Check with array vendor for optimal queue depth value.
DISK	ABRTS/s (K)	1	Aborts issued by guest(VM) because storage is not responding. For Windows VMs this happens after 60 seconds by default. Can be caused for instance when paths failed or array is not accepting any IO for whatever reason.
DISK	RESETS/s (K)	1	The number of commands reset per second.

For a more detailed view over ESXTOP read the followign VMware article.

http://communities.vmware.com/docs/DOC-9279

Gerrish's IT Blog

Pages

Thursday, 4 February 2010

ESXTOP

No comments:

Post a Comment

Followers

About Me