Esxtop allows monitoring and collection of data for all system resources: CPU, memory, disk and network.
Understanding all the information in esxtop can seem like quiet a lot to take it at first but once you use esxtop and understand all the information you wont stop using it.The following keys are the ones I use the most.
open console session or ssh to ESX(i) and type:
esxtop
By default the screen will be refreshed every 5 seconds, change this by typing:
s 2
Changing views is easy type the following keys for the associated views:
c = cpu
m = memory
n = network
i = interrupts
d = disk adapter
u = disk device
v = disk VM
To Ad/Remove fields:
To Ad/Remove fields:
:f
Changing the order:
Changing the order:
o
Saving all the settings you’ve changed:
Saving all the settings you’ve changed:
W
To capture the information and export it to a CSV use the following command:
esxtop -b -d 2 -n 100> esxtopcapture.csv
Where “-b” stands for batch mode, “-d 2″ is a delay of 2 seconds and “-n 100″ are 100 iterations. In this specific case esxtop will log all metrics for 200 seconds.
Help:?
here are a few of the metric thresholds that i use
Display | Metric | Threshold | Explanation |
CPU | %RDY | 10 | Overprovisioning of vCPUs, excessive usage of vSMP or a limit(check %MLMTD) has been set. See Jason’s explanation for vSMP VMs |
CPU | %CSTP | 100 | Excessive usage of vSMP. Decrease amount of vCPUs for this particular VM. This should lead to increased scheduling opportunities. |
CPU | %MLMTD | 0 | If larger than 0 the world is being throttled. Possible cause: Limit on CPU. |
CPU | %SWPWT | 1 | VM waiting on swapped pages to be read from disk. Possible cause: Memory overcommitment. |
CPU | TIMER/S (H) | 1000 | High timer-interrupt rate. It may be possible to reduce this rate and thus reduce overhead. The amount of overhead increases with the number of vCPUs assigned to a VM. |
MEM | MCTLSZ (I) | 1 | If larger than 0 host is forcing VMs to inflate balloon driver to reclaim memory as host is overcommited. |
MEM | SWCUR (J) | 1 | If larger than 0 host has swapped memory pages in the past. Possible cause: Overcommitment. |
MEM | SWR/s (J) | 1 | If larger than 0 host is actively reading from swap(vswp). Possible cause: Excessive memory overcommitment. |
MEM | SWW/s (J) | 1 | If larger than 0 host is actively writing to swap(vswp). Possible cause: Excessive memory overcommitment. |
MEM | N%L (F) | 80 | If less than 80 VM experiences poor NUMA locality. If a VM has a memory size greater than the amount of memory local to each processor, the ESX scheduler does not attempt to use NUMA optimizations for that VM and “remotely” uses memory via “interconnect”. |
NETWORK | %DRPTX | 1 | Dropped packages transmitted, hardware overworked. Possible cause: very high network utilization |
NETWORK | %DRPRX | 1 | Dropped packages received, hardware overworked. Possible cause: very high network utilization |
DISK | GAVG (H) | 25 | Look at “DAVG” and “KAVG” as the sum of both is GAVG. |
DISK | DAVG (H) | 25 | Disk latency most likely to be caused by array. |
DISK | KAVG (H) | 5 | Disk latency caused by the VMkernel, high KAVG usually means queuing. Check “QUED”. |
DISK | QUED (F) | 1 | Queue maxed out. Possibly queue depth set to low. Check with array vendor for optimal queue depth value. |
DISK | ABRTS/s (K) | 1 | Aborts issued by guest(VM) because storage is not responding. For Windows VMs this happens after 60 seconds by default. Can be caused for instance when paths failed or array is not accepting any IO for whatever reason. |
DISK | RESETS/s (K) | 1 | The number of commands reset per second. |
No comments:
Post a Comment