Introduction

This article will showcase the native sniffing ability in ESX. Specifically, we will focus on scenarios where high volumes of data (i.e. network storage) must be sniffed.

If you are simply sniffing management traffic (or something lightweight), you have much more freedom and less risk of running out of space. The focus here will be the type of packet captures that must be carefully tweaked to get the desired forensics during issue.

Available Tools:

The following are common tools used for sniffing on ESX. Most of them are native (no install required). Try them and choose your favorite.

tcpdump-uw is a native utility available in ESX 4.x and later. Can only sniff vmk's (i.e. vmkernel interfaces such as NFS, iSCSI, etc.). This is the tool we cover in the examples. This is excellent for sniffing NFS traffic. More info at VMware KB1031186

pktcap-uw is a native utility available in ESX 5.5 and later. Can sniff vmk, dvs, VM, etc. This is the latest and greatest and is worth checking out. More info VMware KB2051814

vem-pkt is only available if ESX has the Cisco Nexus 1000V VEM vib installed (included with all N1KV editions). This particular tool may become a thing of the past, considering 3rd party switches are EOL beyond vSphere 6.5. Even though I run 1000V, I still prefer the native sniffing of the ESX tools which can sniff N1KV traffic no problem.

Choosing a vmk Interface

You may be interested in a particular vmkernel interface for iSCSI or NFS.

Tip: List your vmk's with esxcfg-vmknic -l (via CLI or SSH)

Output

You'll have to be smart about where you place your capture files.
They can quickly consume all space on the ESX filesystem.
If you have run out of space on ESX root (i.e. using default of /var/tmp),
then you should evacuate the host and reboot. You may not be getting valid logs,
even though the system seems fine.

The following are some common choices to place your output:

  • /var/tmp (ESX filesystem, use with caution)
  • /vmfs/volumes/datastore1 (local datastore, ok)
  • /vmfs/volumes/some-remote-path (remote datastore, ok)

SNIFFING SHOULD BE PERFORMED FROM THE DCUI

Although sniffing via SSH works just fine, the preference is to use the Direct Console Interface (DCUI). This means either directly at the server with a keyboard and monitor, or something like iDRAC,iLO,RSA for out of band connectivity.

Also, you must be careful to perfectly hit CTRL + C when it's time to cancel. If you accidentally move the sniffing session to the background with another key press combination, it could impact your running workloads by continuing to run forever (until ESX reboots).

All examples will use tcpdump-uw, which is native to ESXi

Example #1

tcpdump-uw -i vmk3 -s 9014 -C 100M -W 5 -w /var/tmp/esx007.pcap

  • Use the tcpdump-uw sniffer to capture network activity
  • Only collect from the NFS vmkernel interface -i vmk3
  • Create Five total sniffer trace files -W 5
  • The size of each file will be 100MB each -C 100
  • Sniff jumbo frames -s 9014 (optional, though most interesting frames are small)
  • The older files are automatically replaced until you press CTRL + C to cancel.
  • The files will be saved to /var/tmp on ESX in this example
  • Delivers just a few minutes of traces, depending on I/O

If writing to ESX filesystems (i.e. /var/tmp), don't go higher than -W 10

Example #2

Warning: This command uses high values and must be saved to a VMFS volume or NFS ISO share for example.

In this case, we use the local disk attached to the ESX host.
tcpdump-uw -i vmk3 -C 100M -W 120 -w /vmfs/volumes/Local007/esx007.pcap

  • Saves to local disk on ESX (your path will vary)
  • Captures network activity using default frame size (i.e. no jumbo frames)
  • Only sniffing vmk3 (your choice may vary)
  • Here, we create 120 total sniffer trace files -W 120
  • Delivers up to 1 hour of traces, depending on I/O.
Stopping the Captures

The objective is to stop the capture once the issue has been observed. This can be tricky considering that the sniffer captures roll-over frequently. For hosts with high activity, the time before roll-over may be shorter.

As such, consider pointing to a VMFS volume instead of ESX localos. That way, you can safely increase to -W 120 for a total of 120 files that will be created before beginning to overwrite the oldest.

Sharing Captures with Support

The examples here show how to save the captures into the commonly accepted WireShark format known as .pcap. These are binary, so choose Automatic or binary instead of TEXT/ASCII mode.

Also note that this network traffic is likely in the clear ( not encrypted) so be careful about how you handle the files. Speak with your InfoSec team about policies related to such traffic.

With that said, these traces are hugely beneficial and sometimes the only way to answer the toughest questions in your data center.