Identify cause of performance issue with Hyper-V Cluster and Shared Storage

Do you have a Hyper-V cluster with multiple nodes and all of your VMs live on shared storage?  Do you often have users complain that the virtual servers are “slow”? 

When you check task manager inside the VM, you don’t see obvious issues.  You then open Resource Monitor and go to the disk tab and you see that the “Active Time” is showing as 100%.  The VM feels slow even though task manager otherwise says it’s idle.  You check your shared storage console and the latency and cache hit rate and other metrics all look normal – nothing obvious to indicate any performance issues.

How do you determine exactly what is causing the performance woes?  I found myself in this exact situation and I found a solution and figured if it helped me, it can help you.

The Solution:

Run the PowerShell script below on one of the Hyper-V hosts in your cluster, any host will do so long as it’s part of the cluster.

The script will connect to each Hyper-V host and extract the performance counters for readbytes/s and writebytes/s.  It will then combine the data for all VMs on all the hosts  and create a third “totalbytes/s” column.  It will then display a gridview of every individual VHDX file and its real time read/write byte/s.  If you have one ore multiple VMs that are exceedingly high for a long period of time, these VMs are almost certainly sucking up all of your Disk IO and starving the other VMs resulting in the user complaints of slowness.

Crucially at the end of the script will output the sum of totalbytes/s used across all VMs.   This becomes your one-stop shop value to determine if high read/write/s on your shared storage is the cause of your performance issues.  In our environment, anything up to 200mb/s is generally OK.  But if it gets over 300mb/s, people will start to complain and over 400mb/s, VMs nearly grind to a halt when used interactively.

image

Identify what files on a specific VM are using all the disk IO

Now that you know one or more VMs are using all the disk IO on your SAN, how do you identify what specific files are being read inside the VMs?  You could log into the VMs and open resource monitor and check the disk tab but that’s not always a readily available option.  An alternative solution is below.  

This leverages the amazing Nirsoft “FileActivityWatch” tool (https://nirsoft.net/file_activity_watch.html) to output a list of the most accessed files inside the VM.  The script for that is below.  This version is intended to be used from an RMM tool so you can modify for your use case as needed.

Get-ClusterDiskPerVMUsageMetrics.ps1

$HyperVHosts = (Get-ClusterNode).Name

if(-not ($HyperVHosts)) { write-warning "Clustered HyperV Hosts not found.  Is this script running on a clustered HyperV Host?"; break}

$allCounterData = @()

foreach ($HyperVHost in $HyperVHosts) {
    $counters = Get-Counter -ComputerName $HyperVHost -Counter @(
        "\Hyper-V Virtual Storage Device(*)\Read Bytes/sec",
        "\Hyper-V Virtual Storage Device(*)\Write Bytes/sec"
    )

    ForEach ($sample in $counters.CounterSamples) {
        $allCounterData += [PSCustomObject]@{
            HyperVHost   = $HyperVHost
            InstanceName = $sample.InstanceName
            Path         = $sample.Path
            CounterName  = if ($sample.Path -like '*Read Bytes/sec') { 'Read' } else { 'Write' }
            BytesPerSec  = $sample.CookedValue
        }
    }
}

$results = $allCounterData | Group-Object HyperVHost, InstanceName | ForEach-Object {
    
    $group = $_.Group
    $hyperVHost = $group[0].HyperVHost
    $instanceName = $group[0].InstanceName
    $timestamp = $group[0].Timestamp

    if($instancename -match 'vhdx')
    {
        [int]$readBytes = [math]::round((($group | Where-Object { $_.CounterName -eq 'Read' }).BytesPerSec | Measure-Object -Sum | Select-Object -ExpandProperty Sum),0)
        [int]$writeBytes = [math]::round((($group | Where-Object { $_.CounterName -eq 'Write' }).BytesPerSec | Measure-Object -Sum | Select-Object -ExpandProperty Sum),0)
        [int]$totalBytes = [math]::round(($readBytes + $writeBytes),0)

        [PSCustomObject]@{
            HyperVHost  = $hyperVHost
            VMInstance  = $instanceName
            'ReadBytes/s'   = $ReadBytes
            'WriteBytes/s'  = $writebytes
            'TotalBytes/s'  = $totalbytes
        }
    }
}

$results | Sort-Object 'TotalBytes/s' -Descending | Out-GridView

$Total = [math]::round("{0:N0}" -f ($Results | measure-object 'totalbytes/s' -Sum).sum / 1MB,0)

$HyperVHostsString = $HyperVHosts -join ";"

$ts = [string](get-date)
write-host "Data Collection Date: $ts" -ForegroundColor Green
write-host "[$HyperVHostsString] processed $Total MB/s" -ForegroundColor yellow

Get-VMPerFileUsage.ps1

# Do what you need to do to ensure the FileActivityWatch.exe program from Nirsoft is in c:\windows\temp before running this script
# https://www.nirsoft.net/utils/file_activity_watch.html

$LogFile = "c:\windows\temp\fileactivity.csv"
if (Test-Path $logFile) { Remove-Item $LogFile }

c:\windows\temp\FileActivityWatch.exe /scomma $LogFile /capturetime 10000

Start-sleep -Seconds 12

Import-Csv $Logfile | ForEach-Object {
    
    $ReadBytes = [int64]$_.'Read Bytes'
    $WriteBytes = [int64]$_.'Write Bytes'
    $TotalBytes =  ([int64]$_.'Read Bytes') + ([int64]$_.'Write Bytes')
    
    [PSCustomObject]@{
        FileName     = $_.'Filename'
        ProcessName  = $_.'Process Name'
        ProcessID = $_.'Process ID'
        [string]'ReadBytes/s'    = 	"{0:n0}" -f $ReadBytes + " bytes/s"
        [string]'WriteBytes/s'   = "{0:n0}" -f $WriteBytes + " bytes/s"
        [string]'TotalBytes/s'   =  "{0:n0}" -f $TotalBytes + " bytes/s"
        'totalbytesraw' = $TotalBytes
    }
} | Select-Object FileName, ProcessName, ProcessID, 'ReadBytes/s', 'WriteBytes/s', 'TotalBytes/s', totalbytesraw | sort totalbytesraw -Descending | select  -first 25 -Property * -ExcludeProperty totalbytesraw | fl 


The output looks as shown below. In this case during the 10 second interval the command collected data from, several ISOs were being copied in the C: drive.

You know are armed with the information you need to quiet the VM disk utilization and stop your users from complaining the VMs are “slow”.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.