Troubleshoot Hyper-V Cluster Node Blue Screen Issue

We had an incident last Friday – couple Hyper-V cluster nodes went to blue screen and rebooted themselves. With the Windows debugging tool and some knowledge of Cluster, I think I have figured it out.

1) Run Windows Debug Tool, and set the symbol path: SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

2) Copy the dump file to your local, the dump file is located at \\Server\c$\windows\Minidump\. And run the following commands to find the crashed process:

!analyze –v

What is netft?

lmvm netft

According to the following records, the crashed process is rhs.exe (Resource Hosting Subsystem in Cluster)

!process fffffa80291d5b30

It is expected that Cluster service will reboot Windows when some critical process crashed. You can find it by running the following command on your Cluster node, and check the value of HangRecoveryAction.

cluster /cluster:<cluster-name> /prop

3) Now, we know the issue is about the Cluster. Let’s generate the cluster log by running the following command. And copy the log file to your local, the file is located at: \\Server\c$\Windows\Cluster\Reports\Cluster.log

Cluster log /g

4) Let’s see what happened back that time (I use trace32 to open the log file). The ISO-Images disk was deadlocked for some reason. (I confirmed with the Network admin that an abrupt network outage happened that day around that time). Why this only happened to the ISO-Image (it is in the Available Storage group), all CSV disks are fine. I think the only shared disks in Hyper-V cluster should be CSV, so we decide to remove that ISO-Image disk to prevent the issue from happening again.

Don’t forget the cluster log time stamp is in GMT format, you need to translate it to your local time.

	Unleashing the Power… on Image-Reader: A project to exp…
	Bob on Build docker image with kaniko…
	Voces De La Tierra on Puppet for Windows: Remote…
	Use Amazon Q with Co… on Use Amazon CodeWhisperer for…
	Zigya on Mail for Exchange on E72
	Masking PII Image wi… on Mask Words in Image
	Use ChatGPT to check… on Why you need CodeGuru?
	AWS Config Advanced… on AWS Config Advance Queries aga…
	Gene on Fail to quiesce a virtual mach…
	Elisa Caldwell on TSM

Troubleshoot Hyper-V Cluster Node Blue Screen Issue

Published by Jackie Chen

Leave a comment Cancel reply

Share this:

Related

Published by Jackie Chen

Leave a comment Cancel reply