Improve your Hyper-V Virtual Availability - Live Migrate VMs on Shutdown

Tags: Clustering, Hyper-V, PowerShell

Hyper-V clustering is a pretty rock solid thing, and Live Migration (introduced as we all know with Server 2008 R2) is virtually identical to VMWare's long-available VMotion technology - pick up a running VM, and move it to another host in the cluster without users noticing. Generally speaking you might see a small hiccup - one ping lost as the machine stops on one host and starts on another:

Reply from 10.67.1.141: bytes=32 time<1ms TTL=127
Reply from 10.67.1.141: bytes=32 time<1ms TTL=127
Reply from 10.67.1.141: bytes=32 time<1ms TTL=127
Request timed out
Reply from 10.67.1.141: bytes=32 time=133ms TTL=127
Reply from 10.67.1.141: bytes=32 time<1ms TTL=127
Reply from 10.67.1.141: bytes=32 time<1ms TTL=127

But if you shut down a cluster host, say, because you're deploying a Windows update, or a new version of a backup or monitoring client, the situation is different. Windows will use Quick migration to move the virtual machines from one host to another - and Quick Migration is nothing like VMotion and Live Migration.

Instead of copying the VM memory and processor state across the network, the virtual machine is saved (to your SAN) on one host, then restored from that saved state on another. The difference is obvious:

Reply from 10.67.1.141: bytes=32 time<1ms TTL=127
Reply from 10.67.1.141: bytes=32 time<1ms TTL=127
Request timed out
Request timed out
Request timed out
Request timed out
Request timed out
Request timed out
Request timed out
Request timed out
Request timed out
Request timed out
Request timed out
Request timed out
Request timed out
Request timed out
Request timed out
Request timed out
Request timed out
Request timed out
Reply from 10.67.1.141: bytes=32 time=133ms TTL=127
Reply from 10.67.1.141: bytes=32 time<1ms TTL=127
Reply from 10.67.1.141: bytes=32 time<1ms TTL=127
Reply from 10.67.1.141: bytes=32 time<1ms TTL=127

That save and restore process can take anywhere from 25 to 90 seconds, during which time the VM is off the network.

Your remote desktop session? Dropped.

Your Outlook client? Offline.

Your open files on the file server? Connection lost, better hope you can save elsewhere.

You'd think there would be a better way. And there is - PowerShell.

12 lines of PowerShell (excluding white space) and Group Policy is all you need to take all the virtual machines on your host and distribute them across the cluster:

First we get the local computer name and use it to suspend the Cluster service on this computer. This prevents it from taking over other cluster resources.

Then, we get a list of all the other nodes in the cluster with a state of "Up" meaning that they will be able to accept live migration requests. We also need a counter variable ($i) - we'll use this to keep track of the host to which we will move a VM.

Next, we get a list of all the Virtual Machine resource groups that are currently hosted by the server we're shutting down.

Finally we cycle through the list, moving virtual machines to each of the other hosts in order. Once we're done, we resume the cluster service (note that because this is intended to run as a shutdown script, resuming presents very little risk of having groups moved TO this node in the short time between this script finishing and the host restarting).

Having written the script, we save it to a known location on each cluster node (perhaps C:\Scripts\Evac-VMs.PS1).  At this point you have a choice to make; there are two options:

  1. Sign the script (you'll need a Code Signing certificate and private key, then: $cert = @(gci cert:\currentuser\CodeSigningCert)[0]; Set-AuthenticodeSignature Evac-VMs.ps1 $cert);
  2. Set the script execution policy to Unrestricted (Set-ExecutionPolicy -ExecutionPolicy Unrestricted)

I strongly recommend option 1 for security, but in a lab or low security environment (i.e. you WANT your hosts to be compromised) option 2 might be acceptable.

Finally, you need to configure Group Policy or Local Policy for the host to have a Shutdown Script. You'll find these settings under Computer Settings > Windows Settings > Scripts (Startup/Shutdown):

PowerShell Scripts tab (not the default Scripts tab) - set PowerShell scripts to run First (so that the live migrations are the first tasks executed during shutdown):

Add a new script and set the script path to your saved file:

It's worth noting that scripts executed by Group Policy do not need to be signed - they bypass the script execution policy settings. Nevertheless you're going to want to sign it so that:

  1. If someone changes the script you will know when you run it manually;
  2. You can test the script;
  3. You can use it for reasons other than shutting down a host.

That's all that needs to be done - now it's testing time. I've attached the (unsigned) script below so you can just save it, sign it and test it.

Download the script.

No Comments