One of Microsoft's big marketing statements I've heard several times is that LiveMigration wasn't that important since clients don't change when they do work on hardware even with LiveMigration. I'll cover why this in depth on why this is a flawed thought for an enterprise company in a future blog entry.
Along comes a critical use case this past week. MS08-67 came out and threw most companies I know of into some serious chaos while they rolled this patch out ASAP. Now this one does impact any Windows OS including Server Core. Anyone that would be using Hyper-V would obviously be affected right now. Let's walk through trying to deploy this for 120 Hyper-V hosts with Quick Migration (which causes a service interruption) as fast as humanly possible with business buy-off to do this ASAP outside of Maintenance Zones. Lets assume we are talking about a patch that ONLY affect Virtualization Hosts (I know I know.. not realistic with Hyper-V, bear with me).
Hyper-V Scenario with Quick Migration:
Assumptions to setup:
- Standard business Day is 6am to 10pm. So with the business units agreement we can do this patch from 10pm to 6am every day which is 8 hours of work.
- Applying the patch takes 30 mins including reboot and checkout time. Need to apply to both the primary and secondary Host since any real enterprise has HA setup and configured since its "Defined as Free by Microsoft" right?
- Each Hyper-V Host has 20 Servers on it and there is 120 Host Pairs.
- There's a fail over host available for each of these 120 Hosts (Let's not talk about amazingly wasted resources. This is being generous to Microsoft here and we fail over only once each cluster.)
- Each Host takes about 30 mins to "Quick Migrate" all 20 Server Virtual Machines and one person can do 4 hosts at a time without incurring other unplanned outages.
- All Hands on Deck for part of our team and some folks are awake during normal work hours for support. Lets say reasonably 6 people are working on this.
6 people * 4 Hosts per person per hour gives us roughly 24 hosts and their fail-over pair getting updated each hour and a half. 240 hosts divided by 24 gives us 10 hours to do all these migrations at a rush with a staggered patch start time by about 7 mins for each server. Also each person is perfect in their execution. That's not unreasonable considering connect time to console and login times.
This doesn't take into account the issues with the business units that are dependent on your services:
- Apps that don't work right with a Quick Migration and don't check out right
- Hit to your team's morale.
- Hit to your team's reputation for using this Virtualization Solution.
- People aren't perfect and make mistakes, patches don't always apply right.
VMware Scenario with DRS & VMotion:
- Put the Host that needs the patch into Maintenance Mode. If the cluster is large enough do two at the same time.
- Apply the patch to the Host and reboot it.
- Check the Host out and Release it for usage. Take it out of Maintenance Mode.
- Repeat until every Host is finished.
I have been able to do a full rushed patch deployment like this in my environment with an average of 30 Servers per Host in about 6 hours by myself.
We would start this patch application immediately upon notification since VMotion does not cause a network outage or service interruption. The window of potential infection is incredibly small at this point as I don't wait for a maintenance zone and start the update immediately on the Hosts.
So the question for a real enterprise how much is this worth? For me its pretty obviously worth it. No downtime. No service impact. Just a continiously available service for my clients who don't have to care about the latest patch.