Windows Azure VM downtime due to Host and Guest OS update and how to manage it in multi-instance Windows Azure Application

I have seen some Azure VM downtime concerns from Windows Azure users who have minimum 2 or more instances to meet to 99.95% SLA. The specific concerns are related with VM downtime when Guest and Host OS update is scheduled.

So lets consider following 3 scenarios:

Scenario 1:

  • If you have once 1 instance then this instance will be down and will be ready after update is finished and there is nothing you can do about it

Scenario 2:

  • If you have 2 instances, then only 1 instance will down at a given time, and you will be running on half capacity of your resources during update process. This is still a situation when your first VM is getting ready the second VM could go down for patching so there may a very small time slice when your both VM are no ready to serve your request.

Scenario 3:

  • If you have lot more instances, then also only 1 instance is down at a given time for patching however it is possible that during update process 1 or more machines are going down or coming up same time so this does not mean if you have total 10 machines, at any given time you sure will have 9 machines ready for you.

 

Note: In Scenario 2 and 3 you have some control to decide when and how all of your instances will be updated while Guest OS is being updated. And in this scenario, the concept of upgrade-domains is used when the Guest OS update is performed. 

 

Guest OS Update (OS in your Azure VM):

To master the art of setting Upgrade Domain/ Fault domain, I would suggest you to read the blog below and architect you multi-instance Windows Azure application after digesting this info:

 

Host OS Update (OS in which you Azure VM is running):

- You don’t have any control when and how Host OS will be updated.

 

At last, now you can have better idea that you can control the timing of you own VM (web/worker role) OS updates using upgrade domain/fault domain concept up to certain extent, but you cannot control when the host OS is updated.

 

Now you may ask how often host OS gets updated usually? Monthly/Weekly/Daily? Also does it tend to be scheduled during off-peak/night time of the timezone which the to-be-updated datacenter is located in, or it can just happen at any time during the day?

  • Based on historic schedule located at https://msdn.microsoft.com/en-us/library/ee924680.aspx you can see it almost happens monthly and it can happen at any time during the day. In fact, each datacenter takes multiple days to walk all the upgrade domains, so any given tenant within the datacenter can be upgraded at any time over the course of a couple days. Most of the time the host OS update is depend on security fixes so if the security fixes are not applicable to the Host OS the host OS update time can go longer then month.