The IT Back Out Plan: How NOT to get caught with your pants down

Written by Mark Farrugia, Senior Microsoft Premier Field Engineer.

Photo Credit: zirconicussoYou have all seen it a million times on the silver screen, our hero is caught, trapped and/or facing impossible odds, and all of a sudden they make a miraculous escape. It seems the whole entire time while I would have been saying my prayers, our hero had a back out plan to get them out of that particular jam, and go on to save the day. I.T. Operations is pretty much the same. You have to become the Indiana Jones of operations in order to run a successful shop. I am not saying to go and search for gold in your servers, rather that proper planning is the key to making a successful change in your environment. I have alluded to this topic in the past briefly in one of my other posts titled “6 Steps For Managing Risk in IT Operations”. Today I want to expand on this topic slightly further.

I have encountered problems in the past at client sites which did not end well because someone did not have a proper back out plan prepared. One example: a client rolled out a hotfix for a critical portion of the operating system, and the hot fix ended up revealing a corruption in their image that lay dormant for years unnoticed. The client took an extended period of time to react to the issue because testing did not show any issues in the lab, only in production. Ideally they would have reacted more quickly and enacted the back out plan, but alas they did not have one. They encountered unnecessary grief because someone did not create and test a proper recovery plan.

What is a Back out Plan?

The concept of the back out plan is very simple: reverse the changes introduced as quickly as possible, with the least amount of impact possible. The back out plan should be documented as part of any change request, with all the proper contact information for any necessary parties to act on the plan within it. Should the panic button be hit, a proper communications plan needs to be acted upon also to keep all stakeholders apprised to the issue.

The back out plan is the last thing any I.T. administrator wants to enact as they then have to be accountable to management as to why the proposed change failed, which is usually the result of improper (or non-existent) proper planning and testing. It’s also important to note that a failed change is not always a bad thing if everyone involved, including management and the organization as a whole, learns from their mistake and makes changes to improved processes and procedures.

Elements of a Good Back Out Plan

So, to summarize, any change introduced into a production environment should have a properly documented back out plan. A good back out plan will contain some or all of the following:

  • Change request manager to be contacted upon a failed change
  • A step by step guide to returning the environment back to a working state
  • The contact details of all parties involved, necessary to get the environment back up and running
  • A communications plan to keep all involved on the progress of the change, and time to recovery

More information on constructing a proper back out plan can be found in the Planning Phase of the Microsoft Operations Framework documentation, along with the following resource: Best Practices for Change Management.