6 Steps For Managing Risk in IT Operations
Written by Mark Farrugia, Senior Microsoft Premier Field Engineer.
BusinessDictionary.com defines Managed Risk as the following:
Identified probability of loss, or exposure to a danger, that has been minimized to an acceptable level through careful planning and implementation of effective countermeasures.
What Exactly Does That Mean?
We all manage risk everyday through our everyday actions. For example, when walking down a staircase, most people manage the risk of falling by holding the handrail. When we drives an automobile, we mitigate the risk by wearing our seatbelts and abiding by the laws that apply to those roads. Both examples show a possible danger (falling and/or getting hurt in a car), but clearly show how the danger has been minimized to an acceptable level.
So How Does This Pertain To Our Information Technology (IT) Operations?
Quite simply, we need to identify the risks associated to our business IT operations and put in place effective enough countermeasures to mitigate those risks. I say enough because there are methodologies and technologies that will allow our operations to get to five nines of uptime, but would the return on investment be worth it? For some IT shops where it's mission critical to have real time financial data, such as a trading floor, the slightest outage could cost enormous amounts of money, but for other organizations, that level of investment into mitigation technologies may not be necessary.
This is where the acceptable level of risk comes into the equation. An organization may choose to cluster their file services, but is still willing to accept 4 hours of downtime should an outage occur beyond their control. The acceptable risk formula is going to be unique to every operation and person out there. Two businesses may share the same business model, such as a financial institution, but one may place more importance on email than the other, hence different technology risks.
6 Steps to Help Organizations Deal with Risk
- Identify Risks in Operations
- Analyze and Prioritize
- Plan and Schedule
- Track and report
- Controlling Risk
- Learn From Risk
The above six steps help feed and populate all the necessary information into a proper risk statement for the four risk lists identified as the Components of Risk.
- Master Risks List – serves as a central repository for all risks
- Top Risks List – Helps teams focus on the most important risks for mitigation
- Risks by Services – lists risks associated to core business services end-to-end
- Retired Risk List – serves as a knowledge base repository
1. Identification of Risks in Operations
A risk has to be clearly defined before it can be managed. To have a successful risk statement it needs to be written in a natural language statement consisting of two parts: the condition and consequence.
- Condition – a description of an existing state of affairs or attribute that operations feels may result in a loss or reduction in gain
- Consequence – describes the undesirable state of affairs or attribute that may contribute to the loss or reduction in gain
2. Analyze and Prioritize
Analysis of the risk probability is a measure that the consequences described in the risk statement that are likely to occur and assigned a score. Probability has to be greater than zero but less than one hundred, as this translates into the risk not happening to the risk is most certain to happen respectively.
Prioritizing the risks identified allows the organization to focus on the risks that pose the most threat to their operation.
3. Plan and Schedule
The planning and scheduling task takes the prioritized risks and converts them into action plans to be acted upon. Planning involves developing detailed strategies and actions for each of the top risks, prioritizing risk actions, and creating an integrated risk management plan.
4. Track and Report
As action plans are implemented, IT Operations tracks the changes to the operating environment and measures how risks are changing. This tracking and reporting of risk mitigation is important to support the next action.
5. Controlling Risk
During this step, individuals carry out activities related to contingency plans because triggers have been reached. Corrective actions are initiated based on risk tracking information.
6. Learn from Risk
Risk learning should be a continuous activity throughout the entire risk management process and may begin at any time.