Availability—Looking Behind the Numbers

Office 365 Product Manager Kumar Venkateswar discussed many of Office 365’s high availability features and described our approach to Service Availability in Office 365.  With cloud service availability in the news and very important to our customers, I knew that those choosing and using cloud services would want to learn more, so I drew from the Office 365 service documents and the Google Apps Service Level Agreement to create this post. I found that while both Microsoft Office 365 and Google Apps strive for 99.9% availability, the way they approach it is much different from a customer’s point of view. In fact, there are seven important criteria in considering cloud service availability.

1. Percentage of users affected by an issue
The Google Apps service level agreement dictates whether there is a service issue. That is, the problem must affect at least 5% of the client’s users for Google to consider it to be an issue. I wonder how many of Google Apps' issues don’t become part of Google’s availability calculation because they aren’t affecting more than 5% of a client’s users. Is this practice a way for Google to conveniently hide its scheduled maintenance downtime?

Through Microsoft’s service level agreement, if one user is affected there is an issue for the client. This holds true for Office 365 and for each of Microsoft’s online services. In addition, Microsoft explicitly states when it schedules maintenance for Office 365. Customers appreciate being arbiters of what the issues are, and appreciate knowing when maintenance downtime will occur!

2. Which customers are used in calculating availability
Google includes its consumer and email service users in its availability calculation for Google Apps for business Gmail users, making the Gmail availability figure higher than it would otherwise be.

Gmail Availability % = time Gmail available to consumer users + time Gmail available to business users X 100
                      Total time period 

Exchange Online Availability % = time Exchange Online is available to its customers X 100
                                                 Total time period

While Google Apps has 40 million users, the seven year old Gmail service had 260 million consumer users as of October 2011. Google is using 300 million users as a basis to calculate availability for a service which has 40 million users! If consumer Gmail were available, while enterprise Gmail and Exchange Online were unavailable for 30 minutes in a 724 hour month, the math would be:

Gmail Availability % = (724 X 260)/300 + (723.5 X 40)/300 X 100 = 99.99%

        Exchange Online Availability % = 723.5 X 100 = 99.93%

The Gmail availability would be identical to that for Exchange Online, at 99.93% if Google didn’t adjust their downtime to include all Gmail users. Google plays with the math to make their availability look higher!

3. Services reflected in the availability calculation
Since Microsoft Exchange Online encompasses much more functionality than Gmail does, to come even close in functionality Google needs Gmail, Google Calendar, Google Contacts, Google Groups, Google Tasks and Postini. In fact, if there is a Google Calendar outage, it doesn't figure into Gmail availability. If there is a Google Contacts outage, Gmail availability is unaffected. Let the buyer beware. Gmail’s claim of 99.99% uptime in 2011 encompasses email only, and none of the capabilities people use alongside it. Is Gmail availability a good metric to gage Google Apps' overall performance? I don't think so.

4. Number of cloud services covered by the Service Level Agreement
The Office 365 Service Level Agreement covers the entire online suite including SharePoint Online, Outlook Web Applications, Exchange Online and Lync Online. If any customer witnesses an issue in any component of the service, Microsoft addresses the issue under the terms of the Microsoft Online Services Service Level Agreement.

If Google Apps users experience issues with services they also use such as Labs features, Google Voice, Google+ Hangouts, and Google Tasks, the Google Apps 99.9% uptime guarantee does not cover these issues. If high availability for features like these is important, the customer is vulnerable. -- Customers receive no technical support for these services except through customer help center channels.

5. Number of cloud service outages across a range of providers
What I find interesting is that availability track records for Google, Microsoft and Amazon show that we all experience outages and have work ahead of us. CNET summarized on September 9th:

“But Microsoft certainly isn't alone. Google has also seen its share of downtime. Just this past Wednesday, Google Docs was offline for about 30 minutes. In May, the company's Blogger service was unavailable for the greater part of a day. And in 2009, a host of Google services went down briefly throughout the world. Early last month, many Yahoo Mail users around the world were unable to access the service for almost a day. Amazon, too, has experienced outages over the past year with its Elastic Compute Cloud (EC2) service, which hosts the Web sites of many major companies. One disruption in April affected such customers as Quora and Reddit, while another one last month took Netflix, Foursquare, Quora, and Reddit offline.”

6. Amount of resources the provider invests in delivery
Microsoft is investing 90% of its $US 9.6 Billion R&D budget on cloud strategies alone, this year. We have put the right infrastructure in place and are committed to strengthen availability. After all, with thirty years’ experience in servicing customers, and twenty-two years’ experience in providing data center services, Microsoft has invested in a globally distributed datacenter infrastructure. It will take some time for other companies to obtain similar levels of expertise. Meanwhile, it is just this type of investment that attracts prestigious, service firms to begin using Office 365.

“Having Microsoft manage our Office 365 environment is freeing up our IT resources, so we can be a lot more proactive in other areas of our business. The kind of resilience on [Microsoft’s] infrastructure is not something a company can implement without heavy investment.”

--Shane Izaks, IT General Manager, The Hongkong and Shanghai Hotels Limited

7. Amount and timing of customer remedy for any lost productivity
When a Google Apps customer experiences less than 99.9% monthly uptime for more than 5% of its users, the customer receives a 10% service credit in the form of 3 days added to their contract. Should an Office 365 customer experience less than 99.9% monthly uptime for any number of users, Microsoft credits the customer 25% of the monthly amount. In addition, both Microsoft and Google have their respective remedies at 99% and 95% availability tiers. I’m not including the specifics here as they fall along the same lines: service extension vs. financial credit.

In the event of an outage, to remedy lost productivity, we believe our customers value receiving a generous financial credit within the coming month rather than a comparatively less generous service extension with no financial credit in the coming year.

It’s a clear to me. Microsoft’s practices in calculating service availability and addressing availability concerns show that the company values its Office 365 customers. Microsoft’s accountable, financial remedies show it is committed to providing strong service availability.