Implementing a data analytics framework improves data management and reduces costs
Article, 113 KB, Microsoft Word file
Using an improved data analytics framework, Microsoft IT was able to decommission over 25,000 underused servers and right-size another 10,000, saving Microsoft millions of dollars a month. How did we do it? Using enhanced server metadata, we identified thousands of unused or underused servers and then retired, right-sized, or migrated those servers to the cloud. We expect to save even more by using a similar framework to evaluate and improve utilization on the entire Microsoft Azure platform.
Our business intelligence experts used a data analytics framework to consolidate, cleanse, analyze, and publish data, so that everyone from service engineers to executives can get information that they can act on, in the format they need. Stakeholders can access data more easily, which enables them to plan, evaluate, and collaborate more effectively.
Gathering data is not enough
To provide business intelligence about our global data center infrastructure, we gather data from many sources. This data ranges from unstructured information gathered manually, to information that is automatically gathered about the configuration, health, and performance of tens of thousands of devices. Because of variability in how data was being collected and presented, however, it was often challenging to use it effectively.
Before the data analytics framework, issues with datacenter infrastructure, configuration, health, and performance data included:
Data was stored in a variety of formats (for example, SQL databases, Microsoft Excel spreadsheets, and SharePoint lists).
Different groups owned data sources, and data management within those groups varied widely.
Sometimes data was simply missing. At other times, data was duplicated, or data drawn from different sources didn’t agree.
Sometimes data was highly structured and other times it was nearly raw.
Data was sometimes out of date, and sometimes lacked time information.
So even though there was a lot of data, using that data was problematic. For example, different teams came to meetings with conflicting information, or performance was measured in one way in one group and in a different way by another group.
Framing a solution
Our business intelligence specialists began to see that many of the problems they were encountering resulted from inconsistencies in how data was being measured, stored, analyzed, and reported. To address these problems, they built a flexible and highly configurable data analytics framework. The benefits of this data analytics framework were many, including speeding the decision making process, increasing user confidence in data, and helping Microsoft to save millions of dollars.
As shown in Figure 1, the data analytics framework:
Extracted data from a variety of data sources to a single location.
Cleaned and consolidated the extracted data.
Published data as a portfolio of analytics solutions, including dashboards, scorecards, reports, and data views.
Enabled users also to create their reports and graphs.
Alerted operators about data source issues, such as databases being offline.
Prevented processed data from being deleted when the data source became unavailable.
Figure 1. The data analytics framework automatically extracts, cleanses, consolidates, analyzes, and publishes data from a variety of sources.
The first layer of the framework automatically extracts data from a variety of sources, such as spreadsheets, databases, and SharePoint lists.
Cleansing and consolidating data
After data is all in one place, it is automatically cleansed and consolidated. For example, if there appear to be multiple sources of the same data, those sources are evaluated and then data from the most reliable source is used. Information missing from one source can sometimes be filled in from another source. Duplicate data is detected and eliminated. Cleansing and consolidating data from multiple sources are the most important steps because they make data more reliable, unified, and coherent—a single source that all can use with confidence.
After data has been cleansed and consolidated, it is ready for analysis. At this layer of the framework, business rules can be applied and raw data is transformed into actionable information. For example, data about the version of software that is running on servers in a datacenter becomes information about which devices need software updates. Some data consumers may choose to access cleansed data directly and do their own analysis and publishing by using software such as Microsoft Excel, Power BI, or Microsoft SQL Server Reporting Services (SSRS).
After data has been analyzed, the resulting information can be published in a variety of formats (such as dashboards, scorecards, or reports) using SSRS or Power BI, or data views using SQL queries. Senior managers may choose scorecards that track progress toward quarterly business goals. Others may prefer charts and graphs. People interested in security might choose information about software updates, while others might be more interested in information about underused servers. For example, data is used as part of a semiautomatic process that analyzes server utilization. Data is also distributed to other processes and to automation tools.
Putting the framework to work
After spending months optimizing the framework for data management, reporting, and cost savings, we deployed it and made its output available to users. Here are some of the benefits and lessons that we learned along the way.
Some of the benefits provided by the framework are:
Users have confidence that data produced by the framework is the best available. Just being able to access reliable data quickly and easily increases confidence and productivity significantly.
Data becomes highly available in a centralized repository, not a fragmented collection of sources.
A single data source emerges where previously there had often been differing views and interpretations of data. Managers and their employees might use different views of data, but they use the same data set and performance measures.
Users can see information on data dashboards in almost real time.
Self-serve reporting is possible. Using Power BI, data consumers do not have to rely on others to create new or customized views of data.
Scorecards and dashboards enable data-driven business reviews. Instead of disagreeing about data, managers can open a live dashboard and ask, “Why is this number down?”
A semi-automated framework significantly reduces the time required to manage, query, and process data. The framework resulted in an approximately 60 percent reduction in developer time and a 50 percent reduction in operations time.
Consistent data significantly enhances collaboration.
When all data is aggregated, our BI experts can look for inconsistencies and help owners to correct the problem at the source, which improves data quality.
Data center infrastructure security is improved because servers that are running out-of-date software are detected and reported.
Systems become more highly available because of insights into infrastructure design.
Server defects can be identified and eradicated through application of Six-Sigma methodologies.
Support is improved by tracking and reporting information about tickets used to track incidents and changes to servers (such as ticket volume and time required to resolve incidents).
Datacenter capacity and provisioning management are improved.
Best practices and lessons learned
Some of the best practices and lessons we learned include:
Get leaders on board early by showing them the value of the framework. For example, showing server usage data to the person who is paying for the servers makes the value of the framework real. Buy-in and approval from senior managers can significantly facilitate both development and deployment.
Communicate. Give everyone who will use the framework—or whose work will be affected by it—a voice in its development and deployment. A data analytics framework does not just gather and distribute data; it can also select, sort, remove, and reformat it. A data analytics framework also quickly becomes a tool used to make critical decisions. For these reasons, it is important to make a list of all potentially impacted stakeholders and make sure that their voices are heard.
Be decisive. Although all voices should be heard, it is impossible to satisfy everyone’s demands. Giving people a firm answer coupled with a reasonable explanation helps to keep them engaged in the process.
Consider implementing data contracts that define how data is managed and stored by its owners. At the very least, establish and maintain good communication with data owners.
Seek to unify and consolidate data sources, and use your experience with the framework to improve data quality. In the beginning, this can be as simple as emailing data owners about any questions or shortcomings in their data.
Try to make the framework immune to data variability and anomalies. For example, if new data becomes unavailable, existing data should not simply be replaced with nothing. Instead, the system should pause and retain known, good data until the source stabilizes.
Implement high standards for data transformation and presentation so that information is to-the-point and actionable. A data dashboard that is poorly designed can significantly reduce the usefulness of the data it presents. Users with specialized needs can use tools or services such as Power BI to create custom views of data with the scope and detail that they require.
Giver users as much self-service access to information as possible, because systems and data change constantly.
Using the framework saves millions
The Microsoft Hosting Resource and Recovery (HRR) program examines our server utilization for efficiency. Using the data analytics framework significantly increased the speed and ease of tracking server utilization. In 18 months, HRR was able to decommission almost 25,000 servers, reducing the server footprint at our data centers by over 30 percent, and saving the company millions of dollars per month.
Next, the HRR program started moving overprovisioned virtual machines to devices with more appropriate (and significantly less expensive) capabilities. Using the data analytics framework, HRR was able to right-size 10,000 servers in one year. At an average savings of $200 per server per month, right-sizing those 10,000 servers saved 24 million dollars in a single year.
The data analytics framework enhanced the HRR program’s capabilities by enabling it to track server usage over time and determine not just that a server was underused, but for how long. It identified and turned off devices hosted on the Azure platform that were used only periodically—for example, only on weekdays during the last three weeks of a quarter. That saved over 10 million hours of operating costs.
We plan to apply the same cost-cutting principles based on data analytics beyond our data centers. There are hundreds of thousands of virtual machines running on the Azure platform, many at just 2 percent utilization. We expect that increasing the net utilization of those resources to 15 percent is well within reach.
Our data analytics framework, combined with products like Microsoft SQL Server and Power BI, are saving time, clarifying goals and metrics, enhancing confidence, and helping our business to run faster and better.
For more information
Microsoft IT Showcase
© 2016 Microsoft Corporation. All rights reserved. Microsoft and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.