Architecting your network for successful Azure adoption
Technical Case Study
October 2015 Updated May 2016
Microsoft IT experienced network-related challenges as it moved its application portfolio to the public cloud. To address these challenges, it took a series of steps to improve network performance and reliability, which included analyzing network traffic flow changes, redesigning the corporate network edge to the Internet, and adding ExpressRoute connections for Microsoft Azure.
Technical Case Study, 338KB, Microsoft Word file
Products & Technologies
As Microsoft IT located more of its line-of-business applications to Microsoft Azure and other public cloud-based properties, it faced network infrastructure challenges.
To resolve these issues, Microsoft IT:
The increasing shift of IT systems and workflows to the cloud benefits companies in multiple ways, from hardware and maintenance savings to increased scalability and easier implementation. Many enterprises are eagerly planning their cloud-based strategy for achieving newer, better systems for long-term IT success—and Microsoft is no different. Microsoft IT is discovering the ways that Microsoft Azure can benefit the implementation and operation of the line-of-business (LOB) applications it develops and infrastructure services it supports for their business customers.
Note: Microsoft Azure is a separate business unit from Microsoft IT. Although both are Microsoft entities, all operations are separate, and Microsoft IT is functionally the same as any other Microsoft Azure enterprise customer.
One aspect of this move to the cloud that was not fully appreciated, however, was the effect it would have on existing network infrastructure. Specifically, cloud migration significantly changed the volume and nature of traffic flows within and outside the corporate network, including both user-to-system and system-to-system traffic. Over a 15-month period, Microsoft IT noted a tenfold (10x) increase in traffic from internal corporate networks to the Internet. Most of this traffic was destined for their own public cloud services, replacing the traffic that would have previously traveled to on-premises services on internal networks. Overall, the existing IT networking infrastructure that was in place to deploy and support new cloud-based solutions at Microsoft proved to be insufficient in the following key areas:
Edge demand issues. Traffic destinations for LOB and productivity services shifted from on-premises datacenters to Internet-based cloud services. This created much more demand at edge access points where corporate network users access the public Internet. User productivity was threatened if traffic became bottlenecked at this edge. Instead of merely supporting users who need to view public websites, the Internet edge now needed to support all traffic going to and from the LOB applications and services running in the public cloud.
Application migration challenges. Many LOB applications moving from on-premises data center locations to Microsoft Azure required the migration of large virtual hard disks and data sources. Like the Internet edge challenges, these migrations presented significant delays given the capacity required to transfer data over existing Internet-facing network infrastructure.
Lack of secure, efficient system-to-system traffic flows. Microsoft IT considers many of its LOB applications to be hybrid, consisting of or depending on a mixture of public cloud-based and on-premises resources. Until all of its LOB applications run in Microsoft Azure, Microsoft IT requires a secure design for system-to-system traffic flows between its corporate data centers and Microsoft Azure to support these hybrid applications. Depending on how a hybrid application is designed, accessing Microsoft Azure services using traditional Internet addresses introduces security considerations. It also subjects such traffic to changes in availability (such as slowdowns or outages) that are difficult to predict and can adversely affect application performance. To resolve these issues, in the case of Microsoft IT, a private, secure network connection was needed for these hybrid applications.
Lack of a traditional network “edge” in public cloud services. Microsoft IT policy requires inspection of outbound traffic from internal corporate networks to the Internet to detect data leakage or intrusion attempts. Because public cloud services are inherently multi-tenant, meaning that other customers occupy it as well, Microsoft IT could not assume the network destinations could be trusted. They also could not assume that any assets moved to the public cloud would not travel further, beyond their control and ability to track them simply by examining or limiting network connectivity. (To learn more, see link for “Making security a priority when moving applications and data to the cloud” at the end of this document).
As part of its strategy to address these issues, Microsoft IT performed a preliminary analysis of traffic flows to understand specific changes. It was immediately clear, however, that these issues were already affecting network availability and performance to a degree that in-depth analysis was not feasible. Among its recommendations, Microsoft IT strongly suggests that other enterprises charting a path to public cloud-based productivity resources do a proactive, detailed analysis of traffic modeling change requirements before implementing the cloud-based solution so that issues can be addressed in a more deliberate way.
Microsoft IT had a mandate to
Support more than 200,000 workers in more than 880 locations.
Support more than 2,100 LOB applications running on more than 40,000 servers.
Work with business units to develop roadmaps for moving these applications out of the seven Microsoft data centers and into the Microsoft Azure public cloud.
Microsoft IT needed a way to address the bottlenecks in its network design while continuing to meet the needs of Microsoft and its partners. It also needed to sustain the IT best practices of reducing costs and gaining agility for its own organization.
The key to addressing its network design challenges was for Microsoft IT to focus on improving traffic management between the Microsoft corporate network and the company’s growing number of public cloud-based resources.
Beyond addressing the immediate challenges, it was important for Microsoft IT to anticipate the industry’s cloud roadmap to make best use of their investment. The strategy was to use components that can be used for other types of connectivity, keep in front of development decisions, and be a leader in influencing holistic design patterns.
Long-term, Microsoft IT knew that its network traffic would have less to do with data center connectivity and more to do with maintaining secure and efficient traffic patterns among its cloud-based properties. This understanding led Microsoft IT to begin revising network designs based on new cloud-based traffic flows, security models, and bandwidth priorities.
Using a dual approach, Microsoft IT deployed a technology called ExpressRoute to provide a dedicated private connection to its Microsoft Azure properties. They also redefined their traffic management policy along the existing public network edge to more intelligently reroute various user and server requests based on their destinations.
As part of this approach, Microsoft IT also addressed bandwidth issues on the edge by:
Validating bandwidth to ensure sufficient capacity for the network changes (traffic moving to public cloud-based properties in addition to moving to traditional data center properties).
Providing additional high-performance firewall devices at the public edge to provide sufficient capacity on the corporate network backbone to support all traffic if there is an outage.
Solution Design Principles
The new solution was derived from the following key design principles:
Enable hybrid applications. While it is the stated mission of Microsoft IT to run all its applications in the cloud, this cloud is a mixture of both private cloud (on-premises) resources and public cloud technologies, such as Microsoft Azure, Office 365, SharePoint Online, Microsoft Dynamics Online, and Visual Studio Online. Like many other enterprises, Microsoft IT supports hybrid LOB applications whose resources are located in both cloud types—a necessary solution as long as cloud-based applications need on-premises resources. Microsoft IT uses many hybrid applications and, therefore, it accommodated these applications when redefining its network design.
Provide robust access to Microsoft public cloud services. As a way to establish a more secure and faster method of operating its hybrid LOB applications, Microsoft IT needed to design a way for its system-to-system traffic to use a dedicated route to Microsoft Azure, rather than using the public Internet beyond their network edge. Traffic from the corporate network edge now uses a dedicated path to engage the Microsoft cloud network. In this way, Microsoft IT created a reliable network path with a defined end-to-end service level.
Use software-defined networking. Instead of relying only on network hardware to manage traffic routing for ExpressRoute, Microsoft IT applied software-defined networking principles to enable automatic router configuration changes using a software portal. Developers and other users lacking network expertise can access the portal and request connectivity by describing the interaction of their Microsoft Azure virtual machines with the corporate network. The software performs the required configuration changes on the routers, both at the Microsoft IT data center and Microsoft Azure locations.
Distributed Internet and Azure public edge
Users accessing productivity applications and Microsoft Azure caused a new volume of Internet and public cloud traffic at the corporate network edge. Microsoft IT replaced legacy network proxies with hardware-based firewalls to achieve a more distributed and higher capacity configuration. By implementing both a default Internet and Azure public edge, it allows them to potentially treat Microsoft cloud destinations differently in terms of security and service levels from all other Internet-bound traffic.
Implementing a distributed Internet edge
Understanding the value of a distributed Internet edge means acknowledging that corporate network user traffic to and from the public Internet has changed greatly with the advent of public cloud-based services. Before, most user traffic was occasional and traveled to unknown destinations; now, a high percentage of user traffic goes to and from the public cloud-based services, and almost all users are engaged in this traffic throughout their day.
Typically, all traffic passing between the Microsoft corporate network and the Internet is subject to rigorous scrutiny by various technologies running on dedicated network devices. Data loss prevention (DLP) software inspects each outgoing packet to monitor, detect, and block sensitive data leaving the corporate environment, while intrusion detection services (IDS) monitor incoming packets to identify and isolate any potential malicious activities or violations of IT policy.
For traffic going to unknown locations outside the company (for example, external websites or third-party services), deep packet inspection—scrutinizing all traffic for potential compliance and security issues—is much needed and well worth the significant investment Microsoft IT makes in DLP and IDS technologies. This level of inspection may not be necessary for all situations, or the cloud service itself may offer the functionality, such as DLP in Office 365.
Figure 1. In the Microsoft IT environment, edge security devices identify packets bound for either known or unknown public cloud destinations. The packets are subjected to required levels of inspection based upon security and compliance policies.
Microsoft IT initially deployed ExpressRoute to address the need for reliable, secure connections between on-premises applications and infrastructure services to compute resources in Microsoft Azure. The ExpressRoute technology will be branded with other names by ExpressRoute partners, but the value is the same—helping IT organizations operate hybrid and traditional LOB applications effectively in the Microsoft Azure public cloud without sacrificing security or performance.
Figure 2. ExpressRoute provides private connectivity to Azure compute services on virtual networks using the customer’s own address space and dedicated connectivity to Microsoft public cloud services.
ExpressRoute Service Scenarios
Customers work with partners to connect their network infrastructures on their own premises or in a co-location environment. They have the option of using Exchange Provider or Network Services Provider models. (To learn more, see link for “ExpressRoute pricing and telecommunications scenarios” at the end of this document).
Figure 3. ExpressRoute connectivity can be implemented through partners using two distinct models, Exchange Provider and Network Service Provider.
Microsoft IT implemented the Exchange Provider model, using its own networking group and infrastructure to offer ExpressRoute as a service to its business customers. Similarly, large enterprises with robust telecommunication capabilities in data centers can add ExpressRoute to their networks and then make the newly secure Microsoft Azure connectivity available to business groups within their corporate organizations.
ExpressRoute creates virtual circuits to Microsoft Azure that do not go over the public Internet.
The private peering option uses the customer’s own IP addresses in Microsoft Azure, essentially extending their network into the public cloud. This method helps secure and optimize traffic flows between Microsoft Azure and the corporate network by addressing Microsoft Azure resources using private addressing. Because the Microsoft Azure IP data is decoupled from the request, connection to the Microsoft Azure-based resources is kept private. It can also be a dedicated connection, unencumbered by other network activities. These private peerings allow access to Azure Compute services to be connected to virtual networks. Azure compute services include Azure Infrastructure as a Service (IaaS) virtual machines, Platform as a Service (PaaS) web and worker roles, and web sites.
Public peering options allow access to ExpressRoute-enabled public cloud services exposed to the Internet, not only compute services on Azure virtual networks. This allows for robust connectivity to these public services, without the performance and security uncertainties of using the Internet as transport.
The Azure public variation allows dedicated connectivity to nearly all Azure services. Service-specific ExpressRoute peerings, for example to Office 365, provide this same type of connectivity only to a particular cloud service. In all cases, the ExpressRoute partner and Microsoft provide the service level guarantees after the traffic is handed off to their infrastructure.
Migrating LOB applications
Enhanced connectivity has greatly increased the ability of Microsoft IT to migrate to the public cloud. Modern applications going to PaaS and Software as a Service (SaaS) account for more than 30 percent of their portfolio today and will continue to increase in the future. The applications not being modernized can still function in the public cloud on Azure IaaS, leveraging investments in ExpressRoute private connectivity to allow them to function nearly the same as they did on-premises.
As of September 2015, Microsoft IT has migrated more than 250 of its traditional LOB applications, consisting of nearly 7000 individual virtual machines, to Azure IaaS using ExpressRoute. They have a base commitment of migrating 300 per month, with a maximum capacity to move about 1000 a month, until all possible applications have been moved to Microsoft Azure
More secure, efficient enterprise-level application connectivity. ExpressRoute provides the level of dedicated infrastructure needed for a large enterprise to provide secure, efficient connectivity of its LOB applications to the Microsoft Azure cloud.
Increased availability of critical services. As part of adding ExpressRoute to its network, Microsoft IT implemented redundant hardware throughout its upgrade solution, following its best practice of eliminating single points of failure as it increased the availability and scalability of LOB applications.
Faster time-to-productivity for Microsoft Azure projects. Robust connectivity and ease of migration to Microsoft Azure means that Microsoft IT can make better use of its cloud-based technologies and deliver value to its business teams sooner than was possible before.
Higher bandwidth capacity. The distributed user edge and ExpressRoute solutions reduce bottlenecks in user-to-system and system-to-system network traffic flows respectively, improving the user experience and expanding functionality options for bandwidth-intensive, cloud-based applications.
Improved predictability of performance. Where application and network speed are required, ExpressRoute’s private network design provides better predictability than previous scenarios by using a dedicated connection to Microsoft Azure. This level of predictability makes it possible for Microsoft IT to offer service guarantees that meet the needs of its LOB application owners.
Improved security for outgoing corporate network requests. Because the newly distributed user edge routes eligible packets directly to trusted public cloud resources, and because ExpressRoute uses a virtual private network within Microsoft Azure to use corporate network IP addresses, these technologies help significantly improve the security of both user and server connections that would otherwise have been vulnerable to Internet-based attacks.
Reduced on-premises data center footprint and associated investments. Microsoft IT has begun to realize a reduction in the total cost of ownership (TCO) in data center equipment and maintenance investments. The shift to an on-demand capacity based model allows them to build for current demand, not for peak or for a future anticipated demand that may never materialize. In the longer term, there is more cost avoidance as future on-premises hardware replacement and facilities upgrades are no longer necessary. Microsoft IT also expects significant further savings as more of its LOB applications move to PaaS and SaaS, removing the responsibility of maintaining lower-level components of the stack, such as the operating system and applications themselves.
Analyze network traffic patterns for optimal cloud-based performance. Microsoft IT’s network changes were the result of preliminary analysis that focused on commonly used productivity services and a sample of LOB applications that represented the larger portfolio. This allowed the IT networking team to extrapolate the expected impact on the network and make changes to accommodate the new demands. With more time, a detailed analysis of traffic patterns might have provided additional insights into the new network design. Microsoft IT recommends that any enterprise planning to migrate its services to a public cloud make time to analyze and consider the impacts of this change to its traffic flows, and plan network changes accordingly.
Consider the distributed cloud and Internet edge. As Microsoft IT discovered, replicating its new distributed edge design in multiple locations helps scale out the volume of cloud and Internet-bound traffic. This also reduced their use of dedicated WAN circuits and the corporate backbone, as the traffic was no longer hauled back to central hub sites or data centers to reach the egress points.
Include security teams in the application migration planning process. Successful migration of applications to the public cloud requires both network and security expertise. Research and analysis done by network teams can be audited by security experts to ensure holistic, secure migration planning and positive results.
Understand application flow details before migration. When planning the migration of a hybrid application, it’s important to examine the application architecture to understand which components will remain in the data center and which will be moved to the public cloud. Also, understanding how users access the application is important for understanding traffic flows, and can be used to validate the application’s performance after migration.
Note: Microsoft IT used the BlueStripe FactFinder product and Microsoft System Center Operations Manager data to perform instance-based profiling, both for gauging readiness of an application to be migrated to Azure IaaS and to understand dependencies when application ecosystems were being refactored to PaaS. The FactFinder technology has since been acquired by Microsoft and is being folded into the Operations Management Suite cloud service.
For complex environments, make both a near-term and a long-term plan. Because the Microsoft IT migration of LOB applications is ongoing, not all changes can be addressed by a single solution. Over time, however, hybrid applications will gradually be engineered to require fewer data center resources. The trend toward fully cloud-based applications will continue to change network traffic patterns until the use of traditional data center resources is minimized and most of the services are located in Microsoft Azure.
For more information
Microsoft IT Showcase
ExpressRoute home page
ExpressRoute technical overview
ExpressRoute pricing and telecommunications scenarios
Making security a priority when moving applications and data to the cloud
TechNet Radio: Delivering Results–How Microsoft IT Prepared its Network for the Cloud
© 2015 Microsoft Corporation. All rights reserved. Microsoft and Windows are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.