Implementing VPN split tunneling for Office 365

Note

This topic is part of a set of topics that address Office 365 optimization for remote users.

For many years enterprises have been using VPNs to support remote experiences for their users. Whilst core workloads remained on-premises, a VPN from the remote client routed through a datacenter on the corporate network was the primary method for remote users to access corporate resources. To safeguard these connections, enterprises build layers of network security solutions along the VPN paths. This was done to protect internal infrastructure as well as to safeguard mobile browsing of external web sites by rerouting traffic into the VPN and then out through the on-premises Internet perimeter. VPNs, network perimeters and associated security infrastructure were often purpose built and scaled for a defined volume of traffic, typically with the majority of connectivity being initiated from within the corporate network, and most of it staying withing the internal network boundaries.

For quite some time, VPN models where all connections from the remote user device are routed back into the on-premises network (known as forced tunneling) were largely sustainable as long as the concurrent scale of remote users was modest and the traffic volumes traversing VPN were low. Some customers continued to use VPN force tunneling as the status quo even after their applications moved from inside the corporate perimeter to public SaaS clouds, Office 365 being a prime example.

The use of forced tunneled VPNs for connecting to distributed and performance sensitive cloud applications is extremely suboptimal, but the negative impact of that may have been accepted by some enterprises so as to maintain the status quo from a security perspective. An example diagram of this scenario can be seen below:

Split Tunnel VPN configuration

This problem has been growing for a number of years, with many customers reporting a significant shift of network traffic patterns. Traffic which used to stay on premises now connects to external cloud endpoints. Numerous Microsoft customers report that previously, around 80% of their network traffic was to some internal source (represented by the dotted line in the above diagram). In 2020 that number is now around 20% or lower as they have shifted major workloads to the cloud, these trends are not uncommon with other enterprises. Over time, as the cloud journey progresses, the above model becomes increasingly cumbersome and unsustainable, preventing an organization from being agile as they move into a cloud first world.

The worldwide COVID-19 crisis has escalated this problem to require immediate remediation. The need to ensure employee safety has generated unprecedented demands on enterprise IT to support work-from-home productivity at a massive scale. Microsoft Office 365 is well positioned to help customers fulfil that demand, but high concurrency of users working from home generates a large volume of Office 365 traffic which, if routed through forced tunnel VPN and on-premises network perimeters, causes rapid saturation and runs VPN infrastructure out of capacity. In this new reality, using VPN to access Office 365 is no longer just a performance impediment, but a hard wall which not only impacts Office 365 but critical business operations which still have to rely on the VPN to operate.

Microsoft has been working closely with customers and the wider industry for many years to provide effective, modern solutions to these problems from within our own services, and to align with industry best practice. Connectivity principles for the Office 365 service have been designed to work efficiently for remote users whilst still allowing an organization to maintain security and control over their connectivity. These solutions can also be implemented very quickly with limited work yet achieve a significant positive impact on the problems outlined above.

Microsoft's recommended strategy for optimizing remote worker's connectivity is focused on rapidly alleviating the problems with the traditional approach and also providing high performance with a few simple steps. These steps adjust the legacy VPN approach for a small number of defined endpoints which bypass bottlenecked VPN servers. An equivalent or even superior security model can be applied at different layers to remove the need to secure all traffic at the egress of the corporate network. In most cases this can be effectively achieved within hours and is then scalable to other workloads as requirements demand and time allows.

Common VPN scenarios

In the list below you'll see the most common VPN scenarios seen in enterprise environments. Most customers traditionally operate model 1 (VPN Forced Tunnel). This section will help you to quickly and securely transition to model 2, which is achievable with relatively little effort, and which has enormous benefits to network performance and user experience.

Model Description
1. VPN Forced Tunnel 100% of traffic goes into VPN tunnel, including on-premise, Internet and all O365/M365
2. VPN Forced Tunnel with few exceptions VPN tunnel is used by default (default route points to VPN), with few, most important exempt scenarios that are allowed to go direct
3. VPN Forced Tunnel with broad exceptions VPN tunnel is used by default (default route points to VPN), with broad exceptions that are allowed to go direct (such as all Office 365, All Salesforce, All Zoom)
4. VPN Selective Tunnel VPN tunnel is used only for corpnet based services. Default route (Internet and all Internet based services) goes direct.
5. No VPN A variation of #2, where instead of legacy VPN, all corpnet services are published through modern security approaches (like Zscaler ZPA, AAD Proxy/MCAS, etc)

1. VPN Forced Tunnel

This is the most common starting scenario for most enterprise customers. A forced VPN is used which means 100% of traffic is directed into the corporate network regardless of the fact the endpoint resides within the corporate network or not. Any external (Internet) bound traffic such as Office 365 or Internet browsing is then hairpinned back out of the on premises security equipment such as proxies. In the current climate with nearly 100% of users working remotely, this model therefore puts extremely high load on the VPN infrastructure and is likely to significantly hinder performance of all corporate traffic and thus the enterprise to operate efficiently at a time of crisis.

VPN Forced Tunnel model 1

2. VPN Forced Tunnel with a small number of trusted exceptions

This model is significantly more efficient for an enterprise to operate under as it allows a small number of controlled and defined endpoints which are very high load and latency sensitive to bypass the VPN tunnel and go direct to the Office 365 service in this example. This significantly improves the performance for the offloaded services, and also decreases the load on the VPN infrastructure, thus allowing elements which still require it to operate with lower contention for resources. It is this model which this article concentrates on assisting with the transition to as it allows for simple, defined actions to be taken very quickly with numerous positive outcomes.

Split Tunnel VPN model 2

3. VPN Forced Tunnel with broad exceptions

The third model broadens the scope of model two as rather than just sending a small group of defined endpoints direct, it instead sends all traffic to trusted services such Office 365, SalesForce etc. direct. This further reduces the load on the corporate VPN infrastructure and improves the performance of the services defined. As this model is likely to take more time to assess the feasibility of and implement, it is likely a step which can be taken iteratively at a later date once model two is successfully in place.

Split Tunnel VPN model 3

4. VPN selective Tunnel

This model reverses the third model in that only traffic identified as having a corporate IP address is sent down the VPN tunnel and thus the Internet path is the default route for everything else. This model requires an organization to be well on the path to Zero Trust in able to safely implement this model. It should be noted that this model or some variation thereof will likely become the necessary default over time as more and more services move away from the corporate network and into the cloud. Microsoft uses this model internally; you can find more information on Microsoft's implementation of VPN split tunneling at Running on VPN: How Microsoft is keeping its remote workforce connected.

Split Tunnel VPN model 4

5. No VPN

A more advanced version of model number two, whereby any internal services are published through a modern security approach or SDWAN solution such as Azure AD Proxy, MCAS, Zscaler ZPA etc.

Split Tunnel VPN model 5

Implement VPN split tunneling

In this section, you'll find the simple steps required to migrate your VPN client architecture from a VPN forced tunnel to a VPN forced tunnel with a small number of trusted exceptions, VPN split tunnel model #2 in the Common VPN scenarios section.

The diagram below illustrates how the recommended VPN split tunnel solution works:

Split tunnel VPN solution detail

1. Identify the endpoints to optimize

In the Office 365 URLs and IP address ranges topic, Microsoft clearly identifies the key endpoints you need to optimize and categorizes them as Optimize. There are currently just four URLS which need to be optimized and twenty IP subnets. This small group of endpoints accounts for around 70% - 80% of the volume of traffic to the Office 365 service including the latency sensitive endpoints such as those for Teams media. Essentially this is the traffic that we need to take special care of and is also the traffic which will put incredible pressure on traditional network paths and VPN infrastructure.

URLs in this category have the following characteristics:

  • Are Microsoft owned and managed endpoints, hosted on Microsoft infrastructure
  • Have IPs provided
  • Low rate of change and are expected to remain small in number (currently 20 IP subnets)
  • Are bandwidth and/or latency sensitive
  • Are able to have required security elements provided in the service rather than inline on the network
  • Account for around 70-80% of the volume of traffic to the Office 365 service

Note

Microsoft has committed to suspending changes to Optimize endpoints for Office 365 until at least June 30 2020, allowing customers to focus on other challenges rather than maintaining the endpoint whitelist once initially implemented. This article will be updated to reflect any future changes.

For more information about Office 365 endpoints and how they are categorized and managed, see the article Managing Office 365 endpoints.

Optimize URLs

The current Optimize URLs can be found in the table below. Under most circumstances, you should only need to use URL endpoints in a browser PAC file where the endpoints are configured to be sent direct, rather than to the proxy.

Optimize URLs Port/Protocol Purpose
https://outlook.office365.com TCP 443 This is one of the primary URLs Outlook uses to connect to its Exchange Online server and has a high volume of bandwidth usage and connection count. Low network latency is required for online features including: instant search, other mailbox calendars, free / busy lookup, manage rules and alerts, Exchange online archive, emails departing the outbox.
https://outlook.office.com TCP 443 This URL is used for Outlook Online Web Access to connect to Exchange Online server, and is sensitive to network latency. Connectivity is particularly required for large file upload and download with SharePoint Online.
https://<tenant>.sharepoint.com TCP 443 This is the primary URL for SharePoint Online and has high bandwidth usage.
https://<tenant>-my.sharepoint.com TCP 443 This is the primary URL for OneDrive for Business and has high bandwidth usage and possibly high connection count from the OneDrive for Business Sync tool.
Teams Media IPs (no URL) UDP 3478, 3479, 3480, and 3481 Relay Discovery allocation and real time traffic (3478), Audio (3479), Video (3480), and Video Screen Sharing (3481). These are the endpoints used for Skype for Business and Microsoft Teams Media traffic (calls, meetings etc). Most endpoints are provided when the Microsoft Teams client establishes a call (and are contained within the required IPs listed for the service). Use of the UDP protocol is required for optimal media quality.

In the above examples, tenant should be replaced with your Office 365 tenant name. For example, contoso.onmicrosoft.com would use contoso.sharepoint.com and constoso-my.sharepoint.com.

Optimize IP address ranges

At the time of writing the IP ranges which these endpoints correspond to are as follows. It is very strongly advised you use a script such as this example, the Office 365 IP and URL web service or the URL/IP page to check for any updates when applying the configuration, and put a policy in place to do so on a regular basis.

104.146.128.0/17
13.107.128.0/22
13.107.136.0/22
13.107.18.10/31
13.107.6.152/31
13.107.64.0/18
131.253.33.215/32
132.245.0.0/16
150.171.32.0/22
150.171.40.0/22
191.234.140.0/22
204.79.197.215/32
23.103.160.0/20
40.104.0.0/15
40.108.128.0/17
40.96.0.0/13
52.104.0.0/14
52.112.0.0/14
52.96.0.0/14
52.120.0.0/14

2. Optimize access to these endpoints via the VPN

Now that we have identified these critical endpoints, we need to divert them away from the VPN tunnel and allow them to use the user's local Internet connection to connect directly to the service. The manner in which this is accomplished will vary depending on the VPN product and machine platform used but most VPN solutions will allow some simple configuration of policy to apply this logic. For information VPN platform-specific split tunnel guidance, see HOWTO guides for common VPN platforms.

If you wish to test the solution manually, you can execute the following PowerShell example to emulate the solution at the route table level. This example adds a route for each of the Teams Media IP subnets into the route table. You can test Teams media performance before and after, and observe the difference in routes for the specified endpoints.

Example: Add Teams Media IP subnets into the route table

$intIndex = "" # index of the interface connected to the internet
$gateway = "" # default gateway of that interface
$destPrefix = "52.120.0.0/14", "52.112.0.0/14", "13.107.64.0/18" # Teams Media endpoints
# Add routes to the route table
foreach ($prefix in $destPrefix) {New-NetRoute -DestinationPrefix $prefix -InterfaceIndex $intIndex -NextHop $gateway}

In the above script, $intIndex is the index of the interface connected to the internet (find by running get-netadapter in PowerShell; look for the value of ifIndex) and $gateway is the default gateway of that interface (find by running ipconfig in a command prompt or (Get-NetIPConfiguration | Foreach IPv4DefaultGateway).NextHop in PowerShell).

Once you have added the routes, you can confirm that the route table is correct by running route print in a command prompt or PowerShell. The output should contain the routes you added, showing the interface index (22 in this example) and the gateway for that interface (192.168.1.1 in this example):

Route print output

To add routes for all current IP address ranges in the Optimize category, you can use the following script variation to query the Office 365 IP and URL web service for the current set of Optimize IP subnets and add them to the route table.

Example: Add all Optimize subnets into the route table

$intIndex = "" # index of the interface connected to the internet
$gateway = "" # default gateway of that interface
# Query the web service for IPs in the Optimize category
$ep = Invoke-RestMethod ("https://endpoints.office.com/endpoints/worldwide?clientrequestid=" + ([GUID]::NewGuid()).Guid)
# Output only IPv4 Optimize IPs to $optimizeIps
$destPrefix = $ep | where {$_.category -eq "Optimize"} | Select-Object -ExpandProperty ips | Where-Object { $_ -like '*.*' }
# Add routes to the route table
foreach ($prefix in $destPrefix) {New-NetRoute -DestinationPrefix $prefix -InterfaceIndex $intIndex -NextHop $gateway}

If you inadvertently added routes with incorrect parameters or simply wish to revert your changes, you can remove the routes you just added with the following command:

foreach ($prefix in $destPrefix) {Remove-NetRoute -DestinationPrefix $prefix -InterfaceIndex $intIndex -NextHop $gateway}

The VPN client should be configured so that traffic to the Optimize IPs are routed in this way. This allows the traffic to utilize local Microsoft resources such as Office 365 Service Front Doors such as the Azure Front Door which deliver Office 365 services and connectivity endpoints as close to your users as possible. This allows us to deliver extremely high performance levels to users wherever they are in the world and takes full advantage of Microsoft's world class global network, which is very likely within a small number of milliseconds of your users' direct egress.

Configuring and securing Teams media traffic

Some administrators may require more detailed information on how call flows operate in Teams using a split tunneling model and how connections are secured.

Configuration

For both calls and meetings, as long as the required Optimize IP subnets for Teams media are correctly in place in the route table, when Teams calls the GetBestRoute method to determine which interface it should use for a particular destination, the local interface will be returned for Microsoft destinations in the Microsoft IP blocks listed above.

Some VPN client software allows routing manipulation based on URL. However, Teams media traffic has no URL associated with it, so control of routing for this traffic must be done using IP subnets.

In certain scenarios, often unrelated to Teams client configuration, media traffic still traverses the VPN tunnel even with the correct routes in place. If you encounter this scenario then using a firewall rule to block the Teams IP subnets or ports from using the VPN should suffice.

Note

A current requirement for this to work in 100% of scenarios is to also add the IP range 13.107.60.1/32. This should not be necessary very shortly due to an update in the Teams client due for release by June 2020. We will update this article with the build details as soon as this information is available.

Signalling traffic is performed over HTTPS and is not as latency sensitive as the media traffic and is marked as Allow in the URL/IP data and thus can safely be routed through the VPN client if desired.

Security

One common argument for avoiding split tunnels is that it is less secure to do so, i.e any traffic that does not go through the VPN tunnel will not benefit from whatever encryption scheme is applied to the VPN tunnel, and is therefore less secure.

The main counter-argument to this is that media traffic is already encrypted via Secure Real-Time Transport Protocol (SRTP), a profile of Real-Time Transport Protocol (RTP) that provides confidentiality, authentication, and replay attack protection to RTP traffic. SRTP itself relies on a randomly generated session key, which is exchanged via the TLS secured signaling channel. This is covered in great detail within this security guide, but the primary section of interest is media encryption.

Media traffic is encrypted using SRTP, which uses a session key generated by a secure random number generator and exchanged using the signaling TLS channel. In addition, media flowing in both directions between the Mediation Server and its internal next hop is also encrypted using SRTP.

Skype for Business Online generates username/passwords for secure access to media relays over Traversal Using Relays around NAT (TURN). Media relays exchange the username/password over a TLS-secured SIP channel. It is worth noting that even though a VPN tunnel may be used to connect the client to the corporate network, the traffic still needs to flow in its SRTP form when it leaves the corporate network to reach the service.

Information on how Teams mitigates common security concerns such as voice or Session Traversal Utilities for NAT (STUN) amplification attacks can be found in this article.

You can also read about modern security controls in remote work scenarios at Alternative ways for security professionals and IT to achieve modern security controls in today's unique remote work scenarios (Microsoft Security Team blog).

Testing

Once the policy is in place, you should confirm it is working as expected. There are multiple ways of testing the path is correctly set to use the local Internet connection:

  • Run the Microsoft 365 connectivity test which will run connectivity tests for you including trace routes as above. We're also adding in VPN tests into this tooling which should also provide some additional insight.

  • A simple tracert to an endpoint within scope of the split tunnel should show the path taken, for example:

    tracert worldaz.tr.teams.microsoft.com
    

    You should then see a path via the local ISP to this endpoint which should resolve to an IP in the Teams ranges we have configured for split tunneling.

  • Take a network capture using a tool such as Wireshark. Filter on UDP during a call and you should see traffic flowing to an IP in the Teams Optimize range. If the VPN tunnel is being used for this traffic, then the media traffic will not be visible in the trace.

Additional support logs

If you need further data to troubleshoot, or are requesting assistance from Microsoft support, obtaining the following information should allow you to expedite finding a solution. Microsoft support's TSS Windows CMD based universal TroubleShooting Script toolset can help you to collect the relevant logs in a simple manner. The tool and instructions on use can be found at https://aka.ms/TssTools.

HOWTO guides for common VPN platforms

This section provides links to detailed guides for implementing split tunneling for Office 365 traffic from the most common partners in this space. We'll add additional guides as they become available.

FAQ

The Microsoft Security Team have published an article which outlines key ways for security professionals and IT can achieve modern security controls in today's unique remote work scenarios. In addition, below are some of the common customer questions and answers on this subject.

How do I stop users accessing other tenants I do not trust where they could exfiltrate data?

The answer is a feature called tenant restrictions. Authentication traffic is not high volume nor especially latency sensitive so can be sent through the VPN solution to the on-premises proxy where the feature is applied. An allow list of trusted tenants is maintained here and if the client attempts to obtain a token to a tenant which is not trusted, the proxy simply denies the request. If the tenant is trusted, then a token is accessible if the user has the right credentials and rights.

So even though a user can make a TCP/UDP connection to the Optimize marked endpoints above, without a valid token to access the tenant in question, they simply cannot login and access/move any data.

Does this model allow access to consumer services such as personal OneDrive accounts?

No, it does not, the Office 365 endpoints are not the same as the consumer services (Onedrive.live.com as an example) so the split tunnel will not allow a user to directly access consumer services. Traffic to consumer endpoints will continue to use the VPN tunnel and existing policies will continue to apply.

How do I apply DLP and protect my sensitive data when the traffic no longer flows through my on-premises solution?

To help you prevent the accidental disclosure of sensitive information, Office 365 has a rich set of built-in tools. You can use the built-in DLP capabilities of Teams and SharePoint to detect inappropriately stored or shared sensitive information. If part of your remote work strategy involves a bring-your-own-device (BYOD) policy, you can use Conditional Access App Control to prevent sensitive data from being downloaded to users' personal devices

How do I evaluate and maintain control of the user's authentication when they are connecting directly?

In addition to the tenant restrictions feature noted in Q1, conditional access policies can be applied to dynamically assess the risk of an authentication request and react appropriately. Microsoft recommends the Zero Trust model is implemented over time and we can use Azure AD conditional access policies to maintain control in a mobile and cloud first world. Conditional access policies can be used to make a real-time decision on whether an authentication request is successful based on numerous factors such as:

  • Device, is the device known/trusted/Domain joined?
  • IP – is the authentication request coming from a known corporate IP address? Or from a country we do not trust?
  • Application – Is the user authorized to use this application?

We can then trigger policy such as approve, trigger MFA or block authentication based on these policies.

How do I protect against viruses and malware?

Again, Office 365 provides protection for the Optimize marked endpoints in various layers in the service itself, outlined in this document. As noted, it is vastly more efficient to provide these security elements in the service itself rather than try and do it in line with devices which may not fully understand the protocols/traffic.By default, SharePoint Online automatically scans file uploads for known malware

For the Exchange endpoints listed above, Exchange Online Protection and Office 365 Advanced Threat Protection do an excellent job of providing security of the traffic to the service.

Can I send more than just the Optimize traffic direct?

Priority should be given to the Optimize marked endpoints as these will give maximum benefit for a low level of work. However, if you wish, the Allow marked endpoints are required for the service to work and have IPs provided for the endpoints which can be used if required.

There are also various vendors who offer cloud based proxy/security solutions called secure web gateways which provide central security, control and corporate policy application for general web browsing. These solutions can work well in a cloud first world, if highly available, performant, and provisioned close to your users by allowing secure Internet access to be delivered from a cloud based location close to the user. This removes the need for a hairpin through the VPN/corporate network for general browsing traffic, whilst still allowing central security control.

Even with these solutions in place however, Microsoft still strongly recommends that Optimize marked Office 365 traffic is sent direct to the service.

For guidance on allowing direct access to an Azure Virtual Network, see the article Remote work using Azure VPN Gateway Point-to-site.

Why is port 80 required? Is traffic sent in the clear?

Port 80 is only used for things like redirect to a port 443 session, no customer data is sent or is accessible over port 80. This article outlines encryption for data in transit and at rest for Office 365, and this article outlines how we use SRTP to protect Teams media traffic.

Does this advice apply to users in China using a worldwide instance of Office 365?

No, it does not. The one caveat to the above advice is users in the PRC who are connecting to a worldwide instance of Office 365. Due to the common occurrence of cross border network congestion in the region, direct Internet egress performance can be variable. Most customers in the region operate using a VPN to bring the traffic into the corporate network and utilize their authorized MPLS circuit or similar to egress outside the country via an optimized path. This is outlined further in the article Office 365 performance optimization for China users.

Overview: VPN split tunneling for Office 365

Office 365 performance optimization for China users

Alternative ways for security professionals and IT to achieve modern security controls in today's unique remote work scenarios (Microsoft Security Team blog)

Enhancing VPN performance at Microsoft: using Windows 10 VPN profiles to allow auto-on connections

Running on VPN: How Microsoft is keeping its remote workforce connected

Office 365 Network Connectivity Principles

Assessing Office 365 network connectivity

Office 365 network and performance tuning