Create a resilient access control management strategy with Microsoft Entra ID

Note

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft can't guarantee the accuracy of any information presented after the date of publication.

Organizations that rely on a single access control, such as multifactor authentication or a single network location, to secure their IT systems are susceptible to access failures to their apps and resources if that single access control becomes unavailable or misconfigured. For example, a natural disaster can result in the unavailability of large segments of telecommunications infrastructure or corporate networks. Such a disruption could prevent end users and administrators from being able to sign in.

This document provides guidance on strategies an organization should adopt to provide resilience to reduce the risk of lockout during unforeseen disruptions with the following scenarios:

  • Organizations can increase their resiliency to reduce the risk of lockout before a disruption by implementing mitigation strategies or contingency plans.
  • Organizations can continue to access apps and resources they choose during a disruption by having mitigation strategies and contingency plans in place.
  • Organizations should make sure they preserve information, such as logs, after a disruption and before they roll back any contingencies they implemented.
  • Organizations that haven’t implemented prevention strategies or alternative plans may be able to implement emergency options to deal with the disruption.

Key guidance

There are four key takeaways in this document:

  • Avoid administrator lockout by using emergency access accounts.
  • Implement MFA using Conditional Access rather than per-user MFA.
  • Mitigate user lockout by using multiple Conditional Access controls.
  • Mitigate user lockout by provisioning multiple authentication methods or equivalents for each user.

Before a disruption

Mitigating an actual disruption must be an organization’s primary focus in dealing with access control issues that may arise. Mitigating includes planning for an actual event plus implementing strategies to make sure access controls and operations are unaffected during disruptions.

Why do you need resilient access control?

Identity is the control plane of users accessing apps and resources. Your identity system controls which users and under which conditions, such as access controls or authentication requirements, users get access to the applications. When one or more authentication or access control requirements aren’t available for users to authenticate due to unforeseen circumstances, organizations can experience one or both of the following issues:

  • Administrator lockout: Administrators can’t manage the tenant or services.
  • User lockout: Users can’t access apps or resources.

Administrator lockout contingency

To unlock admin access to your tenant, you should create emergency access accounts. These emergency access accounts, also known as break glass accounts, allow access to manage Microsoft Entra configuration when normal privileged account access procedures aren’t available. At least two emergency access accounts should be created following the emergency access account recommendations.

Mitigating user lockout

To mitigate the risk of user lockout, use Conditional Access policies with multiple controls to give users a choice of how they access apps and resources. By giving a user the choice between, for example, signing in with MFA or signing in from a managed device or signing in from the corporate network, if one of the access controls is unavailable the user has other options to continue to work.

Microsoft recommendations

Incorporate the following access controls in your existing Conditional Access policies for organization:

  • Provision multiple authentication methods for each user that rely on different communication channels, for example, the Microsoft Authenticator app (internet-based), OATH token (generated on-device), and SMS (telephonic).
  • Deploy Windows Hello for Business on Windows 10 devices to satisfy MFA requirements directly from device sign-in.
  • Use trusted devices via Microsoft Entra hybrid join or Microsoft Intune. Trusted devices improve user experience because the trusted device itself can satisfy the strong authentication requirements of policy without an MFA challenge to the user. MFA will then be required when enrolling a new device and when accessing apps or resources from untrusted devices.
  • Use Microsoft Entra ID Protection risk-based policies that prevent access when the user or sign-in is at risk in place of fixed MFA policies.
  • If you're protecting VPN access using Microsoft Entra multifactor authentication NPS extension, consider federating your VPN solution as a SAML app and determine the app category as recommended below.

Note

Risk-based policies require Microsoft Entra ID P2 licenses.

The following example describes policies you must create to provide a resilient access control for user to access their apps and resources. In this example, you require a security group AppUsers with the target users you want to give access to, a group named CoreAdmins with the core administrators, and a group named EmergencyAccess with the emergency access accounts. This example policy set will grant selected users in AppUsers, access to selected apps if they're connecting from a trusted device OR provide strong authentication, for example MFA. It excludes emergency accounts and core administrators.

Conditional Access mitigation policies set:

  • Policy 1: Block access to people outside target groups
    • Users and Groups: Include all users. Exclude AppUsers, CoreAdmins, and EmergencyAccess
    • Cloud Apps: Include all apps
    • Conditions: (None)
    • Grant Control: Block
  • Policy 2: Grant access to AppUsers requiring MFA OR trusted device.
    • Users and Groups: Include AppUsers. Exclude CoreAdmins, and EmergencyAccess
    • Cloud Apps: Include all apps
    • Conditions: (None)
    • Grant Control: Grant access, require multifactor authentication, require device to be compliant. For multiple controls: Require one of the selected controls.

Contingencies for user lockout

Alternatively, your organization can also create contingency policies. To create contingency policies, you must define tradeoff criteria between business continuity, operational cost, financial cost, and security risks. For example, you may activate a contingency policy only to a subset of users, for a subset of apps, for a subset of clients, or from a subset of locations. Contingency policies give administrators and end users access to apps and resources, during a disruption when no mitigation method was implemented. Microsoft recommends enabling contingency policies in report-only mode when not in use so that administrators can monitor the potential impact of the policies should they need to be turned on.

Understanding your exposure during a disruption helps reduce your risk and is a critical part of your planning process. To create your contingency plan, first determine the following business requirements of your organization:

  1. Determine your mission critical apps ahead of time: What are the apps that you must give access to, even with a lower risk/security posture? Build a list of these apps and make sure your other stakeholders (business, security, legal, leadership) all agree that if all access control goes away, these apps still must continue to run. You're likely going to end up with categories of:
    • Category 1 mission critical apps that can't be unavailable for more than a few minutes, for example Apps that directly affect the revenue of the organization.
    • Category 2 important apps that the business needs to be accessible within a few hours.
    • Category 3 low-priority apps that can withstand a disruption of a few days.
  2. For apps in category 1 and 2, Microsoft recommends you preplan what type of level of access you want to allow:
    • Do you want to allow full access or restricted session, like limiting downloads?
    • Do you want to allow access to part of the app but not the whole app?
    • Do you want to allow information worker access and block administrator access until the access control is restored?
  3. For those apps, Microsoft also recommends you plan which avenues of access you'll deliberately open and which ones you'll close:
    • Do you want to allow browser only access and block rich clients that can save offline data?
    • Do you want to allow access only for users inside the corporate network and keep outside users blocked?
    • Do you want to allow access from certain countries or regions only during the disruption?
    • Do you want policies to the contingency policies, especially for mission critical apps, to fail or succeed if an alternative access control isn't available?

Microsoft recommendations

A contingency Conditional Access policy is a backup policy that omits Microsoft Entra multifactor authentication, third-party MFA, risk-based or device-based controls. In order to minimize unexpected disruption when a contingency policy is enabled, the policy should remain in report-only mode when not in use. Administrators can monitor the potential impact of their contingency policies using the Conditional Access Insights workbook. When your organization decides to activate your contingency plan, administrators can enable the policy and disable the regular control-based policies.

Important

Disabling policies that enforce security on your users, even temporarily, will reduce your security posture while the contingency plan is in place.

  • Configure a set of fallback policies if a disruption in one credential type or one access control mechanism impacts access to your apps. Configure a policy in report-only state that requires Domain Join as a control, as a backup for an active policy that requires a third-party MFA provider.
  • Reduce the risk of bad actors guessing passwords, when MFA isn't required, by following the practices in the password guidance white paper.
  • Deploy Microsoft Entra Self-Service Password Reset (SSPR) and Microsoft Entra Password Protection to make sure users don’t use common password and terms you choose to ban.
  • Use policies that restrict the access within the apps if a certain authentication level isn't attained instead of simply falling back to full access. For example:
    • Configure a backup policy that sends the restricted session claim to Exchange and SharePoint.
    • If your organization uses Microsoft Defender for Cloud Apps, consider falling back to a policy that engages Defender for Cloud Apps and then allow read-only access but not uploads.
  • Name your policies to make sure it's easy to find them during a disruption. Include the following elements in the policy name:
    • A label number for the policy.
    • Text to show, this policy is for emergencies only. For example: ENABLE IN EMERGENCY
    • The disruption it applies to. For example: During MFA Disruption
    • A sequence number to show the order you must activate the policies.
    • The apps it applies to.
    • The controls it will apply.
    • The conditions it requires.

This naming standard for the contingency policies are as follows:

EMnnn - ENABLE IN EMERGENCY: [Disruption][i/n] - [Apps] - [Controls] [Conditions]

The following example: Example A - Contingency Conditional Access policy to restore Access to mission-critical Collaboration Apps, is a typical corporate contingency. In this scenario, the organization typically requires MFA for all Exchange Online and SharePoint Online access, and the disruption in this case is the MFA provider for the customer has an outage (whether Microsoft Entra multifactor authentication, on-premises MFA provider, or third-party MFA). This policy mitigates this outage by allowing specific targeted users access to these apps from trusted Windows devices only when they're accessing the app from their trusted corporate network. It will also exclude emergency accounts and core administrators from these restrictions. The targeted users will then gain access to Exchange Online and SharePoint Online, while other users will still not have access to the apps due to the outage. This example requires a named network location CorpNetwork and a security group ContingencyAccess with the target users, a group named CoreAdmins with the core administrators, and a group named EmergencyAccess with the emergency access accounts. The contingency requires four policies to provide the desired access.

Example A - Contingency Conditional Access policies to restore Access to mission-critical Collaboration Apps:

  • Policy 1: Require Domain Joined devices for Exchange and SharePoint
    • Name: EM001 - ENABLE IN EMERGENCY: MFA Disruption[1/4] - Exchange SharePoint - Require Microsoft Entra hybrid join
    • Users and Groups: Include ContingencyAccess. Exclude CoreAdmins, and EmergencyAccess
    • Cloud Apps: Exchange Online and SharePoint Online
    • Conditions: Any
    • Grant Control: Require Domain Joined
    • State: Report-only
  • Policy 2: Block platforms other than Windows
    • Name: EM002 - ENABLE IN EMERGENCY: MFA Disruption[2/4] - Exchange SharePoint - Block access except Windows
    • Users and Groups: Include all users. Exclude CoreAdmins, and EmergencyAccess
    • Cloud Apps: Exchange Online and SharePoint Online
    • Conditions: Device Platform Include All Platforms, exclude Windows
    • Grant Control: Block
    • State: Report-only
  • Policy 3: Block networks other than CorpNetwork
    • Name: EM003 - ENABLE IN EMERGENCY: MFA Disruption[3/4] - Exchange SharePoint - Block access except Corporate Network
    • Users and Groups: Include all users. Exclude CoreAdmins, and EmergencyAccess
    • Cloud Apps: Exchange Online and SharePoint Online
    • Conditions: Locations Include any location, exclude CorpNetwork
    • Grant Control: Block
    • State: Report-only
  • Policy 4: Block EAS Explicitly
    • Name: EM004 - ENABLE IN EMERGENCY: MFA Disruption[4/4] - Exchange - Block EAS for all users
    • Users and Groups: Include all users
    • Cloud Apps: Include Exchange Online
    • Conditions: Client apps: Exchange Active Sync
    • Grant Control: Block
    • State: Report-only

Order of activation:

  1. Exclude ContingencyAccess, CoreAdmins, and EmergencyAccess from the existing MFA policy. Verify a user in ContingencyAccess can access SharePoint Online and Exchange Online.
  2. Enable Policy 1: Verify users on Domain Joined devices who aren't in the exclude groups are able to access Exchange Online and SharePoint Online. Verify users in the Exclude group can access SharePoint Online and Exchange from any device.
  3. Enable Policy 2: Verify users who aren't in the exclude group can't get to SharePoint Online and Exchange Online from their mobile devices. Verify users in the Exclude group can access SharePoint and Exchange from any device (Windows/iOS/Android).
  4. Enable Policy 3: Verify users who aren't in the exclude groups can't access SharePoint and Exchange off the corporate network, even with a domain joined machine. Verify users in the Exclude group can access SharePoint and Exchange from any network.
  5. Enable Policy 4: Verify all users can't get Exchange Online from the native mail applications on mobile devices.
  6. Disable the existing MFA policy for SharePoint Online and Exchange Online.

In this next example, Example B - Contingency Conditional Access policies to allow mobile access to Salesforce, a business app’s access is restored. In this scenario, the customer typically requires their sales employees access to Salesforce (configured for single-sign on with Microsoft Entra ID) from mobile devices to only be allowed from compliant devices. The disruption in this case is that there's an issue with evaluating device compliance and the outage is happening at a sensitive time where the sales team needs access to Salesforce to close deals. These contingency policies grants critical users access to Salesforce from a mobile device so that they can continue to close deals and not disrupt the business. In this example, SalesforceContingency contains all the Sales employees who need to retain access and SalesAdmins contains necessary admins of Salesforce.

Example B - Contingency Conditional Access policies:

  • Policy 1: Block everyone not in the SalesContingency team
    • Name: EM001 - ENABLE IN EMERGENCY: Device Compliance Disruption[1/2] - Salesforce - Block All users except SalesforceContingency
    • Users and Groups: Include all users. Exclude SalesAdmins and SalesforceContingency
    • Cloud Apps: Salesforce.
    • Conditions: None
    • Grant Control: Block
    • State: Report-only
  • Policy 2: Block the Sales team from any platform other than mobile (to reduce surface area of attack)
    • Name: EM002 - ENABLE IN EMERGENCY: Device Compliance Disruption[2/2] - Salesforce - Block All platforms except iOS and Android
    • Users and Groups: Include SalesforceContingency. Exclude SalesAdmins
    • Cloud Apps: Salesforce
    • Conditions: Device Platform Include All Platforms, exclude iOS and Android
    • Grant Control: Block
    • State: Report-only

Order of activation:

  1. Exclude SalesAdmins and SalesforceContingency from the existing device compliance policy for Salesforce. Verify a user in the SalesforceContingency group can access Salesforce.
  2. Enable Policy 1: Verify users outside of SalesContingency can't access Salesforce. Verify users in the SalesAdmins and SalesforceContingency can access Salesforce.
  3. Enable Policy 2: Verify users in the SalesContingency group can't access Salesforce from their Windows/Mac laptops but can still access from their mobile devices. Verify SalesAdmin can still access Salesforce from any device.
  4. Disable the existing device compliance policy for Salesforce.

Contingencies for user lockout from on-premises resources (NPS extension)

If you're protecting VPN access using Microsoft Entra multifactor authentication NPS extension, consider federating your VPN solution as a SAML app and determine the app category as recommended below.

If you have deployed Microsoft Entra multifactor authentication NPS extension to protect on-premises resources, such as VPN and Remote Desktop Gateway, with MFA - you should consider in advance if you're ready to disable MFA in a case of emergency.

In this case, you can disable the NPS extension, as a result, the NPS server will only verify primary authentication and won't enforce MFA on the users.

Disable NPS extension:

  • Export the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AuthSrv\Parameters registry key as a backup.
  • Delete the registry values for “AuthorizationDLLs” and “ExtensionDLLs”, not the Parameters key.
  • Restart the Network Policy Service (IAS) service for the changes to take effect
  • Determine if primary authentication for VPN is successful.

Once the service has recovered and you're ready to enforce MFA on your users again, enable the NPS extension:

  • Import the registry key from backup HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AuthSrv\Parameters
  • Restart the Network Policy Service (IAS) service for the changes to take effect
  • Determine if primary authentication and secondary authentication for VPN is successful.
  • Review NPS server and the VPN log to determine which users have signed in during the emergency window.

Deploy password hash sync even if you're federated or use pass-through authentication

User lockout can also occur if the following conditions are true:

  • Your organization uses a hybrid identity solution with pass-through authentication or federation.
  • Your on-premises identity systems (such as Active Directory, AD FS, or a dependent component) are unavailable.

To be more resilient, your organization should enable password hash sync, because it enables you to switch to using password hash sync if your on-premises identity systems are down.

Microsoft recommendations

Enable password hash sync using the Microsoft Entra Connect wizard, regardless whether your organization uses federation or pass-through authentication.

Important

It isn't required to convert users from federated to managed authentication to use password hash sync.

During a disruption

If you opted for implementing a mitigation plan, you're able to automatically survive a single access control disruption. However, if you opted to create a contingency plan, you're able to activate your contingency policies during the access control disruption:

  1. Enable your contingency policies that grant targeted users, access to specific apps, from specific networks.
  2. Disable your regular control-based policies.

Microsoft recommendations

Depending on which mitigations or contingencies are used during a disruption, your organization could be granting access with just passwords. No safeguard is a considerable security risk that must be weighed carefully. Organizations must:

  1. As part of your change control strategy, document every change and the previous state to be able to roll back any contingencies you implemented as soon as the access controls are fully operational.
  2. Assume that malicious actors will attempt to harvest passwords through password spray or phishing attacks while you disabled MFA. Also, bad actors might already have passwords that previously didn't grant access to any resource that can be attempted during this window. For critical users such as executives, you can partially mitigate this risk by resetting their passwords before disabling MFA for them.
  3. Archive all sign-in activity to identify who access what during the time MFA was disabled.
  4. Triage all risk detections reported during this window.

After a disruption

Undo the changes you made as part of the activated contingency plan once the service is restored that caused the disruption.

  1. Enable the regular policies
  2. Disable your contingency policies back to report-only mode.
  3. Roll back any other changes you made and documented during the disruption.
  4. If you used an emergency access account, remember to regenerate credentials and physically secure the new credentials details as part of your emergency access account procedures.
  5. Continue to Triage all risk detections reported after the disruption for suspicious activity.
  6. Revoke all refresh tokens that were issued to target a set of users. Revoking all refresh tokens is important for privileged accounts used during the disruption and doing it will force them to reauthenticate and meet the control of the restored policies.

Emergency options

In an emergency and your organization didn't previously implement a mitigation or contingency plan, then follow the recommendations in the Contingencies for user lockout section if they already use Conditional Access policies to enforce MFA. If your organization is using per-user MFA legacy policies, then you can consider the following alternative:

  • If you have the corporate network outbound IP address, you can add them as trusted IPs to enable authentication only to the corporate network.
  • If you don’t have the inventory of outbound IP addresses, or you required to enable access inside and outside the corporate network, you can add the entire IPv4 address space as trusted IPs by specifying 0.0.0.0/1 and 128.0.0.0/1.

Important

If you broaden the trusted IP addresses to unblock access, risk detections associated with IP addresses (for example, impossible travel or unfamiliar locations) won't be generated.

Note

Configuring trusted IPs for Microsoft Entra multifactor authentication is only available with Microsoft Entra ID P1 or P2 licenses.

Learn more