Azure Operational Security best practices

This article provides a set of operational best practices for protecting your data, applications, and other assets in Azure.

The best practices are based on a consensus of opinion, and they work with current Azure platform capabilities and feature sets. Opinions and technologies change over time and this article is updated on a regular basis to reflect those changes.

Define and deploy strong operational security practices

Azure operational security refers to the services, controls, and features available to users for protecting their data, applications, and other assets in Azure. Azure operational security is built on a framework that incorporates the knowledge gained through capabilities that are unique to Microsoft, including the Security Development Lifecycle (SDL), the Microsoft Security Response Center program, and deep awareness of the cybersecurity threat landscape.

Manage and monitor user passwords

The following table lists some best practices related to managing user passwords:

Best practice: Ensure you have the proper level of password protection in the cloud.
Detail: Follow the guidance in Microsoft Password Guidance, which is scoped to users of the Microsoft identity platforms (Azure Active Directory, Active Directory, and Microsoft account).

Best practice: Monitor for suspicious actions related to your user accounts.
Detail: Monitor for users at risk and risky sign-ins by using Azure AD security reports.

Best practice: Automatically detect and remediate high-risk passwords.
Detail: Azure AD Identity Protection is a feature of the Azure AD Premium P2 edition that enables you to:

  • Detect potential vulnerabilities that affect your organization’s identities
  • Configure automated responses to detected suspicious actions that are related to your organization’s identities
  • Investigate suspicious incidents and take appropriate actions to resolve them

Receive incident notifications from Microsoft

Be sure your security operations team receives Azure incident notifications from Microsoft. An incident notification lets your security team know you have compromised Azure resources so they can quickly respond to and remediate potential security risks.

In the Azure enrollment portal, you can ensure admin contact information includes details that notify security operations. Contact information is an email address and phone number.

Organize Azure subscriptions into management groups

If your organization has many subscriptions, you might need a way to efficiently manage access, policies, and compliance for those subscriptions. Azure management groups provide a level of scope that’s above subscriptions. You organize subscriptions into containers called management groups and apply your governance conditions to the management groups. All subscriptions within a management group automatically inherit the conditions applied to the management group.

You can build a flexible structure of management groups and subscriptions into a directory. Each directory is given a single top-level management group called the root management group. This root management group is built into the hierarchy to have all management groups and subscriptions fold up to it. The root management group allows global policies and RBAC assignments to be applied at the directory level.

Here are some best practices for using management groups:

Best practice: Ensure that new subscriptions apply governance elements like policies and permissions as they are added.
Detail: Use the root management group to assign enterprise-wide security elements that apply to all Azure assets. Policies and permissions are examples of elements.

Best practice: Align the top levels of management groups with segmentation strategy to provide a point for control and policy consistency within each segment.
Detail: Create a single management group for each segment under the root management group. Don’t create any other management groups under the root.

Best practice: Limit management group depth to avoid confusion that hampers both operations and security.
Detail: Limit your hierarchy to three levels, including the root.

Best practice: Carefully select which items to apply to the entire enterprise with the root management group.
Detail: Ensure root management group elements have a clear need to be applied across every resource and that they’re low impact.

Good candidates include:

  • Regulatory requirements that have a clear business impact (for example, restrictions related to data sovereignty)
  • Requirements with near-zero potential negative affect on operations, like policy with audit effect or RBAC permission assignments that have been carefully reviewed

Best practice: Carefully plan and test all enterprise-wide changes on the root management group before applying them (policy, RBAC model, and so on).
Detail: Changes in the root management group can affect every resource on Azure. While they provide a powerful way to ensure consistency across the enterprise, errors or incorrect usage can negatively affect production operations. Test all changes to the root management group in a test lab or production pilot.

Streamline environment creation with blueprints

The Azure Blueprints service enables cloud architects and central information technology groups to define a repeatable set of Azure resources that implements and adheres to an organization's standards, patterns, and requirements. Azure Blueprints makes it possible for development teams to rapidly build and stand up new environments with a set of built-in components and the confidence that they're creating those environments within organizational compliance.

Monitor storage services for unexpected changes in behavior

Diagnosing and troubleshooting issues in a distributed application hosted in a cloud environment can be more complex than it is in traditional environments. Applications can be deployed in a PaaS or IaaS infrastructure, on-premises, on a mobile device, or in some combination of these environments. Your application's network traffic might traverse public and private networks, and your application might use multiple storage technologies.

You should continuously monitor the storage services that your application uses for any unexpected changes in behavior (such as slower response times). Use logging to collect more detailed data and to analyze a problem in depth. The diagnostics information that you obtain from both monitoring and logging helps you to determine the root cause of the issue that your application encountered. Then you can troubleshoot the issue and determine the appropriate steps to remediate it.

Azure Storage Analytics performs logging and provides metrics data for an Azure storage account. We recommend that you use this data to trace requests, analyze usage trends, and diagnose issues with your storage account.

Prevent, detect, and respond to threats

Azure Security Center helps you prevent, detect, and respond to threats by providing increased visibility into (and control over) the security of your Azure resources. It provides integrated security monitoring and policy management across your Azure subscriptions, helps detect threats that might otherwise go unnoticed, and works with various security solutions.

The Free tier of Security Center offers limited security for only your Azure resources. The Standard tier extends these capabilities to on-premises and other clouds. Security Center Standard helps you find and fix security vulnerabilities, apply access and application controls to block malicious activity, detect threats by using analytics and intelligence, and respond quickly when under attack. You can try Security Center Standard at no cost for the first 60 days. We recommend that you upgrade your Azure subscription to Security Center Standard.

Use Security Center to get a central view of the security state of all your Azure resources. At a glance, verify that the appropriate security controls are in place and configured correctly, and quickly identify any resources that need attention.

Security Center also integrates with Windows Defender Advanced Threat Protection (ATP), which provides comprehensive Endpoint Detection and Response (EDR) capabilities. With Windows Defender ATP integration, you can spot abnormalities. You can also detect and respond to advanced attacks on server endpoints monitored by Security Center.

Almost all enterprise organizations have a security information and event management (SIEM) system to help identify emerging threats by consolidating log information from diverse signal gathering devices. The logs are then analyzed by a data analytics system to help identify what’s “interesting” from the noise that is inevitable in all log gathering and analytics solutions.

Azure Sentinel is a scalable, cloud-native, security information and event management (SIEM) and security orchestration automated response (SOAR) solution. Azure Sentinel provides intelligent security analytics and threat intelligence via alert detection, threat visibility, proactive hunting, and automated threat response.

Here are some best practices for preventing, detecting, and responding to threats:

Best practice: Increase the speed and scalability of your SIEM solution by using a cloud-based SIEM.
Detail: Investigate the features and capabilities of Azure Sentinel and compare them with the capabilities of what you’re currently using on-premises. Consider adopting Azure Sentinel if it meets your organization’s SIEM requirements.

Best practice: Find the most serious security vulnerabilities so you can prioritize investigation.
Detail: Review your Azure secure score to see the recommendations resulting from the Azure policies and initiatives built into Azure Security Center. These recommendations help address top risks like security updates, endpoint protection, encryption, security configurations, missing WAF, internet-connected VMs, and many more.

The secure score, which is based on Center for Internet Security (CIS) controls, lets you benchmark your organization’s Azure security against external sources. External validation helps validate and enrich your team’s security strategy.

Best practice: Monitor the security posture of machines, networks, storage and data services, and applications to discover and prioritize potential security issues.
Detail: Follow the security recommendations in Security Center starting, with the highest priority items.

Best practice: Integrate Security Center alerts into your security information and event management (SIEM) solution.
Detail: Most organizations with a SIEM use it as a central clearinghouse for security alerts that require an analyst response. Processed events produced by Security Center are published to the Azure Activity Log, one of the logs available through Azure Monitor. Azure Monitor offers a consolidated pipeline for routing any of your monitoring data into a SIEM tool. See Integrate security solutions in Security Center for instructions. If you’re using Azure Sentinel, see Connect Azure Security Center.

Best practice: Integrate Azure logs with your SIEM.
Detail: Use Azure Monitor to gather and export data. This practice is critical for enabling security incident investigation, and online log retention is limited. If you’re using Azure Sentinel, see Connect data sources.

Best practice: Speed up your investigation and hunting processes and reduce false positives by integrating Endpoint Detection and Response (EDR) capabilities into your attack investigation.
Detail: Enable Windows Defender ATP integration via your Security Center security policy. Consider using Azure Sentinel for threat hunting and incident response.

Monitor end-to-end scenario-based network monitoring

Customers build an end-to-end network in Azure by combining network resources like a virtual network, ExpressRoute, Application Gateway, and load balancers. Monitoring is available on each of the network resources.

Azure Network Watcher is a regional service. Use its diagnostic and visualization tools to monitor and diagnose conditions at a network scenario level in, to, and from Azure.

The following are best practices for network monitoring and available tools.

Best practice: Automate remote network monitoring with packet capture.
Detail: Monitor and diagnose networking issues without logging in to your VMs by using Network Watcher. Trigger packet capture by setting alerts and gain access to real-time performance information at the packet level. When you see an issue, you can investigate in detail for better diagnoses.

Best practice: Gain insight into your network traffic by using flow logs.
Detail: Build a deeper understanding of your network traffic patterns by using network security group flow logs. Information in flow logs helps you gather data for compliance, auditing, and monitoring your network security profile.

Best practice: Diagnose VPN connectivity issues.
Detail: Use Network Watcher to diagnose your most common VPN Gateway and connection issues. You can not only identify the issue but also use detailed logs to further investigate.

Secure deployment by using proven DevOps tools

Use the following DevOps best practices to ensure that your enterprise and teams are productive and efficient.

Best practice: Automate the build and deployment of services.
Detail: Infrastructure as code is a set of techniques and practices that help IT pros remove the burden of day-to-day build and management of modular infrastructure. It enables IT pros to build and maintain their modern server environment in a way that’s like how software developers build and maintain application code.

You can use Azure Resource Manager to provision your applications by using a declarative template. In a single template, you can deploy multiple services along with their dependencies. You use the same template to repeatedly deploy your application in every stage of the application lifecycle.

Best practice: Automatically build and deploy to Azure web apps or cloud services.
Detail: You can configure your Azure DevOps Projects to automatically build and deploy to Azure web apps or cloud services. Azure DevOps automatically deploys the binaries after doing a build to Azure after every code check-in. The package build process is equivalent to the Package command in Visual Studio, and the publishing steps are equivalent to the Publish command in Visual Studio.

Best practice: Automate release management.
Detail: Azure Pipelines is a solution for automating multiple-stage deployment and managing the release process. Create managed continuous deployment pipelines to release quickly, easily, and often. With Azure Pipelines, you can automate your release process, and you can have predefined approval workflows. Deploy on-premises and to the cloud, extend, and customize as required.

Best practice: Check your app's performance before you launch it or deploy updates to production.
Detail: Run cloud-based load tests to:

  • Find performance problems in your app.
  • Improve deployment quality.
  • Make sure that your app is always available.
  • Make sure that your app can handle traffic for your next launch or marketing campaign.

Apache JMeter is a free, popular open source tool with a strong community backing.

Best practice: Monitor application performance.
Detail: Azure Application Insights is an extensible application performance management (APM) service for web developers on multiple platforms. Use Application Insights to monitor your live web application. It automatically detects performance anomalies. It includes analytics tools to help you diagnose issues and to understand what users actually do with your app. It's designed to help you continuously improve performance and usability.

Mitigate and protect against DDoS

Distributed denial of service (DDoS) is a type of attack that tries to exhaust application resources. The goal is to affect the application’s availability and its ability to handle legitimate requests. These attacks are becoming more sophisticated and larger in size and impact. They can be targeted at any endpoint that is publicly reachable through the internet.

Designing and building for DDoS resiliency requires planning and designing for a variety of failure modes. Following are best practices for building DDoS-resilient services on Azure.

Best practice: Ensure that security is a priority throughout the entire lifecycle of an application, from design and implementation to deployment and operations. Applications can have bugs that allow a relatively low volume of requests to use a lot of resources, resulting in a service outage.
Detail: To help protect a service running on Microsoft Azure, you should have a good understanding of your application architecture and focus on the five pillars of software quality. You should know typical traffic volumes, the connectivity model between the application and other applications, and the service endpoints that are exposed to the public internet.

Ensuring that an application is resilient enough to handle a denial of service that's targeted at the application itself is most important. Security and privacy are built into the Azure platform, beginning with the Security Development Lifecycle (SDL). The SDL addresses security at every development phase and ensures that Azure is continually updated to make it even more secure.

Best practice: Design your applications to scale horizontally to meet the demand of an amplified load, specifically in the event of a DDoS attack. If your application depends on a single instance of a service, it creates a single point of failure. Provisioning multiple instances makes your system more resilient and more scalable.
Detail: For Azure App Service, select an App Service plan that offers multiple instances.

For Azure Cloud Services, configure each of your roles to use multiple instances.

For Azure Virtual Machines, ensure that your VM architecture includes more than one VM and that each VM is included in an availability set. We recommend using virtual machine scale sets for autoscaling capabilities.

Best practice: Layering security defenses in an application reduces the chance of a successful attack. Implement secure designs for your applications by using the built-in capabilities of the Azure platform.
Detail: The risk of attack increases with the size (surface area) of the application. You can reduce the surface area by using whitelisting to close down the exposed IP address space and listening ports that are not needed on the load balancers (Azure Load Balancer and Azure Application Gateway).

Network security groups are another way to reduce the attack surface. You can use service tags and application security groups to minimize complexity for creating security rules and configuring network security, as a natural extension of an application’s structure.

You should deploy Azure services in a virtual network whenever possible. This practice allows service resources to communicate through private IP addresses. Azure service traffic from a virtual network uses public IP addresses as source IP addresses by default.

Using service endpoints switches service traffic to use virtual network private addresses as the source IP addresses when they're accessing the Azure service from a virtual network.

We often see customers' on-premises resources getting attacked along with their resources in Azure. If you're connecting an on-premises environment to Azure, minimize exposure of on-premises resources to the public internet.

Azure has two DDoS service offerings that provide protection from network attacks:

  • Basic protection is integrated into Azure by default at no additional cost. The scale and capacity of the globally deployed Azure network provides defense against common network-layer attacks through always-on traffic monitoring and real-time mitigation. Basic requires no user configuration or application changes and helps protect all Azure services, including PaaS services like Azure DNS.
  • Standard protection provides advanced DDoS mitigation capabilities against network attacks. It's automatically tuned to protect your specific Azure resources. Protection is simple to enable during the creation of virtual networks. It can also be done after creation and requires no application or resource changes.

Enable Azure Policy

Azure Policy is a service in Azure that you use to create, assign, and manage policies. These policies enforce rules and effects over your resources, so those resources stay compliant with your corporate standards and service-level agreements. Azure Policy meets this need by evaluating your resources for non-compliance with assigned policies.

Enable Azure Policy to monitor and enforce your organization’s written policy. This will ensure compliance with your company or regulatory security requirements by centrally managing security policies across your hybrid cloud workloads. Learn how to create and manage policies to enforce compliance. See Azure Policy definition structure for an overview of the elements of a policy.

Here are some security best practices to follow after you adopt Azure Policy:

Best practice: Policy supports several types of effects. You can read about them in Azure Policy definition structure. Business operations can be negatively affected by the deny effect and the remediate effect, so start with the audit effect to limit the risk of negative impact from policy.
Detail: Start policy deployments in audit mode and then later progress to deny or remediate. Test and review the results of the audit effect before you move to deny or remediate.

For more information, see Create and manage policies to enforce compliance.

Best practice: Identify the roles responsible for monitoring for policy violations and ensuring the right remediation action is taken quickly.
Detail: Have the assigned role monitor compliance through the Azure portal or via the command line.

Best practice: Azure Policy is a technical representation of an organization’s written policies. Map all Azure policies to organizational policies to reduce confusion and increase consistency.
Detail: Document mapping in your organization’s documentation or in the Azure policy itself by adding a reference to the organizational policy in the Azure policy description or the Azure policy initiative description.

Monitor Azure AD risk reports

The vast majority of security breaches take place when attackers gain access to an environment by stealing a user’s identity. Discovering compromised identities is no easy task. Azure AD uses adaptive machine learning algorithms and heuristics to detect suspicious actions that are related to your user accounts. Each detected suspicious action is stored in a record called a risk event. Risk events are recorded in Azure AD security reports. For more information, read about the users at risk security report and the risky sign-ins security report.

Next steps

See Azure security best practices and patterns for more security best practices to use when you’re designing, deploying, and managing your cloud solutions by using Azure.

The following resources are available to provide more general information about Azure security and related Microsoft services: