Isolation in the Azure Public Cloud

Azure allows you to run applications and virtual machines (VMs) on shared physical infrastructure. One of the prime economic motivations to running applications in a cloud environment is the ability to distribute the cost of shared resources among multiple customers. This practice of multi-tenancy improves efficiency by multiplexing resources among disparate customers at low costs. Unfortunately, it also introduces the risk of sharing physical servers and other infrastructure resources to run your sensitive applications and VMs that may belong to an arbitrary and potentially malicious user.

This article outlines how Azure provides isolation against both malicious and non-malicious users and serves as a guide for architecting cloud solutions by offering various isolation choices to architects.

Tenant Level Isolation

One of the primary benefits of cloud computing is concept of a shared, common infrastructure across numerous customers simultaneously, leading to economies of scale. This concept is called multi-tenancy. Microsoft works continuously to ensure that the multi-tenant architecture of Microsoft Cloud Azure supports security, confidentiality, privacy, integrity, and availability standards.

In the cloud-enabled workplace, a tenant can be defined as a client or organization that owns and manages a specific instance of that cloud service. With the identity platform provided by Microsoft Azure, a tenant is simply a dedicated instance of Microsoft Entra ID that your organization receives and owns when it signs up for a Microsoft cloud service.

Each Microsoft Entra directory is distinct and separate from other Microsoft Entra directories. Just like a corporate office building is a secure asset specific to only your organization, a Microsoft Entra directory was also designed to be a secure asset for use by only your organization. The Microsoft Entra architecture isolates customer data and identity information from co-mingling. This means that users and administrators of one Microsoft Entra directory can't accidentally or maliciously access data in another directory.

Azure Tenancy

Azure tenancy (Azure Subscription) refers to a “customer/billing” relationship and a unique tenant in Microsoft Entra ID. Tenant level isolation in Microsoft Azure is achieved using Microsoft Entra ID and Azure role-based access control offered by it. Each Azure subscription is associated with one Microsoft Entra directory.

Users, groups, and applications from that directory can manage resources in the Azure subscription. You can assign these access rights using the Azure portal, Azure command-line tools, and Azure Management APIs. A Microsoft Entra tenant is logically isolated using security boundaries so that no customer can access or compromise co-tenants, either maliciously or accidentally. Microsoft Entra ID runs on “bare metal” servers isolated on a segregated network segment, where host-level packet filtering and Windows Firewall block unwanted connections and traffic.

Diagram showing Azure tenancy.

  • Access to data in Microsoft Entra ID requires user authentication via a security token service (STS). Information on the user’s existence, enabled state, and role is used by the authorization system to determine whether the requested access to the target tenant is authorized for this user in this session.

  • Tenants are discrete containers and there's no relationship between these.

  • No access across tenants unless tenant admin grants it through federation or provisioning user accounts from other tenants.

  • Physical access to servers that comprise the Microsoft Entra service, and direct access to Microsoft Entra ID’s back-end systems, is restricted.

  • Microsoft Entra users have no access to physical assets or locations, and therefore it isn't possible for them to bypass the logical Azure RBAC policy checks stated following.

For diagnostics and maintenance needs, an operational model that employs a just-in-time privilege elevation system is required and used. Microsoft Entra Privileged Identity Management (PIM) introduces the concept of an eligible admin. Eligible admins should be users that need privileged access now and then, but not every day. The role is inactive until the user needs access, then they complete an activation process and become an active admin for a predetermined amount of time.

Microsoft Entra Privileged Identity Management

Microsoft Entra ID hosts each tenant in its own protected container, with policies and permissions to and within the container solely owned and managed by the tenant.

The concept of tenant containers is deeply ingrained in the directory service at all layers, from portals all the way to persistent storage.

Even when metadata from multiple Microsoft Entra tenants is stored on the same physical disk, there's no relationship between the containers other than what is defined by the directory service, which in turn is dictated by the tenant administrator.

Azure role-based access control (Azure RBAC)

Azure role-based access control (Azure RBAC) helps you to share various components available within an Azure subscription by providing fine-grained access management for Azure. Azure RBAC enables you to segregate duties within your organization and grant access based on what users need to perform their jobs. Instead of giving everybody unrestricted permissions in Azure subscription or resources, you can allow only certain actions.

Azure RBAC has three basic roles that apply to all resource types:

  • Owner has full access to all resources including the right to delegate access to others.

  • Contributor can create and manage all types of Azure resources but can’t grant access to others.

  • Reader can view existing Azure resources.

Azure role-based access control (Azure RBAC)

The rest of the Azure roles in Azure allow management of specific Azure resources. For example, the Virtual Machine Contributor role allows the user to create and manage virtual machines. It doesn't give them access to the Azure Virtual Network or the subnet that the virtual machine connects to.

Azure built-in roles list the roles available in Azure. It specifies the operations and scope that each built-in role grants to users. If you're looking to define your own roles for even more control, see how to build Custom roles in Azure RBAC.

Some other capabilities for Microsoft Entra ID include:

  • Microsoft Entra ID enables SSO to SaaS applications, regardless of where they're hosted. Some applications are federated with Microsoft Entra ID, and others use password SSO. Federated applications can also support user provisioning and password vaulting.

  • Access to data in Azure Storage is controlled via authentication. Each storage account has a primary key (storage account key, or SAK) and a secondary secret key (the shared access signature, or SAS).

  • Microsoft Entra ID provides Identity as a Service through federation by using Active Directory Federation Services, synchronization, and replication with on-premises directories.

  • Microsoft Entra multifactor authentication requires users to verify sign-ins by using a mobile app, phone call, or text message. It can be used with Microsoft Entra ID to help secure on-premises resources with the Multi-Factor Authentication Server, and also with custom applications and directories using the SDK.

  • Microsoft Entra Domain Services lets you join Azure virtual machines to an Active Directory domain without deploying domain controllers. You can sign in to these virtual machines with your corporate Active Directory credentials and administer domain-joined virtual machines by using Group Policy to enforce security baselines on all your Azure virtual machines.

  • Azure Active Directory B2C provides a highly available global-identity management service for consumer-facing applications that scales to hundreds of millions of identities. It can be integrated across mobile and web platforms. Your consumers can sign in to all your applications through customizable experiences by using their existing social accounts or by creating credentials.

Isolation from Microsoft Administrators & Data Deletion

Microsoft takes strong measures to protect your data from inappropriate access or use by unauthorized persons. These operational processes and controls are backed by the Online Services Terms, which offer contractual commitments that govern access to your data.

  • Microsoft engineers don't have default access to your data in the cloud. Instead, they're granted access, under management oversight, only when necessary. That access is carefully controlled and logged, and revoked when it's no longer needed.
  • Microsoft may hire other companies to provide limited services on its behalf. Subcontractors may access customer data only to deliver the services for which, we have hired them to provide, and they're prohibited from using it for any other purpose. Further, they're contractually bound to maintain the confidentiality of our customers’ information.

Business services with audited certifications such as ISO/IEC 27001 are regularly verified by Microsoft and accredited audit firms, which perform sample audits to attest that access, only for legitimate business purposes. You can always access your own customer data at any time and for any reason.

If you delete any data, Microsoft Azure deletes the data, including any cached or backup copies. For in-scope services, that deletion will occur within 90 days after the end of the retention period. (In-scope services are defined in the Data Processing Terms section of our Online Services Terms.)

If a disk drive used for storage suffers a hardware failure, it's securely erased or destroyed before Microsoft returns it to the manufacturer for replacement or repair. The data on the drive is overwritten to ensure that the data can't be recovered by any means.

Compute Isolation

Microsoft Azure provides various cloud-based computing services that include a wide selection of compute instances & services that can scale up and down automatically to meet the needs of your application or enterprise. These compute instance and service offer isolation at multiple levels to secure data without sacrificing the flexibility in configuration that customers demand.

Isolated Virtual Machine Sizes

Azure Compute offers virtual machine sizes that are Isolated to a specific hardware type and dedicated to a single customer. The Isolated sizes live and operate on specific hardware generation and will be deprecated when the hardware generation is retired or new hardware generation is available.

Isolated virtual machine sizes are best suited for workloads that require a high degree of isolation from other customers’ workloads. This is sometimes required to meet compliance and regulatory requirements. Utilizing an isolated size guarantees that your virtual machine is the only one running on that specific server instance.

Additionally, as the Isolated size VMs are large, customers may choose to subdivide the resources of these VMs by using Azure support for nested virtual machines.

The current Isolated virtual machine offerings include:

  • Standard_E80ids_v4
  • Standard_E80is_v4
  • Standard_E104i_v5
  • Standard_E104is_v5
  • Standard_E104id_v5
  • Standard_E104ids_v5
  • Standard_M192is_v2
  • Standard_M192ims_v2
  • Standard_M192ids_v2
  • Standard_M192idms_v2
  • Standard_F72s_v2
  • Standard_M128ms

Note

Isolated VM Sizes have a limited lifespan due to hardware deprecation.

Deprecation of Isolated VM Sizes

Isolated VM sizes have a hardware limited lifespan. Azure issues reminders 12 months in advance of the official deprecation date of the sizes and provides an updated isolated offering for your consideration. The following sizes have retirement announced.

Size Isolation Retirement Date
Standard_DS15_v2 May 15, 2021
Standard_D15_v2 May 15, 2021
Standard_G5 February 15, 2022
Standard_GS5 February 15, 2022
Standard_E64i_v3 February 15, 2022
Standard_E64is_v3 February 15, 2022
Standard_M192is_v2 March 31, 2027
Standard_M192ims_v2 March 31, 2027
Standard_M192ids_v2 March 31, 2027
Standard_M192idms_v2 March 31, 2027

FAQ

Q: Is the size going to get retired or only its "isolation" feature?

A: Any size that is published as isolated but have no "i" in the name, the isolation feature of the VM sizes is being retired unless communicated differently. Sizes with "i" in the name will be deprecated.

Q: Is there a downtime when my vm lands on a nonisolated hardware?

A: For VM sizes, where only isolation is deprecating but not the size, no action is needed and there will be no downtime. On contrary if isolation is required, announcement includes the recommended replacement size. Selecting the replacement size requires customers to resize their VMs.

Q: Is there any cost delta for moving to a nonisolated virtual machine?

A: No

Q: When are the other isolated sizes going to retire?

A: We provide reminders 12 months in advance of the official deprecation of the isolated size. Our latest announcement includes isolation feature retirement of Standard_G5, Standard_GS5, Standard_E64i_v3 and Standard_E64i_v3.

Q: I'm an Azure Service Fabric Customer relying on the Silver or Gold Durability Tiers. Does this change impact me?

A: No. The guarantees provided by Service Fabric's Durability Tiers will continue to function even after this change. If you require physical hardware isolation for other reasons, you may still need to take one of the actions described above.

Q: What are the milestones for D15_v2 or DS15_v2 isolation retirement?

A:

Date Action
May 15, 20201 D/DS15_v2 isolation retirement announcement
May 15, 2021 D/DS15_v2 isolation guarantee removed

1 Existing customer using these sizes will receive an announcement email with detailed instructions on the next steps.

Q: What are the milestones for G5, Gs5, E64i_v3 and E64is_v3 isolation retirement?

A:

Date Action
Feb 15, 20211 G5/GS5/E64i_v3/E64is_v3 isolation retirement announcement
Feb 28, 2022 G5/GS5/E64i_v3/E64is_v3 isolation guarantee removed

1 Existing customer using these sizes will receive an announcement email with detailed instructions on the next steps.

Next steps

Customers can also choose to further subdivide the resources of these Isolated virtual machines by using Azure support for nested virtual machines.

Dedicated hosts

In addition to the isolated hosts described in the preceding section, Azure also offers dedicated hosts. Dedicated hosts in Azure is a service that provides physical servers that can host one or more virtual machines, and which are dedicated to a single Azure subscription. Dedicated hosts provide hardware isolation at the physical server level. No other VMs will be placed on your hosts. Dedicated hosts are deployed in the same datacenters and share the same network and underlying storage infrastructure as other, non-isolated hosts. For more information, see the detailed overview of Azure dedicated hosts.

Hyper-V & Root OS Isolation Between Root VM & Guest VMs

Azure’s compute platform is based on machine virtualization—meaning that all customer code executes in a Hyper-V virtual machine. On each Azure node (or network endpoint), there's a Hypervisor that runs directly over the hardware and divides a node into a variable number of Guest Virtual Machines (VMs).

Hyper-V & Root OS Isolation Between Root VM & Guest VMs

Each node also has one special Root VM, which runs the Host OS. A critical boundary is the isolation of the root VM from the guest VMs and the guest VMs from one another, managed by the hypervisor and the root OS. The hypervisor/root OS pairing leverages Microsoft's decades of operating system security experience, and more recent learning from Microsoft's Hyper-V, to provide strong isolation of guest VMs.

The Azure platform uses a virtualized environment. User instances operate as standalone virtual machines that don't have access to a physical host server.

The Azure hypervisor acts like a micro-kernel and passes all hardware access requests from guest virtual machines to the host for processing by using a shared-memory interface called VM Bus. This prevents users from obtaining raw read/write/execute access to the system and mitigates the risk of sharing system resources.

Advanced VM placement algorithm & protection from side channel attacks

Any cross-VM attack involves two steps: placing an adversary-controlled VM on the same host as one of the victim VMs, and then breaching the isolation boundary to either steal sensitive victim information or affect its performance for greed or vandalism. Microsoft Azure provides protection at both steps by using an advanced VM placement algorithm and protection from all known side channel attacks including noisy neighbor VMs.

The Azure Fabric Controller

The Azure Fabric Controller is responsible for allocating infrastructure resources to tenant workloads, and it manages unidirectional communications from the host to virtual machines. The VM placing algorithm of the Azure fabric controller is highly sophisticated and nearly impossible to predict as physical host level.

The Azure Fabric Controller

The Azure hypervisor enforces memory and process separation between virtual machines, and it securely routes network traffic to guest OS tenants. This eliminates possibility of and side channel attack at VM level.

In Azure, the root VM is special: it runs a hardened operating system called the root OS that hosts a fabric agent (FA). FAs are used in turn to manage guest agents (GA) within guest operating systems on customer VMs. FAs also manage storage nodes.

The collection of Azure hypervisor, root OS/FA, and customer VMs/GAs comprises a compute node. FAs are managed by a fabric controller (FC), which exists outside of compute and storage nodes (compute and storage clusters are managed by separate FCs). If a customer updates their application’s configuration file while it’s running, the FC communicates with the FA, which then contacts GAs, which notify the application of the configuration change. In the event of a hardware failure, the FC will automatically find available hardware and restart the VM there.

Azure Fabric Controller

Communication from a Fabric Controller to an agent is unidirectional. The agent implements an SSL-protected service that only responds to requests from the controller. It cannot initiate connections to the controller or other privileged internal nodes. The FC treats all responses as if they were untrusted.

Fabric Controller

Isolation extends from the Root VM from Guest VMs, and the Guest VMs from one another. Compute nodes are also isolated from storage nodes for increased protection.

The hypervisor and the host OS provide network packet - filters to help assure that untrusted virtual machines cannot generate spoofed traffic or receive traffic not addressed to them, direct traffic to protected infrastructure endpoints, or send/receive inappropriate broadcast traffic.

Additional Rules Configured by Fabric Controller Agent to Isolate VM

By default, all traffic is blocked when a virtual machine is created, and then the fabric controller agent configures the packet filter to add rules and exceptions to allow authorized traffic.

There are two categories of rules that are programmed:

  • Machine configuration or infrastructure rules: By default, all communication is blocked. There are exceptions to allow a virtual machine to send and receive DHCP and DNS traffic. Virtual machines can also send traffic to the “public” internet and send traffic to other virtual machines within the same Azure Virtual Network and the OS activation server. The virtual machines’ list of allowed outgoing destinations doesn't include Azure router subnets, Azure management, and other Microsoft properties.
  • Role configuration file: This defines the inbound Access Control Lists (ACLs) based on the tenant's service model.

VLAN Isolation

There are three VLANs in each cluster:

VLAN Isolation

  • The main VLAN – interconnects untrusted customer nodes
  • The FC VLAN – contains trusted FCs and supporting systems
  • The device VLAN – contains trusted network and other infrastructure devices

Communication is permitted from the FC VLAN to the main VLAN, but cannot be initiated from the main VLAN to the FC VLAN. Communication is also blocked from the main VLAN to the device VLAN. This assures that even if a node running customer code is compromised, it cannot attack nodes on either the FC or device VLANs.

Storage Isolation

Logical Isolation Between Compute and Storage

As part of its fundamental design, Microsoft Azure separates VM-based computation from storage. This separation enables computation and storage to scale independently, making it easier to provide multi-tenancy and isolation.

Therefore, Azure Storage runs on separate hardware with no network connectivity to Azure Compute except logically. This means that when a virtual disk is created, disk space isn't allocated for its entire capacity. Instead, a table is created that maps addresses on the virtual disk to areas on the physical disk and that table is initially empty. The first time a customer writes data on the virtual disk, space on the physical disk is allocated, and a pointer to it's placed in the table.

Isolation Using Storage Access control

Access Control in Azure Storage has a simple access control model. Each Azure subscription can create one or more Storage Accounts. Each Storage Account has a single secret key that's used to control access to all data in that Storage Account.

Isolation Using Storage Access control

Access to Azure Storage data (including Tables) can be controlled through a SAS (Shared Access Signature) token, which grants scoped access. The SAS is created through a query template (URL), signed with the SAK (Storage Account Key). That signed URL can be given to another process (that is, delegated), which can then fill in the details of the query and make the request of the storage service. A SAS enables you to grant time-based access to clients without revealing the storage account’s secret key.

The SAS means that we can grant a client limited permissions, to objects in our storage account for a specified period of time and with a specified set of permissions. We can grant these limited permissions without having to share your account access keys.

IP Level Storage Isolation

You can establish firewalls and define an IP address range for your trusted clients. With an IP address range, only clients that have an IP address within the defined range can connect to Azure Storage.

IP storage data can be protected from unauthorized users via a networking mechanism that's used to allocate a dedicated or dedicated tunnel of traffic to IP storage.

Encryption

Azure offers the following types of Encryption to protect data:

  • Encryption in transit
  • Encryption at rest

Encryption in Transit

Encryption in transit is a mechanism of protecting data when it's transmitted across networks. With Azure Storage, you can secure data using:

Encryption at Rest

For many organizations, data encryption at rest is a mandatory step towards data privacy, compliance, and data sovereignty. There are three Azure features that provide encryption of data that's "at rest":

For more information, see Overview of managed disk encryption options.

Azure Disk Encryption

Azure Disk Encryption for Linux VMs and Azure Disk Encryption for Windows VMs help you address organizational security and compliance requirements by encrypting your VM disks (including boot and data disks) with keys and policies you control in Azure Key Vault.

The Disk Encryption solution for Windows is based on Microsoft BitLocker Drive Encryption, and the Linux solution is based on dm-crypt.

The solution supports the following scenarios for IaaS VMs when they're enabled in Microsoft Azure:

  • Integration with Azure Key Vault
  • Standard tier VMs: A, D, DS, G, GS, and so forth, series IaaS VMs
  • Enabling encryption on Windows and Linux IaaS VMs
  • Disabling encryption on OS and data drives for Windows IaaS VMs
  • Disabling encryption on data drives for Linux IaaS VMs
  • Enabling encryption on IaaS VMs that are running Windows client OS
  • Enabling encryption on volumes with mount paths
  • Enabling encryption on Linux VMs that are configured with disk striping (RAID) by using mdadm
  • Enabling encryption on Linux VMs by using LVM(Logical Volume Manager) for data disks
  • Enabling encryption on Windows VMs that are configured by using storage spaces
  • All Azure public regions are supported

The solution doesn't support the following scenarios, features, and technology in the release:

  • Basic tier IaaS VMs
  • Disabling encryption on an OS drive for Linux IaaS VMs
  • IaaS VMs that are created by using the classic VM creation method
  • Integration with your on-premises Key Management Service
  • Azure Files (shared file system), Network File System (NFS), dynamic volumes, and Windows VMs that are configured with software-based RAID systems

SQL Database Isolation

SQL Database is a relational database service in the Microsoft cloud based on the market-leading Microsoft SQL Server engine and capable of handling mission-critical workloads. SQL Database offers predictable data isolation at account level, geography / region based and based on networking— all with near-zero administration.

SQL Database Application Model

Microsoft SQL Database is a cloud-based relational database service built on SQL Server technologies. It provides a highly available, scalable, multi-tenant database service hosted by Microsoft in cloud.

From an application perspective, SQL Database provides the following hierarchy: Each level has one-to-many containment of levels below.

SQL Database Application Model

The account and subscription are Microsoft Azure platform concepts to associate billing and management.

Logical SQL servers and databases are SQL Database-specific concepts and are managed by using SQL Database, provided OData and TSQL interfaces or via the Azure portal.

Servers in SQL Database aren't physical or VM instances, instead they 're collections of databases, sharing management and security policies, which are stored in so called “logical master” database.

SQL Database

Logical master databases include:

  • SQL logins used to connect to the server
  • Firewall rules

Billing and usage-related information for databases from the same server aren't guaranteed to be on the same physical instance in the cluster, instead applications must provide the target database name when connecting.

From a customer perspective, a server is created in a geo-graphical region while the actual creation of the server happens in one of the clusters in the region.

Isolation through Network Topology

When a server is created and its DNS name is registered, the DNS name points to the so called “Gateway VIP” address in the specific data center where the server was placed.

Behind the VIP (virtual IP address), we have a collection of stateless gateway services. In general, gateways get involved when there's coordination needed between multiple data sources (master database, user database, etc.). Gateway services implement the following:

  • TDS connection proxying. This includes locating user database in the backend cluster, implementing the login sequence and then forwarding the TDS packets to the backend and back.
  • Database management. This includes implementing a collection of workflows to do CREATE/ALTER/DROP database operations. The database operations can be invoked by either sniffing TDS packets or explicit OData APIs.
  • CREATE/ALTER/DROP login/user operations
  • Server management operations via OData API

Isolation through Network Topology

The tier behind the gateways is called “back-end”. This is where all the data is stored in a highly available fashion. Each piece of data is said to belong to a “partition” or “failover unit”, each of them having at least three replicas. Replicas are stored and replicated by SQL Server engine and managed by a failover system often referred to as “fabric”.

Generally, the back-end system doesn't communicate outbound to other systems as a security precaution. This is reserved to the systems in the front-end (gateway) tier. The gateway tier machines have limited privileges on the back-end machines to minimize the attack surface as a defense-in-depth mechanism.

Isolation by Machine Function and Access

SQL Database is composed of services running on different machine functions. SQL Database is divided into “backend” Cloud Database and “front-end” (Gateway/Management) environments, with the general principle of traffic only going into back-end and not out. The front-end environment can communicate to the outside world of other services and in general, has only limited permissions in the back-end (enough to call the entry points it needs to invoke).

Networking Isolation

Azure deployment has multiple layers of network isolation. The following diagram shows various layers of network isolation Azure provides to customers. These layers are both native in the Azure platform itself and customer-defined features. Inbound from the Internet, Azure DDoS provides isolation against large-scale attacks against Azure. The next layer of isolation is customer-defined public IP addresses (endpoints), which are used to determine which traffic can pass through the cloud service to the virtual network. Native Azure virtual network isolation ensures complete isolation from all other networks, and that traffic only flows through user configured paths and methods. These paths and methods are the next layer, where NSGs, UDR, and network virtual appliances can be used to create isolation boundaries to protect the application deployments in the protected network.

Networking Isolation

Traffic isolation: A virtual network is the traffic isolation boundary on the Azure platform. Virtual machines (VMs) in one virtual network cannot communicate directly to VMs in a different virtual network, even if both virtual networks are created by the same customer. Isolation is a critical property that ensures customer VMs and communication remains private within a virtual network.

Subnet offers an additional layer of isolation with in virtual network based on IP range. IP addresses in the virtual network, you can divide a virtual network into multiple subnets for organization and security. VMs and PaaS role instances deployed to subnets (same or different) within a VNet can communicate with each other without any extra configuration. You can also configure network security group (NSGs) to allow or deny network traffic to a VM instance based on rules configured in access control list (ACL) of NSG. NSGs can be associated with either subnets or individual VM instances within that subnet. When an NSG is associated with a subnet, the ACL rules apply to all the VM instances in that subnet.

Next Steps