Isolation in the Azure Public Cloud
Azure allows you to run applications and virtual machines (VMs) on shared physical infrastructure. One of the prime economic motivations to running applications in a cloud environment is the ability to distribute the cost of shared resources among multiple customers. This practice of multi-tenancy improves efficiency by multiplexing resources among disparate customers at low costs. Unfortunately, it also introduces the risk of sharing physical servers and other infrastructure resources to run your sensitive applications and VMs that may belong to an arbitrary and potentially malicious user.
This article outlines how Azure provides isolation against both malicious and non-malicious users and serves as a guide for architecting cloud solutions by offering various isolation choices to architects.
Tenant Level Isolation
One of the primary benefits of cloud computing is concept of a shared, common infrastructure across numerous customers simultaneously, leading to economies of scale. This concept is called multi-tenancy. Microsoft works continuously to ensure that the multi-tenant architecture of Microsoft Cloud Azure supports security, confidentiality, privacy, integrity, and availability standards.
In the cloud-enabled workplace, a tenant can be defined as a client or organization that owns and manages a specific instance of that cloud service. With the identity platform provided by Microsoft Azure, a tenant is simply a dedicated instance of Azure Active Directory (Azure AD) that your organization receives and owns when it signs up for a Microsoft cloud service.
Each Azure AD directory is distinct and separate from other Azure AD directories. Just like a corporate office building is a secure asset specific to only your organization, an Azure AD directory was also designed to be a secure asset for use by only your organization. The Azure AD architecture isolates customer data and identity information from co-mingling. This means that users and administrators of one Azure AD directory cannot accidentally or maliciously access data in another directory.
Azure tenancy (Azure Subscription) refers to a “customer/billing” relationship and a unique tenant in Azure Active Directory. Tenant level isolation in Microsoft Azure is achieved using Azure Active Directory and role-based controls offered by it. Each Azure subscription is associated with one Azure Active Directory (AD) directory.
Users, groups, and applications from that directory can manage resources in the Azure subscription. You can assign these access rights using the Azure portal, Azure command-line tools, and Azure Management APIs. An Azure AD tenant is logically isolated using security boundaries so that no customer can access or compromise co-tenants, either maliciously or accidentally. Azure AD runs on “bare metal” servers isolated on a segregated network segment, where host-level packet filtering and Windows Firewall block unwanted connections and traffic.
- Access to data in Azure AD requires user authentication via a security token service (STS). Information on the user’s existence, enabled state, and role is used by the authorization system to determine whether the requested access to the target tenant is authorized for this user in this session.
Tenants are discrete containers and there is no relationship between these.
No access across tenants unless tenant admin grants it through federation or provisioning user accounts from other tenants.
Physical access to servers that comprise the Azure AD service, and direct access to Azure AD’s back-end systems, is restricted.
Azure AD users have no access to physical assets or locations, and therefore it is not possible for them to bypass the logical RBAC policy checks stated following.
For diagnostics and maintenance needs, an operational model that employs a just-in-time privilege elevation system is required and used. Azure AD Privileged Identity Management (PIM) introduces the concept of an eligible admin. Eligible admins should be users that need privileged access now and then, but not every day. The role is inactive until the user needs access, then they complete an activation process and become an active admin for a predetermined amount of time.
Azure Active Directory hosts each tenant in its own protected container, with policies and permissions to and within the container solely owned and managed by the tenant.
The concept of tenant containers is deeply ingrained in the directory service at all layers, from portals all the way to persistent storage.
Even when metadata from multiple Azure Active Directory tenants is stored on the same physical disk, there is no relationship between the containers other than what is defined by the directory service, which in turn is dictated by the tenant administrator.
Azure Role-Based Access Control (RBAC)
Azure Role-Based Access Control (RBAC) helps you to share various components available within an Azure subscription by providing fine-grained access management for Azure. Azure RBAC enables you to segregate duties within your organization and grant access based on what users need to perform their jobs. Instead of giving everybody unrestricted permissions in Azure subscription or resources, you can allow only certain actions.
Azure RBAC has three basic roles that apply to all resource types:
Owner has full access to all resources including the right to delegate access to others.
Contributor can create and manage all types of Azure resources but can’t grant access to others.
Reader can view existing Azure resources.
The rest of the RBAC roles in Azure allow management of specific Azure resources. For example, the Virtual Machine Contributor role allows the user to create and manage virtual machines. It does not give them access to the Azure Virtual Network or the subnet that the virtual machine connects to.
RBAC built-in roles list the roles available in Azure. It specifies the operations and scope that each built-in role grants to users. If you're looking to define your own roles for even more control, see how to build Custom roles in Azure RBAC.
Some other capabilities for Azure Active Directory include:
Azure AD enables SSO to SaaS applications, regardless of where they are hosted. Some applications are federated with Azure AD, and others use password SSO. Federated applications can also support user provisioning and password vaulting.
Access to data in Azure Storage is controlled via authentication. Each storage account has a primary key (storage account key, or SAK) and a secondary secret key (the shared access signature, or SAS).
Azure AD provides Identity as a Service through federation by using Active Directory Federation Services, synchronization, and replication with on-premises directories.
Azure Multi-Factor Authentication is the multi-factor authentication service that requires users to verify sign-ins by using a mobile app, phone call, or text message. It can be used with Azure AD to help secure on-premises resources with the Azure Multi-Factor Authentication server, and also with custom applications and directories using the SDK.
Azure AD Domain Services lets you join Azure virtual machines to an Active Directory domain without deploying domain controllers. You can sign in to these virtual machines with your corporate Active Directory credentials and administer domain-joined virtual machines by using Group Policy to enforce security baselines on all your Azure virtual machines.
Azure Active Directory B2C provides a highly available global-identity management service for consumer-facing applications that scales to hundreds of millions of identities. It can be integrated across mobile and web platforms. Your consumers can sign in to all your applications through customizable experiences by using their existing social accounts or by creating credentials.
Isolation from Microsoft Administrators & Data Deletion
Microsoft takes strong measures to protect your data from inappropriate access or use by unauthorized persons. These operational processes and controls are backed by the Online Services Terms, which offer contractual commitments that govern access to your data.
Microsoft engineers do not have default access to your data in the cloud. Instead, they are granted access, under management oversight, only when necessary. That access is carefully controlled and logged, and revoked when it is no longer needed.
Microsoft may hire other companies to provide limited services on its behalf. Subcontractors may access customer data only to deliver the services for which, we have hired them to provide, and they are prohibited from using it for any other purpose. Further, they are contractually bound to maintain the confidentiality of our customers’ information.
Business services with audited certifications such as ISO/IEC 27001 are regularly verified by Microsoft and accredited audit firms, which perform sample audits to attest that access, only for legitimate business purposes. You can always access your own customer data at any time and for any reason.
If you delete any data, Microsoft Azure deletes the data, including any cached or backup copies. For in-scope services, that deletion will occur within 90 days after the end of the retention period. (In-scope services are defined in the Data Processing Terms section of our Online Services Terms.)
If a disk drive used for storage suffers a hardware failure, it is securely erased or destroyed before Microsoft returns it to the manufacturer for replacement or repair. The data on the drive is overwritten to ensure that the data cannot be recovered by any means.
Microsoft Azure provides various cloud-based computing services that include a wide selection of compute instances & services that can scale up and down automatically to meet the needs of your application or enterprise. These compute instance and service offer isolation at multiple levels to secure data without sacrificing the flexibility in configuration that customers demand.
Isolated Virtual Machine Sizes
Azure Compute offers virtual machine sizes that are Isolated to a specific hardware type and dedicated to a single customer. These virtual machine sizes are best suited for workloads that require a high degree of isolation from other customers for workloads involving elements like compliance and regulatory requirements. Customers can also choose to further subdivide the resources of these Isolated virtual machines by using Azure support for nested virtual machines.
Utilizing an isolated size guarantees that your virtual machine will be the only one running on that specific server instance. The current Isolated virtual machine offerings include:
You can learn more about each available isolated size here.
Retiring D15_v2/DS15_v2 isolation on February 15, 2020
We recently announced the Preview of Azure Dedicated Host, which allows you to run your organization’s Linux and Windows virtual machines on single-tenant physical servers. We plan to fully replace isolated Azure VMs with Azure Dedicated Host. After February 15, 2020 the D15_v2/DS15_v2 Azure VMs may no longer be hardware isolated.
How does this affect me?
After February 15, 2020, we will no longer provide an isolation guarantee for your D15_v2/DS15_v2 Azure virtual machines.
What actions should I take?
If hardware isolation is not required for you, there is no action you need to take.
If isolation is required to you, before February 15, 2020, you would need to either:
• Migrate your workload to Azure Dedicated Host Preview
• Request access to a D15i_v2 and DS15i_v2 Azure VM, to get the same price performance. This option is only available for pay-as-you-go and one-year reserved instance scenarios.
• Migrate your workload to another Azure isolated virtual machine.
For details see below:
|Nov 18, 2019||Availability of D/DS15i_v2 (PAYG, 1-year RI)|
|Feb 14, 2020||Last day to buy D/DS15i_v2 1-year RI|
|Feb 15, 2020||D/DS15_v2 isolation guarantee removed|
|May 15, 2021||Retire D/DS15i_v2 (all customers except who bought 3-year RI of D/DS15_v2 before November 18, 2019)|
|Nov 17, 2022||Retire D/DS15i_v2 when 3-year RIs done (for customers who bought 3-year RI of D/DS15_v2 before November 18, 2019)|
Q: Why am I not seeing the new D/DS15i_v2 sizes in the portal?
A: If you are a current D/DS15_v2 customer and want to use the new D/DS15i_v2 sizes, please fill this form
Q: Why I am not seeing any quota for the new D/DS15i_v2 sizes?
A: If you are a current D/DS15_v2 customer and want to use the new D/DS15i_v2 sizes, please fill this form
Q: When are the other isolated sizes going to retire?
A: We will provide reminders 12 months in advance of the official decommissioning of the sizes.
Q: Is there a downtime when my vm lands on a non-isolated hardware?
A: If you do not need isolation you do not need to take any action and you would not see any downtime.
Q: Are there any cost changes for moving to a non-isolated virtual machine?
Q: I already purchased 1- or 3-year Reserved Instance for D15_v2 or Ds15_v2. How will the discount be applied to my VM usage?
A: RIs purchased before November 18, 2019 will automatically extend coverage to the new isolated VM series.
|RI||Instance Size Flexibility||Benefit eligibility|
|D15_v2||Off||D15_v2 and D15i_v2|
|D15_v2||On||D15_v2 series and D15i_v2 will all receive the RI benefit.|
|D14_v2||On||D15_v2 series and D15i_v2 will all receive the RI benefit.|
Likewise for Dsv2 series.
Q: I want to purchase additional Reserved Instances for Dv2. Which one should I choose?
A: All RIs purchased after Nov 18, 2019, have the following behavior.
|RI||Instance Size Flexibility||Benefit eligibility|
|D15_v2||On||D15_v2 series will receive the RI benefit. The new D15i_v2 will not be eligible for RI benefit from this RI type.|
Instance Size Flexibility cannot be used to apply to any other sizes such as D2_v2, D4_v2, or D15_v2. Likewise, for Dsv2 series.
Q: Can I buy a new 3-year RI for D15i_v2 and DS15i_v2?
A: Unfortunately no, only 1-year RI is available for new purchase.
Q: Can I move my existing D15_v2/DS15_v2 Reserve Instance to an isolated size Reserved Instance?
A: This is not necessary since the benefit will apply to both isolated and non-isolated sizes. But Azure will support changing existing D15_v2/DS15_v2 Reserved Instances to D15i_v2/DS15i_v2. For all other Dv2/Dsv2 Reserved Instances, use the existing Reserved Instance or buy new Reserved Instances for the isolated sizes.
Hyper-V & Root OS Isolation Between Root VM & Guest VMs
Azure’s compute platform is based on machine virtualization—meaning that all customer code executes in a Hyper-V virtual machine. On each Azure node (or network endpoint), there is a Hypervisor that runs directly over the hardware and divides a node into a variable number of Guest Virtual Machines (VMs).
Each node also has one special Root VM, which runs the Host OS. A critical boundary is the isolation of the root VM from the guest VMs and the guest VMs from one another, managed by the hypervisor and the root OS. The hypervisor/root OS pairing leverages Microsoft's decades of operating system security experience, and more recent learning from Microsoft's Hyper-V, to provide strong isolation of guest VMs.
The Azure platform uses a virtualized environment. User instances operate as standalone virtual machines that do not have access to a physical host server.
The Azure hypervisor acts like a micro-kernel and passes all hardware access requests from guest virtual machines to the host for processing by using a shared-memory interface called VMBus. This prevents users from obtaining raw read/write/execute access to the system and mitigates the risk of sharing system resources.
Advanced VM placement algorithm & protection from side channel attacks
Any cross-VM attack involves two steps: placing an adversary-controlled VM on the same host as one of the victim VMs, and then breaching the isolation boundary to either steal sensitive victim information or affect its performance for greed or vandalism. Microsoft Azure provides protection at both steps by using an advanced VM placement algorithm and protection from all known side channel attacks including noisy neighbor VMs.
The Azure Fabric Controller
The Azure Fabric Controller is responsible for allocating infrastructure resources to tenant workloads, and it manages unidirectional communications from the host to virtual machines. The VM placing algorithm of the Azure fabric controller is highly sophisticated and nearly impossible to predict as physical host level.
The Azure hypervisor enforces memory and process separation between virtual machines, and it securely routes network traffic to guest OS tenants. This eliminates possibility of and side channel attack at VM level.
In Azure, the root VM is special: it runs a hardened operating system called the root OS that hosts a fabric agent (FA). FAs are used in turn to manage guest agents (GA) within guest OSes on customer VMs. FAs also manage storage nodes.
The collection of Azure hypervisor, root OS/FA, and customer VMs/GAs comprises a compute node. FAs are managed by a fabric controller (FC), which exists outside of compute and storage nodes (compute and storage clusters are managed by separate FCs). If a customer updates their application’s configuration file while it’s running, the FC communicates with the FA, which then contacts GAs, which notify the application of the configuration change. In the event of a hardware failure, the FC will automatically find available hardware and restart the VM there.
Communication from a Fabric Controller to an agent is unidirectional. The agent implements an SSL-protected service that only responds to requests from the controller. It cannot initiate connections to the controller or other privileged internal nodes. The FC treats all responses as if they were untrusted.
Isolation extends from the Root VM from Guest VMs, and the Guest VMs from one another. Compute nodes are also isolated from storage nodes for increased protection.
The hypervisor and the host OS provide network packet - filters to help assure that untrusted virtual machines cannot generate spoofed traffic or receive traffic not addressed to them, direct traffic to protected infrastructure endpoints, or send/receive inappropriate broadcast traffic.
Additional Rules Configured by Fabric Controller Agent to Isolate VM
By default, all traffic is blocked when a virtual machine is created, and then the fabric controller agent configures the packet filter to add rules and exceptions to allow authorized traffic.
There are two categories of rules that are programmed:
Machine configuration or infrastructure rules: By default, all communication is blocked. There are exceptions to allow a virtual machine to send and receive DHCP and DNS traffic. Virtual machines can also send traffic to the “public” internet and send traffic to other virtual machines within the same Azure Virtual Network and the OS activation server. The virtual machines’ list of allowed outgoing destinations does not include Azure router subnets, Azure management, and other Microsoft properties.
Role configuration file: This defines the inbound Access Control Lists (ACLs) based on the tenant's service model.
There are three VLANs in each cluster:
The main VLAN – interconnects untrusted customer nodes
The FC VLAN – contains trusted FCs and supporting systems
The device VLAN – contains trusted network and other infrastructure devices
Communication is permitted from the FC VLAN to the main VLAN, but cannot be initiated from the main VLAN to the FC VLAN. Communication is also blocked from the main VLAN to the device VLAN. This assures that even if a node running customer code is compromised, it cannot attack nodes on either the FC or device VLANs.
Logical Isolation Between Compute and Storage
As part of its fundamental design, Microsoft Azure separates VM-based computation from storage. This separation enables computation and storage to scale independently, making it easier to provide multi-tenancy and isolation.
Therefore, Azure Storage runs on separate hardware with no network connectivity to Azure Compute except logically. This means that when a virtual disk is created, disk space is not allocated for its entire capacity. Instead, a table is created that maps addresses on the virtual disk to areas on the physical disk and that table is initially empty. The first time a customer writes data on the virtual disk, space on the physical disk is allocated, and a pointer to it is placed in the table.
Isolation Using Storage Access control
Access Control in Azure Storage has a simple access control model. Each Azure subscription can create one or more Storage Accounts. Each Storage Account has a single secret key that is used to control access to all data in that Storage Account.
Access to Azure Storage data (including Tables) can be controlled through a SAS (Shared Access Signature) token, which grants scoped access. The SAS is created through a query template (URL), signed with the SAK (Storage Account Key). That signed URL can be given to another process (that is, delegated), which can then fill in the details of the query and make the request of the storage service. A SAS enables you to grant time-based access to clients without revealing the storage account’s secret key.
The SAS means that we can grant a client limited permissions, to objects in our storage account for a specified period of time and with a specified set of permissions. We can grant these limited permissions without having to share your account access keys.
IP Level Storage Isolation
You can establish firewalls and define an IP address range for your trusted clients. With an IP address range, only clients that have an IP address within the defined range can connect to Azure Storage.
IP storage data can be protected from unauthorized users via a networking mechanism that is used to allocate a dedicated or dedicated tunnel of traffic to IP storage.
Azure offers the following types of Encryption to protect data:
Encryption in transit
Encryption at rest
Encryption in Transit
Encryption in transit is a mechanism of protecting data when it is transmitted across networks. With Azure Storage, you can secure data using:
Transport-level encryption, such as HTTPS when you transfer data into or out of Azure Storage.
Wire encryption, such as SMB 3.0 encryption for Azure File shares.
Client-side encryption, to encrypt the data before it is transferred into storage and to decrypt the data after it is transferred out of storage.
Encryption at Rest
For many organizations, data encryption at rest is a mandatory step towards data privacy, compliance, and data sovereignty. There are three Azure features that provide encryption of data that is “at rest”:
Storage Service Encryption allows you to request that the storage service automatically encrypt data when writing it to Azure Storage.
Client-side Encryption also provides the feature of encryption at rest.
Azure Disk Encryption allows you to encrypt the OS disks and data disks used by an IaaS virtual machine.
Azure Disk Encryption
Azure Disk Encryption for virtual machines (VMs) helps you address organizational security and compliance requirements by encrypting your VM disks (including boot and data disks) with keys and policies you control in Azure Key Vault.
The solution supports the following scenarios for IaaS VMs when they are enabled in Microsoft Azure:
Integration with Azure Key Vault
Standard tier VMs: A, D, DS, G, GS, and so forth, series IaaS VMs
Enabling encryption on Windows and Linux IaaS VMs
Disabling encryption on OS and data drives for Windows IaaS VMs
Disabling encryption on data drives for Linux IaaS VMs
Enabling encryption on IaaS VMs that are running Windows client OS
Enabling encryption on volumes with mount paths
Enabling encryption on Linux VMs that are configured with disk striping (RAID) by using mdadm
Enabling encryption on Linux VMs by using LVM(Logical Volume Manager) for data disks
Enabling encryption on Windows VMs that are configured by using storage spaces
All Azure public regions are supported
The solution does not support the following scenarios, features, and technology in the release:
Basic tier IaaS VMs
Disabling encryption on an OS drive for Linux IaaS VMs
IaaS VMs that are created by using the classic VM creation method
Integration with your on-premises Key Management Service
Azure Files (shared file system), Network File System (NFS), dynamic volumes, and Windows VMs that are configured with software-based RAID systems
SQL Azure Database Isolation
SQL Database is a relational database service in the Microsoft cloud based on the market-leading Microsoft SQL Server engine and capable of handling mission-critical workloads. SQL Database offers predictable data isolation at account level, geography / region based and based on networking— all with near-zero administration.
SQL Azure Application Model
Microsoft SQL Azure Database is a cloud-based relational database service built on SQL Server technologies. It provides a highly available, scalable, multi-tenant database service hosted by Microsoft in cloud.
From an application perspective SQL Azure provides the following hierarchy: Each level has one-to-many containment of levels below.
The account and subscription are Microsoft Azure platform concepts to associate billing and management.
Logical servers and databases are SQL Azure-specific concepts and are managed by using SQL Azure, provided OData and TSQL interfaces or via SQL Azure portal that integrated into Azure portal.
SQL Azure servers are not physical or VM instances, instead they are collections of databases, sharing management and security policies, which are stored in so called “logical master” database.
Logical master databases include:
SQL logins used to connect to the server
Billing and usage-related information for SQL Azure databases from the same logical server are not guaranteed to be on the same physical instance in SQL Azure cluster, instead applications must provide the target database name when connecting.
From a customer perspective, a logical server is created in a geo-graphical region while the actual creation of the server happens in one of the clusters in the region.
Isolation through Network Topology
When a logical server is created and its DNS name is registered, the DNS name points to the so called “Gateway VIP” address in the specific data center where the server was placed.
Behind the VIP (virtual IP address), we have a collection of stateless gateway services. In general, gateways get involved when there is coordination needed between multiple data sources (master database, user database, etc.). Gateway services implement the following:
TDS connection proxying. This includes locating user database in the backend cluster, implementing the login sequence and then forwarding the TDS packets to the backend and back.
Database management. This includes implementing a collection of workflows to do CREATE/ALTER/DROP database operations. The database operations can be invoked by either sniffing TDS packets or explicit OData APIs.
CREATE/ALTER/DROP login/user operations
Logical server management operations via OData API
The tier behind the gateways is called “back-end”. This is where all the data is stored in a highly available fashion. Each piece of data is said to belong to a “partition” or “failover unit”, each of them having at least three replicas. Replicas are stored and replicated by SQL Server engine and managed by a failover system often referred to as “fabric”.
Generally, the back-end system does not communicate outbound to other systems as a security precaution. This is reserved to the systems in the front-end (gateway) tier. The gateway tier machines have limited privileges on the back-end machines to minimize the attack surface as a defense-in-depth mechanism.
Isolation by Machine Function and Access
SQL Azure (is composed of services running on different machine functions. SQL Azure is divided into “backend” Cloud Database and “front-end” (Gateway/Management) environments, with the general principle of traffic only going into back-end and not out. The front-end environment can communicate to the outside world of other services and in general, has only limited permissions in the back-end (enough to call the entry points it needs to invoke).
Azure deployment has multiple layers of network isolation. The following diagram shows various layers of network isolation Azure provides to customers. These layers are both native in the Azure platform itself and customer-defined features. Inbound from the Internet, Azure DDoS provides isolation against large-scale attacks against Azure. The next layer of isolation is customer-defined public IP addresses (endpoints), which are used to determine which traffic can pass through the cloud service to the virtual network. Native Azure virtual network isolation ensures complete isolation from all other networks, and that traffic only flows through user configured paths and methods. These paths and methods are the next layer, where NSGs, UDR, and network virtual appliances can be used to create isolation boundaries to protect the application deployments in the protected network.
Traffic isolation: A virtual network is the traffic isolation boundary on the Azure platform. Virtual machines (VMs) in one virtual network cannot communicate directly to VMs in a different virtual network, even if both virtual networks are created by the same customer. Isolation is a critical property that ensures customer VMs and communication remains private within a virtual network.
Subnet offers an additional layer of isolation with in virtual network based on IP range. IP addresses in the virtual network, you can divide a virtual network into multiple subnets for organization and security. VMs and PaaS role instances deployed to subnets (same or different) within a VNet can communicate with each other without any extra configuration. You can also configure network security group (NSGs) to allow or deny network traffic to a VM instance based on rules configured in access control list (ACL) of NSG. NSGs can be associated with either subnets or individual VM instances within that subnet. When an NSG is associated with a subnet, the ACL rules apply to all the VM instances in that subnet.
This includes the classic front-end and back-end scenario where machines in a particular back-end network or subnetwork may only allow certain clients or other computers to connect to a particular endpoint based on an allow list of IP addresses.
Microsoft Azure provides a various cloud-based computing services that include a wide selection of compute instances & services that can scale up and down automatically to meet the needs of your application or enterprise.
Microsoft Azure separates customer VM-based computation from storage. This separation enables computation and storage to scale independently, making it easier to provide multi-tenancy and isolation. Therefore, Azure Storage runs on separate hardware with no network connectivity to Azure Compute except logically. All requests run over HTTP or HTTPS based on customer’s choice.