Azure Security and Compliance Blueprint: Data Warehouse for UK NHS

Overview

This Azure Security and Compliance Blueprint provides a reference architecture and guidance for a data warehouse solution suitable for securely ingesting, staging, storing, and interacting with sensitive healthcare data. This solution demonstrates ways in which customers can comply with guidance provided in the Cloud Security Good Practice Guide published by NHS Digital, a partner of the United Kingdom's (UK) Department of Health and Social Care (DHSC). The Cloud Security Good Practice Guide is based on the 14 Cloud Security Principles published by the UK National Cyber Security Centre (NCSC).

This reference architecture, implementation guide, and threat model are intended to serve as a foundation for customers to adjust to their specific requirements and should not be used as-is in a production environment without additional configuration. Customers are responsible for conducting appropriate security and compliance assessments of any solution built using this architecture, as requirements may vary based on the specifics of each customer's implementation.

Architecture diagram and components

This solution provides a reference architecture which implements a high-performance and secure cloud data warehouse. There are two separate data tiers in this architecture: one where data is imported, stored, and staged within a clustered SQL environment, and another for the Azure SQL Data Warehouse where the data is loaded using an extract, transform, load tool (e.g. PolyBase T-SQL queries) for processing. Once data is stored in Azure SQL Data Warehouse, analytics can run at a massive scale.

Azure offers a variety of reporting and analytics services for the customer. This solution includes SQL Server Reporting Services for quick creation of reports from the Azure SQL Data Warehouse. All SQL traffic is encrypted with SSL through the inclusion of self-signed certificates. As a best practice, Azure recommends the use of a trusted certificate authority for enhanced security.

Data in the Azure SQL Data Warehouse is stored in relational tables with columnar storage, a format that significantly reduces the data storage costs while improving query performance. Depending on usage requirements, Azure SQL Data Warehouse compute resources can be scaled up or down or shut off completely if there are no active processes requiring compute resources.

A SQL load balancer manages SQL traffic, ensuring high performance. All virtual machines in this reference architecture deploy in an availability set with SQL Server instances configured in an Always On availability group for high-availability and disaster-recovery capabilities.

For enhanced security, all resources in this solution are managed as a resource group through Azure Resource Manager. Azure Active Directory role-based access control is used for controlling access to deployed resources and keys in Azure Key Vault. System health is monitored through Azure Security Center and Azure Monitor. Customers configure both monitoring services to capture logs and display system health in a single, easily navigable dashboard.

This data warehouse reference architecture also includes an Active Directory tier for management of resources within the architecture. The Active Directory subnet enables easy adoption under a larger Active Directory forest structure, allowing for continuous operation of the environment even when access to the larger forest is unavailable. All virtual machines are domain-joined to the Active Directory tier and use Active Directory group policies to enforce security and compliance configurations at the operating system level.

A virtual machine serves as a management bastion host, providing a secure connection for administrators to access deployed resources. The data loads into the staging area through this management bastion host. Microsoft recommends configuring a VPN or Azure ExpressRoute connection for management and data import into the reference architecture subnet.

Data Warehouse for UK NHS reference architecture diagram

This solution uses the following Azure services. Details of the deployment architecture are in the Deployment architecture section.

  • Availability Sets
    • Active Directory domain controllers
    • SQL Cluster Nodes and Witness
  • Azure Active Directory
  • Azure Data Catalog
  • Azure Key Vault
  • Azure Monitor
  • Azure Security Center
  • Azure Load Balancer
  • Azure Storage
  • Azure Automation
  • Azure Virtual Machines
    • (1) Bastion host
    • (2) Active Directory domain controller
    • (2) SQL Server Cluster Node
    • (1) SQL Server Witness
  • Azure Virtual Network
    • (1) /16 Network
    • (4) /24 Networks
    • (4) Network security groups
  • Recovery Services Vault
  • SQL Data Warehouse
  • SQL Server Reporting Services

Deployment architecture

The following section details the deployment and implementation elements.

SQL Data Warehouse: SQL Data Warehouse is an Enterprise Data Warehouse that leverages massively parallel processing to quickly run complex queries across petabytes of data, allowing users to efficiently identify healthcare data. Users can use simple PolyBase T-SQL queries to import big data into the SQL Data Warehouse and utilize the power of massively parallel processing to run high-performance analytics.

SQL Server Reporting Services: SQL Server Reporting Services provides quick creation of reports with tables, charts, maps, gauges, matrixes, and more for Azure SQL Data Warehouse.

Data Catalog: Data Catalog makes data sources easily discoverable and understandable by the users who manage the data. Common data sources can be registered, tagged, and searched for health-related data. The data remains in its existing location, but a copy of its metadata is added to Data Catalog, along with a reference to the data source location. The metadata is also indexed to make each data source easily discoverable via search and understandable to the users who discover it.

Bastion host: The bastion host is the single point of entry that allows users to access the deployed resources in this environment. The bastion host provides a secure connection to deployed resources by only allowing remote traffic from public IP addresses on a safe list. To permit remote desktop (RDP) traffic, the source of the traffic needs to be defined in the network security group.

This solution creates a virtual machine as a domain-joined bastion host with the following configurations:

Virtual network

This reference architecture defines a private virtual network with an address space of 10.0.0.0/16.

Network security groups: Network security groups contain access control lists that allow or deny traffic within a virtual network. Network security groups can be used to secure traffic at a subnet or individual virtual machine level. The following Network security groups exist:

  • A network security group for the Data Tier (SQL Server Clusters, SQL Server Witness, and SQL load balancer)
  • A network security group for the management bastion host
  • A network security group for Active Directory
  • A network security group for SQL Server Reporting Services

Each of the Network security groups have specific ports and protocols open so that the solution can work securely and correctly. In addition, the following configurations are enabled for each network security group:

Subnets: Each subnet is associated with its corresponding network security group.

Data at rest

The architecture protects data at rest through multiple measures, including encryption and database auditing.

The architecture protects data at rest through encryption, database auditing, and other measures.

Azure Storage: To meet encrypted data at rest requirements, all Azure Storage uses Storage Service Encryption. This helps protect and safeguard data in support of organizational security commitments and compliance requirements defined by NHS Digital.

Azure Disk Encryption: Azure Disk Encryption leverages the BitLocker feature of Windows to provide volume encryption for data disks. The solution integrates with Azure Key Vault to help control and manage the disk-encryption keys.

Azure SQL Database: The Azure SQL Database instance uses the following database security measures:

  • Active Directory authentication and authorization enables identity management of database users and other Microsoft services in one central location.
  • SQL database auditing tracks database events and writes them to an audit log in an Azure storage account.
  • Azure SQL Database is configured to use transparent data encryption, which performs real-time encryption and decryption of the database, associated backups, and transaction log files to protect information at rest. Transparent data encryption provides assurance that stored data has not been subject to unauthorized access.
  • Firewall rules prevent all access to database servers until proper permissions are granted. The firewall grants access to databases based on the originating IP address of each request.
  • SQL Threat Detection enables the detection and response to potential threats as they occur by providing security alerts for suspicious database activities, potential vulnerabilities, SQL injection attacks, and anomalous database access patterns.
  • Encrypted Columns ensure that sensitive data never appears as plaintext inside the database system. After enabling data encryption, only client applications or application servers with access to the keys can access plaintext data.
  • SQL Database dynamic data masking limits sensitive data exposure by masking the data to non-privileged users or applications. Dynamic data masking can automatically discover potentially sensitive data and suggest the appropriate masks to be applied. This helps to identify and reduce access to data such that it does not exit the database via unauthorized access. Customers are responsible for adjusting dynamic data masking settings to adhere to their database schema.

Identity management

The following technologies provide capabilities to manage access to data in the Azure environment:

  • Azure Active Directory is Microsoft's multi-tenant cloud-based directory and identity management service. All users for this solution are created in Azure Active Directory, including users accessing the Azure SQL Database.
  • Authentication to the application is performed using Azure Active Directory. For more information, see integrating applications with Azure Active Directory. Additionally, the database column encryption uses Azure Active Directory to authenticate the application to Azure SQL Database. For more information, see how to protect sensitive data in Azure SQL Database.
  • Azure role-based access control enables administrators to define fine-grained access permissions to grant only the amount of access that users need to perform their jobs. Instead of giving every user unrestricted permission for Azure resources, administrators can allow only certain actions for accessing data. Subscription access is limited to the subscription administrator.
  • Azure Active Directory Privileged Identity Management enables customers to minimize the number of users who have access to certain information. Administrators can use Azure Active Directory Privileged Identity Management to discover, restrict, and monitor privileged identities and their access to resources. This functionality can also be used to enforce on-demand, just-in-time administrative access when needed.
  • Azure Active Directory Identity Protection detects potential vulnerabilities affecting an organization's identities, configures automated responses to detected suspicious actions related to an organization's identities, and investigates suspicious incidents to take appropriate action to resolve them.

Security

Secrets management: The solution uses Azure Key Vault for the management of keys and secrets. Azure Key Vault helps safeguard cryptographic keys and secrets used by cloud applications and services. The following Azure Key Vault capabilities help customers protect and access such data:

  • Advanced access policies are configured on a need basis.
  • Key Vault access policies are defined with minimum required permissions to keys and secrets.
  • All keys and secrets in Key Vault have expiration dates.
  • All keys in Key Vault are protected by specialized hardware security modules. The key type is a hardware security module protected 2048-bit RSA Key.
  • All users and identities are granted minimum required permissions using role-based access control.
  • Diagnostics logs for Key Vault are enabled with a retention period of at least 365 days.
  • Permitted cryptographic operations for keys are restricted to the ones required.

Patch management: Windows virtual machines deployed as part of this reference architecture are configured by default to receive automatic updates from Windows Update Service. This solution also includes the Azure Automation service through which updated deployments can be created to patch virtual machines when needed.

Malware protection: Microsoft Antimalware for virtual machines provides real-time protection capability that helps identify and remove viruses, spyware, and other malicious software, with configurable alerts when known malicious or unwanted software attempts to install or run on protected virtual machines.

Azure Security Center: With Azure Security Center, customers can centrally apply and manage security policies across workloads, limit exposure to threats, and detect and respond to attacks. Additionally, Azure Security Center accesses existing configurations of Azure services to provide configuration and service recommendations to help improve security posture and protect data.

Azure Security Center uses a variety of detection capabilities to alert customers of potential attacks targeting their environments. These alerts contain valuable information about what triggered the alert, the resources targeted, and the source of the attack. Azure Security Center has a set of predefined security alerts, which are triggered when a threat, or suspicious activity takes place. Custom alert rules in Azure Security Center allow customers to define new security alerts based on data that is already collected from their environment.

Azure Security Center provides prioritized security alerts and incidents, making it simpler for customers to discover and address potential security issues. A threat intelligence report is generated for each detected threat to assist incident response teams in investigating and remediating threats.

Furthermore, this reference architecture utilizes the vulnerability assessment in Azure Security Center. Once configured, a partner agent (e.g. Qualys) reports vulnerability data to the partner’s management platform. In turn, the partner's management platform provides vulnerability and health monitoring data back to Azure Security Center, allowing customers to quickly identify vulnerable virtual machines.

Business continuity

High availability: Server workloads are grouped in an Availability Set to help ensure high availability of virtual machines in Azure. At least one virtual machine is available during a planned or unplanned maintenance event, meeting the 99.95% Azure SLA.

Recovery Services Vault: The Recovery Services Vault houses backup data and protects all configurations of Azure virtual machines in this architecture. With a Recovery Services Vault, customers can restore files and folders from an IaaS virtual machine without restoring the entire virtual machine, enabling faster restore times.

Logging and auditing

Azure services extensively log system and user activity, as well as system health:

  • Activity logs: Activity logs provide insight into operations performed on resources in a subscription. Activity logs can help determine an operation's initiator, time of occurrence, and status.
  • Diagnostic logs: Diagnostic logs include all logs emitted by every resource. These logs include Windows event system logs, Azure Storage logs, Key Vault audit logs, and Application Gateway access and firewall logs. All diagnostic logs write to a centralized and encrypted Azure storage account for archival. The retention is user-configurable, up to 730 days, to meet organization-specific retention requirements.

Azure Monitor logs: These logs are consolidated in Azure Monitor logs for processing, storing, and dashboard reporting. Once collected, the data is organized into separate tables for each data type, which allows all data to be analyzed together regardless of its original source. Furthermore, Azure Security Center integrates with Azure Monitor logs allowing customers to use Kusto queries to access their security event data and combine it with data from other services.

The following Azure monitoring solutions are included as a part of this architecture:

  • Active Directory Assessment: The Active Directory Health Check solution assesses the risk and health of server environments on a regular interval and provides a prioritized list of recommendations specific to the deployed server infrastructure.
  • SQL Assessment: The SQL Health Check solution assesses the risk and health of server environments on a regular interval and provides customers with a prioritized list of recommendations specific to the deployed server infrastructure.
  • Agent Health: The Agent Health solution reports how many agents are deployed and their geographic distribution, as well as how many agents which are unresponsive and the number of agents which are submitting operational data.
  • Activity Log Analytics: The Activity Log Analytics solution assists with analysis of the Azure activity logs across all Azure subscriptions for a customer.

Azure Automation: Azure Automation stores, runs, and manages runbooks. In this solution, runbooks help collect logs from Azure SQL Database. The Automation Change Tracking solution enables customers to easily identify changes in the environment.

Azure Monitor: Azure Monitor helps users track performance, maintain security, and identify trends by enabling organizations to audit, create alerts, and archive data, including tracking API calls in their Azure resources.

Threat model

The data flow diagram for this reference architecture is available for download or can be found below. This model can help customers understand the points of potential risk in the system infrastructure when making modifications.

Data Warehouse for UK NHS threat model

Compliance documentation

The Azure Security and Compliance Blueprint – UK NHS Customer Responsibility Matrix lists all UK NHS requirements. This matrix details whether the implementation of each principle is the responsibility of Microsoft, the customer, or shared between the two.

The Azure Security and Compliance Blueprint – UK NHS Data Warehouse Implementation Matrix provides information on which UK NHS requirements are addressed by the data warehouse architecture, including detailed descriptions of how the implementation meets the requirements of each covered principle.

Guidance and recommendations

VPN and ExpressRoute

A secure VPN tunnel or ExpressRoute needs to be configured to securely establish a connection to the resources deployed as a part of this PaaS web application reference architecture. By appropriately setting up a VPN or ExpressRoute, customers can add a layer of protection for data in transit.

By implementing a secure VPN tunnel with Azure, a virtual private connection between an on-premises network and an Azure virtual network can be created. This connection takes place over the Internet and allows customers to securely "tunnel" information inside an encrypted link between the customer's network and Azure. Site-to-site VPN is a secure, mature technology that has been deployed by enterprises of all sizes for decades. The IPsec tunnel mode is used in this option as an encryption mechanism.

Because traffic within the VPN tunnel does traverse the Internet with a site-to-site VPN, Microsoft offers another, even more secure connection option. Azure ExpressRoute is a dedicated WAN link between Azure and an on-premises location or an Exchange hosting provider. As ExpressRoute connections do not go over the Internet, these connections offer more reliability, faster speeds, lower latencies, and higher security than typical connections over the Internet. Furthermore, because this is a direct connection of customer's telecommunication provider, the data does not travel over the Internet and therefore is not exposed to it.

Best practices for implementing a secure hybrid network that extends an on-premises network to Azure are available.

Extract-Transform-Load process

PolyBase can load data into Azure SQL Data Warehouse without the need for a separate extract, transform, load or import tool. PolyBase allows access to data through T-SQL queries. Microsoft's business intelligence and analysis stack, as well as third-party tools compatible with SQL Server, can be used with PolyBase.

Azure Active Directory setup

Azure Active Directory is essential to managing the deployment and provisioning access to personnel interacting with the environment. An existing Windows Server Active Directory can be integrated with Azure Active Directory in four clicks. Customers can also tie the deployed Active Directory infrastructure (domain controllers) to an existing Azure Active Directory by making the deployed Active Directory infrastructure a subdomain of an Azure Active Directory forest.

Optional services

Azure offers a variety of services to assist with the storage and staging of formatted and unformatted data. The following services can be added to this reference architecture depending on customer requirements:

  • Azure Data Factory is a managed cloud service that is built for complex hybrid extract-transform-load, and data integration projects. Azure Data Factory has capabilities to help trace and locate health-related data, including visualization and monitoring tools to identify when data arrived and where it came from. Using Azure Data Factory, customers can create and schedule data-driven workflows called pipelines that ingest data from disparate data stores. These pipelines allow customers to ingest data from both internal and external sources. Customers can then process and transform the data for output into data stores such as Azure SQL Data Warehouse.
  • Customers can stage unstructured data in Azure Data Lake Store, which enables the capture of data of any size, type, and ingestion speed in a single place for operational and exploratory analytics. Azure Data Lake has capabilities that enable the extraction and conversion of data. Azure Data Lake Store is compatible with most open source components in the Hadoop ecosystem and integrates nicely with other Azure services such as Azure SQL Data Warehouse.

Disclaimer

  • This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. This document is provided "as-is." Information and views expressed in this document, including URL and other Internet website references, may change without notice. Customers reading this document bear the risk of using it.
  • This document does not provide customers with any legal rights to any intellectual property in any Microsoft product or solutions.
  • Customers may copy and use this document for internal reference purposes.
  • Certain recommendations in this document may result in increased data, network, or compute resource usage in Azure, and may increase a customer's Azure license or subscription costs.
  • This architecture is intended to serve as a foundation for customers to adjust to their specific requirements and should not be used as-is in a production environment.
  • This document is developed as a reference and should not be used to define all means by which a customer can meet specific compliance requirements and regulations. Customers should seek legal support from their organization on approved customer implementations.