Azure Security and Compliance Blueprint: Data Warehouse for FedRAMP Automation
The Federal Risk and Authorization Management Program (FedRAMP) is a United States government-wide program that provides a standardized approach to security assessment, authorization, and continuous monitoring for cloud products and services. This Azure Security and Compliance Blueprint provides guidance for how to deliver a Microsoft Azure data warehouse architecture that helps implement a subset of FedRAMP High controls. This solution provides guidance on the deployment and configuration of Azure resources for a common reference architecture, demonstrating ways in which customers can meet specific security and compliance requirements and serves as a foundation for customers to build and configure their own data warehouse solutions in Azure.
This reference architecture, associated control implementation guides, and threat models are intended to serve as a foundation for customers to adjust to their specific requirements and should not be used as-is in a production environment. Deploying an application into this environment without modification is insufficient to completely meet the requirements of the FedRAMP High baseline. Please note the following:
- The architecture provides a baseline to help customers deploy workloads to Azure in a FedRAMP-compliant manner.
- Customers are responsible for conducting appropriate security and compliance assessments of any solution built using this architecture, as requirements may vary based on the specifics of each customer's implementation.
Architecture diagram and components
This solution provides a data warehouse reference architecture which implements a high-performance and secure cloud data warehouse. There are two separate data tiers in this architecture: one where data is imported, stored, and staged within a clustered SQL environment, and another for the Azure SQL Data Warehouse where the data is loaded using an ETL tool (e.g. PolyBase T-SQL queries) for processing. Once data is stored in Azure SQL Data Warehouse, analytics can run at a massive scale.
Microsoft Azure offers a variety of reporting and analytics services for the customer. This solution includes SQL Server Reporting Services (SSRS) for quick creation of reports from the Azure SQL Data Warehouse. All SQL traffic is encrypted with SSL through the inclusion of self-signed certificates. As a best practice, Azure recommends the use of a trusted certificate authority for enhanced security.
Data in the Azure SQL Data Warehouse is stored in relational tables with columnar storage, a format that significantly reduces the data storage costs while improving query performance. Depending on usage requirements, Azure SQL Data Warehouse compute resources can be scaled up or down or shut off completely if there are no active processes requiring compute resources.
A SQL load balancer manages SQL traffic, ensuring high performance. All virtual machines in this reference architecture deploy in an availability set with SQL Server instances configured in an AlwaysOn availability group for high-availability and disaster-recovery capabilities.
This data warehouse reference architecture also includes an Active Directory (AD) tier for management of resources within the architecture. The Active Directory subnet enables easy adoption under a larger AD forest structure, allowing for continuous operation of the environment even when access to the larger forest is unavailable. All virtual machines are domain-joined to the Active Directory tier and use Active Directory group policies to enforce security and compliance configurations at the operating system level.
A virtual machine serves as a management bastion host, providing a secure connection for administrators to access deployed resources. The data loads into the staging area through this management bastion host. Azure recommends configuring a VPN or Azure ExpressRoute connection for management and data import into the reference architecture subnet.
This solution uses the following Azure services. Details of the deployment architecture are in the Deployment architecture section.
Azure Virtual Machines
- (1) Bastion Host
- (2) Active Directory domain controller
- (2) SQL Server Cluster Node
- (1) SQL Server Witness
- (1) Active Directory domain controllers
- (1) SQL cluster nodes and witness
- (4) Subnets
- (4) Network Security Groups
SQL Data Warehouse
SQL Server Reporting Services
Azure SQL Load Balancer
Azure Active Directory
Recovery Services Vault
Azure Key Vault
Azure Monitor logs
The following section details the development and implementation elements.
SQL Data Warehouse: SQL Data Warehouse is an Enterprise Data Warehouse (EDW) that leverages Massively Parallel Processing (MPP) to quickly run complex queries across petabytes of data. Import big data into SQL Data Warehouse with simple PolyBase T-SQL queries and use the power of MPP to run high-performance analytics.
SQL Server Reporting Services: SQL Server Reporting Services enables quick creation of reports with tables, charts, maps, gauges, matrixes, and more for Azure SQL Data Warehouse.
Bastion host: The bastion host is the single point of entry that allows users to access the deployed resources in this environment. The bastion host provides a secure connection to deployed resources by only allowing remote traffic from public IP addresses on a safe list. To permit remote desktop (RDP) traffic, the source of the traffic needs to be defined in the Network Security Group (NSG).
A virtual machine was created as a domain-joined bastion host with the following configurations:
- Antimalware extension
- Azure Monitor logs extension
- Azure Diagnostics extension
- Azure Disk Encryption using Azure Key Vault (respects Azure Government, PCI DSS, HIPAA and other requirements)
- An auto-shutdown policy to reduce consumption of virtual machine resources when not in use
- Windows Defender Credential Guard enabled so that credentials and other secrets run in a protected environment that is isolated from the running operating system
This reference architecture defines a private virtual network with an address space of 10.0.0.0/16.
Network security groups: NSGs contain Access Control Lists (ACLs) that allow or deny traffic within a VNet. NSGs can be used to secure traffic at a subnet or individual VM level. The following NSGs exist:
- An NSG for the Data Tier (SQL Server Clusters, SQL Server Witness, and SQL Load Balancer)
- An NSG for the management bastion host
- An NSG for Active Directory
- An NSG for SQL Server Reporting Services
Each of the NSGs have specific ports and protocols open so that the solution can work securely and correctly. In addition, the following configurations are enabled for each NSG:
- Diagnostic logs and events are enabled and stored in a storage account
- Azure Monitor logs is connected to the NSG's diagnostics
Subnets: Each subnet is associated with its corresponding NSG.
Data at rest
The architecture protects data at rest through encryption, database auditing, and other measures.
Azure Disk Encryption Azure Disk Encryption leverages the BitLocker feature of Windows to provide volume encryption for OS and data disks. The solution integrates with Azure Key Vault to help control and manage the disk-encryption keys.
Azure SQL Database The Azure SQL Database instance uses the following database security measures:
- AD Authentication and Authorization enables identity management of database users and other Microsoft services in one central location.
- SQL database auditing tracks database events and writes them to an audit log in an Azure storage account.
- SQL Database is configured to use Transparent Data Encryption (TDE), which performs real-time encryption and decryption of data and log files to protect information at rest. TDE provides assurance that stored data has not been subject to unauthorized access.
- Firewall rules prevent all access to database servers until proper permissions are granted. The firewall grants access to databases based on the originating IP address of each request.
- SQL Threat Detection enables the detection and response to potential threats as they occur by providing security alerts for suspicious database activities, potential vulnerabilities, SQL injection attacks, and anomalous database access patterns.
- Always Encrypted columns ensure that sensitive data never appears as plaintext inside the database system. After enabling data encryption, only client applications or app servers with access to the keys can access plaintext data.
- SQL Database dynamic data masking can be done after the reference architecture deploys. Customers will need to adjust dynamic data masking settings to adhere to their database schema.
High availability: Server workloads are grouped in an Availability Set to help ensure high availability of virtual machines in Azure. At least one virtual machine is available during a planned or unplanned maintenance event, meeting the 99.95% Azure SLA.
Recovery Services Vault: The Recovery Services Vault houses backup data and protects all configurations of Azure Virtual Machines in this architecture. With a Recovery Services Vault, customers can restore files and folders from an IaaS VM without restoring the entire VM, enabling faster restore times.
Logging and audit
Azure Monitor logs provides extensive logging of system and user activity, as well as system health. The Azure Monitor logs solution collects and analyzes data generated by resources in Azure and on-premises environments.
- Activity Logs: Activity logs provide insight into operations performed on resources in a subscription.
- Diagnostic Logs: Diagnostic logs include all logs emitted by every resource. These logs include Windows event system logs and Azure Blob storage, tables, and queue logs.
- Firewall Logs: The Application Gateway provides full diagnostic and access logs. Firewall logs are available for WAF-enabled Application Gateway resources.
- Log Archiving: All diagnostic logs write to a centralized and encrypted Azure storage account for archival with a defined retention period of 2 days. These logs connect to Azure Monitor logs for processing, storing, and dashboard reporting.
Additionally, the following monitoring solutions are included as a part of this architecture:
- AD Assessment: The Active Directory Health Check solution assesses the risk and health of server environments on a regular interval and provides a prioritized list of recommendations specific to the deployed server infrastructure.
- Antimalware Assessment: The Antimalware solution reports on malware, threats, and protection status.
- Azure Automation: The Azure Automation solution stores, runs, and manages runbooks.
- Security and Audit: The Security and Audit dashboard provides a high-level insight into the security state of resources by providing metrics on security domains, notable issues, detections, threat intelligence, and common security queries.
- SQL Assessment: The SQL Health Check solution assesses the risk and health of server environments on a regular interval and provides customers with a prioritized list of recommendations specific to the deployed server infrastructure.
- Update Management: The Update Management solution allows customer management of operating system security updates, including a status of available updates and the process of installing required updates.
- Agent Health: The Agent Health solution reports how many agents are deployed and their geographic distribution, as well as how many agents which are unresponsive and the number of agents which are submitting operational data.
- Azure Activity Logs: The Activity Log Analytics solution assists with analysis of the Azure activity logs across all Azure subscriptions for a customer.
- Change Tracking: The Change Tracking solution allows customers to easily identify changes in the environment.
The following technologies provide identity management capabilities in the Azure environment:
- Active Directory (AD) can be Microsoft's multi-tenant cloud-based directory and identity management service. All users for the solution were created in Azure Active Directory, including users accessing the SQL Database.
- Authentication to the application is performed using Azure AD. For more information, see Integrating applications with Azure Active Directory. Additionally, the database column encryption uses Azure AD to authenticate the application to Azure SQL Database. For more information, see how to protect sensitive data in SQL Database.
- Azure Active Directory Identity Protection detects potential vulnerabilities affecting an organization’s identities, configures automated responses to detected suspicious actions related to an organization’s identities, and investigates suspicious incidents to take appropriate action to resolve them.
- Azure Role-based Access Control (RBAC) enables focused access management for Azure. Subscription access is limited to the subscription administrator.
To learn more about using the security features of Azure SQL Database, see the Contoso Clinic Demo Application sample.
Secrets management: The solution uses Azure Key Vault for the management of keys and secrets. Azure Key Vault helps safeguard cryptographic keys and secrets used by cloud applications and services.
Malware protection: Microsoft Antimalware for Virtual Machines provides real-time protection capability that helps identify and remove viruses, spyware, and other malicious software, with configurable alerts when known malicious or unwanted software attempts to install or run on protected virtual machines.
Patch management: Windows virtual machines deployed as part of this reference architecture are configured by default to receive automatic updates from Windows Update Service. This solution also includes the Azure Automation service through which updated deployments can be created to patch virtual machines when needed.
Guidance and recommendations
ExpressRoute and VPN
ExpressRoute or a secure VPN tunnel needs to be configured to securely establish a connection to the resources deployed as a part of this data warehouse reference architecture. As ExpressRoute connections do not go over the Internet, these connections offer more reliability, faster speeds, lower latencies, and higher security than typical connections over the Internet. By appropriately setting up ExpressRoute or a VPN, customers can add a layer of protection for data in transit.
Extract-Transform-Load (ETL) process
PolyBase can load data into Azure SQL Data Warehouse without the need for a separate ETL or import tool. PolyBase allows access to data through T-SQL queries. Microsoft's business intelligence and analysis stack, as well as third-party tools compatible with SQL Server, can be used with PolyBase.
Azure Active Directory setup
Azure Active Directory is essential to managing the deployment and provisioning access to personnel interacting with the environment. An existing Windows Server Active Directory can be integrated with AAD in four clicks. Customers can also tie the deployed Active Directory infrastructure (domain controllers) to an existing AAD by making the deployed Active Directory infrastructure a subdomain of an AAD forest.
Although this data warehouse architecture is not intended for deployment to the Azure Commercial environment, similar objectives can be achieved through the services described in this reference architecture, as well as additional services available only in the Azure Commercial environment. Please note that Azure Commercial maintains a FedRAMP JAB P-ATO at the Moderate Impact Level, allowing government agencies and partners to deploy moderately sensitive information to the cloud leveraging the Azure Commercial environment.
Azure Commercial offers a wide variety of services that handle formatted and unformatted data storage and staging to be used in data warehousing, including:
- Azure Data Factory is a managed cloud service that is built for complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects. Using Azure Data Factory, customers can create and schedule data-driven workflows called pipelines that ingest data from disparate data stores. Customers can then process and transform the data for output into data stores such as Azure SQL Data Warehouse.
- Azure Data Lake Store enables the capture of data of any size, type, and ingestion speed in a single place for operational and exploratory analytics. Azure Data Lake Store is compatible with most open source components in the Hadoop ecosystem and integrates nicely with other Azure services such as Azure SQL Data Warehouse.
The data flow diagram (DFD) for this reference architecture is available for download or can be found below:
The Azure Security and Compliance Blueprint – FedRAMP High Customer Responsibility Matrix lists all security controls required by the FedRAMP High baseline. Similarly, the Azure Security and Compliance Blueprint – FedRAMP Moderate Customer Responsibility Matrix lists all security controls required by the FedRAMP Moderate baseline. Both documents detail whether the implementation of each control is the responsibility of Microsoft, the customer, or shared between the two.
The Azure Security and Compliance Blueprint - FedRAMP High Control Implementation Matrix and the Azure Security and Compliance Blueprint - FedRAMP Moderate Control Implementation Matrix provide information on which controls are covered by the data warehouse architecture for each FedRAMP baseline, including detailed descriptions of how the implementation meets the requirements of each covered control.
- This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. This document is provided "as-is." Information and views expressed in this document, including URL and other Internet website references, may change without notice. Customers reading this document bear the risk of using it.
- This document does not provide customers with any legal rights to any intellectual property in any Microsoft product or solutions.
- Customers may copy and use this document for internal reference purposes.
- Certain recommendations in this document may result in increased data, network, or compute resource usage in Azure, and may increase a customer's Azure license or subscription costs.
- This architecture is intended to serve as a foundation for customers to adjust to their specific requirements and should not be used as-is in a production environment.
- This document is developed as a reference and should not be used to define all means by which a customer can meet specific compliance requirements and regulations. Customers should seek legal support from their organization on approved customer implementations.