Analyze observational patient data by using OHDSI with the OMOP CDM

Azure Pipelines
Azure SQL Database
Azure Virtual Machine Scale Sets
Azure Blob Storage
Azure Container Registry

Observational Health Data Sciences and Informatics (OHDSI) created and maintains the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) standard and associated OHDSI software tools to visualize and analyze clinical health data. These tools facilitate the design and execution of analyses on standardized, patient-level, observational data.

OHDSI on Azure allows organizations that want to use the OMOP CDM and the associated analytical tools to easily deploy and operate the solution on the Azure platform.

"Terraform" is either a registered trademark or a trademark of HashiCorp in the United States and/or other countries. No endorsement by HashiCorp is implied by the use of this mark.

Architecture

Diagram that shows an architecture for analyzing patient data by using OHDSI.

Download a Visio file of this architecture.

The preceding diagram illustrates the solution architecture at a high level. The solution is made up of two major resource groups:

  • Bootstrap resource group. Contains a foundational set of Azure resources that support the deployment of the OMOP resource group.
  • OMOP resource group. Contains the OHDSI-specific Azure resources.

Azure Pipelines orchestrates all deployment automation.

This article is primarily intended for DevOps engineering teams. If you plan to deploy this scenario, you should have experience with the Azure portal and Azure DevOps.

Workflow

  1. Deploy the Bootstrap resource group to support the resources and permissions needed for deployment of the OHDSI resources.
  2. Deploy the OMOP resource group for the OHDSI-specific Azure resources. This step should complete your infrastructure-related setup.
  3. Provision the OMOP CDM and vocabularies to deploy the data model and populate the OMOP controlled vocabularies into the CDM in Azure SQL.
  4. Deploy the OHDSI applications:
    1. Set up the Atlas UI and WebAPI by using the BroadSea WebTools image. Atlas is a web UI that integrates features from various OHDSI applications. It's supported by the WebAPI layer.
    2. Set up Achilles and Synthea by using the BroadSea Methods image. Achilles is an R-based script that runs data characterization and quality assessments on the OMOP CDM. The Synthea ETL script is an optional tool that enables users to load synthetic patient data into the OMOP CDM.

Components

  • Microsoft Entra ID is a multitenant cloud-based directory and identity management service. Microsoft Entra ID is used to manage permissions for environment deployment.
  • Azure Pipelines automatically builds and tests code projects. This Azure DevOps service combines continuous integration (CI) and continuous delivery (CD). Azure Pipelines uses these practices to constantly and consistently test and build code and ship it to any target. Pipelines define and run this deployment approach for OHDSI on Azure.
  • Azure Virtual Machine Scale Sets enable you to create and manage a group of heterogeneous load-balanced virtual machines (VMs). These VMs coordinate the deployment of the environment.
  • Azure Blob Storage is a storage service that's optimized for storing massive amounts of unstructured data. Blob Storage is used to store the Terraform state file and the raw OMOP vocabulary files (before ingestion into the CDM).
  • Azure Key Vault is an Azure service for storing and accessing secrets, keys, and certificates with improved security. Key Vault provides HSM-backed security and audited access through role-based access controls that are integrated with Microsoft Entra ID. In this architecture, Key Vault stores all secrets, including API keys, passwords, cryptographic keys, and certificates.
  • Azure SQL Database is a fully managed platform as a service (PaaS) database engine. SQL Database handles database management functions like upgrading, patching, backups, and monitoring. This service houses the OMOP CDM and all associated relational data.
  • Azure Web Application Firewall helps protect applications from common web-based attacks like OWASP vulnerabilities, SQL injection, and cross-site scripting. This technology is cloud native. It doesn't require licensing and is pay-as-you-go.
  • Azure Container Registry enables you to build, store, and manage container images and artifacts in a private registry for all types of container deployments. In this solution, it stores OHDSI application images (BroadSea WebTools and BroadSea Methods) for deployment into Azure App Service.
  • Azure App Service is an HTTP-based service for hosting web applications, REST APIs, and mobile back ends. This service supports the OHDSI WebAPI and Atlas applications.

Alternatives

If you require more scalability or control, consider these alternatives:

Scenario details

The ability to federate, harmonize, visualize, segment, and analyze clinical patient data has rapidly become a popular use case in the healthcare industry. Many organizations, including academic institutions, government agencies, and organizations in the private sector, are looking for ways to use their patient health data to accelerate research and development. Unfortunately, most IT teams struggle to collaborate effectively with researchers to provide a work environment where researchers can feel productive and empowered.

OHDSI is an initiative that includes thousands of collaborators in over 70 countries/regions. It offers one of the few available solutions in an open-source format for researchers. OHDSI created and maintains the OMOP CDM standard and associated OHDSI software tools to visualize and analyze clinical health data.

Potential use cases

Several types of healthcare organizations can benefit from this solution, including:

  • Academic institutions that want to enable scientific researchers to tackle observational cohort studies by using clinical data.
  • Governmental agencies that want to federate large amounts of disparate data sources to accelerate scientific discovery.
  • Private sector companies that want to streamline the identification of potential patients for clinical trials.

Considerations

These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that you can use to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.

Reliability

Reliability ensures your application can meet the commitments you make to your customers. For more information, see Overview of the reliability pillar.

SQL Database includes zone-redundant databases, failover groups, geo-replication, and automatic backups. These features allow your application to continue running if there's a maintenance event or outage. For more information, see Azure SQL Database availability capabilities.

You might want to consider using Application Insights to monitor the health of your application. With Application Insights, you can generate alerts and respond to performance problems that affect the customer experience. For more information, see What is Application Insights?.

For more information about reliability, see Designing reliable Azure applications.

Security

Security provides assurances against deliberate attacks and the abuse of your valuable data and systems. For more information, see Overview of the security pillar.

This scenario uses Managed identities for Azure resources, which provide an identity for an application to use when it connects to resources that support Microsoft Entra authentication. Managed identities eliminate the need to manage secrets and credentials for each Azure resource.

SQL Database uses a layered approach to help protect customer data. It covers network security, access management, threat protection, and information protection. For more information on SQL Database security, see Azure SQL Database security and compliance.

If high-security networking is a critical requirement, consider using Azure Private Link to connect App Service to Azure SQL. Doing so removes public internet access to the SQL database, which is a commonly used attack vector. You can also use private endpoints for Azure Storage to access data over an Azure private link with increased security. These implementations aren't currently included in the solution, but you can add them if you need to.

For general guidance on designing secure solutions, see the Azure Security documentation.

Cost optimization

Cost optimization is about reducing unnecessary expenses and improving operational efficiencies. For more information, see Overview of the cost optimization pillar.

To better understand the cost of running this scenario on Azure, use the Azure pricing calculator. This estimate uses the default configuration of all Azure resources deployed via infrastructure as code. These cost estimates can change based on the size of your data and because of other resources in your organization that might be shared, like Microsoft Entra ID or Azure DevOps.

Performance efficiency

Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. For more information, see Performance efficiency pillar overview.

This scenario uses App Service, which you can optionally use to automatically scale the number of instances that support the Atlas UI. This functionality allows you to keep up with end-user demand. For more information about autoscaling, see Autoscaling best practices.

For more information, see Performance efficiency checklist.

Deploy this scenario

See these resources for more information on deploying an OHDSI tool suite and for additional detailed documentation:

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal authors:

Other contributors:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps