Deploy and configure OMOP analytics in healthcare data solutions (preview)

Artigo
03/14/2024

[This article is prerelease documentation and is subject to change.]

OMOP analytics enables data preparation for standardized analytics through Observational Medical Outcomes Partnership (OMOP) open community standards. You can deploy and configure the capability after deploying healthcare data solutions (preview) to your Fabric workspace and the Healthcare data foundations capability.

OMOP analytics is an optional capability under healthcare data solutions in Microsoft Fabric (preview). You have the flexibility to decide whether or not to use it, depending on your specific needs or scenarios.

Deployment prerequisites

OMOP analytics has a direct dependency on the Healthcare data foundations capability. Hence, make sure you successfully deploy, configure, and execute the Healthcare data foundations pipelines first. For more information, go to Deploy and configure Healthcare data foundations.
To run the provided sample notebooks, ensure you download the sample data in your environment as explained in Deploy sample data.

Deploy OMOP analytics

To deploy the OMOP analytics capability to your workspace, follow these steps:

Navigate to the healthcare data solutions home page on Fabric.
Select the OMOP analytics tile.
On the capability page, select Deploy to workspace.
The deployment can take a few minutes to complete. Refrain from closing the tab or the browser while the deployment is in progress. In the meantime, you can work in another tab.
After the deployment completes, you'll be notified. Select the Manage capability button from the message bar to navigate to the capability management page. Here, you can view, configure, and manage the following deployed notebooks:
- healthcare#_msft_silver_omop
- healthcare#_msft_omop_sample_drug_exposure_era
- healthcare#_msft_omop_sample_drug_exposure_insights

Configure the OMOP silver notebook

The healthcare#_msft_silver_omop notebook uses the OMOP APIs shipped as part of the healthcare data solutions (preview) library for data transformation. The notebook transforms resources in the healthcare#_msft_silver lakehouse into OMOP common data model. The parameter value silver_database_name in the healthcare#_msft_config notebook defines the silver lakehouse identifier. The transformed data is inserted into the OMOP lakehouse, defined in the same config notebook as omop_database_name.

By default, you aren't expected to make any changes to the notebook configuration file. In case you prefer pointing to different source and target lakehouses, you can change the values in the healthcare#_msft_config_notebook as explained in Configure the global configuration notebook.

We recommend scheduling this notebook job to run every 4 hours. The initial run might not have data to consume due to concurrent and dependent jobs, leading to latency. Adjusting the frequency of higher layer jobs can reduce this latency.

To learn more about the notebook execution, see Use OMOP analytics.

Configure the drug exposure era sample notebook

The healthcare#_msft_omop_sample_drug_exposure_era sample notebook demonstrates the process of generating the drug_era table records in OMOP using the PySpark (Python) language in an Azure Synapse Analytics notebook, primarily for exploratory purposes. The drug_era table records generation follows the OHDSI drug era sample script, which is adapted to work with PySpark in Azure Synapse Analytics. The drug era generator code is included in the custom Python library, which is packaged as a wheel (.WHL) file and uploaded to an Apache Spark pool for easy access.

Before executing the notebook, keep the following prerequisites in mind:

Ensure that the OMOP database has valid data in the following tables:
- drug_exposure
- concept
- concept_ancestor
You can generate this data using the sample data or your own data by running the FHIR to OMOP pipeline.
Ensure the custom library wheel package is attached to the Spark pool that you use to run this notebook.

The key configuration parameter for this notebook is the omop_database_name. This parameter identifies the name of the OMOP database that contains the data for generating the drug_era table. Update this value only if your OMOP database differs from the default database configured in the healthcare#_msft_config_notebook global configuration notebook.

If the OMOP drug_exposure table populates with valid data, this notebook invokes the DrugEraGenerator module that strings together periods of time that a person is exposed to an active drug ingredient, allowing for a gap of 30 days. The DrugEraGenerator module deletes all the existing drug_era records and generates new records, based on the latest OMOP data.

To learn more about the notebook execution, go to Use the OMOP analytics sample notebooks.

Configure the drug exposure insights sample notebook

The healthcare#_msft_omop_sample_drug_exposure_insights sample notebook demonstrates an exploratory analysis on the drug_era table using PySpark in an Azure Synapse Analytics notebook. The analysis generates a histogram displaying patients' secondary drug exposures to active ingredients, stratified by gender and age for a specific year. The drug_era table is generated using a custom library DrugEraGenerator that the previous notebook healthcare#_msft_omop_sample_drug_exposure_era invokes. This analysis extends the Drug exposure query DEX03: Distribution of age, stratified by drug by incorporating stratification based on both gender and age.

Before executing the notebook, keep the following prerequisites in mind:

If you wish to edit the notebook configuration, ensure you make a copy of this notebook. Don't update the notebook directly.
Ensure the drug_era table contains data by executing the drug exposure era notebook. Running this notebook replaces any existing drug_era records with new records, based on the latest OMOP data.
Use this notebook as-is for exploratory analysis and create a copy to perform custom analysis.

Following are the key notebook configuration parameters. You can modify these parameters for alternative exploratory analysis on patient drug exposures:

primary_drug_concept_id: The primary active ingredient exposure for patients.
secondary_drug_concept_id: The secondary active ingredient exposure for patients.
year: The target year during which patients were actively exposed to both the primary and secondary drugs.