question

IyanuloluwaEmmanuel-9324 avatar image
0 Votes"
IyanuloluwaEmmanuel-9324 asked SamaraSoucy-MSFT answered

How to Stream Data from Oracle onprem in Realtime to Azure synapse?

I am trying to deploy a real time ETL solution using Microsoft Azure but I am having challenges.

My data source is oracle database, with Integration Runtime installed on the server with the database

I want to pick data from an Oracle Database as they come in (real time) and send it to Google Big Query and Azure Managed Elasticsearch service, using data lake as a staging area.

What I have done currently:
-is to have Integration Runtime installed on the server with the Oracle database and a copy activity in my ADF pipeline that picks yesterday's data from the Oracle database (which was configured in my select script).

-then I have a python script that picks the data from data lake, transforms and sends it to Google Big query and elasticsearch, which is run on Azure databricks and I run the databricks notebook in Azure data factory pipeline.

So my ADF pipeline has an activity that picks data from Oracle and sends it to Data Lake, then another activity runs the databricks notebook.

But I am having issues configuring the streaming from the Oracle database, how do I tell ADF to trigger the pipeline in real time when there's new data in the Oracle database? How do I go about it?

azure-synapse-analytics
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

SamaraSoucy-MSFT avatar image
0 Votes"
SamaraSoucy-MSFT answered

The short answer is that Synapse Pipelines/Data Factory isn't really meant for real time data processing, but there are a couple options. The simplest would be to change your daily data load into one that triggers every few minutes to check for new data. If your delay tolerance is a minute or more, this would probably be fine and would be simpler to implement. Option two is to use an Event Grid trigger and configure a change feed solution of some form on your Oracle database that can send messages to Event Grid when there is an update that needs to be picked up.

Depending on what you pipeline does, you may also want to consider Logic Apps or possibly Functions instead of Pipelines. Synapse Pipelines/ADF are built to handle periodically processing large amounts of data- your daily pipeline is exactly that. When it comes to constantly moving small amounts of data, Logic Apps is often a better fit and it does offer and on-premises connector. If it has all the capabilities you need, it may be worth a bit of time putting your scenario through the pricing calculator to see which would be more cost effective for you. From a trigger standpoint you will have the same two strategies though the Logic Apps timer trigger can trigger multiple times a minute, so it would be possible to get that option closer to real time.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.