Coding alternative to the Azure data factory

Raja 81 Reputation points
2021-09-14T06:08:14.177+00:00

Hi Team,
I have been using Azure Data Factory for various ETL activities; copying from various data sources, transforming and dumping it into other destinations. Sometimes the pipeline I'm creating becomes very complex (its not simple transformation) . For example I need to connect to external server(using REST) then get data , do many steps and finally write to different files. This requires good amount of logic which makes the data factory look very complex and difficult to read. Is there any better alternative where I can code all of this instead of using defined Azure Data activities?

I can write Python/Java program and instead of using those defined boxes I can write my own custom code. May be Synapse or Data bricks or something else can be used. Which is the better alternative (on similar cost)?

Thanks,
Raja

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,624 questions
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 26,031 Reputation points
    2021-09-14T19:12:03.553+00:00

    Hello @Raja and welcome to Microsoft Q&A.

    In addition to @Kuldeep Chitrakar suggestion of Logic Apps and Azure Automate, there is also Azure Functions, which is purely code.

    In Azure Data Factory there is the Custom Activity, where Azure Batch is leveraged to run custom code you provide.

    For big data, Azure Databricks, Azure Synapse Analytics, and HDInsight are better suited. HDInsight is heavyweight and always-on, so probably the most expensive of the three.
    Databricks runs on Spark. There are notebooks where you can write in PySpark, Scala, and SQL.
    Azure Synapse is the most diverse. Azure Synapse is the confluence of Data Factory, Spark (like Databricks), SQL (both on-demand, and dedicated), and much more than I can reliably remember.
    I think both Databricks and Synapse also do Java. I know they accept .jar files.

    Does this help?

    1 person found this answer helpful.
    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Kuldeep Chitrakar 1 Reputation point
    2021-09-14T11:12:07.18+00:00

    You can think of orchestrating the pipelines using LogicApps or Azure Automate

    0 comments No comments

  2. Raja 81 Reputation points
    2021-09-20T09:27:37.98+00:00

    Hi @Kuldeep Chitrakar ,
    Thanks for prompt response. As far as I understand Data factory is good when you want simple ETL job and dont want to code but use simple UI drag and drop stuff.

    Out of all the options you have provided , it seems using "Function" is a good choice as I dont need to run a cluster.

    0 comments No comments