How do I create a dynamic SQL query for Cosmos DB using Azure Data Factory?

Andrew Hopkinson 6 Reputation points
2024-04-30T13:18:23.7733333+00:00

I am doing a simple data flow where I have a source from a cosmos db container, I want to get the latest data that is greater than a _ts value that I store in a variable (SELECT * FROM c WHERE c_ts > @variables('LastRunDateTS')). How can I pass this to query in the source for the cosmos query? I see there are parameters, but I don't want to hard code the values, I would like these to come from a variable and it gives errors when I try and reference a variable inside the query. Please help, thanks.

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,463 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,684 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Andrew Hopkinson 6 Reputation points
    2024-05-02T15:54:53.0133333+00:00

    This is what I had to do get it to work, the suggestions provided in this post do not work for me,First I created a variable the LastRunDateString

    Then I created a parameter "parameter2" for the "Data Flow" in this value I put the following pipeline express:

    @concat('SELECT * FROM c WHERE c._ts > ',variables('LastRunTimeString'))
    

    User's image

    This is a little wierd to get to work because there is also a parameter you need to add to your actual "dataflow" task (which is confusing). Then in the expression builder for the CosmosDB source SQL stateme, I put $parameter2 and it seems to work.

    User's image


  2. phemanth 6,475 Reputation points Microsoft Vendor
    2024-05-02T16:38:13.01+00:00

    @Andrew Hopkinson

    Welcome to Microsoft Q&A platform and thanks for posting your question.I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others "I'll repost your solution in case you'd like to accept the answer.

    Ask: I am doing a simple data flow where I have a source from a cosmos db container, I want to get the latest data that is greater than a _ts value that I store in a variable (SELECT * FROM c WHERE c_ts > @variables('LastRunDateTS')). How can I pass this to query in the source for the cosmos query? I see there are parameters, but I don't want to hard code the values, I would like these to come from a variable and it gives errors when I try and reference a variable inside the query. Please help, thanks.

    Solution: First I created a variable the LastRunDateString

    Then I created a parameter "parameter2" for the "Data Flow" in this value I put the following pipeline express:

    @concat('SELECT * FROM c WHERE c._ts > ',variables('LastRunTimeString'))
    

    User's image

    This is a little wierd to get to work because there is also a parameter you need to add to your actual "dataflow" task (which is confusing). Then in the expression builder for the CosmosDB source SQL stateme, I put $parameter2 and it seems to work.

    User's image

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.