Using cached data from one data flow activity as the source for another data flow activity

David Lang 20 Reputation points
2024-05-15T14:57:15.3366667+00:00

On this reference page...

https://learn.microsoft.com/en-us/azure/data-factory/data-flow-sink

...MS says, "Write to activity output The cached sink can optionally write your output data to the input of the next pipeline activity. This will allow you to quickly and easily pass data out of your data flow activity without needing to persist the data in a data store."

This is exactly what I want to do but cannot figure out how to reference my cached data set from the first data flow in the second data flow.

I have been told to add a source transformation in the second data flow and select "Cached" as the source type (see Q&A's AI Assist answer) but when I do this there is no Cached option in the UI:

User's image

I have also been told to use a Lookup activity but when I add this to the second data flow and then reference the Cached lookup section in expression builder I get no options...

User's image

Can someone give me very specific steps on how this is done?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,805 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Q&A Assist
    2024-05-15T14:57:15.3566667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    To reference cached data from one data flow activity as the source for another data flow activity, you can follow these steps:

    1. In the first data flow activity, add a sink transformation and select "Cache" as the sink type.
    2. In the second data flow activity, add a source transformation and select "Cached" as the source type.
    3. In the "Cached" source transformation, select the first data flow activity as the source data flow.
    4. Map the columns from the cached data to the input of the second data flow activity.

    This will allow you to use the output of the first data flow activity as the input for the second data flow activity without persisting the data in a data store.

    References: