question

ewinkiser avatar image
0 Votes"
ewinkiser asked HimanshuSinha-MSFT commented

What is the best way to Divide up a Table for access to ADF due to Look Up Limitation

Hello!
@nasreen-akter @MartinJaffer-MSFT @KranthiPakala-MSFT

I have a DB Table the size of the following. We must load ALL employees however my Lookups will only take about 3000 rows (huge number of columns). What is the best way for me to break this table of 8452 rows into batches of 3000 rows so I can run my pipelines.
Thanks!
Mike

102598-image.png


azure-data-factory
image.png (54.2 KiB)
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

What is your objective? You seem to be doing an operation that would likely work better inside a data flow, not a pipeline lookup.

0 Votes 0 ·

Thanks!

The objective is to get the DB Table reduced to where the Lookup Activity doesn't fail due to its limitations.
Mike

0 Votes 0 ·

I meant, what is the goal of your pipeline? If you are looking up 1000s of wide rows, you should look at performing that operation inside a data flow which utilizes Spark compute and does have those limitations. The Lookup pipeline activity is not intended to work on large datasets in that way.

1 Vote 1 ·

1 Answer

HimanshuSinha-MSFT avatar image
0 Votes"
HimanshuSinha-MSFT answered HimanshuSinha-MSFT commented

Hello @ewinkiser ,
Thanks for the ask and using the Microsoft Q&A platform .
As @MarkKromer-MSFT suggested you can use ADF mapping data flow , but in case if you want to use ADF pipeline you can try to use the the waterrmark logic .

  1. Create a watermark table at the Source . Let the table have the field may be a employeeid /somedate ( depends on which field you want to use )

  2. Read the value of the watermark table using a lookup .

  3. In your pipeline add a for loop .

  4. Add a lookup activity , *** The select statement should be dynamic something on the lines 'SELECT top 5000 fieldnames where watermark = LookupActivityinStep2.value"

  5. Your lookup should have all the values for 5000 records . Now you can perform your logic here . If you intend to use an inner loop here ( which is not supported ) please use a execute pipeline and implement the FE In the child pipeline

  6. Now get the max values of the watermark field and update the watermark table .

  7. In the next iteration the dynamic expression which only pick rows greater the value which is in the watermark table .
    Hope this helps

Please do let me know how it goes .
Thanks
Himanshu
Please do consider clicking on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Just to be clear ... this is a "pipeline" approach vs. a "data flow" approach. BOTH are ADF features.

1 Vote 1 ·

Thanks @MarkKromer-2402 for correcting me on that . I have updated the intial post .

Thanks
Himanshu

0 Votes 0 ·

Hello @ewinkiser ,,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet .In case if you have any resolution please do share that same with the community as it can be helpful to others . Otherwise, will respond back with the more details and we will try to help .
Thanks
Himanshu

0 Votes 0 ·