question

MaryiaT-3017 avatar image
0 Votes"
MaryiaT-3017 asked SaurabhSharma-msft commented

DataFactory: Activites stuck in the queued state

We have a number of DataFactory pipelines using Self-Hosted Integration Runtime for Looking up data from Azure SQL/SQL Servers and copying data from various sources based on the lookups response. Normally, lookups take seconds (1-5 seconds, a small SQL query on a table of <1000 records), copy duration usually varies from minutes to a couple of hours. This setup has been operating for over a year now with the same amount of jobs/data.

Recently, the following issue has been arising on an almost daily basis: all triggered activities are marked as 'Queued', whereas there are 0 activities in progress and IR CPU utilization is 0%:

131595-image.png

It doesn't help when the queued pipelines are cancelled, the queue length usually stays the same. This state can last for days (48 hours of weekend at least) basically blocking our normal operations. Restart of the IR helps - after it some of the queued activites transfer into "in progress" and everything starts operating as normal.

Obviously, manual restart multiple times a day is not an option. We would also like to understand why it's happening and how to prevent this kind of "deadlock" happening in the future. Thanks in advance.

azure-data-factory
image.png (12.3 KiB)
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @maryiatsiaseika-3017,

Thanks for using Microsoft Q&A!!
It looks like the self-hosted IR might be running at it’s full potentially and couldn’t accept the new jobs. Can you please try scaling your IR as mentioned in the documentation - Scale considerations.
Also, could you please share your PipelineRunId, ActivityId to so that I can try checking the backend logs for any other clues.

Thanks
Saurabh

0 Votes 0 ·

Hi Saurabh, thanks for your prompt reply.

I would agree with you if CPU utilization was really high (but it's 0%) or number of concurrent running jobs was equal to the available/limit (but again - it's 0 according to the IR monitor. Sometimes it can be 1or 2, but it's still far from the limit, while queue just keeps growing).

Here is a quick example of an activity being stuck in the queue for a long time: PipelineRunId == '4cc0e772-e0fd-4deb-a1a6-b5f30a612a52', ActivityRunId == '94157e45-251e-408f-a69b-28a8a13e0d4a'; Can provide more if required.

Thanks a lot!
Maria

0 Votes 0 ·

Hi @maryiat-3017,

Thanks for sharing and yes, you are right, I checked in backend logs and it doesn't look like a capacity issue. I think you may need to create a support ticket so that an engineer can look into environment setup and SHIR logs to troubleshoot it in a better way. In case you have any limitations opening a support ticket please let me know and I will help providing a one time free support ticket.

Thanks
Saurabh

0 Votes 0 ·

0 Answers