Pipeline does not run new data

Wozniak, Joanna 1

Hi -
I created and published a pipeline that pulls data from an Azure SQL table, processes, models and then appends the output to an Azure SQL table. The Azure SQL table is updated with new data every day or two. In my script, I want to model on data that has been added two days before today with the following script:

from datetime import date, timedelta
yesterday = date.today() - timedelta(days=2)
yesterday.strftime("%Y-%m-%d")
print(yesterday)

keep data that is 2 days ago only

data_prior = data[data['MatterOpenDate'] == str(yesterday)]
print(data_prior.head())

while True:
answer = data_prior.empty
if answer == False:
print('Continue Process')
break
elif answer == True:
print('Empty dataset')
run.complete()
exit()

When I first ran my pipeline it worked great. I published this experiment, etc. and created a reoccurring schedule to run once a day every day.

BUT the schedule continues to run the exact same data as the original run even when there is new data being uploaded. Why and what do I need to do for the script to run 'naturally' as written?

Thank you

MartinJaffer-MSFT 26,031 Reputation points

2021-10-04T21:14:32.933+00:00

Hello @Wozniak, Joanna and welcome to Microsoft Q&A.

To better assist you, could you help clarify what service or compute you are using?
Currently the question is tagged with Data Factory and Machine Learning. Is your pipeline a Machine Learning pipeline, or a Data Factory pipeline?
Is this code being run on Synapse notebook, or Databricks notebook or Batch compute or some Machine Learning service or something else?
MartinJaffer-MSFT 26,031 Reputation points

2021-10-08T22:58:20.32+00:00

@Wozniak, Joanna I have not heard back from you. Are you still facing the issue?

1 answer

Wozniak, Joanna 1 Reputation point

2021-10-09T14:54:22.36+00:00

I solved it, thank you.
Please sign in to rate this answer.

0 comments No comments
Sign in to comment