question

AdrianAnticoTEKsystemsInc-1526 avatar image
0 Votes"
AdrianAnticoTEKsystemsInc-1526 asked GiftA-MSFT answered

Designer python script to save Dataset data as csv to compute instance directories

I have a set of custom Python and R scripts to execute a machine learning pipeline for analysis purposes. The steps are as follows:

  1. With Python, Convert Datastore data to csv that gets saved in the compute instance directory

  2. With R, pull in the csv data and run some feature engineering, build a machine learning model, and save scoring data to directory as csv

  3. Push csv data back to Datastore

The first issue I'm encountering in Designer is the first step, saving Datastore data as csv to the compute instance directory. I created a function called azureml_main() that internally pulls in the Datastore data and saves it as csv to the directory. I have run the code that's inside the function a bunch of times but when I try to have it run in the Python script node in Designer it fails.

Error message:

AmlExceptionMessage:User program failed with FailedToEvaluateScriptError: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
Got exception when invoking script at line 22 in function azureml_main: 'AuthenticationException: Unknown error occurred during authentication. Error detail: Unexpected polling state code_expired'.
---------- End of error message from Python interpreter ----------

ModuleExceptionMessage:FailedToEvaluateScript: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
Got exception when invoking script at line 22 in function azureml_main: 'AuthenticationException: Unknown error occurred during authentication. Error detail: Unexpected polling state code_expired'.
---------- End of error message from Python interpreter ----------


// Python script inside Python node in Designer.
// The script MUST contain a function named azureml_main
// which is the entry point for this module.

import pandas as pd

// The entry point function MUST have two input arguments.
// If the input port is not connected, the corresponding
// dataframe argument will be None.
// Param<dataframe1>: a pandas.DataFrame
// Param<dataframe2>: a pandas.DataFrame

def azureml_main(dataframe1 = None, dataframe2 = None):
    # Azure management
    from azureml.core import Workspace, Dataset

    # MetaData
    subscription_id = '09b5fdb3-165d-4e2b-8ca0-34f998d176d5'
    resource_group = 'xCloudData'
    workspace_name = 'xCloudML'

    # Create workspace 
    workspace = Workspace(subscription_id, resource_group, workspace_name)

    # 1. Retention_Engagement_CombinedData
    dataset = Dataset.get_by_name(workspace, name='retention-engagement-combineddata')

    # Save data to file
    df = dataset.to_pandas_dataframe()
    df.to_csv('/mnt/batch/tasks/shared/LS_root/mounts/clusters/v-aantico1/code/RetentionEngagement_CombinedData.csv')

    # 2. TitleNameJoin
    dataset = Dataset.get_by_name(workspace, name='TitleForJoiningInR')

    # Save data to file
    df = dataset.to_pandas_dataframe()
    df.to_csv('/mnt/batch/tasks/shared/LS_root/mounts/clusters/v-aantico1/code/TitleNameJoin.csv')



azureml_main()




azure-machine-learning
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

GiftA-MSFT avatar image
1 Vote"
GiftA-MSFT answered

Hi, thanks for reaching out. There's no need to re-authenticate inside the Execute Python Script module, instead include the following:

     from azureml.core import Run
     run = Run.get_context(allow_offline=True)
     #access to current workspace
     ws = run.experiment.workspace

Hope this helps!


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.