I have a set of custom Python and R scripts to execute a machine learning pipeline for analysis purposes. The steps are as follows:
With Python, Convert Datastore data to csv that gets saved in the compute instance directory
With R, pull in the csv data and run some feature engineering, build a machine learning model, and save scoring data to directory as csv
Push csv data back to Datastore
The first issue I'm encountering in Designer is the first step, saving Datastore data as csv to the compute instance directory. I created a function called azureml_main() that internally pulls in the Datastore data and saves it as csv to the directory. I have run the code that's inside the function a bunch of times but when I try to have it run in the Python script node in Designer it fails.
Error message:
AmlExceptionMessage:User program failed with FailedToEvaluateScriptError: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
Got exception when invoking script at line 22 in function azureml_main: 'AuthenticationException: Unknown error occurred during authentication. Error detail: Unexpected polling state code_expired'.
---------- End of error message from Python interpreter ----------
ModuleExceptionMessage:FailedToEvaluateScript: The following error occurred during script evaluation, please view the output log for more information:
---------- Start of error message from Python interpreter ----------
Got exception when invoking script at line 22 in function azureml_main: 'AuthenticationException: Unknown error occurred during authentication. Error detail: Unexpected polling state code_expired'.
---------- End of error message from Python interpreter ----------
// Python script inside Python node in Designer.
// The script MUST contain a function named azureml_main
// which is the entry point for this module.
import pandas as pd
// The entry point function MUST have two input arguments.
// If the input port is not connected, the corresponding
// dataframe argument will be None.
// Param<dataframe1>: a pandas.DataFrame
// Param<dataframe2>: a pandas.DataFrame
def azureml_main(dataframe1 = None, dataframe2 = None):
# Azure management
from azureml.core import Workspace, Dataset
# MetaData
subscription_id = '09b5fdb3-165d-4e2b-8ca0-34f998d176d5'
resource_group = 'xCloudData'
workspace_name = 'xCloudML'
# Create workspace
workspace = Workspace(subscription_id, resource_group, workspace_name)
# 1. Retention_Engagement_CombinedData
dataset = Dataset.get_by_name(workspace, name='retention-engagement-combineddata')
# Save data to file
df = dataset.to_pandas_dataframe()
df.to_csv('/mnt/batch/tasks/shared/LS_root/mounts/clusters/v-aantico1/code/RetentionEngagement_CombinedData.csv')
# 2. TitleNameJoin
dataset = Dataset.get_by_name(workspace, name='TitleForJoiningInR')
# Save data to file
df = dataset.to_pandas_dataframe()
df.to_csv('/mnt/batch/tasks/shared/LS_root/mounts/clusters/v-aantico1/code/TitleNameJoin.csv')
azureml_main()