I posted a similar question last week and didn't get a response to that yet so I'm posting another one now.
The code below is what I use to pull data into the compute instance from the Datastore. I transfer data from a Datastore to the compute instance and then save the data to my directory as a csv. The data originates from a SCOPE script and is transferred from Cosmos to the Datastore via Azure Data Factory.
Once the data is in the directory as a csv, I then utilize R to pull in the data into an RStudio session and then I run various tasks that create new data sets. I also save these new data sets to the compute instance directory as csv's. These new data sets are the ones I'd like to push back to the Datastore so they can be transferred elsewhere via Azure Data Factory and later consumed by a PowerBI app we're looking to create.
I tried using Designer and it ran for 4 days without completing before I cancelled the job and started looking for an alternative route. I don't know if it would have completed or if it ran into memory issues and simply didn't fail. When I pull data into the compute instance from the datastore it takes less than a few minutes to complete so I'm not sure why it would take Designer multiple days to attempt to do the reverse operation.
I've looked through a bunch of documentation and I am not able to find anything that tells us how we can transfer data from the compute instance back to the Datastore aside from Designer which is too slow or unable to handle.
This task seems like one that should be obvious for use and a major selling point of Azure Machine Learning so I'm a bit dumbfounded to see that this is a challenge figuring out how to do and that the documentation doesn't clearly show users how to achieve this task, assuming it's even possible. If it's not possible then I need to figure out a whole new system to use to get my work done. If it's not possible, the Azure Machine Learning team should enable this functionality as soon as possible.
# Azure management from azureml.core import Workspace, Dataset # MetaData subscription_id = '09b5fdb3-165d-4e2b-8ca0-34f998d176d5' resource_group = 'xCloudData' workspace_name = 'xCloudML' # Create workspace workspace = Workspace(subscription_id, resource_group, workspace_name) # 1. Retention_Engagement_CombinedData dataset = Dataset.get_by_name(workspace, name='retention-engagement-combineddata') # Save data to file df = dataset.to_pandas_dataframe() df.to_csv('/mnt/batch/tasks/shared/LS_root/mounts/clusters/v-aantico1/code/RetentionEngagement_CombinedData.csv') # 2. TitleNameJoin dataset = Dataset.get_by_name(workspace, name='TitleForJoiningInR') # Save data to file df = dataset.to_pandas_dataframe() df.to_csv('/mnt/batch/tasks/shared/LS_root/mounts/clusters/v-aantico1/code/TitleNameJoin.csv')