No space left on device

Layla Bitar 5 Reputation points
2023-10-26T07:53:59.63+00:00

Hello,

I am very much of a beginner to Azure and cloud computing.

Therefore, I have two questions:

Currently, I am aiming to train whisper-AI on common voice dataset that is available on hugging face. The dataset is a datasetDict object. I have saved it and converted into json. So now the dataset is bunch of JSON files saved in a folder. I created a data asset for this URI folder and saved it in a datastore. However, I am having the hardest time to access and open the folder using the data asset path, is there any way to access the folder through the python SDK?

Another issue I am having is while I am preparing my data for training ( resampling and extracting features) I get the error of not enough storage ( I am doing this through python SDK jupyter notebook), how can I overcome this issue?

Please I would appreciate the assistance,

Thanks!

Layla

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,563 questions
Azure Data Science Virtual Machines
Azure Data Science Virtual Machines
Azure Virtual Machine images that are pre-installed, configured, and tested with several commonly used tools for data analytics, machine learning, and artificial intelligence training.
67 questions
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 42,191 Reputation points Microsoft Employee
    2023-10-26T11:42:53.6133333+00:00

    @Layla Bitar I think you can follow the guidance from this page to load your data asset and mount it before using it in your job. You should be able to achieve this using the below snippet. Ex:

    from azure.ai.ml import command, Input, Output, MLClient
    from azure.ai.ml.constants import AssetTypes, InputOutputModes
    from azure.identity import DefaultAzureCredential
    
    # Set your subscription, resource group and workspace name:
    subscription_id = "<SUBSCRIPTION_ID>"
    resource_group = "<RESOURCE_GROUP>"
    workspace = "<AML_WORKSPACE_NAME>"
    
    # connect to the AzureML workspace
    ml_client = MLClient(
        DefaultAzureCredential(), subscription_id, resource_group, workspace
    )
    
    data_type = AssetTypes.URI_FOLDER
    input_mode = InputOutputModes.RO_MOUNT 
    
    input_path = "azureml://datastores/workspaceblobstore/paths/input-folder/" 
    output_path = "azureml://datastores/workspaceblobstore/paths/output-folder/" 
    
    inputs = {
        "input_data": Input(type=data_type, path=input_path, mode=input_mode)
    }
    
    outputs = {     
         "output_data": Output(type=data_type, path=output_path, mode=output_mode)
    }
    
    
    job = command(
        command="cp ${{inputs.input_data}} ${{outputs.output_data}}",
        inputs=inputs,
        outputs=outputs,
        environment="azureml://registries/azureml/environments/sklearn-1.1/versions/4",
        compute="cpu-cluster",
    )
    
    # Submit the command
    ml_client.jobs.create_or_update(job)
    
    

    With respect to the space issue, I think you might be using input mode as DOWNLOAD instead of MOUNT. Try the MOUNT option and check if it goes through.

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments