question

schwarze-7702 avatar image
1 Vote"
schwarze-7702 asked SJ-1376 answered

UserScriptFilledDisk - what disk?

I am using AZ Machine Learning and have been running python scripts on VMs to train mnist and output some summary statistics on the trained networks. It worked fine for the first few jobs, but when I submitted a few more, all of them failed with a USerScriptFilledDisk error:

"UserError: AzureMLCompute job failed. UserScriptFilledDisk: User script filled the disk. Consider using VM SKU with larger disk size. If the issue persists contact Azure Support."

I am using nodes with only 7GB disk space, but it still does not make sense to me that I should have exceeded that just with mounting mnist and writing less than 1MB of numpy arrays to './outputs/'. The problem does not seem to be specific to any one or few nodes on my cluster. I made a new cluster and tried running my scripts on it. It still throws the same error. So how can I find out what disk I have filled up and how do fix it and keep it from happening again?

Thanks in advance!

More details:

I created an Azure machine learning compute cluster

 compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "cpu-main1")
 compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
 compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 100)
 vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", "STANDARD_DS1_V2")
    
 if compute_name in ws.compute_targets:
     compute_target = ws.compute_targets[compute_name]
 else:
     provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
                                                                 min_nodes = compute_min_nodes, 
                                                                 max_nodes = compute_max_nodes)
    
     compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)    
     compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

I added a data set

 workspace = Workspace(subscription_id, resource_group, workspace_name)
 dataset = Dataset.get_by_name(workspace, name='mnist_zip')
 dataset.download(target_path='.', overwrite=True)

 dataset = dataset.register(workspace=ws,
                            name='mnist_zip',
                            description='zip file with preprocesses mnist data set',
                            create_new_version=False)

I submitted jobs to the cluster

 runs = [ 0 for _ in range(30)]
 for i in range(30):
     args = ['--dataset', dataset.as_mount(), '--id', i] 
     #also tried '.as_download()' - did not seem to make a difference
     src = ScriptRunConfig(source_directory=script_folder,
                           script='script.py', 
                           arguments=args,
                           compute_target=compute_target,
                           environment=env)

     runs[i] = exp.submit(config=src)



azure-machine-learning
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@schwarze-7702 Thanks for the question. Can you please add more details about the region of Training compute targets that you are trying. Also please add details about the attached compute to your workspace and trying reuse it for multiple jobs?


0 Votes 0 ·

Thanks for your comment! I added some more detail to my original post.

0 Votes 0 ·

Also, I am using Azure ML compute clusters. I think the region is WestUS.

0 Votes 0 ·
ramr-msft avatar image
0 Votes"
ramr-msft answered

@schwarze-7702 Thanks for the details. We would recommend to raise a Azure support desk ticket from Help+Support blade from Azure portal for your resource. This will help you to share the details securely and work with an engineer who can provide more insights about the issue that if it can be replicated.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SJ-1376 avatar image
0 Votes"
SJ-1376 answered

Is there any update regarding this question? I experience the same issue, I am trying to register the dataset and UserScriptFilledDisk error occurs without any specific reason during the operation

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.