How to delete the local copy of an uploaded blob on Python, Azure Batch Services.

Sampat, Varun 231 Reputation points
2020-12-03T21:37:15.19+00:00

Hello.

I recently followed the attached tutorial, and it worked perfectly with the provided code. (link: https://learn.microsoft.com/en-us/azure/batch/tutorial-run-python-batch-azure-data-factory)

I realised that Pandas stores a local copy on the VMs used as part of the Batch Service, so I thought it might be good practise to delete the file just uploaded, and tried the following piece of code:

container_client = blob_service_client.get_container_client(containerName)  
with open(output_file_name, "rb") as data:  
    blob_client = container_client.upload_blob(name = output_file_name,   
    data=data)  
    if os.path.exists(output_file_name):  
        os.remove(output_file_name)  
    else:  
        pass  

However, my batch output error file (stderr.txt) returned the following error:

  • Traceback (most recent call last):
    File "main.py", line 44, in <module>
    os.remove(output_file_name)
    PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'iris_setosa.csv'

What should I do? Is there a way to delete files? If not, what is the best alternative? OR am I just confused about how Azure Batch Services work? Does the file get deleted automatically?

If I am wrong, I apologise in advance, since it's my first time working with Batch Services.

Thank you in advance!

Azure Batch
Azure Batch
An Azure service that provides cloud-scale job scheduling and compute management.
302 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,542 questions
0 comments No comments
{count} votes

Accepted answer
  1. Winston 2,761 Reputation points
    2020-12-07T19:43:01.663+00:00

    Hi @Sampat, Varun ,

    Thanks for the well written question. Firstly the error message your getting is because the file is still being used by the other process. Amend your script to remove the file after the other process is done with it. This can be done by putting it outside the the with keyword.

     container_client = blob_service_client.get_container_client(containerName)  
     with open(output_file_name, "rb") as data:  
         blob_client = container_client.upload_blob(name = output_file_name,   
         data=data)  
      
     if os.path.exists(output_file_name):  
             os.remove(output_file_name)  
     else:  
             pass  
    

    Now, it would be easier to simply remove the file from the Blob using Azure Storage Explorer since you had to have set that up as part of this exercise.

    0 comments No comments

0 additional answers

Sort by: Most helpful