Can we append data to an existing csv file stored in Azure blob storage?

Question

I have a machine learning model deployed in azure designer studio. I need to retrain it everyday with new data through python code. I need to keep the existing csv data in the blob storage and also add some more data to the existing csv and retrain it. If I retrain the model with only the new data, the old data is lost so I need to retrain the model by appending new data to existing data. Is there any way to do it through python coding?

I have also researched about append blob but they add only in the end of the blob. In the documentation, they have mentioned we cannot update or add to an existing blob.

Any help is appreciated. Thanks a lot.

I have also researched about append blob but they add only in the end of the blob. In the documentation, they have mentioned we cannot update or add to an existing blob.

Any help is appreciated. Thanks a lot.

Answer

@Senthil Murugan RAMACHANDRAN The best practice with respect to Azure Machine learning is to register your dataset and version it if you would like to retrain it to create a new model. You can infact have multiple csv files in your storage and create a single tabular dataset from the files. For example:

Here we are using files from a blob container which are placed at different times and registering the dataset with versioning. If you would like to add more file, you can simply add more csv files to the web path and then register a new version or use the older versions again if required.

# create a TabularDataset from Titanic training data  
web_paths = ['https://dprepdata.blob.core.windows.net/demo/Titanic.csv',  
             'https://dprepdata.blob.core.windows.net/demo/Titanic2.csv']  
titanic_ds = Dataset.Tabular.from_delimited_files(path=web_paths)  
  
# create a new version of titanic_ds  
titanic_ds = titanic_ds.register(workspace = workspace,  
                                 name = 'titanic_ds',  
                                 description = 'new titanic training data',  
                                 create_new_version = True)

Can we append data to an existing csv file stored in Azure blob storage?

1 answer