question

SenthilMuruganRAMACHANDRAN-7389 avatar image
0 Votes"
SenthilMuruganRAMACHANDRAN-7389 asked ·

Can we append data to an existing csv file stored in Azure blob storage?

I have a machine learning model deployed in azure designer studio. I need to retrain it everyday with new data through python code. I need to keep the existing csv data in the blob storage and also add some more data to the existing csv and retrain it. If I retrain the model with only the new data, the old data is lost so I need to retrain the model by appending new data to existing data. Is there any way to do it through python coding?

I have also researched about append blob but they add only in the end of the blob. In the documentation, they have mentioned we cannot update or add to an existing blob.

Any help is appreciated. Thanks a lot.

azure-machine-learning
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

romungi-MSFT avatar image
0 Votes"
romungi-MSFT answered ·

@SenthilMuruganRAMACHANDRAN-7389 The best practice with respect to Azure Machine learning is to register your dataset and version it if you would like to retrain it to create a new model. You can infact have multiple csv files in your storage and create a single tabular dataset from the files. For example:

Here we are using files from a blob container which are placed at different times and registering the dataset with versioning. If you would like to add more file, you can simply add more csv files to the web path and then register a new version or use the older versions again if required.

 # create a TabularDataset from Titanic training data
 web_paths = ['https://dprepdata.blob.core.windows.net/demo/Titanic.csv',
              'https://dprepdata.blob.core.windows.net/demo/Titanic2.csv']
 titanic_ds = Dataset.Tabular.from_delimited_files(path=web_paths)
    
 # create a new version of titanic_ds
 titanic_ds = titanic_ds.register(workspace = workspace,
                                  name = 'titanic_ds',
                                  description = 'new titanic training data',
                                  create_new_version = True)


· 1 ·
10 |1000 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@SenthilMuruganRAMACHANDRAN-7389 Did you get a chance to review the above response and check if versioning your datasets works for your experiments?

0 Votes 0 ·