How to delete specific data from files stored in Azure Data Lake

DCAK 21 Reputation points
2020-09-26T08:25:22.507+00:00

I'm migrating data sets to be stored in ADLS. After certain period I want to delete data within this data set that is more than 3yrs old. How can I do that? I do not want to delete the file, but specific data within the files.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,359 questions
Azure Data Lake Analytics
0 comments No comments
{count} votes

Accepted answer
  1. HarithaMaddi-MSFT 10,136 Reputation points
    2020-09-28T10:42:37.977+00:00

    Hi @DCAK ,

    Welcome to Microsoft Q&A Platform. Thanks for posting the query.

    Azure Data Lake Storage can be used for storing files today and any modifications to these files needs to be done from ETL tools/Scripts/Programming only. One approach would be to use Azure ETL tool i.e., Azure Data Factory to build pipelines for implementing the above functionality. Dataflows can be used with filter transformation to filter the data dynamically and load into the same file again. Please find below GIF doing the same for one csv file where I filtered data that is not having "col6" as "test1". This condition can be modified according to the requirement i.e., based on year in above requirement.

    28772-datafilteringadf.gif

    Hope this helps! Please let us know if it is not aligning with the requirement or for further queries and we will be glad to assist.

    0 comments No comments

0 additional answers

Sort by: Most helpful