Disaster recovery guidance for data in Azure Data Lake Storage Gen1

Azure Data Lake Storage Gen1 provides locally-redundant storage (LRS). Hence, the data in your Data Lake Storage Gen1 account is resilient to transient hardware failures within a datacenter through automated replicas. This ensures durability and high availability, meeting the Data Lake Storage Gen1 SLA. This article provides guidance on how to further protect your data from rare region-wide outages or accidental deletions.

Disaster recovery guidance

It is critical for every customer to prepare their own disaster recovery plan. Read the information in this article to build your disaster recovery plan. Here are some resources that can help you create your own plan.

Best practices

We recommend that you copy your critical data to another Data Lake Storage Gen1 account in another region with a frequency aligned to the needs of your disaster recovery plan. There are a variety of methods to copy data including ADLCopy, Azure PowerShell or Azure Data Factory. Azure Data Factory is a useful service for creating and deploying data movement pipelines on a recurring basis.

If a regional outage occurs, you can then access your data in the region where the data was copied. You can monitor the Azure Service Health Dashboard to determine the Azure service status across the globe.

Data corruption or accidental deletion recovery guidance

While Data Lake Storage Gen1 provides data resiliency through automated replicas, this does not prevent your application (or developers/users) from corrupting data or accidentally deleting it.

Best practices

To prevent accidental deletion, we recommend that you first set the correct access policies for your Data Lake Storage Gen1 account. This includes applying Azure resource locks to lock down important resources as well as applying account and file level access control using the available Data Lake Storage Gen1 security features. We also recommend that you routinely create copies of your critical data using ADLCopy, Azure PowerShell or Azure Data Factory in another Data Lake Storage Gen1 account, folder, or Azure subscription. This can be used to recover from a data corruption or deletion incident. Azure Data Factory is a useful service for creating and deploying data movement pipelines on a recurring basis.

Organizations can also enable diagnostic logging for their Data Lake Storage Gen1 account to collect data access audit trails that provides information about who might have deleted or updated a file.

Next steps