Use .NET to manage directories and files in Azure Data Lake Storage Gen2

This article shows you how to use .NET to create and manage directories and files in storage accounts that have a hierarchical namespace.

To learn about how to get, set, and update the access control lists (ACL) of directories and files, see Use .NET to manage ACLs in Azure Data Lake Storage Gen2.

Package (NuGet) | Samples | API reference | Gen1 to Gen2 mapping | Give Feedback

Prerequisites

  • An Azure subscription. See Get Azure free trial.

  • A storage account that has hierarchical namespace enabled. Follow these instructions to create one.

Set up your project

To get started, install the Azure.Storage.Files.DataLake NuGet package.

For more information about how to install NuGet packages, see Install and manage packages in Visual Studio using the NuGet Package Manager.

Then, add these using statements to the top of your code file.

using Azure;
using Azure.Storage.Files.DataLake;
using Azure.Storage.Files.DataLake.Models;
using Azure.Storage;
using System.IO;

Connect to the account

To use the snippets in this article, you'll need to create a DataLakeServiceClient instance that represents the storage account.

Connect by using an account key

This is the easiest way to connect to an account.

This example creates a DataLakeServiceClient instance by using an account key.

public static void GetDataLakeServiceClient(ref DataLakeServiceClient dataLakeServiceClient,
    string accountName, string accountKey)
{
    StorageSharedKeyCredential sharedKeyCredential =
        new StorageSharedKeyCredential(accountName, accountKey);

    string dfsUri = "https://" + accountName + ".dfs.core.windows.net";

    dataLakeServiceClient = new DataLakeServiceClient
        (new Uri(dfsUri), sharedKeyCredential);
}

Connect by using Azure Active Directory (Azure AD)

You can use the Azure identity client library for .NET to authenticate your application with Azure AD.

This example creates a DataLakeServiceClient instance by using a client ID, a client secret, and a tenant ID. To get these values, see Acquire a token from Azure AD for authorizing requests from a client application.

public static void GetDataLakeServiceClient(ref DataLakeServiceClient dataLakeServiceClient,
    String accountName, String clientID, string clientSecret, string tenantID)
{

    TokenCredential credential = new ClientSecretCredential(
        tenantID, clientID, clientSecret, new TokenCredentialOptions());

    string dfsUri = "https://" + accountName + ".dfs.core.windows.net";

    dataLakeServiceClient = new DataLakeServiceClient(new Uri(dfsUri), credential);
}

Note

For more examples, see the Azure identity client library for .NET documentation..

Create a container

A container acts as a file system for your files. You can create one by calling the DataLakeServiceClient.CreateFileSystem method.

This example creates a container named my-file-system.

public async Task<DataLakeFileSystemClient> CreateFileSystem
    (DataLakeServiceClient serviceClient)
{
    return await serviceClient.CreateFileSystemAsync("my-file-system");
}

Create a directory

Create a directory reference by calling the DataLakeFileSystemClient.CreateDirectoryAsync method.

This example adds a directory named my-directory to a container, and then adds a sub-directory named my-subdirectory.

public async Task<DataLakeDirectoryClient> CreateDirectory
    (DataLakeServiceClient serviceClient, string fileSystemName)
{
    DataLakeFileSystemClient fileSystemClient =
        serviceClient.GetFileSystemClient(fileSystemName);

    DataLakeDirectoryClient directoryClient =
        await fileSystemClient.CreateDirectoryAsync("my-directory");

    return await directoryClient.CreateSubDirectoryAsync("my-subdirectory");
}

Rename or move a directory

Rename or move a directory by calling the DataLakeDirectoryClient.RenameAsync method. Pass the path of the desired directory a parameter.

This example renames a sub-directory to the name my-subdirectory-renamed.

public async Task<DataLakeDirectoryClient>
    RenameDirectory(DataLakeFileSystemClient fileSystemClient)
{
    DataLakeDirectoryClient directoryClient =
        fileSystemClient.GetDirectoryClient("my-directory/my-subdirectory");

    return await directoryClient.RenameAsync("my-directory/my-subdirectory-renamed");
}

This example moves a directory named my-subdirectory-renamed to a sub-directory of a directory named my-directory-2.

public async Task<DataLakeDirectoryClient> MoveDirectory
    (DataLakeFileSystemClient fileSystemClient)
{
    DataLakeDirectoryClient directoryClient =
         fileSystemClient.GetDirectoryClient("my-directory/my-subdirectory-renamed");

    return await directoryClient.RenameAsync("my-directory-2/my-subdirectory-renamed");
}

Delete a directory

Delete a directory by calling the DataLakeDirectoryClient.Delete method.

This example deletes a directory named my-directory.

public void DeleteDirectory(DataLakeFileSystemClient fileSystemClient)
{
    DataLakeDirectoryClient directoryClient =
        fileSystemClient.GetDirectoryClient("my-directory");

    directoryClient.Delete();
}

Upload a file to a directory

First, create a file reference in the target directory by creating an instance of the DataLakeFileClient class. Upload a file by calling the DataLakeFileClient.AppendAsync method. Make sure to complete the upload by calling the DataLakeFileClient.FlushAsync method.

This example uploads a text file to a directory named my-directory.

public async Task UploadFile(DataLakeFileSystemClient fileSystemClient)
{
    DataLakeDirectoryClient directoryClient =
        fileSystemClient.GetDirectoryClient("my-directory");

    DataLakeFileClient fileClient = await directoryClient.CreateFileAsync("uploaded-file.txt");

    FileStream fileStream =
        File.OpenRead("C:\\Users\\contoso\\Temp\\file-to-upload.txt");

    long fileSize = fileStream.Length;

    await fileClient.AppendAsync(fileStream, offset: 0);

    await fileClient.FlushAsync(position: fileSize);

}

Tip

If your file size is large, your code will have to make multiple calls to the DataLakeFileClient.AppendAsync. Consider using the DataLakeFileClient.UploadAsync method instead. That way, you can upload the entire file in a single call.

See the next section for an example.

Upload a large file to a directory

Use the DataLakeFileClient.UploadAsync method to upload large files without having to make multiple calls to the DataLakeFileClient.AppendAsync method.

public async Task UploadFileBulk(DataLakeFileSystemClient fileSystemClient)
{
    DataLakeDirectoryClient directoryClient =
        fileSystemClient.GetDirectoryClient("my-directory");

    DataLakeFileClient fileClient = directoryClient.GetFileClient("uploaded-file.txt");

    FileStream fileStream =
        File.OpenRead("C:\\Users\\contoso\\file-to-upload.txt");

    await fileClient.UploadAsync(fileStream);

}

Download from a directory

First, create a DataLakeFileClient instance that represents the file that you want to download. Use the DataLakeFileClient.ReadAsync method, and parse the return value to obtain a Stream object. Use any .NET file processing API to save bytes from the stream to a file.

This example uses a BinaryReader and a FileStream to save bytes to a file.

public async Task DownloadFile(DataLakeFileSystemClient fileSystemClient)
{
    DataLakeDirectoryClient directoryClient =
        fileSystemClient.GetDirectoryClient("my-directory");

    DataLakeFileClient fileClient =
        directoryClient.GetFileClient("my-image.png");

    Response<FileDownloadInfo> downloadResponse = await fileClient.ReadAsync();

    BinaryReader reader = new BinaryReader(downloadResponse.Value.Content);

    FileStream fileStream =
        File.OpenWrite("C:\\Users\\contoso\\my-image-downloaded.png");

    int bufferSize = 4096;

    byte[] buffer = new byte[bufferSize];

    int count;

    while ((count = reader.Read(buffer, 0, buffer.Length)) != 0)
    {
        fileStream.Write(buffer, 0, count);
    }

    await fileStream.FlushAsync();

    fileStream.Close();
}

List directory contents

List directory contents by calling the FileSystemClient.GetPathsAsync method, and then enumerating through the results.

This example, prints the names of each file that is located in a directory named my-directory.

public async Task ListFilesInDirectory(DataLakeFileSystemClient fileSystemClient)
{
    IAsyncEnumerator<PathItem> enumerator =
        fileSystemClient.GetPathsAsync("my-directory").GetAsyncEnumerator();

    await enumerator.MoveNextAsync();

    PathItem item = enumerator.Current;

    while (item != null)
    {
        Console.WriteLine(item.Name);

        if (!await enumerator.MoveNextAsync())
        {
            break;
        }

        item = enumerator.Current;
    }

}

See also