Filesystem operations on Azure Data Lake Store using Python

In this article, you learn how to use Python SDK to perform filesystem operations on Azure Data Lake Store. For instructions on how to perform account management operations on Data Lake Store using Python, see Account management operations on Data Lake Store using Python.

Prerequisites

Install the modules

To work with Data Lake Store using Python, you need to install three modules.

Use the following commands to install the modules.

pip install azure-mgmt-resource
pip install azure-mgmt-datalake-store
pip install azure-datalake-store

Create a new Python application

  1. In the IDE of your choice create a new Python application, for example, mysample.py.

  2. Add the following lines to import the required modules

    ## Use this only for Azure AD service-to-service authentication
    from azure.common.credentials import ServicePrincipalCredentials
    
    ## Use this only for Azure AD end-user authentication
    from azure.common.credentials import UserPassCredentials
    
    ## Use this only for Azure AD multi-factor authentication
    from msrestazure.azure_active_directory import AADTokenCredentials
    
    ## Required for Azure Data Lake Store account management
    from azure.mgmt.datalake.store import DataLakeStoreAccountManagementClient
    from azure.mgmt.datalake.store.models import DataLakeStoreAccount
    
    ## Required for Azure Data Lake Store filesystem management
    from azure.datalake.store import core, lib, multithread
    
    # Common Azure imports
    from azure.mgmt.resource.resources import ResourceManagementClient
    from azure.mgmt.resource.resources.models import ResourceGroup
    
    ## Use these as needed for your application
    import logging, getpass, pprint, uuid, time
    
  3. Save changes to mysample.py.

Authentication

In this section, we talk about the different ways to authenticate with Azure AD. The options available are:

Create filesystem client

The following snippet first creates the Data Lake Store account client. It uses the client object to create a Data Lake Store account. Finally, the snippet creates a filesystem client object.

## Declare variables
subscriptionId = 'FILL-IN-HERE'
adlsAccountName = 'FILL-IN-HERE'

## Create a filesystem client object
adlsFileSystemClient = core.AzureDLFileSystem(adlCreds, store_name=adlsAccountName)

Create a directory

## Create a directory
adlsFileSystemClient.mkdir('/mysampledirectory')

Upload a file

## Upload a file
multithread.ADLUploader(adlsFileSystemClient, lpath='C:\\data\\mysamplefile.txt', rpath='/mysampledirectory/mysamplefile.txt', nthreads=64, overwrite=True, buffersize=4194304, blocksize=4194304)

Download a file

## Download a file
multithread.ADLDownloader(adlsFileSystemClient, lpath='C:\\data\\mysamplefile.txt.out', rpath='/mysampledirectory/mysamplefile.txt', nthreads=64, overwrite=True, buffersize=4194304, blocksize=4194304)

Delete a directory

## Delete a directory
adlsFileSystemClient.rm('/mysampledirectory', recursive=True)

Next steps

See also