Get started with the Batch SDK for Python

Learn the basics of Azure Batch and the Batch Python client as we discuss a small Batch application written in Python. We look at how two sample scripts use the Batch service to process a parallel workload on Linux virtual machines in the cloud, and how they interact with Azure Storage for file staging and retrieval. You'll learn a common Batch application workflow and gain a base understanding of the major components of Batch such as jobs, tasks, pools, and compute nodes.

Batch solution workflow (basic)


This article assumes that you have a working knowledge of Python and familiarity with Linux. It also assumes that you're able to satisfy the account creation requirements that are specified below for Azure and the Batch and Storage services.


Code sample

The Python tutorial code sample is one of the many Batch code samples found in the azure-batch-samples repository on GitHub. You can download all the samples by clicking Clone or download > Download ZIP on the repository home page, or by clicking the direct download link. Once you've extracted the contents of the ZIP file, the two scripts for this tutorial are found in the article_samples directory:


Python environment

To run the sample script on your local workstation, you need a Python interpreter compatible with version 2.7 or 3.3+. The script has been tested on both Linux and Windows.

cryptography dependencies

You must install the dependencies for the cryptography library, required by the azure-batch and azure-storage Python packages. Perform one of the following operations appropriate for your platform, or refer to the cryptography installation details for more information:

  • Ubuntu

    apt-get update && apt-get install -y build-essential libssl-dev libffi-dev libpython-dev python-dev

  • CentOS

    yum update && yum install -y gcc openssl-devel libffi-devel python-devel


    zypper ref && zypper -n in libopenssl-dev libffi48-devel python-devel

  • Windows

    pip install cryptography


If installing for Python 3.3+ on Linux, use the python3 equivalents for the Python dependencies. For example, on Ubuntu: apt-get update && apt-get install -y build-essential libssl-dev libffi-dev libpython3-dev python3-dev

Azure packages

Next, install the Azure Batch and Azure Storage Python packages. You can install both packages by using pip and the requirements.txt found here:


Issue following pip command to install the Batch and Storage packages:

pip install -r requirements.txt

Or, you can install the azure-batch and azure-storage Python packages manually:

pip install azure-batch
pip install azure-storage


If you are using an unprivileged account, you may need to prefix your commands with sudo. For example, sudo pip install -r requirements.txt. For more information on installing Python packages, see Installing Packages on

Batch Python tutorial code sample

The Batch Python tutorial code sample consists of two Python scripts and a few data files.

  • Interacts with the Batch and Storage services to execute a parallel workload on compute nodes (virtual machines). The script runs on your local workstation.
  • The script that runs on compute nodes in Azure to perform the actual work. In the sample, parses the text in a file downloaded from Azure Storage (the input file). Then it produces a text file (the output file) that contains a list of the top three words that appear in the input file. After it creates the output file, uploads the file to Azure Storage. This makes it available for download to the client script running on your workstation. The script runs in parallel on multiple compute nodes in the Batch service.
  • ./data/taskdata*.txt: These three text files provide the input for the tasks that run on the compute nodes.

The following diagram illustrates the primary operations that are performed by the client and task scripts. This basic workflow is typical of many compute solutions that are created with Batch. While it does not demonstrate every feature available in the Batch service, nearly every Batch scenario includes portions of this workflow.

Batch example workflow

Step 1. Create containers in Azure Blob Storage.
Step 2. Upload task script and input files to containers.
Step 3. Create a Batch pool.
    3a. The pool StartTask downloads the task script ( to nodes as they join the pool.
Step 4. Create a Batch job.
Step 5. Add tasks to the job.
    5a. The tasks are scheduled to execute on nodes.
    5b. Each task downloads its input data from Azure Storage, then begins execution.
Step 6. Monitor tasks.
    6a. As tasks are completed, they upload their output data to Azure Storage.
Step 7. Download task output from Storage.

As mentioned, not every Batch solution performs these exact steps, and may include many more, but this sample demonstrates common processes found in a Batch solution.

Prepare client script

Before you run the sample, add your Batch and Storage account credentials to If you have not done so already, open the file in your favorite editor and update the following lines with your credentials.

# Update the Batch and Storage account credential strings below with the values
# unique to your accounts. These are used when constructing connection strings
# for the Batch and Storage client objects.

# Batch account credentials

# Storage account credentials

You can find your Batch and Storage account credentials within the account blade of each service in the Azure portal:

Batch credentials in the portal Storage credentials in the portal

In the following sections, we analyze the steps used by the scripts to process a workload in the Batch service. We encourage you to refer regularly to the scripts in your editor while you work your way through the rest of the article.

Navigate to the following line in to start with Step 1:

if __name__ == '__main__':

Step 1: Create Storage containers

Create containers in Azure Storage

Batch includes built-in support for interacting with Azure Storage. Containers in your Storage account will provide the files needed by the tasks that run in your Batch account. The containers also provide a place to store the output data that the tasks produce. The first thing the script does is create three containers in Azure Blob Storage:

  • application: This container will store the Python script run by the tasks,
  • input: Tasks will download the data files to process from the input container.
  • output: When tasks complete input file processing, they will upload the results to the output container.

In order to interact with a Storage account and create containers, we use the azure-storage package to create a BlockBlobService object--the "blob client." We then create three containers in the Storage account using the blob client.

import as azureblob

# Create the blob client, for use in obtaining references to
# blob storage containers and uploading files to containers.
blob_client = azureblob.BlockBlobService(

# Use the blob client to create the containers in Azure Storage if they
# don't yet exist.
APP_CONTAINER_NAME = 'application'
blob_client.create_container(APP_CONTAINER_NAME, fail_on_exist=False)
blob_client.create_container(INPUT_CONTAINER_NAME, fail_on_exist=False)
blob_client.create_container(OUTPUT_CONTAINER_NAME, fail_on_exist=False)

Once the containers have been created, the application can now upload the files that will be used by the tasks.


How to use Azure Blob storage from Python provides a good overview of working with Azure Storage containers and blobs. It should be near the top of your reading list as you start working with Batch.

Step 2: Upload task script and data files

Upload task application and input (data) files to containers

In the file upload operation, first defines collections of application and input file paths as they exist on the local machine. Then it uploads these files to the containers that you created in the previous step.

# Paths to the task script. This script will be executed by the tasks that
# run on the compute nodes.
application_file_paths = [os.path.realpath('')]

# The collection of data files that are to be processed by the tasks.
input_file_paths = [os.path.realpath('./data/taskdata1.txt'),

# Upload the application script to Azure Storage. This is the script that
# will process the data files, and is executed by each of the tasks on the
# compute nodes.
application_files = [
    upload_file_to_container(blob_client, APP_CONTAINER_NAME, file_path)
    for file_path in application_file_paths]

# Upload the data files. This is the data that will be processed by each of
# the tasks executed on the compute nodes in the pool.
input_files = [
    upload_file_to_container(blob_client, INPUT_CONTAINER_NAME, file_path)
    for file_path in input_file_paths]

Using list comprehension, the upload_file_to_container function is called for each file in the collections, and two ResourceFile collections are populated. The upload_file_to_container function appears below:

def upload_file_to_container(block_blob_client, container_name, path):
    Uploads a local file to an Azure Blob storage container.

    :param block_blob_client: A blob service client.
    :type block_blob_client: ``
    :param str container_name: The name of the Azure Blob storage container.
    :param str file_path: The local path to the file.
    :rtype: `azure.batch.models.ResourceFile`
    :return: A ResourceFile initialized with a SAS URL appropriate for Batch

    import datetime
    import as azureblob
    import azure.batch.models as batchmodels

    blob_name = os.path.basename(path)

    print('Uploading file {} to container [{}]...'.format(path,


    sas_token = block_blob_client.generate_blob_shared_access_signature(
        expiry=datetime.datetime.utcnow() + datetime.timedelta(hours=2))

    sas_url = block_blob_client.make_blob_url(container_name,

    return batchmodels.ResourceFile(file_path=blob_name,


A ResourceFile provides tasks in Batch with the URL to a file in Azure Storage that is downloaded to a compute node before that task is run. The ResourceFile.blob_source property specifies the full URL of the file as it exists in Azure Storage. The URL may also include a shared access signature (SAS) that provides secure access to the file. Most task types in Batch include a ResourceFiles property, including:

This sample does not use the JobPreparationTask or JobReleaseTask task types, but you can read more about them in Run job preparation and completion tasks on Azure Batch compute nodes.

Shared access signature (SAS)

Shared access signatures are strings that provide secure access to containers and blobs in Azure Storage. The script uses both blob and container shared access signatures, and demonstrates how to obtain these shared access signature strings from the Storage service.

  • Blob shared access signatures: The pool's StartTask uses blob shared access signatures when it downloads the task script and input data files from Storage (see Step #3 below). The upload_file_to_container function in contains the code that obtains each blob's shared access signature. It does so by calling BlockBlobService.make_blob_url in the Storage module.
  • Container shared access signature: As each task finishes its work on the compute node, it uploads its output file to the output container in Azure Storage. To do so, uses a container shared access signature that provides write access to the container. The get_container_sas_token function in obtains the container's shared access signature, which is then passed as a command-line argument to the tasks. Step #5, Add tasks to a job, discusses the usage of the container SAS.

Check out the two-part series on shared access signatures, Part 1: Understanding the SAS model and Part 2: Create and use a SAS with the Blob service, to learn more about providing secure access to data in your Storage account.

Step 3: Create Batch pool

Create a Batch pool

A Batch pool is a collection of compute nodes (virtual machines) on which Batch executes a job's tasks.

After it uploads the task script and data files to the Storage account, starts its interaction with the Batch service by using the Batch Python module. To do so, a BatchServiceClient is created:

# Create a Batch service client. We'll now be interacting with the Batch
# service in addition to Storage.
credentials = batchauth.SharedKeyCredentials(BATCH_ACCOUNT_NAME,

batch_client = batch.BatchServiceClient(

Next, a pool of compute nodes is created in the Batch account with a call to create_pool.

def create_pool(batch_service_client, pool_id,
                resource_files, publisher, offer, sku):
    Creates a pool of compute nodes with the specified OS settings.

    :param batch_service_client: A Batch service client.
    :type batch_service_client: `azure.batch.BatchServiceClient`
    :param str pool_id: An ID for the new pool.
    :param list resource_files: A collection of resource files for the pool's
    start task.
    :param str publisher: Marketplace image publisher
    :param str offer: Marketplace image offer
    :param str sku: Marketplace image sku
    print('Creating pool [{}]...'.format(pool_id))

    # Create a new pool of Linux compute nodes using an Azure Virtual Machines
    # Marketplace image. For more information about creating pools of Linux
    # nodes, see:

    # Specify the commands for the pool's start task. The start task is run
    # on each node as it joins the pool, and when it's rebooted or re-imaged.
    # We use the start task to prep the node for running our task script.
    task_commands = [
        # Copy the script to the "shared" directory
        # that all tasks that run on the node have access to.
        # Install pip and the dependencies for cryptography
        'apt-get update',
        'apt-get -y install python-pip',
        'apt-get -y install build-essential libssl-dev libffi-dev python-dev',
        # Install the azure-storage module so that the task script can access
        # Azure Blob storage
        'pip install azure-storage']

    # Get the node agent SKU and image reference for the virtual machine
    # configuration.
    # For more information about the virtual machine configuration, see:
    sku_to_use, image_ref_to_use = \
            batch_service_client, publisher, offer, sku)

    new_pool = batch.models.PoolAddParameter(
            common.helpers.wrap_commands_in_shell('linux', task_commands),

    except batchmodels.batch_error.BatchErrorException as err:

When you create a pool, you define a PoolAddParameter that specifies several properties for the pool:

  • ID of the pool (id - required)

    As with most entities in Batch, your new pool must have a unique ID within your Batch account. Your code refers to this pool using its ID, and it's how you identify the pool in the Azure portal.

  • Number of compute nodes (target_dedicated - required)

    This property specifies how many VMs should be deployed in the pool. It is important to note that all Batch accounts have a default quota that limits the number of cores (and thus, compute nodes) in a Batch account. You can find the default quotas and instructions on how to increase a quota (such as the maximum number of cores in your Batch account) in Quotas and limits for the Azure Batch service. If you find yourself asking "Why won't my pool reach more than X nodes?" this core quota may be the cause.

  • Operating system for nodes (virtual_machine_configuration or cloud_service_configuration - required)

    In, we create a pool of Linux nodes using a VirtualMachineConfiguration. The select_latest_verified_vm_image_with_node_agent_sku function in common.helpers simplifies working with Azure Virtual Machines Marketplace images. See Provision Linux compute nodes in Azure Batch pools for more information about using Marketplace images.

  • Size of compute nodes (vm_size - required)

    Since we're specifying Linux nodes for our VirtualMachineConfiguration, we specify a VM size (STANDARD_A1 in this sample) from Sizes for virtual machines in Azure. Again, see Provision Linux compute nodes in Azure Batch pools for more information.

  • Start task (start_task - not required)

    Along with the above physical node properties, you may also specify a StartTask for the pool (it is not required). The StartTask executes on each node as that node joins the pool, and each time a node is restarted. The StartTask is especially useful for preparing compute nodes for the execution of tasks, such as installing the applications that your tasks run.

    In this sample application, the StartTask copies the files that it downloads from Storage (which are specified by using the StartTask's resource_files property) from the StartTask working directory to the shared directory that all tasks running on the node can access. Essentially, this copies to the shared directory on each node as the node joins the pool, so that any tasks that run on the node can access it.

You may notice the call to the wrap_commands_in_shell helper function. This function takes a collection of separate commands and creates a single command line appropriate for a task's command-line property.

Also notable in the code snippet above is the use of two environment variables in the command_line property of the StartTask: AZ_BATCH_TASK_WORKING_DIR and AZ_BATCH_NODE_SHARED_DIR. Each compute node within a Batch pool is automatically configured with several environment variables that are specific to Batch. Any process that is executed by a task has access to these environment variables.


To find out more about the environment variables that are available on compute nodes in a Batch pool, as well as information on task working directories, see Environment settings for tasks and Files and directories in the overview of Azure Batch features.

Step 4: Create Batch job

Create Batch job

A Batch job is a collection of tasks, and is associated with a pool of compute nodes. The tasks in a job execute on the associated pool's compute nodes.

You can use a job not only for organizing and tracking tasks in related workloads, but also for imposing certain constraints--such as the maximum runtime for the job (and by extension, its tasks) and job priority in relation to other jobs in the Batch account. In this example, however, the job is associated only with the pool that was created in step #3. No additional properties are configured.

All Batch jobs are associated with a specific pool. This association indicates which nodes the job's tasks execute on. You specify the pool by using the PoolInformation property, as shown in the code snippet below.

def create_job(batch_service_client, job_id, pool_id):
    Creates a job with the specified ID, associated with the specified pool.

    :param batch_service_client: A Batch service client.
    :type batch_service_client: `azure.batch.BatchServiceClient`
    :param str job_id: The ID for the job.
    :param str pool_id: The ID for the pool.
    print('Creating job [{}]...'.format(job_id))

    job = batch.models.JobAddParameter(

    except batchmodels.batch_error.BatchErrorException as err:

Now that a job has been created, tasks are added to perform the work.

Step 5: Add tasks to job

Add tasks to job
(1) Tasks are added to the job, (2) the tasks are scheduled to run on nodes, and (3) the tasks download the data files to process

Batch tasks are the individual units of work that execute on the compute nodes. A task has a command line and runs the scripts or executables that you specify in that command line.

To actually perform work, tasks must be added to a job. Each CloudTask is configured with a command-line property and ResourceFiles (as with the pool's StartTask) that the task downloads to the node before its command line is automatically executed. In the sample, each task processes only one file. Thus, its ResourceFiles collection contains a single element.

def add_tasks(batch_service_client, job_id, input_files,
              output_container_name, output_container_sas_token):
    Adds a task for each input file in the collection to the specified job.

    :param batch_service_client: A Batch service client.
    :type batch_service_client: `azure.batch.BatchServiceClient`
    :param str job_id: The ID of the job to which to add the tasks.
    :param list input_files: A collection of input files. One task will be
     created for each input file.
    :param output_container_name: The ID of an Azure Blob storage container to
    which the tasks will upload their results.
    :param output_container_sas_token: A SAS token granting write access to
    the specified Azure Blob storage container.

    print('Adding {} tasks to job [{}]...'.format(len(input_files), job_id))

    tasks = list()

    for input_file in input_files:

        command = ['python $AZ_BATCH_NODE_SHARED_DIR/ '
                   '--filepath {} --numwords {} --storageaccount {} '
                   '--storagecontainer {} --sastoken "{}"'.format(

                wrap_commands_in_shell('linux', command),

    batch_service_client.task.add_collection(job_id, tasks)

When they access environment variables such as $AZ_BATCH_NODE_SHARED_DIR or execute an application not found in the node's PATH, task command lines must invoke the shell explicitly, such as with /bin/sh -c MyTaskApplication $MY_ENV_VAR. This requirement is unnecessary if your tasks execute an application in the node's PATH and do not reference any environment variables.

Within the for loop in the code snippet above, you can see that the command line for the task is constructed with five command-line arguments that are passed to

  1. filepath: This is the local path to the file as it exists on the node. When the ResourceFile object in upload_file_to_container was created in Step 2 above, the file name was used for this property (the file_path parameter in the ResourceFile constructor). This indicates that the file can be found in the same directory on the node as
  2. numwords: The top N words should be written to the output file.
  3. storageaccount: The name of the Storage account that owns the container to which the task output should be uploaded.
  4. storagecontainer: The name of the Storage container to which the output files should be uploaded.
  5. sastoken: The shared access signature (SAS) that provides write access to the output container in Azure Storage. The script uses this shared access signature when creates its BlockBlobService reference. This provides write access to the container without requiring an access key for the storage account.
# NOTE: Taken from

# Create the blob client using the container's SAS token.
# This allows us to create a client that provides write
# access only to the container.
blob_client = azureblob.BlockBlobService(account_name=args.storageaccount,

Step 6: Monitor tasks

Monitor tasks
The script (1) monitors the tasks for completion status, and (2) the tasks upload result data to Azure Storage

When tasks are added to a job, they are automatically queued and scheduled for execution on compute nodes within the pool associated with the job. Based on the settings you specify, Batch handles all task queuing, scheduling, retrying, and other task administration duties for you.

There are many approaches to monitoring task execution. The wait_for_tasks_to_complete function in provides a simple example of monitoring tasks for a certain state, in this case, the completed state.

def wait_for_tasks_to_complete(batch_service_client, job_id, timeout):
    Returns when all tasks in the specified job reach the Completed state.

    :param batch_service_client: A Batch service client.
    :type batch_service_client: `azure.batch.BatchServiceClient`
    :param str job_id: The id of the job whose tasks should be to monitored.
    :param timedelta timeout: The duration to wait for task completion. If all
    tasks in the specified job do not reach Completed state within this time
    period, an exception will be raised.
    timeout_expiration = + timeout

    print("Monitoring all tasks for 'Completed' state, timeout in {}..."
          .format(timeout), end='')

    while < timeout_expiration:
        print('.', end='')
        tasks = batch_service_client.task.list(job_id)

        incomplete_tasks = [task for task in tasks if
                            task.state != batchmodels.TaskState.completed]
        if not incomplete_tasks:
            return True

    raise RuntimeError("ERROR: Tasks did not reach 'Completed' state within "
                       "timeout period of " + str(timeout))

Step 7: Download task output

Download task output from Storage

Now that the job is completed, the output from the tasks can be downloaded from Azure Storage. This is done with a call to download_blobs_from_container in

def download_blobs_from_container(block_blob_client,
                                  container_name, directory_path):
    Downloads all blobs from the specified Azure Blob storage container.

    :param block_blob_client: A blob service client.
    :type block_blob_client: ``
    :param container_name: The Azure Blob storage container from which to
     download files.
    :param directory_path: The local directory to which to download the files.
    print('Downloading all files from container [{}]...'.format(

    container_blobs = block_blob_client.list_blobs(container_name)

    for blob in container_blobs.items:
        destination_file_path = os.path.join(directory_path,


        print('  Downloaded blob [{}] from container [{}] to {}'.format(

    print('  Download complete!')

The call to download_blobs_from_container in specifies that the files should be downloaded to your home directory. Feel free to modify this output location.

Step 8: Delete containers

Because you are charged for data that resides in Azure Storage, it is always a good idea to remove any blobs that are no longer needed for your Batch jobs. In, this is done with three calls to BlockBlobService.delete_container:

# Clean up storage resources
print('Deleting containers...')

Step 9: Delete the job and the pool

In the final step, you are prompted to delete the job and the pool that were created by the script. Although you are not charged for jobs and tasks themselves, you are charged for compute nodes. Thus, we recommend that you allocate nodes only as needed. Deleting unused pools can be part of your maintenance process.

The BatchServiceClient's JobOperations and PoolOperations both have corresponding deletion methods, which are called if you confirm deletion:

# Clean up Batch resources (if the user so chooses).
if query_yes_no('Delete job?') == 'yes':

if query_yes_no('Delete pool?') == 'yes':

Keep in mind that you are charged for compute resources--deleting unused pools will minimize cost. Also, be aware that deleting a pool deletes all compute nodes within that pool, and that any data on the nodes will be unrecoverable after the pool is deleted.

Run the sample script

When you run the script from the tutorial code sample, the console output is similar to the following. There is a pause at Monitoring all tasks for 'Completed' state, timeout in 0:20:00... while the pool's compute nodes are created, started, and the commands in the pool's start task are executed. Use the Azure portal to monitor your pool, compute nodes, job, and tasks during and after execution. Use the Azure portal or the Microsoft Azure Storage Explorer to view the Storage resources (containers and blobs) that are created by the application.


Run the script from within the azure-batch-samples/Python/Batch/article_samples directory. It uses a relative path for the common.helpers module import, so you might see ImportError: No module named 'common' if you don't run the script from within this directory.

Typical execution time is approximately 5-7 minutes when you run the sample in its default configuration.

Sample start: 2016-05-20 22:47:10

Uploading file /home/user/py_tutorial/ to container [application]...
Uploading file /home/user/py_tutorial/data/taskdata1.txt to container [input]...
Uploading file /home/user/py_tutorial/data/taskdata2.txt to container [input]...
Uploading file /home/user/py_tutorial/data/taskdata3.txt to container [input]...
Creating pool [PythonTutorialPool]...
Creating job [PythonTutorialJob]...
Adding 3 tasks to job [PythonTutorialJob]...
Monitoring all tasks for 'Completed' state, timeout in 0:20:00..........................................................................
  Success! All tasks reached the 'Completed' state within the specified timeout period.
Downloading all files from container [output]...
  Downloaded blob [taskdata1_OUTPUT.txt] from container [output] to /home/user/taskdata1_OUTPUT.txt
  Downloaded blob [taskdata2_OUTPUT.txt] from container [output] to /home/user/taskdata2_OUTPUT.txt
  Downloaded blob [taskdata3_OUTPUT.txt] from container [output] to /home/user/taskdata3_OUTPUT.txt
  Download complete!
Deleting containers...

Sample end: 2016-05-20 22:53:12
Elapsed time: 0:06:02

Delete job? [Y/n]
Delete pool? [Y/n]

Press ENTER to exit...

Next steps

Feel free to make changes to and to experiment with different compute scenarios. For example, try adding an execution delay to to simulate long-running tasks and monitor them in the portal. Try adding more tasks or adjusting the number of compute nodes. Add logic to check for and allow the use of an existing pool to speed execution time.

Now that you're familiar with the basic workflow of a Batch solution, it's time to dig in to the additional features of the Batch service.

  • Review the Overview of Azure Batch features article, which we recommend if you're new to the service.
  • Start on the other Batch development articles under Development in-depth in the Batch learning path.
  • Check out a different implementation of processing the "top N words" workload with Batch in the TopNWords sample.