Instance Pools API 2.0

The Instance Pools API allows you to create, edit, delete and list instance pools.

An instance pool reduces cluster start and auto-scaling times by maintaining a set of idle, ready-to-use cloud instances. When a cluster attached to a pool needs an instance, it first attempts to allocate one of the pool’s idle instances. If the pool has no idle instances, it expands by allocating a new instance from the instance provider in order to accommodate the cluster’s request. When a cluster releases an instance, it returns to the pool and is free for another cluster to use. Only clusters attached to a pool can use that pool’s idle instances.

Azure Databricks does not charge DBUs while instances are idle in the pool. Instance provider billing does apply. See pricing.

Requirements

Important

To access Databricks REST APIs, you must authenticate.

Create

Endpoint HTTP Method
2.0/instance-pools/create POST

Create an instance pool. Use the returned instance_pool_id to query the status of the instance pool, which includes the number of instances currently allocated by the instance pool. If you provide the min_idle_instances parameter, instances are provisioned in the background and are ready to use once the idle_count in the InstancePoolStats equals the requested minimum.

If your account has Databricks Container Services enabled and the instance pool is created with preloaded_docker_images, you can use the instance pool to launch clusters with a Docker image. The Docker image in the instance pool doesn’t have to match the Docker image in the cluster. However, the container environment of the cluster created on the pool must align with the container environment of the instance pool: you cannot use an instance pool created with preloaded_docker_images to launch a cluster without a Docker image and you cannot use an instance pool created without preloaded_docker_images to a launch cluster with a Docker image.

Note

Azure Databricks may not be able to acquire some of the requested idle instances due to instance provider limitations or transient network issues. Clusters can still attach to the instance pool, but may not start as quickly.

Example

curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/instance-pools/create \
--data @create-instance-pool.json

create-instance-pool.json:

{
  "instance_pool_name": "my-pool",
  "node_type_id": "Standard_D3_v2",
  "min_idle_instances": 10,
  "custom_tags": [
    {
      "key": "my-key",
      "value": "my-value"
    }
  ]
}
{ "instance_pool_id": "1234-567890-fetch12-pool-A3BcdEFg" }

Request structure

Field Name Type Description
instance_pool_name STRING The name of the instance pool. This is required for create and edit operations. It must be unique, non-empty, and less than 100 characters.
min_idle_instances INT32 The minimum number of idle instances maintained by the pool. This is in addition to any instances in use by active clusters.
max_capacity INT32 The maximum number of instances the pool can contain, including both idle instances and ones in use by clusters. Once the maximum capacity is reached, you cannot create new clusters from the pool and existing clusters cannot autoscale up until some instances are made idle in the pool via cluster termination or down-scaling.
node_type_id STRING The node type for the instances in the pool. All clusters attached to the pool inherit this node type and the pool’s idle instances are allocated based on this type. You can retrieve a list of available node types by using the List node types API call.
custom_tags An array of ClusterTag Additional tags for instance pool resources. Azure Databricks tags all pool resources (e.g. VM disk volumes) with these tags in addition to default_tags.

Azure Databricks allows up to 41 custom tags.
idle_instance_autotermination_minutes INT32 The number of minutes that idle instances in excess of the min_idle_instances are maintained by the pool before being terminated. If not specified, excess idle instances are terminated automatically after a default timeout period. If specified, the time must be between 0 and 10000 minutes. If 0 is supplied, excess idle instances are removed as soon as possible.
enable_elastic_disk BOOL Autoscaling Local Storage: when enabled, the instances in the pool dynamically acquire additional disk space when they are running low on disk space.
disk_spec DiskSpec Defines the amount of initial remote storage attached to each instance in the pool.
preloaded_spark_versions An array of STRING A list with at most one runtime version the pool installs on each instance. Pool clusters that use a preloaded runtime version start faster as they do not have to wait for the image to download. You can retrieve a list of available runtime versions by using the Runtime versions API call.
preloaded_docker_images An array of DockerImage A list with at most one Docker image the pool installs on each instance. Pool clusters that use a preloaded Docker image start faster as they do not have to wait for the image to download. Available only if your account has Databricks Container Services enabled.
azure_attributes InstancePoolAzureAttributes Defines the instance availability type (such as spot or on-demand) and max bid price.

Response structure

Field Name Type Description
instance_pool_id STRING The ID of the created instance pool.

Edit

Endpoint HTTP Method
2.0/instance-pools/edit POST

Edit an instance pool. This modifies the configuration of an existing instance pool.

Example

curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/instance-pools/edit \
--data @edit-instance-pool.json

edit-instance-pool.json:

{
  "instance_pool_id": "1234-567890-fetch12-pool-A3BcdEFg",
  "instance_pool_name": "my-edited-pool",
  "node_type_id": "Standard_D3_v2",
  "min_idle_instances": 5,
  "max_capacity": 200,
  "idle_instance_autotermination_minutes": 30
}
{}

Request structure

Field Name Type Description
instance_pool_id STRING The ID of the instance pool to edit. This field is required.
instance_pool_name STRING The name of the instance pool. This is required for create and edit operations. It must be unique, non-empty, and less than 100 characters.
min_idle_instances INT32 The minimum number of idle instances maintained by the pool. This is in addition to any instances in use by active clusters.
max_capacity INT32 The maximum number of instances the pool can contain, including both idle instances and ones in use by clusters. Once the maximum capacity is reached, you cannot create new clusters from the pool and existing clusters cannot autoscale up until some instances are made idle in the pool via cluster termination or down-scaling.
node_type_id STRING The node type for the instances in the pool. All clusters attached to the pool inherit this node type and the pool’s idle instances are allocated based on this type. You can retrieve a list of available node types by using the List node types API call.
idle_instance_autotermination_minutes INT32 The number of minutes that idle instances in excess of the min_idle_instances are maintained by the pool before being terminated. If not specified, excess idle instances are terminated automatically after a default timeout period. If specified, the time must be between 0 and 10000 minutes. If 0 is supplied, excess idle instances are removed as soon as possible.

Delete

Endpoint HTTP Method
2.0/instance-pools/delete POST

Delete an instance pool. This permanently deletes the instance pool. The idle instances in the pool are terminated asynchronously. New clusters cannot attach to the pool. Running clusters attached to the pool continue to run but cannot autoscale up. Terminated clusters attached to the pool will fail to start until they are edited to no longer use the pool.

Example

curl --netrc -X POST \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/instance-pools/delete \
--data '{ "instance_pool_id": "1234-567890-fetch12-pool-A3BcdEFg" }'
{}

Request structure

Field Name Type Description
instance_pool_id STRING The ID of the instance pool to delete.

Get

Endpoint HTTP Method
2.0/instance-pools/get GET

Retrieve the information for an instance pool given its identifier.

Example

curl --netrc -X GET \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/instance-pools/get \
--data '{ "instance_pool_id": "1234-567890-fetch12-pool-A3BcdEFg" }'
{
  "instance_pool_name": "mypool",
  "node_type_id": "Standard_D3_v2",
  "custom_tags": {
    "my-key": "my-value"
  },
  "idle_instance_autotermination_minutes": 60,
  "enable_elastic_disk": false,
  "preloaded_spark_versions": [
    "5.4.x-scala2.11"
  ],
  "instance_pool_id": "101-120000-brick1-pool-ABCD1234",
  "default_tags": {
    "Vendor": "Databricks",
    "DatabricksInstancePoolCreatorId": "100125",
    "DatabricksInstancePoolId": "101-120000-brick1-pool-ABCD1234"
  },
  "state": "ACTIVE",
  "stats": {
    "used_count": 10,
    "idle_count": 5,
    "pending_used_count": 5,
    "pending_idle_count": 5
  },
  "status": {}
}

Request structure

Field Name Type Description
instance_pool_id STRING The instance pool about which to retrieve information.

Response structure

Field Name Type Description
instance_pool_name STRING The name of the instance pool. This is required for create and edit operations. It must be unique, non-empty, and less than 100 characters.
min_idle_instances INT32 The minimum number of idle instances maintained by the pool. This is in addition to any instances in use by active clusters.
max_capacity INT32 The maximum number of instances the pool can contain, including both idle instances and ones in use by clusters. Once the maximum capacity is reached, you cannot create new clusters from the pool and existing clusters cannot autoscale up until some instances are made idle in the pool via cluster termination or down-scaling.
node_type_id STRING The node type for the instances in the pool. All clusters attached to the pool inherit this node type and the pool’s idle instances are allocated based on this type. You can retrieve a list of available node types by using the List node types API call.
custom_tags An array of ClusterTag Additional tags for instance pool resources. Azure Databricks tags all pool resources (e.g. VM disk volumes) with these tags in addition to default_tags.

Azure Databricks allows up to 41 custom tags.
idle_instance_autotermination_minutes INT32 The number of minutes that idle instances in excess of the min_idle_instances are maintained by the pool before being terminated. If not specified, excess idle instances are terminated automatically after a default timeout period. If specified, the time must be between 0 and 10000 minutes. If 0 is supplied, excess idle instances are removed as soon as possible.
enable_elastic_disk BOOL Autoscaling Local Storage: when enabled, the instances in the pool dynamically acquire additional disk space when they are running low on disk space.
disk_spec DiskSpec Defines the amount of initial remote storage attached to each instance in the pool.
preloaded_spark_versions An array of STRING A list with the runtime version the pool installs on each instance. Pool clusters that use a preloaded runtime version start faster as they do not have to wait for the image to download. You can retrieve a list of available runtime versions by using the Runtime versions API call.
instance_pool_id STRING The canonical unique identifier for the instance pool.
default_tags An array of ClusterTag Tags that are added by Azure Databricks regardless of any custom_tags, including:

* Vendor: Databricks
* DatabricksInstancePoolCreatorId: <create_user_id>
* DatabricksInstancePoolId: <instance_pool_id>
state InstancePoolState Current state of the instance pool.
stats InstancePoolStats Statistics about the usage of the instance pool.
status InstancePoolStatus Status about failed pending instances in the pool.

List

Endpoint HTTP Method
2.0/instance-pools/list GET

List information for all instance pools.

Example

curl --netrc -X GET \
https://adb-1234567890123456.7.azuredatabricks.net/api/2.0/instance-pools/list
{
  "instance_pools": [
    {
      "instance_pool_name": "mypool",
      "node_type_id": "Standard_D3_v2",
      "idle_instance_autotermination_minutes": 60,
      "enable_elastic_disk": false,
      "preloaded_spark_versions": [
        "5.4.x-scala2.11"
      ],
      "instance_pool_id": "101-120000-brick1-pool-ABCD1234",
      "default_tags": {
        "Vendor": "Databricks",
        "DatabricksInstancePoolCreatorId": "100125",
        "DatabricksInstancePoolId": "101-120000-brick1-pool-ABCD1234"
      },
      "state": "ACTIVE",
      "stats": {
        "used_count": 10,
        "idle_count": 5,
        "pending_used_count": 5,
        "pending_idle_count": 5
      },
      "status": {}
    },
    ...
  ]
}

Response structure

Field Name Type Description
instance_pools An array of InstancePoolStatus A list of instance pools with their statistics included.

Data structures

In this section:

InstancePoolState

The state of an instance pool. The current allowable state transitions are:

  • ACTIVE -> DELETED
Name Description
ACTIVE Indicates an instance pool is active. Clusters can attach to it.
DELETED Indicates the instance pool has been deleted and is no longer accessible.

InstancePoolStats

Statistics about the usage of the instance pool.

Field Name Type Description
used_count INT32 Number of active instances that are in use by a cluster.
idle_count INT32 Number of active instances that are not in use by a cluster.
pending_used_count INT32 Number of pending instances that are assigned to a cluster.
pending_idle_count INT32 Number of pending instances that are not assigned to a cluster.

InstancePoolStatus

Status about failed pending instances in the pool.

Field Name Type Description
pending_instance_errors An array of PendingInstanceError List of error messages for the failed pending instances.

PendingInstanceError

Error message of a failed pending instance.

Field Name Type Description
instance_id STRING ID of the failed instance.
message STRING Message describing the cause of the failure.

DiskSpec

Describes the initial set of disks to attach to each instance. For example, if there are 3 instances and each instance is configured to start with 2 disks, 100 GiB each, then Azure Databricks creates a total of 6 disks, 100 GiB each, for these instances.

Field Name Type Description
disk_type DiskType The type of disks to attach.
disk_count INT32 The number of disks to attach to each instance:

* This feature is only enabled for supported node types.
* Users can choose up to the limit of the disks supported by the node type.
* For node types with no local disk, at least one disk needs to be specified.
disk_size INT32 The size of each disk (in GiB) to attach. Values must fall into the supported range for a particular instance type:

* Premium LRS (SSD): 1 - 1023 GiB
* Standard LRS (HDD): 1- 1023 GiB

DiskType

Describes the type of disk.

Field Name Type Description
azure_disk_volume_type AzureDiskVolumeType The type of Azure disk to use.

InstancePoolAndStats

Field Name Type Description
instance_pool_name STRING The name of the instance pool. This is required for create and edit operations. It must be unique, non-empty, and less than 100 characters.
min_idle_instances INT32 The minimum number of idle instances maintained by the pool. This is in addition to any instances in use by active clusters.
max_capacity INT32 The maximum number of instances the pool can contain, including both idle instances and ones in use by clusters. Once the maximum capacity is reached, you cannot create new clusters from the pool and existing clusters cannot autoscale up until some instances are made idle in the pool via cluster termination or down-scaling.
node_type_id STRING The node type for the instances in the pool. All clusters attached to the pool inherit this node type and the pool’s idle instances are allocated based on this type. You can retrieve a list of available node types by using the List node types API call.
custom_tags An array of ClusterTag Additional tags for instance pool resources. Azure Databricks tags all pool resources (e.g. VM disk volumes) with these tags in addition to default_tags.

Azure Databricks allows up to 41 custom tags.
idle_instance_autotermination_minutes INT32 The number of minutes that idle instances in excess of the min_idle_instances are maintained by the pool before being terminated. If not specified, excess idle instances are terminated automatically after a default timeout period. If specified, the time must be between 0 and 10000 minutes. If 0 is supplied, excess idle instances are removed as soon as possible.
enable_elastic_disk BOOL Autoscaling Local Storage: when enabled, the instances in the pool dynamically acquire additional disk space when they are running low on disk space.
disk_spec DiskSpec Defines the amount of initial remote storage attached to each instance in the pool.
preloaded_spark_versions An array of STRING A list with the runtime version the pool installs on each instance. Pool clusters that use a preloaded runtime version start faster as they do not have to wait for the image to download. You can retrieve a list of available runtime versions by using the Runtime versions API call.
instance_pool_id STRING The canonical unique identifier for the instance pool.
default_tags An array of ClusterTag Tags that are added by Azure Databricks regardless of any custom_tags, including:

* Vendor: Databricks
* DatabricksInstancePoolCreatorId: <create_user_id>
* DatabricksInstancePoolId: <instance_pool_id>
state InstancePoolState Current state of the instance pool.
stats InstancePoolStats Statistics about the usage of the instance pool.

AzureDiskVolumeType

All Azure Disk types that Azure Databricks supports. See https://docs.microsoft.com/azure/virtual-machines/linux/disks-types

Name Description
PREMIUM_LRS Premium storage tier, backed by SSDs.
STANDARD_LRS Standard storage tier, backed by HDDs.

InstancePoolAzureAttributes

Attributes set during instance pools creation related to Azure.

Field Name Type Description
availability AzureAvailability Availability type used for all subsequent nodes.
spot_bid_max_price DOUBLE The max bid price used for Azure spot instances. You can set this to greater than or equal to the current spot price. You can also set this to -1 (the default), which specifies that the instance cannot be evicted on the basis of price. The price for the instance will be the current price for spot instances or the price for a standard instance. You can view historical pricing and eviction rates in the Azure portal.