Instance Pools API

The Instance Pools API allows you to create, edit, delete and list instance pools.

An instance pool reduces cluster start and auto-scaling times by maintaining a set of idle, ready-to-use cloud instances. When a cluster attached to a pool needs an instance, it first attempts to allocate one of the pool’s idle instances. If the pool has no idle instances, it expands by allocating a new instance from the instance provider in order to accommodate the cluster’s request. When a cluster releases an instance, it returns to the pool and is free for another cluster to use. Only clusters attached to a pool can use that pool’s idle instances.

Azure Databricks does not charge DBUs while instances are idle in the pool. Instance provider billing does apply; see pricing.

Create

Endpoint HTTP Method
2.0/instance-pools/create POST

Create an instance pool. Use the returned instance_pool_id to query the status of the instance pool, which includes the number of instances currently allocated by the pool. If you provide the min_idle_instances parameter, instances are provisioned in the background and are ready to use once the idle_count in the InstancePoolStats equals the requested minimum.

Note

Azure Databricks may not be able to acquire some of the requested idle instances due to instance provider limitations or transient network issues. Clusters can still attach to the instance pool, but may not start as quickly.

An example request:

{
  "instance_pool_name": "my-pool",
  "node_type_id": "Standard_D3_v2",
  "min_idle_instances": 10
}

And response:

{
  "instance_pool_id": "0101-120000-brick1-pool-ABCD1234"
}

Request Structure

Field Name Type Description
instance_pool_name STRING The name of the instance pool. This is required for create and edit operations. It must be unique, non-empty, and less than 100 characters.
min_idle_instances INT32 The minimum number of idle instances maintained by the pool. This is in addition to any instances in use by active clusters.
max_capacity INT32 The maximum number of instances the pool can contain, including both idle instances and ones in use by clusters. Once the maximum capacity is reached, you cannot create new clusters from the pool and existing clusters cannot autoscale up until some instances are made idle in the pool via cluster termination or down-scaling.
node_type_id STRING The node type for the instances in the pool. All clusters attached to the pool inherit this node type and the pool’s idle instances are allocated based on this type. You can retrieve a list of available node types by using the List Node Types API call.
custom_tags An array of ClusterTag Additional tags for instance pool resources. Azure Databricks tags all pool resources (e.g. VM disk volumes) with these tags in addition to default_tags.

Azure Databricks allows up to 41 custom tags.
idle_instance_autotermination_minutes INT32 The number of minutes that idle instances in excess of the min_idle_instances are maintained by the pool before being terminated. If not specified, excess idle instances are terminated automatically after a default timeout period. If specified, the time must be between 0 and 10000 minutes. If 0 is supplied, excess idle instances are removed as soon as possible.
enable_elastic_disk BOOL Autoscaling Local Storage: when enabled, the instances in the pool dynamically acquire additional disk space when they are running low on disk space.
disk_spec DiskSpec Defines the amount of initial remote storage attached to each instance in the pool.
preloaded_spark_versions An array of STRING A list with the runtime version the pool installs on each instance. Pool clusters that use a preloaded runtime version start faster as they do have to wait for the image to download. You can retrieve a list of available runtime versions by using the Runtime Versions API call.

Response Structure

Field Name Type Description
instance_pool_id STRING The ID of the created instance pool.

Edit

Endpoint HTTP Method
2.0/instance-pools/edit POST

Edit an instance pool. This modifies the configuration of an existing instance pool.

Note

  • You can edit only the following fields: instance_pool_name, min_idle_instances, max_capacity, and idle_instance_autotermination_minutes.
  • You must supply an instance_pool_name.
  • You must supply a node_type_id and it must match the original node_type_id.

An example request:

{
  "instance_pool_id": "0101-120000-brick1-pool-ABCD1234",
  "instance_pool_name": "my-edited-pool",
  "node_type_id": "Standard_D3_v2",
  "min_idle_instances": 5,
  "max_capacity": 200,
  "idle_instance_autotermination_minutes": 30
}

Request Structure

Field Name Type Description
instance_pool_id STRING The ID of the instance pool to edit. This field is required.
instance_pool_name STRING The name of the instance pool. This is required for create and edit operations. It must be unique, non-empty, and less than 100 characters.
min_idle_instances INT32 The minimum number of idle instances maintained by the pool. This is in addition to any instances in use by active clusters.
max_capacity INT32 The maximum number of instances the pool can contain, including both idle instances and ones in use by clusters. Once the maximum capacity is reached, you cannot create new clusters from the pool and existing clusters cannot autoscale up until some instances are made idle in the pool via cluster termination or down-scaling.
node_type_id STRING The node type for the instances in the pool. All clusters attached to the pool inherit this node type and the pool’s idle instances are allocated based on this type. You can retrieve a list of available node types by using the List Node Types API call.
idle_instance_autotermination_minutes INT32 The number of minutes that idle instances in excess of the min_idle_instances are maintained by the pool before being terminated. If not specified, excess idle instances are terminated automatically after a default timeout period. If specified, the time must be between 0 and 10000 minutes. If 0 is supplied, excess idle instances are removed as soon as possible.

Delete

Endpoint HTTP Method
2.0/instance-pools/delete POST

Delete an instance pool. This permanently deletes the instance pool. The idle instances in the pool are terminated asynchronously. New clusters cannot attach to the pool. Running clusters attached to the pool continue to run but cannot autoscale up. Terminated clusters attached to the pool will fail to start until they are edited to no longer use the pool.

An example request:

{
  "instance_pool_id": "0101-120000-brick1-pool-ABCD1234"
}

Request Structure

Field Name Type Description
instance_pool_id STRING The ID of the instance pool to delete.

Get

Endpoint HTTP Method
2.0/instance-pools/get GET

Retrieve the information for an instance pool given its identifier.

An example request:

/instance-pools/get?instance_pool_id=0101-120000-brick1-pool-ABCD1234

An example response:

{
  "instance_pool_name": "mypool",
  "node_type_id": "Standard_D3_v2",
  "idle_instance_autotermination_minutes": 60,
  "enable_elastic_disk": false,
  "preloaded_spark_versions": [
    "5.4.x-scala2.11"
  ],
  "instance_pool_id": "101-120000-brick1-pool-ABCD1234",
  "default_tags": {
    "Vendor": "Databricks",
    "DatabricksInstancePoolCreatorId": "100125",
    "DatabricksInstancePoolId": "101-120000-brick1-pool-ABCD1234"
  },
  "state": "ACTIVE",
  "stats": {
    "used_count": 10,
    "idle_count": 5,
    "pending_used_count": 5,
    "pending_idle_count": 5
  },
  "status": {}
}

Request Structure

Field Name Type Description
instance_pool_id STRING The instance pool about which to retrieve information.

Response Structure

Field Name Type Description
instance_pool_name STRING The name of the instance pool. This is required for create and edit operations. It must be unique, non-empty, and less than 100 characters.
min_idle_instances INT32 The minimum number of idle instances maintained by the pool. This is in addition to any instances in use by active clusters.
max_capacity INT32 The maximum number of instances the pool can contain, including both idle instances and ones in use by clusters. Once the maximum capacity is reached, you cannot create new clusters from the pool and existing clusters cannot autoscale up until some instances are made idle in the pool via cluster termination or down-scaling.
node_type_id STRING The node type for the instances in the pool. All clusters attached to the pool inherit this node type and the pool’s idle instances are allocated based on this type. You can retrieve a list of available node types by using the List Node Types API call.
custom_tags An array of ClusterTag Additional tags for instance pool resources. Azure Databricks tags all pool resources (e.g. VM disk volumes) with these tags in addition to default_tags.

Azure Databricks allows up to 41 custom tags.
idle_instance_autotermination_minutes INT32 The number of minutes that idle instances in excess of the min_idle_instances are maintained by the pool before being terminated. If not specified, excess idle instances are terminated automatically after a default timeout period. If specified, the time must be between 0 and 10000 minutes. If 0 is supplied, excess idle instances are removed as soon as possible.
enable_elastic_disk BOOL Autoscaling Local Storage: when enabled, the instances in the pool dynamically acquire additional disk space when they are running low on disk space.
disk_spec DiskSpec Defines the amount of initial remote storage attached to each instance in the pool.
preloaded_spark_versions An array of STRING A list with the runtime version the pool installs on each instance. Pool clusters that use a preloaded runtime version start faster as they do have to wait for the image to download. You can retrieve a list of available runtime versions by using the Runtime Versions API call.
instance_pool_id STRING The canonical unique identifier for the instance pool.
default_tags An array of ClusterTag Tags that are added by Azure Databricks regardless of any custom_tags, including:

* Vendor: Databricks
* DatabricksInstancePoolCreatorId: <create_user_id>
* DatabricksInstancePoolId: <instance_pool_id>
state InstancePoolState Current state of the instance pool.
stats InstancePoolStats Statistics about the usage of the instance pool.

List

Endpoint HTTP Method
2.0/instance-pools/list GET

List information for all instance pools.

An example response:

{
  "instance_pools": [
    {
      "instance_pool_name": "my-pool",
      "min_idle_instances": 10,
      "node_type_id": "Standard_D3_v2",
      "idle_instance_autotermination_minutes": 60,
      "instance_pool_id": "0101-120000-brick1-pool-ABCD1234",
      "default_tags": [
        { "DatabricksInstancePoolCreatorId", "1234" },
        { "DatabricksInstancePoolId", "0101-120000-brick1-pool-ABCD1234" }
      ],
      "stats": {
        "used_count": 10,
        "idle_count": 5,
        "pending_used_count": 5,
        "pending_idle_count": 5
      }
    }
  ]
}

Response Structure

Field Name Type Description
instance_pools An array of InstancePoolAndStats A list of instance pools with their statistics included.

Data Structures

InstancePoolState

The state of an instance pool. The current allowable state transitions are as follows:

  • ACTIVE -> DELETED
Name Description
ACTIVE Indicates an instance pool is active. Clusters can attach to it.
DELETED Indicates the instance pool has been deleted and is no longer accessible.

InstancePoolStats

Statistics about the usage of the instance pool.

Field Name Type Description
used_count INT32 Number of active instances that are in use by a cluster.
idle_count INT32 Number of active instances that are not in use by a cluster.
pending_used_count INT32 Number of pending instances that are assigned to a cluster.
pending_idle_count INT32 Number of pending instances that are not assigned to a cluster.

DiskSpec

Describes the initial set of disks to attach to each instance. For example, if there are 3 instances and each instance is configured to start with 2 disks, 100 GiB each, then Azure Databricks creates a total of 6 disks, 100 GiB each, for these instances.

Field Name Type Description
disk_type DiskType The type of disks to attach.
disk_count INT32 The number of disks to attach to each instance:

* This feature is only enabled for supported node types.
* Users can choose up to the limit of the disks supported by the node type.
* For node types with no local disk, at least one disk needs to be specified.
disk_size INT32 The size of each disk (in GiB) to attach. Values must fall into the supported range for a particular instance type:

* Premium LRS (SSD): 1 - 1023 GiB
* Standard LRS (HDD): 1- 1023 GiB

DiskType

Describes the type of disk.

Field Name Type Description
azure_disk_volume_type AzureDiskVolumeType The type of Azure disk to use.

InstancePoolAndStats

Field Name Type Description
instance_pool_name STRING The name of the instance pool. This is required for create and edit operations. It must be unique, non-empty, and less than 100 characters.
min_idle_instances INT32 The minimum number of idle instances maintained by the pool. This is in addition to any instances in use by active clusters.
max_capacity INT32 The maximum number of instances the pool can contain, including both idle instances and ones in use by clusters. Once the maximum capacity is reached, you cannot create new clusters from the pool and existing clusters cannot autoscale up until some instances are made idle in the pool via cluster termination or down-scaling.
node_type_id STRING The node type for the instances in the pool. All clusters attached to the pool inherit this node type and the pool’s idle instances are allocated based on this type. You can retrieve a list of available node types by using the List Node Types API call.
custom_tags An array of ClusterTag Additional tags for instance pool resources. Azure Databricks tags all pool resources (e.g. VM disk volumes) with these tags in addition to default_tags.

Azure Databricks allows up to 41 custom tags.
idle_instance_autotermination_minutes INT32 The number of minutes that idle instances in excess of the min_idle_instances are maintained by the pool before being terminated. If not specified, excess idle instances are terminated automatically after a default timeout period. If specified, the time must be between 0 and 10000 minutes. If 0 is supplied, excess idle instances are removed as soon as possible.
enable_elastic_disk BOOL Autoscaling Local Storage: when enabled, the instances in the pool dynamically acquire additional disk space when they are running low on disk space.
disk_spec DiskSpec Defines the amount of initial remote storage attached to each instance in the pool.
preloaded_spark_versions An array of STRING A list with the runtime version the pool installs on each instance. Pool clusters that use a preloaded runtime version start faster as they do have to wait for the image to download. You can retrieve a list of available runtime versions by using the Runtime Versions API call.
instance_pool_id STRING The canonical unique identifier for the instance pool.
default_tags An array of ClusterTag Tags that are added by Azure Databricks regardless of any custom_tags, including:

* Vendor: Databricks
* DatabricksInstancePoolCreatorId: <create_user_id>
* DatabricksInstancePoolId: <instance_pool_id>
state InstancePoolState Current state of the instance pool.
stats InstancePoolStats Statistics about the usage of the instance pool.

AzureDiskVolumeType

All Azure Disk types that Azure Databricks supports. See https://docs.microsoft.com/azure/virtual-machines/linux/disks-types

Name Description
PREMIUM_LRS Premium storage tier, backed by SSDs.
STANDARD_LRS Standard storage tier, backed by HDDs.