HDInsightCompute Class

Manages an HDInsight cluster compute target in Azure Machine Learning.

Azure HDInsight is a popular platform for big-data analytics. The platform provides Apache Spark, which can be used to train your model. For more information, see What are compute targets in Azure Machine Learning?

Class ComputeTarget constructor.

Retrieve a cloud representation of a Compute object associated with the provided workspace. Returns an instance of a child class corresponding to the specific type of the retrieved Compute object.

Inheritance
HDInsightCompute

Constructor

HDInsightCompute(workspace, name)

Parameters

Name Description
workspace
Required

The workspace object containing the HDInsightCompute object to retrieve.

name
Required
str

The name of the of the HDInsightCompute object to retrieve.

workspace
Required

The workspace object containing the Compute object to retrieve.

name
Required
str

The name of the of the Compute object to retrieve.

Remarks

The following sample shows how to create a Spark for HDInsight cluster in Azure.


   from azureml.core.compute import ComputeTarget, HDInsightCompute
   from azureml.exceptions import ComputeTargetException
   import os

   try:
   # If you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase

   # Attaching a HDInsight cluster using the public address of the HDInsight cluster is no longer supported.
   # Instead, use resourceId of the HDInsight cluster.
   # The resourceId of the HDInsight Cluster can be constructed using the following string format:
   # /subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.HDInsight/clusters/<cluster_name>.
   # You can also use subscription_id, resource_group and cluster_name without constructing resourceId.
       attach_config = HDInsightCompute.attach_configuration(resource_id='<resource_id>',
                                                             ssh_port=22,
                                                             username=os.environ.get('hdiusername', '<ssh_username>'),
                                                             password=os.environ.get('hdipassword', '<my_password>'))

       hdi_compute = ComputeTarget.attach(workspace=ws,
                                          name='myhdi',
                                          attach_configuration=attach_config)

   except ComputeTargetException as e:
       print("Caught = {}".format(e.message))


   hdi_compute.wait_for_completion(show_output=True)

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb

Methods

attach

DEPRECATED. Use the attach_configuration method instead.

Associate an existing HDI resource with the provided workspace.

attach_configuration

Create a configuration object for attaching an HDInsight compute target.

Attaching a HDInsight cluster using the public address of the HDInsight cluster is no longer supported. Instead, use resourceId of the HDInsight cluster. The resourceId of the HDInsight Cluster can be constructed using the following string format: "/subscriptions/<subscription_id>/resourceGroups/<resource_group>/ providers/Microsoft.HDInsight/clusters/<cluster_name>".

You can also use subscription_id, resource_group and cluster_name without constructing resourceId. For more details: https://aka.ms/azureml-compute-hdi

delete

Delete is not supported for HDInsightCompute object. Use detach instead.

deserialize

Convert a JSON object into a HDInsightCompute object.

detach

Detaches the HDInsightCompute object from its associated workspace.

Underlying cloud objects are not deleted, only the association is removed.

get_credentials

Retrieve the credentials for the HDInsightCompute target.

refresh_state

Perform an in-place update of the properties of the object.

This method updates the properties based on the current state of the corresponding cloud object. This is primarily used for manual polling of compute state.

serialize

Convert this HDInsightCompute object into a JSON serialized dictionary.

attach

DEPRECATED. Use the attach_configuration method instead.

Associate an existing HDI resource with the provided workspace.

static attach(workspace, name, username, address, ssh_port='22', password='', private_key_file='', private_key_passphrase='')

Parameters

Name Description
workspace
Required

The workspace object to associate the compute resource with.

name
Required
str

The name to associate with the compute resource inside the provided workspace. Does not have to match the name of the compute resource to be attached.

username
Required
str

The username needed to access the resource.

address
Required
str

The address of the resource to be attached.

ssh_port
int

The exposed port for the resource. Defaults to 22.

default value: 22
password
Required
str

The password needed to access the resource.

private_key_file
Required
str

The path to a file containing the private key for the resource.

private_key_passphrase
Required
str

The private key phrase needed to access the resource.

Returns

Type Description

An HDInsightCompute object representation of the compute object.

Exceptions

Type Description

attach_configuration

Create a configuration object for attaching an HDInsight compute target.

Attaching a HDInsight cluster using the public address of the HDInsight cluster is no longer supported. Instead, use resourceId of the HDInsight cluster. The resourceId of the HDInsight Cluster can be constructed using the following string format: "/subscriptions/<subscription_id>/resourceGroups/<resource_group>/ providers/Microsoft.HDInsight/clusters/<cluster_name>".

You can also use subscription_id, resource_group and cluster_name without constructing resourceId. For more details: https://aka.ms/azureml-compute-hdi

static attach_configuration(username, subscription_id=None, resource_group=None, cluster_name=None, resource_id=None, address=None, ssh_port='22', password='', private_key_file='', private_key_passphrase='')

Parameters

Name Description
username
Required
str

The username needed to access the resource.

subscription_id
str

The Azure subscription ID

default value: None
resource_group
str

Name of the resource group in which HDI cluster is located.

default value: None
cluster_name
str

The HDI cluster name

default value: None
resource_id
str

The Azure Resource Manager (ARM) resource ID for the resource to be attached.

default value: None
address
str

The address for the resource to be attached.

default value: None
ssh_port
int

The exposed port for the resource. Defaults to 22.

default value: 22
password
Required
str

The password needed to access the resource.

private_key_file
Required
str

The path to a file containing the private key for the resource.

private_key_passphrase
Required
str

The private key phrase needed to access the resource.

Returns

Type Description

A configuration object to be used when attaching a Compute object.

Exceptions

Type Description

delete

Delete is not supported for HDInsightCompute object. Use detach instead.

delete()

Exceptions

Type Description

deserialize

Convert a JSON object into a HDInsightCompute object.

static deserialize(workspace, object_dict)

Parameters

Name Description
workspace
Required

The workspace object the HDInsightCompute object is associated with.

object_dict
Required

A JSON object to convert to a HDInsightCompute object.

Returns

Type Description

The HDInsightCompute representation of the provided JSON object.

Exceptions

Type Description

Remarks

Raises a ComputeTargetException if the provided workspace is not the workspace the Compute is associated with.

detach

Detaches the HDInsightCompute object from its associated workspace.

Underlying cloud objects are not deleted, only the association is removed.

detach()

Exceptions

Type Description

get_credentials

Retrieve the credentials for the HDInsightCompute target.

get_credentials()

Returns

Type Description

The credentials for the HDInsightCompute target

Exceptions

Type Description

refresh_state

Perform an in-place update of the properties of the object.

This method updates the properties based on the current state of the corresponding cloud object. This is primarily used for manual polling of compute state.

refresh_state()

Exceptions

Type Description

serialize

Convert this HDInsightCompute object into a JSON serialized dictionary.

serialize()

Returns

Type Description

The JSON representation of this HDICompute object.

Exceptions

Type Description