HDInsightCompute Class

Manages an HDInsight cluster compute target in Azure Machine Learning.

Azure HDInsight is a popular platform for big-data analytics. The platform provides Apache Spark, which can be used to train your model. For more information, see What are compute targets in Azure Machine Learning?

Class ComputeTarget constructor.

Retrieve a cloud representation of a Compute object associated with the provided workspace. Returns an instance of a child class corresponding to the specific type of the retrieved Compute object.

Inheritance
HDInsightCompute

Constructor

HDInsightCompute(workspace, name)

Parameters

workspace
Workspace
Required

The workspace object containing the HDInsightCompute object to retrieve.

name
str
Required

The name of the of the HDInsightCompute object to retrieve.

workspace
Workspace
Required

The workspace object containing the Compute object to retrieve.

name
str
Required

The name of the of the Compute object to retrieve.

Remarks

The following sample shows how to create a Spark for HDInsight cluster in Azure.


   from azureml.core.compute import ComputeTarget, HDInsightCompute
   from azureml.exceptions import ComputeTargetException
   import os

   try:
   # If you want to connect using SSH key instead of username/password you can provide parameters private_key_file and private_key_passphrase

   # Attaching a HDInsight cluster using the public address of the HDInsight cluster is no longer supported.
   # Instead, use resourceId of the HDInsight cluster.
   # The resourceId of the HDInsight Cluster can be constructed using the following string format:
   # /subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.HDInsight/clusters/<cluster_name>.
   # You can also use subscription_id, resource_group and cluster_name without constructing resourceId.
       attach_config = HDInsightCompute.attach_configuration(resource_id='<resource_id>',
                                                             ssh_port=22,
                                                             username=os.environ.get('hdiusername', '<ssh_username>'),
                                                             password=os.environ.get('hdipassword', '<my_password>'))

       hdi_compute = ComputeTarget.attach(workspace=ws,
                                          name='myhdi',
                                          attach_configuration=attach_config)

   except ComputeTargetException as e:
       print("Caught = {}".format(e.message))


   hdi_compute.wait_for_completion(show_output=True)

Full sample is available from https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training/train-in-spark/train-in-spark.ipynb

Methods

attach

DEPRECATED. Use the attach_configuration method instead.

Associate an existing HDI resource with the provided workspace.

attach_configuration

Create a configuration object for attaching an HDInsight compute target.

Attaching a HDInsight cluster using the public address of the HDInsight cluster is no longer supported. Instead, use resourceId of the HDInsight cluster. The resourceId of the HDInsight Cluster can be constructed using the following string format: "/subscriptions/<subscription_id>/resourceGroups/<resource_group>/ providers/Microsoft.HDInsight/clusters/<cluster_name>".

You can also use subscription_id, resource_group and cluster_name without constructing resourceId. For more details: https://aka.ms/azureml-compute-hdi

delete

Delete is not supported for HDInsightCompute object. Use detach instead.

deserialize

Convert a JSON object into a HDInsightCompute object.

detach

Detaches the HDInsightCompute object from its associated workspace.

Underlying cloud objects are not deleted, only the association is removed.

get_credentials

Retrieve the credentials for the HDInsightCompute target.

refresh_state

Perform an in-place update of the properties of the object.

This method updates the properties based on the current state of the corresponding cloud object. This is primarily used for manual polling of compute state.

serialize

Convert this HDInsightCompute object into a JSON serialized dictionary.

attach

DEPRECATED. Use the attach_configuration method instead.

Associate an existing HDI resource with the provided workspace.

static attach(workspace, name, username, address, ssh_port='22', password='', private_key_file='', private_key_passphrase='')

Parameters

workspace
Workspace
Required

The workspace object to associate the compute resource with.

name
str
Required

The name to associate with the compute resource inside the provided workspace. Does not have to match the name of the compute resource to be attached.

username
str
Required

The username needed to access the resource.

address
str
Required

The address of the resource to be attached.

ssh_port
int
default value: 22

The exposed port for the resource. Defaults to 22.

password
str
Required

The password needed to access the resource.

private_key_file
str
Required

The path to a file containing the private key for the resource.

private_key_passphrase
str
Required

The private key phrase needed to access the resource.

Returns

An HDInsightCompute object representation of the compute object.

Return type

Exceptions

attach_configuration

Create a configuration object for attaching an HDInsight compute target.

Attaching a HDInsight cluster using the public address of the HDInsight cluster is no longer supported. Instead, use resourceId of the HDInsight cluster. The resourceId of the HDInsight Cluster can be constructed using the following string format: "/subscriptions/<subscription_id>/resourceGroups/<resource_group>/ providers/Microsoft.HDInsight/clusters/<cluster_name>".

You can also use subscription_id, resource_group and cluster_name without constructing resourceId. For more details: https://aka.ms/azureml-compute-hdi

static attach_configuration(username, subscription_id=None, resource_group=None, cluster_name=None, resource_id=None, address=None, ssh_port='22', password='', private_key_file='', private_key_passphrase='')

Parameters

username
str
Required

The username needed to access the resource.

subscription_id
str
default value: None

The Azure subscription ID

resource_group
str
default value: None

Name of the resource group in which HDI cluster is located.

cluster_name
str
default value: None

The HDI cluster name

resource_id
str
default value: None

The Azure Resource Manager (ARM) resource ID for the resource to be attached.

address
str
default value: None

The address for the resource to be attached.

ssh_port
int
default value: 22

The exposed port for the resource. Defaults to 22.

password
str
Required

The password needed to access the resource.

private_key_file
str
Required

The path to a file containing the private key for the resource.

private_key_passphrase
str
Required

The private key phrase needed to access the resource.

Returns

A configuration object to be used when attaching a Compute object.

Return type

Exceptions

delete

Delete is not supported for HDInsightCompute object. Use detach instead.

delete()

Exceptions

deserialize

Convert a JSON object into a HDInsightCompute object.

static deserialize(workspace, object_dict)

Parameters

workspace
Workspace
Required

The workspace object the HDInsightCompute object is associated with.

object_dict
dict
Required

A JSON object to convert to a HDInsightCompute object.

Returns

The HDInsightCompute representation of the provided JSON object.

Return type

Exceptions

Remarks

Raises a ComputeTargetException if the provided workspace is not the workspace the Compute is associated with.

detach

Detaches the HDInsightCompute object from its associated workspace.

Underlying cloud objects are not deleted, only the association is removed.

detach()

Exceptions

get_credentials

Retrieve the credentials for the HDInsightCompute target.

get_credentials()

Returns

The credentials for the HDInsightCompute target

Return type

Exceptions

refresh_state

Perform an in-place update of the properties of the object.

This method updates the properties based on the current state of the corresponding cloud object. This is primarily used for manual polling of compute state.

refresh_state()

Exceptions

serialize

Convert this HDInsightCompute object into a JSON serialized dictionary.

serialize()

Returns

The JSON representation of this HDICompute object.

Return type

Exceptions