question

SydneyD-4380 avatar image
0 Votes"
SydneyD-4380 asked romungi-MSFT commented

Local compute not found error when running a hyperparameter search

I am new to Azure and am trying to run a hyperparameter search on my neural network. I can run my code fine when I'm submitting a single job to examine a parameter, but when I run a hyperparameter search with the same configurations I get the following error:

"ComputeTargetNotFound: Compute Target with name local not found in provided workspace"

Any help would be appreciated!

  from azureml.core import Workspace
     from azureml.core import Experiment 
     from azureml.core import Environment
     from azureml.core import ScriptRunConfig
     from azureml.core.environment import CondaDependencies
     from azureml.train.hyperdrive import HyperDriveConfig
     from azureml.train.hyperdrive import choice
     from azureml.train.hyperdrive import RandomParameterSampling, BanditPolicy, uniform, PrimaryMetricGoal
     from azureml.core.compute import ComputeTarget
        
        
     ws = Workspace.from_config()
     env = Environment.get(workspace=ws, name="AzureML-tensorflow-2.5-ubuntu20.04-py38-cuda11-gpu")
     curated_clone1 = env.clone("customize_curated")
     conda_dep = CondaDependencies().add_conda_package("scikit-learn")
     curated_clone1.python.conda_dependencies=conda_dep
        
        
     curated_clone1.register(ws)
        
     param_sampling = RandomParameterSampling( {
             'learning_rate': choice(0.001, 0.0001, 0.00001),
                
         }
     )
        
     early_termination_policy = BanditPolicy(slack_factor=0.15, evaluation_interval=1, delay_evaluation=10)
        
     src = ScriptRunConfig(source_directory='./', script='loadv1.py',  environment=curated_clone1)
        
     hd_config = HyperDriveConfig(run_config=src,
                                  hyperparameter_sampling=param_sampling,
                                  policy=early_termination_policy,
                                  primary_metric_name="loss",
                                  primary_metric_goal=PrimaryMetricGoal.MINIMIZE,
                                  max_total_runs=100,
                                  max_concurrent_runs=4)
        
        
     experiment = Experiment(workspace=ws, name='day3-experiment-data')
     #run = experiment.submit(src)
     hyperdrive_run = experiment.submit(hd_config)
azure-machine-learning
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@SydneyD-4380 Did you get a chance to check if adding a compute target worked?

0 Votes 0 ·

1 Answer

romungi-MSFT avatar image
0 Votes"
romungi-MSFT answered

@SydneyD-4380 The experiment setup mentioned seems similar to one of the threads I recently commented on.
The error in that case was with the module sklearn, not sure if the suggested steps worked for the user to fix the error with the module.

In this case and the other thread the compute target is still not defined. The compute target needs to be defined in your ScriptRunConfig() for example,


 from azureml.core.compute import ComputeTarget, AmlCompute
 from azureml.core.compute_target import ComputeTargetException
    
 # choose a name for your cluster
 cluster_name = "hd-cluster"
    
 try:
     compute_target = ComputeTarget(workspace=ws, name=cluster_name)
     print('Found existing compute target.')
 except ComputeTargetException:
     print('Creating a new compute target...')
     compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', 
                                                            max_nodes=4)
    
     # create the cluster
     compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
    
     compute_target.wait_for_completion(show_output=True)
    
 # use get_status() to get a detailed status for the current cluster. 
    print(compute_target.get_status().serialize())
    
    src = ScriptRunConfig(source_directory='./', script='loadv1.py',compute_target=compute_target, environment=curated_clone1)


The default compute target is local on the ScriptRunConfig() and since it is not created in your workspace this error is seen since you are submiting this experiment to the workspace.

The compute target where training will happen. This can either be a ComputeTarget object, the name of an existing ComputeTarget, or the string "local". If no compute target is specified, your local machine will be used.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.