Troubleshoot automated ML experiments

APPLIES TO: Azure CLI ml extension v2 (current) Python SDK azure-ai-ml v2 (current)

In this guide, learn how to identify and resolve issues in your automated machine learning experiments.

Troubleshoot automated ML for Images and NLP in Studio

If there is a job failure for Automated ML for Images and NLP, you can use the following steps to understand the error.

  1. In the studio UI, the AutoML job should have a failure message indicating the reason for failure.
  2. For more details, go to the child job of this AutoML job. This child run is a HyperDrive job.
  3. In the Trials tab, you can check all the trials done for this HyperDrive run.
  4. Go to the failed trial job.
  5. These jobs should have an error message in the Status section of the Overview tab indicating the reason for failure. Select See more details to get more details about the failure.
  6. Additionally you can view std_log.txt in the Outputs + Logs tab to look at detailed logs and exception traces.

If your Automated ML runs uses pipeline runs for trials, follow these steps to understand the error.

  1. Follow the steps 1-4 above to identify the failed trial job.
  2. This run should show you the pipeline run and the failed nodes in the pipeline are marked with Red color. Diagram that shows a failed pipeline job.
  3. Select the failed node in the pipeline.
  4. These jobs should have an error message in the Status section of the Overview tab indicating the reason for failure. Select See more details to get more details about the failure.
  5. You can look at std_log.txt in the Outputs + Logs tab to look at detailed logs and exception traces.

Next steps