What is the Azure Machine Learning SDK for Python?

Data scientists and AI developers use the Azure Machine Learning SDK for Python to build and run machine learning workflows with the Azure Machine Learning service. You can interact with the service in any Python environment, including Jupyter Notebooks, Visual Studio Code, or your favorite Python IDE.

Key areas of the SDK include:

  • Explore, prepare and manage the lifecycle of your datasets used in machine learning experiments.
  • Manage cloud resources for monitoring, logging, and organizing your machine learning experiments.
  • Train models either locally or by using cloud resources, including GPU-accelerated model training.
  • Use automated machine learning, which accepts configuration parameters and training data. It automatically iterates through algorithms and hyperparameter settings to find the best model for running predictions.
  • Deploy web services to convert your trained models into RESTful services that can be consumed in any application.

For a step-by-step walkthrough of how to get started, try the quickstart.

The following sections are overviews of some of the most important classes in the SDK, and common design patterns for using them. To get the SDK, see the installation guide.


Namespace: azureml.core.workspace.Workspace

The Workspace class is a foundational resource in the cloud that you use to experiment, train, and deploy machine learning models. It ties your Azure subscription and resource group to an easily consumed object.

Import the class and create a new workspace by using the following code. Set create_resource_group to False if you have a previously existing Azure resource group that you want to use for the workspace. Some functions might prompt for Azure authentication credentials.

from azureml.core import Workspace
ws = Workspace.create(name='myworkspace',

Use the same workspace in multiple environments by first writing it to a configuration JSON file. This saves your subscription, resource, and workspace name data.

ws.write_config(path="./file-path", file_name="ws_config.json")

Load your workspace by reading the configuration file.

from azureml.core import Workspace
ws_other_environment = Workspace.from_config(path="./file-path/ws_config.json")

Alternatively, use the static get() method to load an existing workspace without using configuration files.

from azureml.core import Workspace
ws = Workspace.get(name="myworkspace", subscription_id='<azure-subscription-id>', resource_group='myresourcegroup')

The variable ws represents a Workspace object in the following code examples.


Namespace: azureml.core.experiment.Experiment

The Experiment class is another foundational cloud resource that represents a collection of trials (individual model runs). The following code fetches an Experiment object from within Workspace by name, or it creates a new Experiment object if the name doesn't exist.

from azureml.core.experiment import Experiment
experiment = Experiment(workspace=ws, name='test-experiment')

Run the following code to get a list of all Experiment objects contained in Workspace.

list_experiments = Experiment.list(ws)

Use the get_runs function to retrieve a list of Run objects (trials) from Experiment. The following code retrieves the runs and prints each run ID.

list_runs = experiment.get_runs()
for run in list_runs:

There are two ways to execute an experiment trial. If you're interactively experimenting in a Jupyter notebook, use the start_logging function. If you're submitting an experiment from a standard Python environment, use the submit function. Both functions return a Run object. The experiment variable represents an Experiment object in the following code examples.


Namespace: azureml.core.run.Run

A run represents a single trial of an experiment. Run is the object that you use to monitor the asynchronous execution of a trial, store the output of the trial, analyze results, and access generated artifacts. You use Run inside your experimentation code to log metrics and artifacts to the Run History service. Functionality includes:

  • Storing and retrieving metrics and data.
  • Using tags and the child hierarchy for easy lookup of past runs.
  • Registering stored model files for deployment.
  • Storing, modifying, and retrieving properties of a run.

Create a Run object by submitting an Experiment object with a run configuration object. Use the tags parameter to attach custom categories and labels to your runs. You can easily find and retrieve them later from Experiment.

tags = {"prod": "phase-1-model-tests"}
run = experiment.submit(config=your_config_object, tags=tags)

Use the static list function to get a list of all Run objects from Experiment. Specify the tags parameter to filter by your previously created tag.

from azureml.core.run import Run
filtered_list_runs = Run.list(experiment, tags=tags)

Use the get_details function to retrieve the detailed output for the run.

run_details = run.get_details()

Output for this function is a dictionary that includes:

  • Run ID
  • Status
  • Start and end time
  • Compute target (local versus cloud)
  • Dependencies and versions used in the run
  • Training-specific data (differs depending on model type)

For more examples of how to configure and monitor runs, see the how-to.


Namespace: azureml.core.model.Model

The Model class is used for working with cloud representations of machine learning models. Methods help you transfer models between local development environments and the Workspace object in the cloud.

You can use model registration to store and version your models in the Azure cloud, in your workspace. Registered models are identified by name and version. Each time you register a model with the same name as an existing one, the registry increments the version. The Azure Machine Learning service supports any model that can be loaded through Python 3, not just Azure Machine Learning service models.

The following example shows how to build a simple local classification model with scikit-learn, register the model in Workspace, and download the model from the cloud.

Create a simple classifier, clf, to predict customer churn based on their age. Then dump the model to a .pkl file in the same directory.

from sklearn import svm
from sklearn.externals import joblib
import numpy as np

# customer ages
X_train = np.array([50, 17, 35, 23, 28, 40, 31, 29, 19, 62])
X_train = X_train.reshape(-1, 1)
# churn y/n
y_train = ["yes", "no", "no", "no", "yes", "yes", "yes", "no", "no", "yes"]

clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(X_train, y_train)

joblib.dump(value=clf, filename="churn-model.pkl")

Use the register function to register the model in your workspace. Specify the local model path and the model name. Registering the same name more than once will create a new version.

from azureml.core.model import Model

model = Model.register(workspace=ws, model_path="churn-model.pkl", model_name="churn-model-test")

Now that the model is registered in your workspace, it's easy to manage, download, and organize your models. To retrieve a model (for example, in another environment) object from Workspace, use the class constructor and specify the model name and any optional parameters. Then, use the download function to download the model, including the cloud folder structure.

from azureml.core.model import Model
import os

model = Model(workspace=ws, name="churn-model-test")

Use the delete function to remove the model from Workspace.


After you have a registered model, deploying it as a web service is a straightforward process. First you create and register an image. This step configures the Python environment and its dependencies, along with a script to define the web service request and response formats. After you create an image, you build a deploy configuration that sets the CPU cores and memory parameters for the compute target. You then attach your image.

ComputeTarget, RunConfiguration, and ScriptRunConfig

Namespace: azureml.core.compute.ComputeTarget
Namespace: azureml.core.runconfig.RunConfiguration
Namespace: azureml.core.script_run_config.ScriptRunConfig

The ComputeTarget class is the abstract parent class for creating and managing compute targets. A compute target represents a variety of resources where you can train your machine learning models. A compute target can be either a local machine or a cloud resource, such as Azure Machine Learning Compute, Azure HDInsight, or a remote virtual machine.

Use compute targets to take advantage of powerful virtual machines for model training, and set up either persistent compute targets or temporary runtime-invoked targets. For a comprehensive guide on setting up and managing compute targets, see the how-to.

The following code shows a simple example of setting up an AmlCompute (child class of ComputeTarget) target. This target creates a runtime remote compute resource in your Workspace object. The resource scales automatically when a job is submitted. It's deleted automatically when the run finishes.

Reuse the simple scikit-learn churn model and build it into its own file, train.py, in the current directory. At the end of the file, create a new directory called outputs. This step creates a directory in the cloud (your workspace) to store your trained model that joblib.dump() serialized.

# train.py

from sklearn import svm
from sklearn.externals import joblib
import numpy as np
import os

# customer ages
X_train = np.array([50, 17, 35, 23, 28, 40, 31, 29, 19, 62])
X_train = X_train.reshape(-1, 1)
# churn y/n
y_train = ["yes", "no", "no", "no", "yes", "yes", "yes", "no", "no", "yes"]

clf = svm.SVC(gamma=0.001, C=100.)
clf.fit(X_train, y_train)

os.makedirs("outputs", exist_ok=True)
joblib.dump(value=clf, filename="outputs/churn-model.pkl")

Next you create the compute target by instantiating a RunConfiguration object and setting the type and size. This example uses the smallest resource size (1 CPU core, 3.5 GB of memory). The list_vms variable contains a list of supported virtual machines and their sizes.

from azureml.core.runconfig import RunConfiguration
from azureml.core.compute import AmlCompute
list_vms = AmlCompute.supported_vmsizes(workspace=ws)

compute_config = RunConfiguration()
compute_config.target = "amlcompute"
compute_config.amlcompute.vm_size = "STANDARD_D1_V2"

Create dependencies for the remote compute resource's Python environment by using the CondaDependencies class. The train.py file is using scikit-learn and numpy, which need to be installed in the environment. You can also specify versions of dependencies. Use the dependencies object to set the environment in compute_config.

from azureml.core.conda_dependencies import CondaDependencies

dependencies = CondaDependencies()
compute_config.environment.python.conda_dependencies = dependencies

Now you're ready to submit the experiment. Use the ScriptRunConfig class to attach the compute target configuration, and to specify the path/file to the training script train.py. Submit the experiment by specifying the config parameter of the submit() function. Call wait_for_completion on the resulting run to see asynchronous run output as the environment is initialized and the model is trained.

from azureml.core.experiment import Experiment
from azureml.core import ScriptRunConfig

script_run_config = ScriptRunConfig(source_directory=os.getcwd(), script="train.py", run_config=compute_config)
experiment = Experiment(workspace=ws, name="compute_target_test")
run = experiment.submit(config=script_run_config)

After the run finishes, the trained model file churn-model.pkl is available in your workspace.


Namespace: azureml.train.automl.automlconfig.AutoMLConfig

Use the AutoMLConfig class to configure parameters for automated machine learning training. Automated machine learning iterates over many combinations of machine learning algorithms and hyperparameter settings. It then finds the best-fit model based on your chosen accuracy metric. Configuration allows for specifying:

  • Task type (classification, regression, forecasting)
  • Number of algorithm iterations and maximum time per iteration
  • Accuracy metric to optimize
  • Algorithms to blacklist/whitelist
  • Number of cross-validations
  • Compute targets
  • Training data


Use the automl extra in your installation to use automated machine learning.

For detailed guides and examples of setting up automated machine learning experiments, see the tutorial and how-to.

The following code illustrates building an automated machine learning configuration object for a classification model, and using it when you're submitting an experiment.

from azureml.train.automl import AutoMLConfig

automl_config = AutoMLConfig(task="classification",

Use the automl_config object to submit an experiment.

from azureml.core.experiment import Experiment

experiment = Experiment(ws, "automl_test_experiment")
run = experiment.submit(config=automl_config, show_output=True)

After you submit the experiment, output shows the training accuracy for each iteration as it finishes. After the run is finished, an AutoMLRun object (which extends the Run class) is returned. Get the best-fit model by using the get_output() function to return a Model object.

best_model = run.get_output()
y_predict = best_model.predict(X_test)

Image and Webservice

Namespace: azureml.core.image.image.Image
Namespace: azureml.core.webservice.webservice.Webservice

Image is an abstract parent class for packaging models into container images that include the runtime environment and dependencies. Models must be built into an image before you deploy them as a web service. Webservice is the abstract parent class for creating and deploying web services for your models. For a detailed guide on building images and deploying web services, follow the how-to.

The following code shows an abstract example of creating an Image class and using it to deploy a web service. The ContainerImage class extends Image and creates a Docker image.

from azureml.core.image import ContainerImage

image_config = ContainerImage.image_configuration(execution_script="score.py",

In this example, score.py processes the request/response for the web service. The script defines two methods: init() and run(). The init() method loads a previously registered model once when the Docker container starts. The run() method should accept input data as a parameter. It uses the model to run predictions and return the response. The conda environment file myenv.yml defines dependencies. Register the image by using the following code.

image = ContainerImage.create(name="test-image",

The models parameter accepts a list of Model objects. It represents the models that will be available in the image. To deploy the image as a web service, first build a deployment configuration.

from azureml.core.webservice import AciWebservice

deploy_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=1)

Use the deployment configuration to create the web service.

from azureml.core.webservice import Webservice

service = Webservice.deploy_from_image(deployment_config=deploy_config,

This example creates an Azure Container Instances web service, which is best for small-scale testing and quick deployments. To deploy your model as a production-scale web service, use Azure Kubernetes Service (AKS). See AksCompute class.


Namespace: azureml.dataprep.dataflow

Dataflow is an abstract parent class to represent a series of lazily-evaluated, immutable operations on data. It is only an execution plan. No data is loaded from the source until you get data from the Dataflow using one of head, to_pandas_dataframe, get_profile or the write methods.

The azureml-dataprep package helps data scientists explore, cleanse and transform data for machine learning workflows in any Python environment. The package offers an intelligent and scalable experience for essential data preparation scenarios, while maintaining interoperability with common data analysis libraries.

To get help or ask questions, please email: askamldataprep@microsoft.com

Functionality included

This package includes classes and methods that support:

  • Automatic file type detection whether your data is in any of the supported file types. You don’t need to use special file readers for formats like CSV, text, Excel, etc., or to specify delimiter, header, or encoding parameters.

  • Summary statistics can be generated quickly for a dataflow with a single line of code.

  • Intelligent time-saving transformations such as:

  • Assertion. Create assertion rules to ensure that values in the specified columns satisfy the provided expression.

To see detailed examples and code for each preparation step, follow these how-to guides:

  1. Data prep tutorial: prepare data for regression modeling using NYC taxi data and use automated machine learning to build the model
  2. How to load data, which can be in various formats
  3. How to transform data it into a more usable structure
  4. How to write data that data to a location accessible to your models
  5. Explore more with these sample Jupyter notebooks

Key benefits

  • Cross-platform functionality. You can interact with the package in any Python environment alongside familiar libraries. Run it on Windows, macOS, or Linux (Red Hat Enterprise Linux, CentOS and Ubuntu).

  • Intelligent transformations powered by AI, including grouping similar values to their canonical form and deriving columns by examples without custom code.

  • Capability to work with large files of different schema.

  • Scalability on a single machine by streaming data during processing rather than loading into memory.

  • Seamless integration with other Azure Machine Learning services. You can simply pass your prepared data file into [AutoMLConfig] object for automated machine learning training.

Data caching

A Dataflow can be cached as a file on your disk during a local run by calling dflow_cached = dflow.cache(directory_path). With this code, you run all the steps in the Dataflow, dflow, and save the cached data to the specified directory_path. The returned Dataflow, dflow_cached, has a Caching Step added at the end. Any subsequent runs on on the Dataflow dflow_cached will reuse the cached data, and the steps before the Caching Step is run again.

Caching avoids running transforms multiple times, which can make local runs more efficient. Here are common places to use Caching:

  • after reading data from remote
  • after expensive transforms, such as Sort
  • after transforms that change the shape of data, such as Sampling, Filter and Summarize

Caching Step will be ignored during scale-out run invoked by to_spark_dataframe().


Namespace: azureml.core.dataset.Dataset

The Dataset class is a foundational resource for managing data within Azure Machine Learning. When you’re ready to use the data for training, you can save the dataset to your AML workspace to get versioning and reproducibility capabilities. By creating a dataset, you create a reference to the data source location, along with a copy of its metadata. The data remains in its existing location, so no extra storage cost is incurred.


Some Dataset classes (preview) have dependencies on the azureml-dataprep package (GA). For Linux users, these classes are only supported on the following distributions: Red Hat Enterprise Linux, CentOS, Fedora, and Ubuntu.

Dataset types

Datasets are categorized into types based on how users consume them in training. Currently we support tabular datasets which represent data in a tabular format by parsing the provided file or list of files. The TabularDataset class provides functionality such as the ability to materialize the data into a pandas DataFrame.

Import the class and create a new Dataset by using the following code. Datasets can work with local data or data in the cloud.

from azureml.core import Datastore, Dataset
dstore = Datastore.get(ws, datastore_name)
datapath = dstore.path('data/file.csv')
dataset = Dataset.Tabular.from_delimited_files(datapath)

You can convert your dataset to a Pandas dataframe by calling the to_pandas_dataframe method if you need to prepare your data.

To use this dataset with your models in machine learning pipelines and experiments,register it in your workspace. Some functions might prompt for Azure authentication credentials.

dataset = dataset.register(workspace = ws,
                           name = 'test-dataset',
                           description = 'Training data')

Once you’ve registered a dataset, you can easily access it from the workspace and use it in a script or Python notebook.

dataset = Dataset.get_by_name( workspace= ws, name='test-datset')

Next steps

Try these next steps to learn how to use the Azure Machine Learning SDK for Python:

  • Follow the tutorial to learn how to build, train, and deploy a model in Python.

  • Look up classes and modules in the reference documentation on this site by using the table of contents on the left.