Create and explore Azure Machine Learning dataset with labels

In this article, you'll learn how to export the data labels from an Azure Machine Learning data labeling project and load them into popular formats such as, a pandas dataframe for data exploration.

What are datasets with labels

Azure Machine Learning datasets with labels are referred to as labeled datasets. These specific datasets are TabularDatasets with a dedicated label column and are only created as an output of Azure Machine Learning data labeling projects. Create a data labeling project for image labeling or text labeling. Machine Learning supports data labeling projects for image classification, either multi-label or multi-class, and object identification together with bounded boxes.

Prerequisites

Export data labels

When you complete a data labeling project, you can export the label data from a labeling project. Doing so, allows you to capture both the reference to the data and its labels, and export them in COCO format or as an Azure Machine Learning dataset.

Use the Export button on the Project details page of your labeling project.

Export button in studio UI

COCO

The COCO file is created in the default blob store of the Azure Machine Learning workspace in a folder within export/coco.

Note

In object detection projects, the exported "bbox": [x,y,width,height]" values in COCO file are normalized. They are scaled to 1. Example : a bounding box at (10, 10) location, with 30 pixels width , 60 pixels height, in a 640x480 pixel image will be annotated as (0.015625. 0.02083, 0.046875, 0.125). Since the coordintes are normalized, it will show as '0.0' as "width" and "height" for all images. The actual width and height can be obtained using Python library like OpenCV or Pillow(PIL).

Azure Machine Learning dataset

You can access the exported Azure Machine Learning dataset in the Datasets section of your Azure Machine Learning studio. The dataset Details page also provides sample code to access your labels from Python.

Exported dataset

Tip

Once you have exported your labeled data to an Azure Machine Learning dataset, you can use AutoML to build computer vision models trained on your labeled data. Learn more at Set up AutoML to train computer vision models with Python

Explore labeled datasets via pandas dataframe

Load your labeled datasets into a pandas dataframe to leverage popular open-source libraries for data exploration with the to_pandas_dataframe() method from the azureml-dataprep class.

Install the class with the following shell command:

pip install azureml-dataprep

In the following code, the animal_labels dataset is the output from a labeling project previously saved to the workspace. The exported dataset is a TabularDataset.

APPLIES TO: Python SDK azureml v1

import azureml.core
from azureml.core import Dataset, Workspace

# get animal_labels dataset from the workspace
animal_labels = Dataset.get_by_name(workspace, 'animal_labels')
animal_pd = animal_labels.to_pandas_dataframe()

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

#read images from dataset
img = mpimg.imread(animal_pd['image_url'].iloc(0).open())
imgplot = plt.imshow(img)

Next steps