data package

Contains modules supporting data representation for Datastore and Dataset in Azure Machine Learning.

This package contains core functionality supporting Datastore and Dataset classes in the core package. Datastore objects contain connection information to Azure storage services that can be easily referred to by name without the need to work directly with or hard code connection information in scripts. Datastore supports a number of different services represented by classes in this package, including AzureBlobDatastore, AzureFileDatastore, and AzureDataLakeDatastore. For a full list of supported storage services, see the Datastore class.

While a Datastore acts as a container for your data files, you can think of a Dataset as a reference or pointer to specific data that's in your datastore. The following Datasets types are supported:

  • TabularDataset represents data in a tabular format created by parsing the provided file or list of files.

  • FileDataset references single or multiple files in your datastores or public URLs.

For more information, see the article Add & register datasets. To get started working with a datasets, see https://aka.ms/tabulardataset-samplenotebook and https://aka.ms/filedataset-samplenotebook.

Modules

abstract_dataset

Abstract Dataset class.

abstract_datastore

Contains the base functionality for datastores that save connection information to Azure storage services.

azure_data_lake_datastore

Contains the base functionality for datastores that save connection information to Azure Data Lake Storage.

azure_my_sql_datastore

Contains the base functionality for datastores that save connection information to Azure Database for MySQL.

azure_postgre_sql_datastore

Contains the base functionality for datastores that save connection information to Azure Database for PostgreSQL.

azure_sql_database_datastore

Contains the base functionality for datastores that save connection information to Azure SQL database.

azure_storage_datastore

Contains functionality for datastores that save connection information to Azure Blob and Azure File storage.

constants

Constants used in the azureml.data package. Internal use only.

context_managers

Contains functionality to manage data context of datastores and datasets. Internal use only.

data_reference

Contains functionality that defines how to create references to data in datastores.

datapath

Contains functionality to create references to data in datastores.

This module contains the DataPath class, which represents the location of data, and the DataPathComputeBinding class, which represents how the data is made available on the compute targets.

dataset_action_run

Contains functionality that manages the execution of Dataset actions.

This module provides convenience methods for creating Dataset actions and get their results after completion.

dataset_consumption_config

Contains functionality for Dataset consumption configuration.

dataset_definition

Contains functionality to manage dataset definition and its operations.

dataset_error_handling

Module for dataset error handling in Azure Machine Learning service.

dataset_factory

Contains functionality to create datasets for Azure Machine Learning service.

dataset_snapshot

Contains functionality to manage Dataset snapshot operations.

dataset_type_definitions

Contains enumeration values used with Dataset.

datastore_client

Internal use only.

dbfs_datastore

Contains functionality for datastores that save connection information to Databricks File Sytem (DBFS).

file_dataset

Contains functionality for referencing single or multiple files in datastores or public URLs.

For more information, see the article Add & register datasets. To get started working with a file dataset, see https://aka.ms/filedataset-samplenotebook.

sql_data_reference

Contains functionality for creating references to data in datastores that save connection info to SQL databases.

stored_procedure_parameter

stored procedure parameter class.

tabular_dataset

Contains functionality for representing data in a tabular format by parsing the provided file or list of files.

For more information, see the article Add & register datasets. To get started working with a tabular dataset, see https://aka.ms/tabulardataset-samplenotebook.

Classes

DataType

Configures column data types for a dataset created in Azure Machine Learning service.

DataType methods are used in the TabularDatasetFactory class from_* methods, which are used to create new TabularDataset objects.

FileDataset

Represents a collection of file references in datastores or public URLs to use in Azure Machine Learning.

A FileDataset defines a series of lazily-evaluated, immutable operations to load data from the data source into file streams. Data is not loaded from the source until FileDataset is asked to deliver data.

A FileDataset is created using the from_files(path, validate=True) method of the FileDatasetFactory class.

For more information, see the article Add & register datasets. To get started working with a file dataset, see https://aka.ms/filedataset-samplenotebook.

TabularDataset

Represents a tabular dataset to use in Azure Machine Learning service.

A TabularDataset defines a series of lazily-evaluated, immutable operations to load data from the data source into tabular representation. Data is not loaded from the source until TabularDataset is asked to deliver data.

TabularDataset is created using methods like from_delimited_files(path, validate=True, include_path=False, infer_column_types=True, set_column_types=None, separator=',', header=True, partition_format=None) from the TabularDatasetFactory class.

For more information, see the article Add & register datasets. To get started working with a tabular dataset, see https://aka.ms/tabulardataset-samplenotebook.