TransformedDataContext Class

The user provided data with applied transformations.

If there is no featurization done this will be the same as the RawDataContext. This class will also hold the necessary transformers used.

Construct the TransformerDataContext class.

Inheritance
TransformedDataContext

Constructor

TransformedDataContext(X, y=None, X_valid=None, y_valid=None, sample_weight=None, sample_weight_valid=None, x_raw_column_names=None, cv_splits_indices=None, num_cv_folds=None, n_step=None, validation_size=None, timeseries=False, timeseries_param_dict=None, cache_store=None, logger=<Logger azureml.automl.runtime.data_context (INFO)>, task_type=None, X_raw_cleaned=None, y_raw_cleaned=None, X_valid_raw_cleaned=None, y_valid_raw_cleaned=None, data_snapshot_str='', data_snapshot_str_with_quantiles='', output_data_snapshot_str_with_quantiles='')

Parameters

X
DataFrame
Required

Input training data.

y
ndarray or DataFrame
default value: None

Input training labels.

X_valid
DataFrame
default value: None

validation data.

y_valid
ndarray or DataFrame
default value: None

validation labels.

sample_weight
ndarray or DataFrame
default value: None

Sample weights for training data.

sample_weight_valid
ndarray or DataFrame
default value: None

validation set sample weights.

cv_splits_indices
ndarray or DataFrame
default value: None

Custom indices by which to split the data when running cross validation.

num_cv_folds
<xref:integer>
default value: None

Number of cross validation folds

n_step
<xref:integer>
default value: None

Stepsize of cross validation in forecasting

validation_size
<xref:Float>
default value: None

Fraction of data to be held out for validation

cache_store
CacheStore
default value: None

cache store to use for caching transformed data. None means don't cache.

logger
<xref:logger>
default value: <Logger azureml.automl.runtime.data_context (INFO)>

module logger

X_raw_cleaned
ndarray or DataFrame
default value: None

Cleaned input training data.

y_raw_cleaned
ndarray or DataFrame
default value: None

Cleaned input training labels.

X_valid_raw_cleaned
ndarray or DataFrame
default value: None

Cleaned input validation data.

y_valid_raw_cleaned
ndarray or DataFrame
default value: None

Cleaned input validation labels.

data_snapshot_str
str
Required

The input data snapshot string.

data_snapshot_str_with_quantiles
str
Required

The input data snapshot string with quantiles.

output_data_snapshot_str_with_quantiles
str
Required

The output data snapshot string with quantiles columns.

x_raw_column_names
default value: None
timeseries
default value: False
timeseries_param_dict
default value: None
task_type
default value: None

Methods

cleanup

Clean up the cache.

cleanup

Clean up the cache.

cleanup() -> None

Attributes

FEATURIZED_CV_SPLIT_KEY_INITIALS

FEATURIZED_CV_SPLIT_KEY_INITIALS = 'featurized_cv_split_'

FEATURIZED_TRAIN_TEST_VALID_KEY_INITIALS

FEATURIZED_TRAIN_TEST_VALID_KEY_INITIALS = 'featurized_train_test_valid'