ColumnTypesBuilder class

Definition

Interactive object that can be used to infer column types and type conversion attributes.

ColumnTypesBuilder(dataflow: azureml.dataprep.api.dataflow.Dataflow, engine_api: azureml.dataprep.api.engineapi.api.EngineAPI)
Inheritance
builtins.object
ColumnTypesBuilder

Methods

ambiguous_date_conversions_drop() -> None

Resolves ambiguous date conversion candidates by removing them from the conversion dictionary.

Note

Resolving ambiguity this way will ensure that such columns remain unchanged.

ambiguous_date_conversions_keep_day_month() -> None

Resolves ambiguous date conversion candidates by only keeping date formats where day comes before month.

ambiguous_date_conversions_keep_month_day() -> None

Resolves ambiguous date conversion candidates by only keeping date formats where month comes before day.

learn(inference_arguments: azureml.dataprep.api.builders.InferenceArguments = None) -> None

Performs a pull on the data and populates conversion_candidates with automatically inferred conversion candidates for each column.

to_dataflow() -> azureml.dataprep.api.dataflow.Dataflow

Uses current state of this object to add 'set_column_types' step to the original Dataflow.

Note

This call will fail if there are any unresolved date format ambiguities remaining.

ambiguous_date_conversions_drop() -> None

Resolves ambiguous date conversion candidates by removing them from the conversion dictionary.

Note

Resolving ambiguity this way will ensure that such columns remain unchanged.

ambiguous_date_conversions_drop() -> None

ambiguous_date_conversions_keep_day_month() -> None

Resolves ambiguous date conversion candidates by only keeping date formats where day comes before month.

ambiguous_date_conversions_keep_day_month() -> None

ambiguous_date_conversions_keep_month_day() -> None

Resolves ambiguous date conversion candidates by only keeping date formats where month comes before day.

ambiguous_date_conversions_keep_month_day() -> None

learn(inference_arguments: azureml.dataprep.api.builders.InferenceArguments = None) -> None

Performs a pull on the data and populates conversion_candidates with automatically inferred conversion candidates for each column.

learn(inference_arguments: azureml.dataprep.api.builders.InferenceArguments = None) -> None

Parameters

inference_arguments

(Optional) Argument that would force automatic date format ambiguity resolution for all columns.

to_dataflow() -> azureml.dataprep.api.dataflow.Dataflow

Uses current state of this object to add 'set_column_types' step to the original Dataflow.

Note

This call will fail if there are any unresolved date format ambiguities remaining.

to_dataflow() -> azureml.dataprep.api.dataflow.Dataflow

Returns

The modified Dataflow.

Attributes

ambiguous_date_columns

List of columns, where ambiguous date formats were detected.

Returns

List of columns, where ambiguous date formats were detected.

Remarks

Each of the ambiguous date columns must be resolved before calling to_dataflow() -> azureml.dataprep.api.dataflow.Dataflow. There are 3 ways to resolve ambiguity:

conversion_candidates

Current dictionary of conversion candidates, where key is column name and value is list of conversion candidates.

Remarks

The values in the conversion_candidates dictionary could be of several types:


   import azureml.dataprep as dprep

   dataflow = dprep.read_csv(path='./some/path')
   builder = dataflow.builders.set_column_types()
   builder.conversion_candidates['MyNumericColumn'] = dprep.FieldType.DECIMAL    # force conversion to decimal
   builder.conversion_candidates['MyBoolColumn'] = dprep.FieldType.BOOLEAN       # force conversion to bool
   builder.conversion_candidates['MyDateColumnWithFormat'] = (dprep.FieldType.DATE, ['%m-%d-%Y'])  # force conversion to date with month before day
   builder.conversion_candidates['MyOtherDateColumn'] = dprep.DateTimeConverter(['%d-%m-%Y'])      # force conversion to data with day before month (alternative way)

Note

This will be populated automatically with inferred conversion candidates when learn(inference_arguments: azureml.dataprep.api.builders.InferenceArguments = None) -> None is called.

Any modifications made to this dictionary will be discarded any time learn(inference_arguments: azureml.dataprep.api.builders.InferenceArguments = None) -> None is called.