dataprep package

Main Data Prep module that contains tools to load, analyze and manipulate data. To learn more about the advantages, key functionalities and supported platforms of Data Prep, you may refer to https://aka.ms/data-prep-sdk.

Classes

Dataflow

A Dataflow represents a series of lazily-evaluated, immutable operations on data. It is only an execution plan. No data is loaded from the source until you get data from the Dataflow using one of head, to_pandas_dataframe, get_profile or the write methods.

InferenceArguments

Class to control data type inference behavior.

SkipMode

Defines a strategy to skip rows when reading files

PromoteHeadersMode

Defines strategy to promote headers when reading files

FileEncoding

An enumeration.

JoinType

Describes different possible types of join.

FileFormatArguments

Defines and stores the arguments which can affect learning on a 'FileFormatBuilder'.

ImputeColumnArguments

Defines and stores the arguments which can affect learning on a 'ImputeMissingValuesBuilder'.

ColumnSelector

Matches a set of columns by name according to a search term.

LocalDataSource

Describes a source of data that is available from local disk.

BlobDataSource

Describes a source of data that is available from Azure Blob Storage.

DatabaseAuthType

An enumeration.

MSSQLDataSource

Represents a datasource that points to a Microsoft SQL Database.

PostgreSQLDataSource

Represents a datasource that points to a PostgreSQL Database.

LocalFileOutput

Describes local target to write file(s) to.

BlobFileOutput

Describes Azure Blob Storage target to write file(s) to.

HttpDataSource

Describes a source of data that is available from http or https.

ReplaceValueFunction

An enumeration.

StringMissingReplacementOption

An enumeration.

ReplacementsValue

The values to replace and their replacements.

ColumnRelationship

An enumeration.

DecimalMark

An enumeration.

SummaryFunction

Enum SummaryFunction.

TrimType

An enumeration.

MismatchAsOption

An enumeration.

AssertPolicy

An enumeration.

TypeConverter

Basic type converter.

FieldType

An enumeration.

DateTimeConverter

Converter to DateTime.

CandidateDateTimeConverter

Specialized result of type inference used by DataPrep to suggest DateTime conversion.

CandidateConverter

Result of type inference returned by DataPrep to suggest a potential type conversion.

InferenceInfo

Result of running type inference on a specific column.

ColumnProfile

A ColumnProfile collects summary statistics on a particular column of data produced by a Dataflow.

DataProfile

A DataProfile collects summary statistics on the data produced by a Dataflow.

HistogramBucket
ValueCountEntry
TypeCountEntry
BoxAndWhiskerInspector
HistogramInspector
ColumnStatsInspector
ScatterPlotInspector
ValueCountInspector
ParseDelimitedProperties

Describes and stores the properties required to parse a Delimited Text-file.

ParseFixedWidthProperties

Describes and stores the properties required to parse a Fixed-Width Text-file.

ParseLinesProperties

Describes and stores the properties required to parse a Text-file containing raw lines.

ParseParquetProperties

Describes and stores the properties required to read a Parquet File.

ReadExcelProperties

Describes and stores the properties required to read an Excel file.

ReadJsonProperties

Describes and stores the properties required to read a JSON file.

ExternalReference

A reference to a Dataflow that is saved to a file.

Expression
Secret
ExecutionError

Exception raised when dataflow execution fails.

UnexpectedError

Unexpected error.

DataPrepImportError
Step

Single operation to be applied to data as part of the Dataflow.

SType

Defines suported semantic types

STypeCountEntry
HistogramCompareMethod

An enumeration.

RegEx

The RegEx class makes it possible to create expressions that leverage regular expressions.

Functions

read_csv(path: FilePath, separator: str = ',', header: azureml.dataprep.api.dataflow.PromoteHeadersMode = <PromoteHeadersMode.CONSTANTGROUPED: 3>, encoding: azureml.dataprep.api.engineapi.typedefinitions.FileEncoding = <FileEncoding.UTF8: 0>, quoting: bool = False, inference_arguments: azureml.dataprep.api.builders.InferenceArguments = None, skip_rows: int = 0, skip_mode: azureml.dataprep.api.dataflow.SkipMode = <SkipMode.NONE: 0>, comment: str = None, include_path: bool = False, archive_options: azureml.dataprep.api._archiveoption.ArchiveOptions = None, infer_column_types: bool = False, verify_exists: bool = True, partition_size: typing.Union[int, NoneType] = None) -> azureml.dataprep.api.dataflow.Dataflow

Creates a new Dataflow with the operations required to read and parse CSV and other delimited text files (TSV, custom delimiters like semicolon, colon etc.).

read_csv(path: FilePath, separator: str = ',', header: azureml.dataprep.api.dataflow.PromoteHeadersMode = <PromoteHeadersMode.CONSTANTGROUPED: 3>, encoding: azureml.dataprep.api.engineapi.typedefinitions.FileEncoding = <FileEncoding.UTF8: 0>, quoting: bool = False, inference_arguments: azureml.dataprep.api.builders.InferenceArguments = None, skip_rows: int = 0, skip_mode: azureml.dataprep.api.dataflow.SkipMode = <SkipMode.NONE: 0>, comment: str = None, include_path: bool = False, archive_options: azureml.dataprep.api._archiveoption.ArchiveOptions = None, infer_column_types: bool = False, verify_exists: bool = True, partition_size: typing.Union[int, NoneType] = None) -> azureml.dataprep.api.dataflow.Dataflow

Parameters

path

The path to the file(s) or folder(s) that you want to load and parse. It can either be a local path or an Azure Blob url. Globbing is supported. For example, you can use path = "./data*" to read all files with name starting with "data".

separator

The separator character to use to split columns.

header

The mode in which header is promoted. The options are: PromoteHeadersMode.CONSTANTGROUPED, PromoteHeadersMode.GROUPED, PromoteHeadersMode.NONE, PromoteHeadersMode.UNGROUPED. The default is PromoteHeadersMode.CONSTANTGROUPED, which assumes all files have the same schema by promoting the first row of the first file as header, and dropping the first row of the rest of the files. PromoteHeadersMode.GROUPED will promote the first row of each file as header and aggregate the result. PromoteHeadersMode.NONE will not promote header. PromoteHeadersMode.UNGROUPED will promote only the first row of the first file as header.

encoding

The encoding of the files being read.

quoting

Whether to handle new line characters within quotes. The default is to interpret the new line characters as starting new rows, irrespective of whether the characters are within quotes or not. If set to True, new line characters inside quotes will not result in new rows, and file reading speed will slow down.

inference_arguments

(Deprecated, use infer_column_types instead) Arguments that determine how data types are inferred. For example, to deal with ambiguous date format, you can specify inference_arguments = dprep.InferenceArguments(day_first = False)). Date values will then be read as MM/DD. Note that DataPrep will also attempt to infer and convert other column types.

skip_rows

How many rows to skip in the file(s) being read.

skip_mode

The mode in which rows are skipped. The options are: SkipMode.NONE, SkipMode.UNGROUPED, SkipMode.GROUPED. SkipMode.NONE (Dafault) Do not skip lines. Note that, if skip_rows is porvided this is ignored and SkipMode.UNGROUPED is used instead. SkipMode.UNGROUPED will skip only for the first file. SkipMode.GROUPED will skip for every file.

comment

Character used to indicate a line is a comment instead of data in the files being read. Comment character has to be the first character of the row to be interpreted.

include_path

Whether to include a column containing the path from which the data was read. This is useful when you are reading multiple files, and might want to know which file a particular record is originated from, or to keep useful information in file path.

archive_options

Options for archive file, including archive type and entry glob pattern. We only support ZIP as archive type at the moment. For example, by specifying archive_options = ArchiveOptions(archive_type = ArchiveType.ZIP, entry_glob = '>>*<<10-20.csv'), Dataprep will read all files with name ending with "10-20.csv" in ZIP.

infer_column_types
bool

Attempt to infer columns types based on data. Apply column type conversions accordingly.

verify_exists

Checks that the file referenced exists and can be accessed by the current context. You can set this to False when creating Dataflows in an environment that does not have access to the data but that will be executed in an environment that does.

partition_size

The desired partition size in bytes. Text readers parallelize their work by splitting the input into partitions which can be worked on independently. This parameter makes it possible to customize the size of those partitions. The minimum accepted value is 4 MB (4 * 1024 * 1024).

Returns

A new Dataflow.

read_fwf(path: FilePath, offsets: typing.List[int], header: azureml.dataprep.api.dataflow.PromoteHeadersMode = <PromoteHeadersMode.CONSTANTGROUPED: 3>, encoding: azureml.dataprep.api.engineapi.typedefinitions.FileEncoding = <FileEncoding.UTF8: 0>, inference_arguments: azureml.dataprep.api.builders.InferenceArguments = None, skip_rows: int = 0, skip_mode: azureml.dataprep.api.dataflow.SkipMode = <SkipMode.NONE: 0>, include_path: bool = False, infer_column_types: bool = False, verify_exists: bool = True) -> azureml.dataprep.api.dataflow.Dataflow

Creates a new Dataflow with the operations required to read and parse fixed-width data.

read_fwf(path: FilePath, offsets: typing.List[int], header: azureml.dataprep.api.dataflow.PromoteHeadersMode = <PromoteHeadersMode.CONSTANTGROUPED: 3>, encoding: azureml.dataprep.api.engineapi.typedefinitions.FileEncoding = <FileEncoding.UTF8: 0>, inference_arguments: azureml.dataprep.api.builders.InferenceArguments = None, skip_rows: int = 0, skip_mode: azureml.dataprep.api.dataflow.SkipMode = <SkipMode.NONE: 0>, include_path: bool = False, infer_column_types: bool = False, verify_exists: bool = True) -> azureml.dataprep.api.dataflow.Dataflow

Parameters

path

The path to the file(s) or folder(s) that you want to load and parse. It can either be a local path or an Azure Blob url. Globbing is supported. For example, you can use path = "./data*" to read all files with name starting with "data".

offsets

The offsets at which to split columns. The first column is always assumed to start at offset 0. For example, assuming we have "WAPostal98004" in a row, settling offsets = [2,8] will split the row into "WA","Postal" and "98004".

header

The mode in which header is promoted. The options are: PromoteHeadersMode.CONSTANTGROUPED, PromoteHeadersMode.GROUPED, PromoteHeadersMode.NONE, PromoteHeadersMode.UNGROUPED. The default is PromoteHeadersMode.CONSTANTGROUPED, which assumes all files have the same schema by promoting the first row of the first file as header, and dropping the first row of the rest of the files. PromoteHeadersMode.GROUPED will promote the first row of each file as header and aggregate the result. PromoteHeadersMode.NONE will not promote header. PromoteHeadersMode.UNGROUPED will promote only the first row of the first file as header.

encoding

The encoding of the files being read.

inference_arguments

(Deprecated, use infer_column_types instead) Arguments that determine how data types are inferred. For example, to deal with ambiguous date format, you can specify inference_arguments = dprep.InferenceArguments(day_first = False)). Date values will then be read as MM/DD. Note that DataPrep will also attempt to infer and convert other column types.

skip_rows

How many rows to skip in the file(s) being read.

skip_mode

The mode in which rows are skipped. The options are: SkipMode.NONE, SkipMode.UNGROUPED, SkipMode.GROUPED. SkipMode.NONE (Dafault) Do not skip lines. Note that, if skip_rows is porvided this is ignored and SkipMode.UNGROUPED is used instead. SkipMode.UNGROUPED will skip only for the first file. SkipMode.GROUPED will skip for every file.

include_path

Whether to include a column containing the path from which the data was read. This is useful when you are reading multiple files, and might want to know which file a particular record is originated from, or to keep useful information in file path.

infer_column_types
bool

Attempt to infer columns types based on data. Apply column type conversions accordingly.

verify_exists

Checks that the file referenced exists and can be accessed by the current context. You can set this to False when creating Dataflows in an environment that does not have access to the data but that will be executed in an environment that does.

Returns

A new Dataflow.

read_excel(path: FilePath, sheet_name: str = None, use_column_headers: bool = False, inference_arguments: azureml.dataprep.api.builders.InferenceArguments = None, skip_rows: int = 0, include_path: bool = False, infer_column_types: bool = False, verify_exists: bool = True) -> azureml.dataprep.api.dataflow.Dataflow

Creates a new Dataflow with the operations required to read Excel files.

read_excel(path: FilePath, sheet_name: str = None, use_column_headers: bool = False, inference_arguments: azureml.dataprep.api.builders.InferenceArguments = None, skip_rows: int = 0, include_path: bool = False, infer_column_types: bool = False, verify_exists: bool = True) -> azureml.dataprep.api.dataflow.Dataflow

Parameters

path

The path to the file(s) or folder(s) that you want to load and parse. It can either be a local path or an Azure Blob url. Globbing is supported. For example, you can use path = "./data*" to read all files with name starting with "data".

sheet_name

The name of the Excel sheet to load.The default is to read the first sheet from each Excel file.

use_column_headers

Whether to use the first row as column headers.

inference_arguments

(Deprecated, use infer_column_types instead) Arguments that determine how data types are inferred. For example, to deal with ambiguous date format, you can specify inference_arguments = dprep.InferenceArguments(day_first = False)). Date values will then be read as MM/DD. Note that DataPrep will also attempt to infer and convert other column types.

skip_rows

How many rows to skip in the file(s) being read.

include_path

Whether to include a column containing the path from which the data was read. This is useful when you are reading multiple files, and might want to know which file a particular record is originated from, or to keep useful information in file path.

infer_column_types
bool

Attempt to infer columns types based on data. Apply column type conversions accordingly.

verify_exists

Checks that the file referenced exists and can be accessed by the current context. You can set this to False when creating Dataflows in an environment that does not have access to the data but that will be executed in an environment that does.

Returns

A new Dataflow.

read_lines(path: FilePath, header: azureml.dataprep.api.dataflow.PromoteHeadersMode = <PromoteHeadersMode.NONE: 0>, encoding: azureml.dataprep.api.engineapi.typedefinitions.FileEncoding = <FileEncoding.UTF8: 0>, skip_rows: int = 0, skip_mode: azureml.dataprep.api.dataflow.SkipMode = <SkipMode.NONE: 0>, comment: str = None, include_path: bool = False, verify_exists: bool = True, partition_size: typing.Union[int, NoneType] = None) -> azureml.dataprep.api.dataflow.Dataflow

Creates a new Dataflow with the operations required to read text files and split them into lines.

read_lines(path: FilePath, header: azureml.dataprep.api.dataflow.PromoteHeadersMode = <PromoteHeadersMode.NONE: 0>, encoding: azureml.dataprep.api.engineapi.typedefinitions.FileEncoding = <FileEncoding.UTF8: 0>, skip_rows: int = 0, skip_mode: azureml.dataprep.api.dataflow.SkipMode = <SkipMode.NONE: 0>, comment: str = None, include_path: bool = False, verify_exists: bool = True, partition_size: typing.Union[int, NoneType] = None) -> azureml.dataprep.api.dataflow.Dataflow

Parameters

path

The path to the file(s) or folder(s) that you want to load and parse. It can either be a local path or an Azure Blob url. Globbing is supported. For example, you can use path = "./data*" to read all files with name starting with "data".

header

The mode in which header is promoted. The options are: PromoteHeadersMode.CONSTANTGROUPED, PromoteHeadersMode.GROUPED, PromoteHeadersMode.NONE, PromoteHeadersMode.UNGROUPED. The default is PromoteHeadersMode.NONE, which will not promote header. PromoteHeadersMode.CONSTANTGROUPED will assume all files have the same schema by promoting the first row of the first file as header, and dropping the first row of the rest of the files. PromoteHeadersMode.GROUPED will promote the first row of each file as header and aggregate the result. PromoteHeadersMode.UNGROUPED will promote only the first row of the first file as header.

encoding

The encoding of the files being read.

skip_rows

How many rows to skip in the file(s) being read.

skip_mode

The mode in which rows are skipped. The options are: SkipMode.NONE, SkipMode.UNGROUPED, SkipMode.GROUPED. SkipMode.NONE (Dafault) Do not skip lines. Note that, if skip_rows is porvided this is ignored and SkipMode.UNGROUPED is used instead. SkipMode.UNGROUPED will skip only for the first file. SkipMode.GROUPED will skip for every file.

comment

Character used to indicate a line is a comment instead of data in the files being read. Comment character has to be the first character of the row to be interpreted.

include_path

Whether to include a column containing the path from which the data was read. This is useful when you are reading multiple files, and might want to know which file a particular record is originated from, or to keep useful information in file path.

verify_exists

Checks that the file referenced exists and can be accessed by the current context. You can set this to False when creating Dataflows in an environment that does not have access to the data but that will be executed in an environment that does.

partition_size

The desired partition size in bytes. Text readers parallelize their work by splitting the input into partitions which can be worked on independently. This parameter makes it possible to customize the size of those partitions. The minimum accepted value is 4 MB (4 * 1024 * 1024).

Returns

A new Dataflow.

read_sql(data_source: DatabaseSource, query: str) -> azureml.dataprep.api.dataflow.Dataflow

Creates a new Dataflow that can read data from a Microsoft SQL or Azure SQL database by executing the query specified.

read_sql(data_source: DatabaseSource, query: str) -> azureml.dataprep.api.dataflow.Dataflow

Parameters

data_source

The details of the Microsoft SQL or Azure SQL database.

query

The query to execute to read data.

Returns

A new Dataflow.

read_postgresql(data_source: DatabaseSource, query: str) -> azureml.dataprep.api.dataflow.Dataflow

Creates a new Dataflow that can read data from a PostgreSQL database by executing the query specified.

read_postgresql(data_source: DatabaseSource, query: str) -> azureml.dataprep.api.dataflow.Dataflow

Parameters

data_source

The details of the PostgreSQL database.

query

The query to execute to read data.

Returns

A new Dataflow.

read_parquet_file(path: FilePath, include_path: bool = False, verify_exists: bool = True) -> azureml.dataprep.api.dataflow.Dataflow

Creates a new Dataflow with the operations required to read Parquet files.

read_parquet_file(path: FilePath, include_path: bool = False, verify_exists: bool = True) -> azureml.dataprep.api.dataflow.Dataflow

Parameters

path

The path to the file(s) or folder(s) that you want to load and parse. It can either be a local path or an Azure Blob url. Globbing is supported. For example, you can use path = "./data*" to read all files with name starting with "data".

include_path

Whether to include a column containing the path from which the data was read. This is useful when you are reading multiple files, and might want to know which file a particular record is originated from, or to keep useful information in file path.

verify_exists

Checks that the file referenced exists and can be accessed by the current context. You can set this to False when creating Dataflows in an environment that does not have access to the data but that will be executed in an environment that does.

Returns

A new Dataflow.

read_parquet_dataset(path: FilePath, include_path: bool = False) -> azureml.dataprep.api.dataflow.Dataflow

Creates a new Dataflow with the operations required to read Parquet Datasets.

read_parquet_dataset(path: FilePath, include_path: bool = False) -> azureml.dataprep.api.dataflow.Dataflow

Parameters

path

The path to the file(s) or folder(s) that you want to load and parse. It can either be a local path or an Azure Blob url. Globbing is supported. For example, you can use path = "./data*" to read all files with name starting with "data".

include_path

Whether to include a column containing the path from which the data was read. This is useful when you are reading multiple files, and might want to know which file a particular record is originated from, or to keep useful information in file path.

Returns

A new Dataflow.

Remarks

A Parquet Dataset is different from a Parquet file in that it could be a Folder containing a number of Parquet Files. It could also have a hierarchical structure that partitions the data by the value of a column. These more complex forms of Parquet data are produced commonly by Spark/HIVE. read_parquet_dataset will read these more complex datasets using pyarrow which handle complex Parquet layouts well. It will also handle single Parquet files, or folders full of only single Parquet files, though these are better read using read_parquet_file as it doesn't use pyarrow for reading and should be significantly faster than use pyarrow.

read_json(path: FilePath, encoding: azureml.dataprep.api.engineapi.typedefinitions.FileEncoding = <FileEncoding.UTF8: 0>, flatten_nested_arrays: bool = False, include_path: bool = False) -> azureml.dataprep.api.dataflow.Dataflow

Creates a new Dataflow with the operations required to read JSON files.

read_json(path: FilePath, encoding: azureml.dataprep.api.engineapi.typedefinitions.FileEncoding = <FileEncoding.UTF8: 0>, flatten_nested_arrays: bool = False, include_path: bool = False) -> azureml.dataprep.api.dataflow.Dataflow

Parameters

path

The path to the file(s) or folder(s) that you want to load and parse. It can either be a local path or an Azure Blob url. Globbing is supported. For example, you can use path = "./data*" to read all files with name starting with "data".

encoding

The encoding of the files being read.

flatten_nested_arrays

Property controlling program's handling of nested arrays. If you choose to flatten nested JSON arrays, it could result in a much larger number of rows.

include_path

Whether to include a column containing the path from which the data was read. This is useful when you are reading multiple files, and might want to know which file a particular record is originated from, or to keep useful information in file path.

Returns

A new Dataflow.

detect_file_format(path: FilePath) -> azureml.dataprep.api.builders.FileFormatBuilder

Analyzes the file(s) at the specified path and attempts to determine the type of file and the arguments required to read and parse it. The result is a FileFormatBuilder which contains the results of the analysis. This method may fail due to unsupported file format. And you should always inspect the returned builder to ensure that it is as expected.

detect_file_format(path: FilePath) -> azureml.dataprep.api.builders.FileFormatBuilder

Parameters

path

The path to the file(s) or folder(s) that you want to load and parse. It can either be a local path or an Azure Blob url.

Returns

A FileFormatBuilder. It can be modified and used as the input to a new Dataflow

auto_read_file(path: FilePath, include_path: bool = False) -> azureml.dataprep.api.dataflow.Dataflow

Analyzes the file(s) at the specified path and returns a new Dataflow containing the operations required to read and parse them. The type of the file and the arguments required to read it are inferred automatically. If this method fails or produces results not as expected, you may consider using detect_file_format(path: FilePath) -> azureml.dataprep.api.builders.FileFormatBuilder or other read methods with file types specified.

auto_read_file(path: FilePath, include_path: bool = False) -> azureml.dataprep.api.dataflow.Dataflow

Parameters

path

The path to the file(s) or folder(s) that you want to load and parse. It can either be a local path or an Azure Blob url. Globbing is supported. For example, you can use path = "./data*" to read all files with name starting with "data".

include_path

Whether to include a column containing the path from which the data was read. This is useful when you are reading multiple files, and might want to know which file a particular record is originated from, or to keep useful information in file path.

Returns

A new Dataflow.

read_pandas_dataframe

Creates a new Dataflow based on the contents of a given pandas DataFrame.

Parameters

df

pandas DataFrame to be parsed and cached at 'temp_folder'.

temp_folder

path to folder that 'df' contents will be written to.

overwrite_ok

If temp_folder exists, whether to allow its contents to be replaced.

in_memory

Whether to read the DataFrame from memory instead of persisting to disk.

Returns

Dataflow that uses the contents of cache_path as its datasource.

Remarks

If 'in_memory' is False, the contents of 'df' will be written to 'temp_folder' as a DataPrep DataSet. This folder must be accessible both from the calling script and from any environment where the Dataflow is executed.

If the Dataflow is guaranteed to be executed in the same context as the source DataFrame, the 'in_memory' argument can be set to True. In this case, the DataFrame does not need to be written out. This mode will usually result in better performance.

Note

The column names in the passed DataFrame must be unicode strings (or bytes). It is possible

to end up with Integer types column names after transposing a DataFrame. These can be

converted to strings using the command:

df.columns = df.columns.astype(str)

login()

login()

col(name: StrExpressionLike, record: azureml.dataprep.api.expressions.Expression = None) -> azureml.dataprep.api.expressions.RecordFieldExpression

Creates an expression that retrieves the value in the specified column from a record.

col(name: StrExpressionLike, record: azureml.dataprep.api.expressions.Expression = None) -> azureml.dataprep.api.expressions.RecordFieldExpression

Parameters

name

The name of the column.

Returns

An expression.

f_not(expression: azureml.dataprep.api.expressions.Expression) -> azureml.dataprep.api.expressions.Expression

Negates the specified expression.

f_not(expression: azureml.dataprep.api.expressions.Expression) -> azureml.dataprep.api.expressions.Expression

Parameters

expression

An expression.

Returns

The negated expression.

f_and(*expressions: typing.List[azureml.dataprep.api.expressions.Expression]) -> azureml.dataprep.api.expressions.Expression

Returns an expression that evaluates to true if all expressions are true; false otherwise. This expression supports short-circuit evaluation.

f_and(*expressions: typing.List[azureml.dataprep.api.expressions.Expression]) -> azureml.dataprep.api.expressions.Expression

Parameters

expressions

List of expressions, at least 2 expressions are required.

Returns

An expression that results in a boolean value.

f_or(*expressions: typing.List[azureml.dataprep.api.expressions.Expression]) -> azureml.dataprep.api.expressions.Expression

Returns an expression that evaluates to true if any expression is true; false otherwise. This expression supports short-circuit evaluation.

f_or(*expressions: typing.List[azureml.dataprep.api.expressions.Expression]) -> azureml.dataprep.api.expressions.Expression

Parameters

expressions

List of expressions, at least 2 expressions are required.

Returns

An expression that results in a boolean value.

cond(condition: azureml.dataprep.api.expressions.Expression, if_true: typing.Any, or_else: typing.Any) -> azureml.dataprep.api.expressions.Expression

Returns a conditional expression that will evaluate an input expression and return one value/expression if it evaluates to true or a different one if it doesn't.

cond(condition: azureml.dataprep.api.expressions.Expression, if_true: typing.Any, or_else: typing.Any) -> azureml.dataprep.api.expressions.Expression

Parameters

condition

The expression to evaluate.

if_true

The value/expression to use if the expression evaluates to True.

or_else

The value/expression to use if the expression evaluates to False.

Returns

A conditional expression.

round(value: azureml.dataprep.api.expressions.Expression, decimal_places: IntExpressionLike) -> azureml.dataprep.api.expressions.Expression

Creates an expression that will round the result of the expression specified to the desired number of decimal places.

round(value: azureml.dataprep.api.expressions.Expression, decimal_places: IntExpressionLike) -> azureml.dataprep.api.expressions.Expression

Parameters

value

An expression that returns the value to round.

decimal_places

The number of desired decimal places. Can be a value or an expression.

Returns

An expression that results in the rounded number.

trim_string(value: azureml.dataprep.api.expressions.Expression, trim_left: BoolExpressionLike = True, trim_right: BoolExpressionLike = True) -> azureml.dataprep.api.expressions.Expression

Creates an expression that will trim the string resulting from the expression specified.

trim_string(value: azureml.dataprep.api.expressions.Expression, trim_left: BoolExpressionLike = True, trim_right: BoolExpressionLike = True) -> azureml.dataprep.api.expressions.Expression

Parameters

value

An expression that returns the value to trim.

trim_left

Whether to trim from the beginning. Can be a value or an expression.

trim_right

Whether to trim from the end. Can be a value or an expression.

Returns

An expression that results in a trimmed string.

register_secrets(secrets: typing.Dict[str, str]) -> typing.List[azureml.dataprep.api.engineapi.typedefinitions.Secret]

Registers a set of secrets to be used during execution.

register_secrets(secrets: typing.Dict[str, str]) -> typing.List[azureml.dataprep.api.engineapi.typedefinitions.Secret]

Parameters

secrets

Dictionary of secret id to secret value.

register_secret(value: str, id: str = None) -> azureml.dataprep.api.engineapi.typedefinitions.Secret

Registers a secret to be used during execution.

register_secret(value: str, id: str = None) -> azureml.dataprep.api.engineapi.typedefinitions.Secret

Parameters

value

Value to keep secret. This won't be persisted with the package.

id

(Optional) Secret id to use. This will be persisted in the package. Default value is new Guid.

create_secret(id: str) -> azureml.dataprep.api.engineapi.typedefinitions.Secret

Creates a Secret. Secrets are used in remote data sources like MSSQLDataSource.

create_secret(id: str) -> azureml.dataprep.api.engineapi.typedefinitions.Secret

Parameters

id

Secret id to use. This will be persisted in package.

set_diagnostics_collection(send_diagnostics=True)

set_diagnostics_collection(send_diagnostics=True)

Parameters

send_diagnostics
default value: True

create_datetime(*values: azureml.dataprep.api.expressions.Expression) -> azureml.dataprep.api.expressions.Expression

Creates an expression that returns a datetime from the given list of date parts. The input values should be in this order: year, month, day, hour, minute, second. The values can be of string or numeric type. e.g., create_datetime(2019), create_datetime(2019, 2)

create_datetime(*values: azureml.dataprep.api.expressions.Expression) -> azureml.dataprep.api.expressions.Expression

Parameters

values

Date parts.

Returns

Created datetime.

get_stream_properties(value: azureml.dataprep.api.expressions.Expression) -> azureml.dataprep.api.expressions.Expression

Creates an expression that returns a set of properties (such as last modified time) of the stream. The properties can vary depending on the type of the stream.

get_stream_properties(value: azureml.dataprep.api.expressions.Expression) -> azureml.dataprep.api.expressions.Expression

Parameters

value

An expression that returns a stream.

Returns

A record containing the stream's properties.

get_stream_info(value: azureml.dataprep.api.expressions.Expression, workspace: <built-in function any>) -> azureml.dataprep.api.expressions.Expression

get_stream_info(value: azureml.dataprep.api.expressions.Expression, workspace: <built-in function any>) -> azureml.dataprep.api.expressions.Expression