DataProfile class

Definition

A DataProfile collects summary statistics on the data produced by a Dataflow.

DataProfile()
Inheritance
builtins.object
DataProfile

Variables

columns
Dict[str, ColumnProfile]
Profile information for each result column.

Methods

compare(other_profile, include_columns=None, exclude_columns=None, histogram_compare_method=<HistogramCompareMethod.WASSERSTEIN: 0>)

Compares the current profile with other_dataset profile. With the exception of Histogram difference, all are subtract '-' operations. For histogram difference, it is the statistical distance scaled [0, ∞]. If there are no histograms, the default value is None.

get_columns()
to_pandas_dataframe

compare(other_profile, include_columns=None, exclude_columns=None, histogram_compare_method=<HistogramCompareMethod.WASSERSTEIN: 0>)

Compares the current profile with other_dataset profile. With the exception of Histogram difference, all are subtract '-' operations. For histogram difference, it is the statistical distance scaled [0, ∞]. If there are no histograms, the default value is None.

compare(other_profile, include_columns=None, exclude_columns=None, histogram_compare_method=<HistogramCompareMethod.WASSERSTEIN: 0>)

Parameters

other_profile
DataProfile

Another data profile for comparison.

include_columns
List[str]

List of column names to be included in comparison.

default value: None
exclude_columns
List[str]

List of column names to be excluded in comparison.

default value: None
histogram_compare_method
HistogramCompareMethod

Enum describing the method.

default value: HistogramCompareMethod.WASSERSTEIN

Returns

Difference of the profiles.

Return type

DataProfileDifference

get_columns()

get_columns()

to_pandas_dataframe

Attributes

dtypes

Column data types.

Returns

A dictionary, where key is the column name and value is FieldType.

row_count

Count of rows in this DataProfile.

Returns

Count of rows.

Return type

int

shape

Shape of the data produced by the Dataflow.

Returns

Tuple of row count and column count.

stype_counts

Columns with semantic types found, each with a list of the found semantic types.

Returns

A dictionary, where key is the column name and value is a list of STypeCountEntry.

Remarks

Only columns where semantic types were found are included in the dictionary, which means the lists are never empty. The lists are each ordered descending by the count of values found that matched the semantic type.