ColumnProfile class

Definition

A ColumnProfile collects summary statistics on a particular column of data produced by a Dataflow.

ColumnProfile(values: typing.Dict[str, typing.Any] = None)
Inheritance
builtins.object
azureml.dataprep.api.engineapi.typedefinitions.ProfileResult
ColumnProfile

Variables

column_name
str
Name of column
type
FieldType
Type of values in column
min
Any
Minimum value
max
Any
Maximum value
count
int
Count of rows
missing_count
int
Count of rows with a missing value
not_missing_count
int
Count of rows with a value
error_count
int
Count of rows with an error value
percent_missing
float
Percent of the values that are missing
empty_count
int
Count of rows with empty string value
lower_quartile
float
Estimated 25th-percentile value
median
float
Estimated median value
upper_quartile
float
Estimated 75th-percentile value
mean
float
Mean
std
float
Standard deviation
variance
float
Variance
skewness
float
Skewness
kurtosis
float
Kurtosis
quantiles
Dict[float, float]
Dictionary of quantiles
value_counts
List[ValueCountEntry]
Counts of discrete values in the data; None if too many values.
type_counts
List[TypeCountEntry]
Counts of discrete types in the data.
histogram
List[HistogramBucket]
Histogram buckets showing the distribution of the data; None if data is non-numeric.
stype_counts
List[STypeCountEntry]
List of semantic type names and counts of values that matched. None if the profile did not contain semantic type counts. Can be an empty list when there were no matches.
whisker_top
float
WhiskerTop
whisker_bottom
float
WhiskerBottom

Methods

get_stats()

Return column stats.

get_stats()

Return column stats.

get_stats()

Attributes

histogram

The histogram for values in the column.

kurtosis

The kurtosis value for the column.

lower_quartile

The lower quartile value for the column.

max

The max value in the column.

mean

The mean value for the column.

median

The median value for the column.

min

The min value in the column.

name

(Deprecated. Use column_name instead.)

quantiles

The quartile values for the column.

skewness

The skewness value for the column.

std

The standard deviation for the column.

stype_counts

The count of each semantic type in the column.

type_counts

The count of each type in the column.

upper_quartile

The upper quartile value for the column.

value_counts

The count of each value in the column.

variance

The variance value for the column.