ExtensionsCatalog.IndicateMissingValues ExtensionsCatalog.IndicateMissingValues ExtensionsCatalog.IndicateMissingValues Method

Definition

Overloads

IndicateMissingValues(TransformsCatalog, InputOutputColumnPair[]) IndicateMissingValues(TransformsCatalog, InputOutputColumnPair[]) IndicateMissingValues(TransformsCatalog, InputOutputColumnPair[])

Create a MissingValueIndicatorEstimator, which copies the data from the column specified in InputColumnName to a new column: OutputColumnName.

IndicateMissingValues(TransformsCatalog, String, String) IndicateMissingValues(TransformsCatalog, String, String) IndicateMissingValues(TransformsCatalog, String, String)

Create a MissingValueIndicatorEstimator, which scans the data from the column specified in inputColumnName and fills new column specified in outputColumnName with vector of bools where i-th bool has value of true if i-th element in column data has missing value and false otherwise.

IndicateMissingValues(TransformsCatalog, InputOutputColumnPair[]) IndicateMissingValues(TransformsCatalog, InputOutputColumnPair[]) IndicateMissingValues(TransformsCatalog, InputOutputColumnPair[])

Create a MissingValueIndicatorEstimator, which copies the data from the column specified in InputColumnName to a new column: OutputColumnName.

public static Microsoft.ML.Transforms.MissingValueIndicatorEstimator IndicateMissingValues (this Microsoft.ML.TransformsCatalog catalog, Microsoft.ML.InputOutputColumnPair[] columns);
static member IndicateMissingValues : Microsoft.ML.TransformsCatalog * Microsoft.ML.InputOutputColumnPair[] -> Microsoft.ML.Transforms.MissingValueIndicatorEstimator
<Extension()>
Public Function IndicateMissingValues (catalog As TransformsCatalog, columns As InputOutputColumnPair()) As MissingValueIndicatorEstimator

Parameters

catalog
TransformsCatalog TransformsCatalog TransformsCatalog

The transform's catalog.

columns
InputOutputColumnPair[]

The pairs of input and output columns. This estimator operates over data which is either scalar or vector of Single or Double.

Returns

Examples

using System;
using System.Collections.Generic;
using Microsoft.ML;
using Microsoft.ML.Data;

namespace Samples.Dynamic
{
    public static class IndicateMissingValuesMultiColumn
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging, 
            // as well as the source of randomness.
            var mlContext = new MLContext();

            // Get a small dataset as an IEnumerable and convert it to an IDataView.
            var samples = new List<DataPoint>()
            {
                new DataPoint(){ Features1 = new float[3] {1, 1, 0}, Features2 = new float[2] {1, 1} },
                new DataPoint(){ Features1 = new float[3] {0, float.NaN, 1}, Features2 = new float[2] {float.NaN, 1} },
                new DataPoint(){ Features1 = new float[3] {-1, float.NaN, -3}, Features2 = new float[2] {1, float.PositiveInfinity} },
            };
            var data = mlContext.Data.LoadFromEnumerable(samples);

            // IndicateMissingValues is used to create a boolean containing 'true' where the value in the 
            // input column is missing. For floats and doubles, missing values are NaN.
            // We can use an array of InputOutputColumnPair to apply the MissingValueIndicatorEstimator
            // to multiple columns in one pass over the data.
            var pipeline = mlContext.Transforms.IndicateMissingValues(new[] {
                new InputOutputColumnPair("MissingIndicator1", "Features1"),
                new InputOutputColumnPair("MissingIndicator2", "Features2")
            });

            // Now we can transform the data and look at the output to confirm the behavior of the estimator.
            // This operation doesn't actually evaluate data until we read the data below.
            var tansformer = pipeline.Fit(data);
            var transformedData = tansformer.Transform(data);

            // We can extract the newly created column as an IEnumerable of SampleDataTransformed, the class we define below.
            var rowEnumerable = mlContext.Data.CreateEnumerable<SampleDataTransformed>(transformedData, reuseRowObject: false);

            // And finally, we can write out the rows of the dataset, looking at the columns of interest.
            foreach (var row in rowEnumerable)
                Console.WriteLine($"Features1: [{string.Join(", ", row.Features1)}]\t MissingIndicator1: [{string.Join(", ", row.MissingIndicator1)}]\t " +
                    $"Features2: [{string.Join(", ", row.Features2)}]\t MissingIndicator2: [{string.Join(", ", row.MissingIndicator2)}]");

            // Expected output:
            // Features1: [1, 1, 0]     MissingIndicator1: [False, False, False]        Features2: [1, 1]       MissingIndicator2: [False, False]
            // Features1: [0, NaN, 1]   MissingIndicator1: [False, True, False]         Features2: [NaN, 1]     MissingIndicator2: [True, False]
            // Features1: [-1, NaN, -3]         MissingIndicator1: [False, True, False]         Features2: [1, ∞]       MissingIndicator2: [False, False]
        }

        private class DataPoint
        {
            [VectorType(3)]
            public float[] Features1 { get; set; }
            [VectorType(2)]
            public float[] Features2 { get; set; }
        }

        private sealed class SampleDataTransformed : DataPoint
        {
            public bool[] MissingIndicator1 { get; set; }
            public bool[] MissingIndicator2 { get; set; }

        }
    }
}

Remarks

This transform can operate over several columns.

IndicateMissingValues(TransformsCatalog, String, String) IndicateMissingValues(TransformsCatalog, String, String) IndicateMissingValues(TransformsCatalog, String, String)

Create a MissingValueIndicatorEstimator, which scans the data from the column specified in inputColumnName and fills new column specified in outputColumnName with vector of bools where i-th bool has value of true if i-th element in column data has missing value and false otherwise.

public static Microsoft.ML.Transforms.MissingValueIndicatorEstimator IndicateMissingValues (this Microsoft.ML.TransformsCatalog catalog, string outputColumnName, string inputColumnName = null);
static member IndicateMissingValues : Microsoft.ML.TransformsCatalog * string * string -> Microsoft.ML.Transforms.MissingValueIndicatorEstimator
<Extension()>
Public Function IndicateMissingValues (catalog As TransformsCatalog, outputColumnName As String, Optional inputColumnName As String = null) As MissingValueIndicatorEstimator

Parameters

catalog
TransformsCatalog TransformsCatalog TransformsCatalog

The transform's catalog.

outputColumnName
String String String

Name of the column resulting from the transformation of inputColumnName. This column's data type will be a vector of Boolean.

inputColumnName
String String String

Name of the column to copy the data from. This estimator operates over scalar or vector of Single or Double.

Returns

Examples

using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;

namespace Samples.Dynamic
{
    public static class IndicateMissingValues
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging, 
            // as well as the source of randomness.
            var mlContext = new MLContext();

            // Get a small dataset as an IEnumerable and convert it to an IDataView.
            var samples = new List<DataPoint>()
            {
                new DataPoint(){ Features = new float[3] {1, 1, 0} },
                new DataPoint(){ Features = new float[3] {0, float.NaN, 1} },
                new DataPoint(){ Features = new float[3] {-1, float.NaN, -3} },
            };
            var data = mlContext.Data.LoadFromEnumerable(samples);

            // IndicateMissingValues is used to create a boolean containing 'true' where the value in the 
            // input column is missing. For floats and doubles, missing values are represented as NaN.
            var pipeline = mlContext.Transforms.IndicateMissingValues("MissingIndicator", "Features");

            // Now we can transform the data and look at the output to confirm the behavior of the estimator.
            // This operation doesn't actually evaluate data until we read the data below.
            var tansformer = pipeline.Fit(data);
            var transformedData = tansformer.Transform(data);

            // We can extract the newly created column as an IEnumerable of SampleDataTransformed, the class we define below.
            var rowEnumerable = mlContext.Data.CreateEnumerable<SampleDataTransformed>(transformedData, reuseRowObject: false);

            // And finally, we can write out the rows of the dataset, looking at the columns of interest.
            foreach (var row in rowEnumerable)
                Console.WriteLine($"Features: [{string.Join(", ", row.Features)}]\t MissingIndicator: [{string.Join(", ", row.MissingIndicator)}]");

            // Expected output:
            // Features: [1, 1, 0]      MissingIndicator: [False, False, False]
            // Features: [0, NaN, 1]    MissingIndicator: [False, True, False]
            // Features: [-1, NaN, -3]  MissingIndicator: [False, True, False]
        }

        private class DataPoint
        {
            [VectorType(3)]
            public float[] Features { get; set; }
        }

        private sealed class SampleDataTransformed : DataPoint
        {
            public bool[] MissingIndicator { get; set; }
        }
    }
}

Applies to