PcaCatalog.RandomizedPca Method

Definition

Overloads

RandomizedPca(AnomalyDetectionCatalog+AnomalyDetectionTrainers, RandomizedPcaTrainer+Options)

Create RandomizedPcaTrainer with advanced options, which trains an approximate principal component analysis (PCA) model using randomized singular value decomposition (SVD) algorithm.

RandomizedPca(AnomalyDetectionCatalog+AnomalyDetectionTrainers, String, String, Int32, Int32, Boolean, Nullable<Int32>)

Create RandomizedPcaTrainer, which trains an approximate principal component analysis (PCA) model using randomized singular value decomposition (SVD) algorithm.

RandomizedPca(AnomalyDetectionCatalog+AnomalyDetectionTrainers, RandomizedPcaTrainer+Options)

Create RandomizedPcaTrainer with advanced options, which trains an approximate principal component analysis (PCA) model using randomized singular value decomposition (SVD) algorithm.

public static Microsoft.ML.Trainers.RandomizedPcaTrainer RandomizedPca (this Microsoft.ML.AnomalyDetectionCatalog.AnomalyDetectionTrainers catalog, Microsoft.ML.Trainers.RandomizedPcaTrainer.Options options);
static member RandomizedPca : Microsoft.ML.AnomalyDetectionCatalog.AnomalyDetectionTrainers * Microsoft.ML.Trainers.RandomizedPcaTrainer.Options -> Microsoft.ML.Trainers.RandomizedPcaTrainer

Parameters

catalog
AnomalyDetectionCatalog.AnomalyDetectionTrainers

The anomaly detection catalog trainer object.

options
RandomizedPcaTrainer.Options

Advanced options to the algorithm.

Returns

Examples

using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;

namespace Samples.Dynamic.Trainers.AnomalyDetection
{
    public static class RandomizedPcaSampleWithOptions
    {
        public static void Example()
        {
            // Create a new context for ML.NET operations. It can be used for
            // exception tracking and logging, as a catalog of available operations
            // and as the source of randomness. Setting the seed to a fixed number
            // in this example to make outputs deterministic.
            var mlContext = new MLContext(seed: 0);

            // Training data.
            var samples = new List<DataPoint>()
            {
                new DataPoint(){ Features = new float[3] {0, 2, 1} },
                new DataPoint(){ Features = new float[3] {0, 2, 3} },
                new DataPoint(){ Features = new float[3] {0, 2, 4} },
                new DataPoint(){ Features = new float[3] {0, 2, 1} },
                new DataPoint(){ Features = new float[3] {0, 2, 2} },
                new DataPoint(){ Features = new float[3] {0, 2, 3} },
                new DataPoint(){ Features = new float[3] {0, 2, 4} },
                new DataPoint(){ Features = new float[3] {1, 0, 0} }
            };

            // Convert the List<DataPoint> to IDataView, a consumble format to
            // ML.NET functions.
            var data = mlContext.Data.LoadFromEnumerable(samples);

            var options = new Microsoft.ML.Trainers.RandomizedPcaTrainer.Options()
            {
                FeatureColumnName = nameof(DataPoint.Features),
                Rank = 1,
                Seed = 10,
            };

            // Create an anomaly detector. Its underlying algorithm is randomized
            // PCA.
            var pipeline = mlContext.AnomalyDetection.Trainers.RandomizedPca(
                options);

            // Train the anomaly detector.
            var model = pipeline.Fit(data);

            // Apply the trained model on the training data.
            var transformed = model.Transform(data);

            // Read ML.NET predictions into IEnumerable<Result>.
            var results = mlContext.Data.CreateEnumerable<Result>(transformed,
                reuseRowObject: false).ToList();

            // Let's go through all predictions.
            for (int i = 0; i < samples.Count; ++i)
            {
                // The i-th example's prediction result.
                var result = results[i];

                // The i-th example's feature vector in text format.
                var featuresInText = string.Join(',', samples[i].Features);

                if (result.PredictedLabel)
                    // The i-th sample is predicted as an outlier.
                    Console.WriteLine("The {0}-th example with features [{1}] is" +
                        "an outlier with a score of being outlier {2}", i,
                        featuresInText, result.Score);
                else
                    // The i-th sample is predicted as an inlier.
                    Console.WriteLine("The {0}-th example with features [{1}] is" +
                        "an inlier with a score of being outlier {2}",
                        i, featuresInText, result.Score);
            }
            // Lines printed out should be
            // The 0 - th example with features[0, 2, 1] isan inlier with a score of being outlier 0.2264826
            // The 1 - th example with features[0, 2, 3] isan inlier with a score of being outlier 0.1739471
            // The 2 - th example with features[0, 2, 4] isan inlier with a score of being outlier 0.05711612
            // The 3 - th example with features[0, 2, 1] isan inlier with a score of being outlier 0.2264826
            // The 4 - th example with features[0, 2, 2] isan inlier with a score of being outlier 0.3868995
            // The 5 - th example with features[0, 2, 3] isan inlier with a score of being outlier 0.1739471
            // The 6 - th example with features[0, 2, 4] isan inlier with a score of being outlier 0.05711612
            // The 7 - th example with features[1, 0, 0] isan outlier with a score of being outlier 0.6260795
        }

        // Example with 3 feature values. A training data set is a collection of
        // such examples.
        private class DataPoint
        {
            [VectorType(3)]
            public float[] Features { get; set; }
        }

        // Class used to capture prediction of DataPoint.
        private class Result
        {
            // Outlier gets true while inlier has false.
            public bool PredictedLabel { get; set; }
            // Inlier gets smaller score. Score is between 0 and 1.
            public float Score { get; set; }
        }
    }
}

Remarks

By default the threshold used to determine the label of a data point based on the predicted score is 0.5. Scores range from 0 to 1. A data point with predicted score higher than 0.5 is considered an outlier. Use ChangeModelThreshold<TModel>(AnomalyPredictionTransformer<TModel>, Single) to change this threshold.

RandomizedPca(AnomalyDetectionCatalog+AnomalyDetectionTrainers, String, String, Int32, Int32, Boolean, Nullable<Int32>)

Create RandomizedPcaTrainer, which trains an approximate principal component analysis (PCA) model using randomized singular value decomposition (SVD) algorithm.

public static Microsoft.ML.Trainers.RandomizedPcaTrainer RandomizedPca (this Microsoft.ML.AnomalyDetectionCatalog.AnomalyDetectionTrainers catalog, string featureColumnName = "Features", string exampleWeightColumnName = null, int rank = 20, int oversampling = 20, bool ensureZeroMean = true, Nullable<int> seed = null);
static member RandomizedPca : Microsoft.ML.AnomalyDetectionCatalog.AnomalyDetectionTrainers * string * string * int * int * bool * Nullable<int> -> Microsoft.ML.Trainers.RandomizedPcaTrainer
<Extension()>
Public Function RandomizedPca (catalog As AnomalyDetectionCatalog.AnomalyDetectionTrainers, Optional featureColumnName As String = "Features", Optional exampleWeightColumnName As String = null, Optional rank As Integer = 20, Optional oversampling As Integer = 20, Optional ensureZeroMean As Boolean = true, Optional seed As Nullable(Of Integer) = null) As RandomizedPcaTrainer

Parameters

catalog
AnomalyDetectionCatalog.AnomalyDetectionTrainers

The anomaly detection catalog trainer object.

featureColumnName
String

The name of the feature column. The column data must be a known-sized vector of Single.

exampleWeightColumnName
String

The name of the example weight column (optional). To use the weight column, the column data must be of type Single.

rank
Int32

The number of components in the PCA.

oversampling
Int32

Oversampling parameter for randomized PCA training.

ensureZeroMean
Boolean

If enabled, data is centered to be zero mean.

seed
Nullable<Int32>

The seed for random number generation.

Returns

Examples

using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;

namespace Samples.Dynamic.Trainers.AnomalyDetection
{
    public static class RandomizedPcaSample
    {
        public static void Example()
        {
            // Create a new context for ML.NET operations. It can be used for except
            // ion tracking and logging, as a catalog of available operations and as
            // the source of randomness. Setting the seed to a fixed number in this
            // example to make outputs deterministic.
            var mlContext = new MLContext(seed: 0);

            // Training data.
            var samples = new List<DataPoint>()
            {
                new DataPoint(){ Features = new float[3] {0, 2, 1} },
                new DataPoint(){ Features = new float[3] {0, 2, 1} },
                new DataPoint(){ Features = new float[3] {0, 2, 1} },
                new DataPoint(){ Features = new float[3] {0, 1, 2} },
                new DataPoint(){ Features = new float[3] {0, 2, 1} },
                new DataPoint(){ Features = new float[3] {2, 0, 0} }
            };

            // Convert the List<DataPoint> to IDataView, a consumble format to
            // ML.NET functions.
            var data = mlContext.Data.LoadFromEnumerable(samples);

            // Create an anomaly detector. Its underlying algorithm is randomized
            // PCA.
            var pipeline = mlContext.AnomalyDetection.Trainers.RandomizedPca(
                featureColumnName: nameof(DataPoint.Features), rank: 1,
                    ensureZeroMean: false);

            // Train the anomaly detector.
            var model = pipeline.Fit(data);

            // Apply the trained model on the training data.
            var transformed = model.Transform(data);

            // Read ML.NET predictions into IEnumerable<Result>.
            var results = mlContext.Data.CreateEnumerable<Result>(transformed,
                reuseRowObject: false).ToList();

            // Let's go through all predictions.
            for (int i = 0; i < samples.Count; ++i)
            {
                // The i-th example's prediction result.
                var result = results[i];

                // The i-th example's feature vector in text format.
                var featuresInText = string.Join(',', samples[i].Features);

                if (result.PredictedLabel)
                    // The i-th sample is predicted as an outlier.
                    Console.WriteLine("The {0}-th example with features [{1}] is " +
                        "an outlier with a score of being inlier {2}", i,
                            featuresInText, result.Score);
                else
                    // The i-th sample is predicted as an inlier.
                    Console.WriteLine("The {0}-th example with features [{1}] is " +
                        "an inlier with a score of being inlier {2}", i,
                        featuresInText, result.Score);
            }
            // Lines printed out should be
            // The 0 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.1101028
            // The 1 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.1101028
            // The 2 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.1101028
            // The 3 - th example with features[0, 1, 2] is an outlier with a score of being outlier 0.5082728
            // The 4 - th example with features[0, 2, 1] is an inlier with a score of being outlier 0.1101028
            // The 5 - th example with features[2, 0, 0] is an outlier with a score of being outlier 1
        }

        // Example with 3 feature values. A training data set is a collection of
        // such examples.
        private class DataPoint
        {
            [VectorType(3)]
            public float[] Features { get; set; }
        }

        // Class used to capture prediction of DataPoint.
        private class Result
        {
            // Outlier gets true while inlier has false.
            public bool PredictedLabel { get; set; }
            // Inlier gets smaller score. Score is between 0 and 1.
            public float Score { get; set; }
        }
    }
}

Remarks

By default the threshold used to determine the label of a data point based on the predicted score is 0.5. Scores range from 0 to 1. A data point with predicted score higher than 0.5 is considered an outlier. Use ChangeModelThreshold<TModel>(AnomalyPredictionTransformer<TModel>, Single) to change this threshold.

Applies to