DataOperationsCatalog.FilterRowsByKeyColumnFraction 方法

參考

定義

命名空間:: Microsoft.ML

組件:: Microsoft.ML.Data.dll

套件:: Microsoft.ML v3.0.1

套件:: Microsoft.ML v1.0.0

套件:: Microsoft.ML v1.1.0

套件:: Microsoft.ML v1.2.0

套件:: Microsoft.ML v1.3.1

套件:: Microsoft.ML v1.4.0

套件:: Microsoft.ML v1.5.5

套件:: Microsoft.ML v1.6.0

套件:: Microsoft.ML v1.7.0

套件:: Microsoft.ML v2.0.0

重要

部分資訊涉及發行前產品，在發行之前可能會有大幅修改。 Microsoft 對此處提供的資訊，不做任何明確或隱含的瑕疵擔保。

依資料行的值 KeyDataViewType 篩選資料集。

public Microsoft.ML.IDataView FilterRowsByKeyColumnFraction (Microsoft.ML.IDataView input, string columnName, double lowerBound = 0, double upperBound = 1);

member this.FilterRowsByKeyColumnFraction : Microsoft.ML.IDataView * string * double * double -> Microsoft.ML.IDataView

Public Function FilterRowsByKeyColumnFraction (input As IDataView, columnName As String, Optional lowerBound As Double = 0, Optional upperBound As Double = 1) As IDataView

參數

input: IDataView

輸入資料。

columnName: String

要用於篩選的資料行名稱。

lowerBound: Double

內含下限。

upperBound: Double

獨佔上限。

傳回

IDataView

範例

using System;
using System.Collections.Generic;
using Microsoft.ML;

namespace Samples.Dynamic
{
    /// <summary>
    /// Sample class showing how to use FilterRowsByKeyColumnFraction.
    /// </summary>
    public static class FilterRowsByKeyColumnFraction
    {
        public static void Example()
        {
            // Create a new context for ML.NET operations. It can be used for
            // exception tracking and logging, 
            // as a catalog of available operations and as the source of randomness.
            var mlContext = new MLContext();

            // Create a small dataset as an IEnumerable.
            var samples = new List<DataPoint>()
            {
                new DataPoint(){ Age = 21 },
                new DataPoint(){ Age = 40 },
                new DataPoint(){ Age = 38 },
                new DataPoint(){ Age = 22 },
                new DataPoint(){ Age = 40 },
                new DataPoint(){ Age = 40 },
                new DataPoint(){ Age = 22 },
                new DataPoint(){ Age = 21 }
            };

            // Convert training data to IDataView.
            var data = mlContext.Data.LoadFromEnumerable(samples);

            // Convert the uint values to KeyDataViewData
            var pipeline = mlContext.Transforms.Conversion.MapValueToKey("Age");
            var transformedData = pipeline.Fit(data).Transform(data);

            // Before we apply a filter, examine all the records in the dataset.
            var enumerable = mlContext.Data
                .CreateEnumerable<DataPoint>(transformedData, reuseRowObject: true);

            Console.WriteLine($"Age");
            foreach (var row in enumerable)
            {
                Console.WriteLine($"{row.Age}");
            }

            // Age
            // 
            //  1
            //  2
            //  3
            //  4
            //  2
            //  2
            //  4
            //  1

            // Now filter down to half the keys, choosing the lower half of values. 
            // For the keys we have the sorted values: 1 1 2 2 2 3 4 4.
            // Projected in the [0, 1[ interval as per: (key - 0.5)/(Count of Keys)
            // the values of the keys for our data would be:
            // 0.125 0.125 0.375 0.375 0.375 0.625 0.875 0.875
            // so the keys resulting from filtering in the [0, 0.5 [ interval are
            // the ones with normalized values 0.125 and 0.375, respectively keys 
            // with values 1 and 2.
            var filteredHalfData = mlContext.Data
                .FilterRowsByKeyColumnFraction(transformedData, columnName: "Age",
                lowerBound: 0, upperBound: 0.5);

            var filteredHalfEnumerable = mlContext.Data
                .CreateEnumerable<DataPoint>(filteredHalfData,
                reuseRowObject: true);

            Console.WriteLine($"Age");
            foreach (var row in filteredHalfEnumerable)
            {
                Console.WriteLine($"{row.Age}");
            }

            // Age
            // 
            //  1
            //  2
            //  2
            //  2
            //  1

            // As mentioned above, the normalized keys are:
            // 0.125 0.125 0.375 0.375 0.375 0.625 0.875 0.875
            // so the keys resulting from filtering in the [0.3, 0.6 [ interval are
            // the ones with normalized value 0.375, respectively key with
            // value = 2.
            var filteredMiddleData = mlContext.Data
                .FilterRowsByKeyColumnFraction(transformedData, columnName: "Age",
                lowerBound: 0.3, upperBound: 0.6);

            // Look at the data and observe that values above 2 have been filtered
            // out
            var filteredMiddleEnumerable = mlContext.Data
                .CreateEnumerable<DataPoint>(filteredMiddleData,
                reuseRowObject: true);

            Console.WriteLine($"Age");
            foreach (var row in filteredMiddleEnumerable)
            {
                Console.WriteLine($"{row.Age}");
            }

            // Age
            //
            //  2
            //  2
            //  2
        }

        private class DataPoint
        {
            public uint Age { get; set; }
        }
    }
}

備註

只保留滿足範圍條件的資料列：索引鍵資料行的值 (視為整個索引鍵 columnName 範圍的分數，) 必須介於 (內含) 與 upperBound (獨佔) 之間 lowerBound 。如果 columnName 是某些「穩定隨機化」取得的索引鍵資料行，例如雜湊，此篩選就很有用。

適用於

DataOperationsCatalog.FilterRowsByKeyColumnFraction 方法

定義

參數

傳回

範例

備註

適用於

意見反應

其他資源