ConversionsExtensionsCatalog.Hash 方法

定義

多載

Hash(TransformsCatalog+ConversionTransforms, HashingEstimator+ColumnOptions[])

建立 HashingEstimator ,它會將輸入資料行的資料類型 InputColumnName 雜湊至新的資料行: Name

Hash(TransformsCatalog+ConversionTransforms, String, String, Int32, Int32)

建立 HashingEstimator ,它會將資料從 中指定的 inputColumnName 資料行雜湊至新的資料行: outputColumnName

Hash(TransformsCatalog+ConversionTransforms, HashingEstimator+ColumnOptions[])

建立 HashingEstimator ,它會將輸入資料行的資料類型 InputColumnName 雜湊至新的資料行: Name

public static Microsoft.ML.Transforms.HashingEstimator Hash (this Microsoft.ML.TransformsCatalog.ConversionTransforms catalog, params Microsoft.ML.Transforms.HashingEstimator.ColumnOptions[] columns);
static member Hash : Microsoft.ML.TransformsCatalog.ConversionTransforms * Microsoft.ML.Transforms.HashingEstimator.ColumnOptions[] -> Microsoft.ML.Transforms.HashingEstimator
<Extension()>
Public Function Hash (catalog As TransformsCatalog.ConversionTransforms, ParamArray columns As HashingEstimator.ColumnOptions()) As HashingEstimator

參數

catalog
TransformsCatalog.ConversionTransforms

轉換的目錄。

columns
HashingEstimator.ColumnOptions[]

包含輸入和輸出資料行名稱之估算器的進階選項。 此估算器會透過文字、數值、布林值、索引鍵和 DataViewRowId 資料類型運作。 新資料行的資料類型將是 的 UInt32 向量,或 UInt32 根據輸入資料行資料類型是向量或純量。

傳回

範例

using System;
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Transforms;

namespace Samples.Dynamic
{
    // This example demonstrates hashing of categorical string and integer data types by using Hash transform's 
    // advanced options API.
    public static class HashWithOptions
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext(seed: 1);

            // Get a small dataset as an IEnumerable.
            var rawData = new[] {
                new DataPoint() { Category = "MLB" , Age = 18 },
                new DataPoint() { Category = "NFL" , Age = 14 },
                new DataPoint() { Category = "NFL" , Age = 15 },
                new DataPoint() { Category = "MLB" , Age = 18 },
                new DataPoint() { Category = "MLS" , Age = 14 },
            };

            var data = mlContext.Data.LoadFromEnumerable(rawData);

            // Construct the pipeline that would hash the two columns and store the
            // results in new columns. The first transform hashes the string column
            // and the second transform hashes the integer column.
            //
            // Hashing is not a reversible operation, so there is no way to retrieve
            // the original value from the hashed value. Sometimes, for debugging,
            // or model explainability, users will need to know what values in the
            // original columns generated the values in the hashed columns, since
            // the algorithms will mostly use the hashed values for further
            // computations. The Hash method will preserve the mapping from the
            // original values to the hashed values in the Annotations of the newly
            // created column (column populated with the hashed values). 
            //
            // Setting the maximumNumberOfInverts parameters to -1 will preserve the
            // full map. If that parameter is left to the default 0 value, the
            // mapping is not preserved.
            var pipeline = mlContext.Transforms.Conversion.Hash(
                    new[]
                    {
                            new HashingEstimator.ColumnOptions(
                                "CategoryHashed",
                                "Category",
                                16,
                                useOrderedHashing: false,
                                maximumNumberOfInverts: -1),

                            new HashingEstimator.ColumnOptions(
                                "AgeHashed",
                                "Age",
                                8,
                                useOrderedHashing: false)
                    });

            // Let's fit our pipeline, and then apply it to the same data.
            var transformer = pipeline.Fit(data);
            var transformedData = transformer.Transform(data);

            // Convert the post transformation from the IDataView format to an
            // IEnumerable <TransformedData> for easy consumption.
            var convertedData = mlContext.Data.CreateEnumerable<
                TransformedDataPoint>(transformedData, true);

            Console.WriteLine("Category CategoryHashed\t Age\t AgeHashed");
            foreach (var item in convertedData)
                Console.WriteLine($"{item.Category}\t {item.CategoryHashed}\t\t  " +
                    $"{item.Age}\t {item.AgeHashed}");

            // Expected data after the transformation.
            //
            // Category CategoryHashed   Age     AgeHashed
            // MLB      36206            18      127
            // NFL      19015            14      62
            // NFL      19015            15      43
            // MLB      36206            18      127
            // MLS      6013             14      62

            // For the Category column, where we set the maximumNumberOfInverts
            // parameter, the names of the original categories, and their
            // correspondence with the generated hash values is preserved in the
            // Annotations in the format of indices and values.the indices array
            // will have the hashed values, and the corresponding element,
            // position -wise, in the values array will contain the original value. 
            //
            // See below for an example on how to retrieve the mapping. 
            var slotNames = new VBuffer<ReadOnlyMemory<char>>();
            transformedData.Schema["CategoryHashed"].Annotations.GetValue(
                "KeyValues", ref slotNames);

            var indices = slotNames.GetIndices();
            var categoryNames = slotNames.GetValues();

            for (int i = 0; i < indices.Length; i++)
                Console.WriteLine($"The original value of the {indices[i]} " +
                    $"category is {categoryNames[i]}");

            // Output Data
            // 
            // The original value of the 6012 category is MLS
            // The original value of the 19014 category is NFL
            // The original value of the 36205 category is MLB
        }

        public class DataPoint
        {
            public string Category { get; set; }
            public uint Age { get; set; }
        }

        public class TransformedDataPoint : DataPoint
        {
            public uint CategoryHashed { get; set; }
            public uint AgeHashed { get; set; }
        }

    }
}

備註

此轉換可以透過數個數據行運作。

適用於

Hash(TransformsCatalog+ConversionTransforms, String, String, Int32, Int32)

建立 HashingEstimator ,它會將資料從 中指定的 inputColumnName 資料行雜湊至新的資料行: outputColumnName

public static Microsoft.ML.Transforms.HashingEstimator Hash (this Microsoft.ML.TransformsCatalog.ConversionTransforms catalog, string outputColumnName, string inputColumnName = default, int numberOfBits = 31, int maximumNumberOfInverts = 0);
static member Hash : Microsoft.ML.TransformsCatalog.ConversionTransforms * string * string * int * int -> Microsoft.ML.Transforms.HashingEstimator
<Extension()>
Public Function Hash (catalog As TransformsCatalog.ConversionTransforms, outputColumnName As String, Optional inputColumnName As String = Nothing, Optional numberOfBits As Integer = 31, Optional maximumNumberOfInverts As Integer = 0) As HashingEstimator

參數

catalog
TransformsCatalog.ConversionTransforms

轉換的目錄。

outputColumnName
String

轉換所產生的 inputColumnName 資料行名稱。 此資料行的資料類型會是索引鍵的向量,或根據輸入資料行資料類型是向量或純量索引鍵的純量。

inputColumnName
String

要雜湊其資料的資料行名稱。 如果設定為 null ,則會將 的值 outputColumnName 當做來源使用。 此估算器會透過文字、數值、布林值、索引鍵或資料類型的向量或 DataViewRowId 純量運作。

numberOfBits
Int32

要雜湊到的位數。 必須介於 1 到 31 之間,包含。

maximumNumberOfInverts
Int32

在雜湊處理期間,我們會建構原始值與所產生雜湊值之間的對應。 原始值的文字表示會儲存在新資料行之批註的位置名稱中。因此,雜湊可以將許多初始值對應至一個。 maximumNumberOfInverts指定對應至應保留之雜湊的相異輸入值數目上限。 0 不會保留任何輸入值。 -1 會保留與每個雜湊對應的所有輸入值。

傳回

範例

using System;
using Microsoft.ML;
using Microsoft.ML.Data;

namespace Samples.Dynamic
{
    // This example demonstrates hashing of categorical string and integer data types.
    public static class Hash
    {
        public static void Example()
        {
            // Create a new ML context, for ML.NET operations. It can be used for
            // exception tracking and logging, as well as the source of randomness.
            var mlContext = new MLContext(seed: 1);

            // Get a small dataset as an IEnumerable.
            var rawData = new[] {
                new DataPoint() { Category = "MLB" , Age = 18 },
                new DataPoint() { Category = "NFL" , Age = 14 },
                new DataPoint() { Category = "NFL" , Age = 15 },
                new DataPoint() { Category = "MLB" , Age = 18 },
                new DataPoint() { Category = "MLS" , Age = 14 },
            };

            var data = mlContext.Data.LoadFromEnumerable(rawData);

            // Construct the pipeline that would hash the two columns and store the
            // results in new columns. The first transform hashes the string column
            // and the second transform hashes the integer column.
            //
            // Hashing is not a reversible operation, so there is no way to retrieve
            // the original value from the hashed value. Sometimes, for debugging,
            // or model explainability, users will need to know what values in the
            // original columns generated the values in the hashed columns, since
            // the algorithms will mostly use the hashed values for further
            // computations. The Hash method will preserve the mapping from the
            // original values to the hashed values in the Annotations of the newly
            // created column (column populated with the hashed values). 
            //
            // Setting the maximumNumberOfInverts parameters to -1 will preserve the
            // full map. If that parameter is left to the default 0 value, the
            // mapping is not preserved.
            var pipeline = mlContext.Transforms.Conversion.Hash("CategoryHashed",
                "Category", numberOfBits: 16, maximumNumberOfInverts: -1)
                .Append(mlContext.Transforms.Conversion.Hash("AgeHashed", "Age",
                numberOfBits: 8));

            // Let's fit our pipeline, and then apply it to the same data.
            var transformer = pipeline.Fit(data);
            var transformedData = transformer.Transform(data);

            // Convert the post transformation from the IDataView format to an
            // IEnumerable <TransformedData> for easy consumption.
            var convertedData = mlContext.Data.CreateEnumerable<
                TransformedDataPoint>(transformedData, true);

            Console.WriteLine("Category CategoryHashed\t Age\t AgeHashed");
            foreach (var item in convertedData)
                Console.WriteLine($"{item.Category}\t {item.CategoryHashed}\t\t  " +
                    $"{item.Age}\t {item.AgeHashed}");

            // Expected data after the transformation.
            //
            // Category CategoryHashed   Age     AgeHashed
            // MLB      36206            18      127
            // NFL      19015            14      62
            // NFL      19015            15      43
            // MLB      36206            18      127
            // MLS      6013             14      62

            // For the Category column, where we set the maximumNumberOfInverts
            // parameter, the names of the original categories, and their
            // correspondence with the generated hash values is preserved in the
            // Annotations in the format of indices and values.the indices array
            // will have the hashed values, and the corresponding element,
            // position -wise, in the values array will contain the original value. 
            //
            // See below for an example on how to retrieve the mapping. 
            var slotNames = new VBuffer<ReadOnlyMemory<char>>();
            transformedData.Schema["CategoryHashed"].Annotations.GetValue(
                "KeyValues", ref slotNames);

            var indices = slotNames.GetIndices();
            var categoryNames = slotNames.GetValues();

            for (int i = 0; i < indices.Length; i++)
                Console.WriteLine($"The original value of the {indices[i]} " +
                    $"category is {categoryNames[i]}");

            // Output Data
            // 
            // The original value of the 6012 category is MLS
            // The original value of the 19014 category is NFL
            // The original value of the 36205 category is MLB
        }

        public class DataPoint
        {
            public string Category { get; set; }
            public uint Age { get; set; }
        }

        public class TransformedDataPoint : DataPoint
        {
            public uint CategoryHashed { get; set; }
            public uint AgeHashed { get; set; }
        }

    }
}

適用於