DataOperationsCatalog.LoadFromEnumerable Method

Definition

Overloads

LoadFromEnumerable<TRow>(IEnumerable<TRow>, SchemaDefinition)

Create a new IDataView over an enumerable of the items of user-defined type. The user maintains ownership of the data and the resulting data view will never alter the contents of the data. Since IDataView is assumed to be immutable, the user is expected to support multiple enumerations of the data that would return the same results, unless the user knows that the data will only be cursored once.

One typical usage for streaming data view could be: create the data view that lazily loads data as needed, then apply pre-trained transformations to it and cursor through it for transformation results.

LoadFromEnumerable<TRow>(IEnumerable<TRow>, DataViewSchema)

Create a new IDataView over an enumerable of the items of user-defined type using the provided DataViewSchema, which might contain more information about the schema than the type can capture.

LoadFromEnumerable<TRow>(IEnumerable<TRow>, SchemaDefinition)

Create a new IDataView over an enumerable of the items of user-defined type. The user maintains ownership of the data and the resulting data view will never alter the contents of the data. Since IDataView is assumed to be immutable, the user is expected to support multiple enumerations of the data that would return the same results, unless the user knows that the data will only be cursored once.

One typical usage for streaming data view could be: create the data view that lazily loads data as needed, then apply pre-trained transformations to it and cursor through it for transformation results.

public Microsoft.ML.IDataView LoadFromEnumerable<TRow> (System.Collections.Generic.IEnumerable<TRow> data, Microsoft.ML.Data.SchemaDefinition schemaDefinition = null) where TRow : class;
member this.LoadFromEnumerable : seq<'Row (requires 'Row : null)> * Microsoft.ML.Data.SchemaDefinition -> Microsoft.ML.IDataView (requires 'Row : null)

Type Parameters

TRow

The user-defined item type.

Parameters

data
IEnumerable<TRow>

The enumerable data containing type TRow to convert to anIDataView.

schemaDefinition
SchemaDefinition

The optional schema definition of the data view to create. If null, the schema definition is inferred from TRow.

Returns

The constructed IDataView.

Examples

using System;
using System.Collections.Generic;
using Microsoft.ML;
using Microsoft.ML.Data;

namespace Samples.Dynamic
{
    public static class LoadFromEnumerable
    {
        // Creating IDataView from IEnumerable, and setting the size of the vector
        // at runtime. When the data model is defined through types, setting the
        // size of the vector is done through the VectorType annotation. When the
        // size of the data is not known at compile time, the Schema can be directly
        // modified at runtime and the size of the vector set there. This is
        // important, because most of the ML.NET trainers require the Features
        // vector to be of known size. 
        public static void Example()
        {
            // Create a new context for ML.NET operations. It can be used for
            // exception tracking and logging, as a catalog of available operations
            // and as the source of randomness.
            var mlContext = new MLContext();

            // Get a small dataset as an IEnumerable.
            IEnumerable<DataPointVector> enumerableKnownSize = new DataPointVector[]
            {
               new DataPointVector{ Features = new float[]{ 1.2f, 3.4f, 4.5f, 3.2f,
                   7,5f } },

               new DataPointVector{ Features = new float[]{ 4.2f, 3.4f, 14.65f,
                   3.2f, 3,5f } },

               new DataPointVector{ Features = new float[]{ 1.6f, 3.5f, 4.5f, 6.2f,
                   3,5f } },

            };

            // Load dataset into an IDataView. 
            IDataView data = mlContext.Data.LoadFromEnumerable(enumerableKnownSize);
            var featureColumn = data.Schema["Features"].Type as VectorDataViewType;
            // Inspecting the schema
            Console.WriteLine($"Is the size of the Features column known: " +
                $"{featureColumn.IsKnownSize}.\nSize: {featureColumn.Size}");

            // Preview
            //
            // Is the size of the Features column known? True.
            // Size: 5.

            // If the size of the vector is unknown at compile time, it can be set 
            // at runtime.
            IEnumerable<DataPoint> enumerableUnknownSize = new DataPoint[]
            {
               new DataPoint{ Features = new float[]{ 1.2f, 3.4f, 4.5f } },
               new DataPoint{ Features = new float[]{ 4.2f, 3.4f, 1.6f } },
               new DataPoint{ Features = new float[]{ 1.6f, 3.5f, 4.5f } },
            };

            // The feature dimension (typically this will be the Count of the array 
            // of the features vector known at runtime).
            int featureDimension = 3;
            var definedSchema = SchemaDefinition.Create(typeof(DataPoint));
            featureColumn = definedSchema["Features"]
                .ColumnType as VectorDataViewType;

            Console.WriteLine($"Is the size of the Features column known: " +
                $"{featureColumn.IsKnownSize}.\nSize: {featureColumn.Size}");

            // Preview
            //
            // Is the size of the Features column known? False.
            // Size: 0.

            // Set the column type to be a known-size vector.
            var vectorItemType = ((VectorDataViewType)definedSchema[0].ColumnType)
                .ItemType;
            definedSchema[0].ColumnType = new VectorDataViewType(vectorItemType,
                featureDimension);

            // Read the data into an IDataView with the modified schema supplied in
            IDataView data2 = mlContext.Data
                .LoadFromEnumerable(enumerableUnknownSize, definedSchema);

            featureColumn = data2.Schema["Features"].Type as VectorDataViewType;
            // Inspecting the schema
            Console.WriteLine($"Is the size of the Features column known: " +
                $"{featureColumn.IsKnownSize}.\nSize: {featureColumn.Size}");

            // Preview
            //
            // Is the size of the Features column known? True. 
            // Size: 3.
        }
    }

    public class DataPoint
    {
        public float[] Features { get; set; }
    }

    public class DataPointVector
    {
        [VectorType(5)]
        public float[] Features { get; set; }
    }
}
using System;
using System.Collections.Generic;
using Microsoft.ML;

namespace Samples.Dynamic
{
    public static class DataViewEnumerable
    {
        // A simple case of creating IDataView from
        //IEnumerable.
        public static void Example()
        {
            // Create a new context for ML.NET operations. It can be used for
            // exception tracking and logging,
            // as a catalog of available operations and as the source of randomness.
            var mlContext = new MLContext();

            // Get a small dataset as an IEnumerable.
            IEnumerable<SampleTemperatureData> enumerableOfData =
                GetSampleTemperatureData(5);

            // Load dataset into an IDataView. 
            IDataView data = mlContext.Data.LoadFromEnumerable(enumerableOfData);

            // We can now examine the records in the IDataView. We first create an
        // enumerable of rows in the IDataView.
            var rowEnumerable = mlContext.Data
                .CreateEnumerable<SampleTemperatureData>(data,
                reuseRowObject: true);

            // SampleTemperatureDataWithLatitude has the definition of a Latitude
        // column of type float. We can use the parameter ignoreMissingColumns
        // to true to ignore any missing columns in the IDataView. The produced
        // enumerable will have the Latitude field set to the default for the
        // data type, in this case 0. 
            var rowEnumerableIgnoreMissing = mlContext.Data
                .CreateEnumerable<SampleTemperatureDataWithLatitude>(data, 
                reuseRowObject: true, ignoreMissingColumns: true);

            Console.WriteLine($"Date\tTemperature");
            foreach (var row in rowEnumerable)
                Console.WriteLine(
                    $"{row.Date.ToString("d")}\t{row.Temperature}");

            // Expected output:
            //  Date    Temperature
            //  1/2/2012        36
            //  1/3/2012        36
            //  1/4/2012        34
            //  1/5/2012        35
            //  1/6/2012        35

            Console.WriteLine($"Date\tTemperature\tLatitude");
            foreach (var row in rowEnumerableIgnoreMissing)
                Console.WriteLine($"{row.Date.ToString("d")}\t{row.Temperature}" 
                    + $"\t{row.Latitude}");

            // Expected output:
            //  Date    Temperature     Latitude
            //  1/2/2012        36      0
            //  1/3/2012        36      0
            //  1/4/2012        34      0
            //  1/5/2012        35      0
            //  1/6/2012        35      0
        }

        private class SampleTemperatureData
        {
            public DateTime Date { get; set; }
            public float Temperature { get; set; }
        }
        
        private class SampleTemperatureDataWithLatitude
        {
            public float Latitude { get; set; }
            public DateTime Date { get; set; }
            public float Temperature { get; set; }
        }
        
        /// <summary>
        /// Get a fake temperature dataset.
        /// </summary>
        /// <param name="exampleCount">The number of examples to return.</param>
        /// <returns>An enumerable of <see cref="SampleTemperatureData"/>.</returns>
        private static IEnumerable<SampleTemperatureData> GetSampleTemperatureData(
            int exampleCount)

        {
            var rng = new Random(1234321);
            var date = new DateTime(2012, 1, 1);
            float temperature = 39.0f;

            for (int i = 0; i < exampleCount; i++)
            {
                date = date.AddDays(1);
                temperature += rng.Next(-5, 5);
                yield return new SampleTemperatureData { Date = date, Temperature = 
                    temperature };

            }
        }
    }
}

LoadFromEnumerable<TRow>(IEnumerable<TRow>, DataViewSchema)

Create a new IDataView over an enumerable of the items of user-defined type using the provided DataViewSchema, which might contain more information about the schema than the type can capture.

public Microsoft.ML.IDataView LoadFromEnumerable<TRow> (System.Collections.Generic.IEnumerable<TRow> data, Microsoft.ML.DataViewSchema schema) where TRow : class;
member this.LoadFromEnumerable : seq<'Row (requires 'Row : null)> * Microsoft.ML.DataViewSchema -> Microsoft.ML.IDataView (requires 'Row : null)
Public Function LoadFromEnumerable(Of TRow As Class) (data As IEnumerable(Of TRow), schema As DataViewSchema) As IDataView

Type Parameters

TRow

The user-defined item type.

Parameters

data
IEnumerable<TRow>

The enumerable data containing type TRow to convert to an IDataView.

schema
DataViewSchema

The schema of the returned IDataView.

Returns

An IDataView with the given schema.

Remarks

The user maintains ownership of the data and the resulting data view will never alter the contents of the data. Since IDataView is assumed to be immutable, the user is expected to support multiple enumerations of the data that would return the same results, unless the user knows that the data will only be cursored once. One typical usage for streaming data view could be: create the data view that lazily loads data as needed, then apply pre-trained transformations to it and cursor through it for transformation results. One practical usage of this would be to supply the feature column names through the DataViewSchema.Annotations.

Applies to