DataViewRow Class

Definition

A logical row of data. May be a row of an IDataView or a stand-alone row.

public abstract class DataViewRow : IDisposable
type DataViewRow = class
    interface IDisposable
Public MustInherit Class DataViewRow
Implements IDisposable
Inheritance
DataViewRow
Derived
Implements

Constructors

DataViewRow()

Properties

Batch

This provides a means for reconciling multiple rows that have been produced generally from GetRowCursorSet(IEnumerable<DataViewSchema.Column>, Int32, Random). When getting a set, there is a need to, while allowing parallel processing to proceed, always have an aim that the original order should be recoverable. Note, whether or not a user cares about that original order in one's specific application is another story altogether (most callers of this as a practical matter do not, otherwise they would not call it), but at least in principle it should be possible to reconstruct the original order one would get from an identically configured GetRowCursor(IEnumerable<DataViewSchema.Column>, Random). So: for any cursor implementation, batch numbers should be non-decreasing. Furthermore, any given batch number should only appear in one of the cursors as returned by GetRowCursorSet(IEnumerable<DataViewSchema.Column>, Int32, Random). In this way, order is determined by batch number. An operation that reconciles these cursors to produce a consistent single cursoring, could do so by drawing from the single cursor, among all cursors in the set, that has the smallest batch number available.

Note that there is no suggestion that the batches for a particular entry will be consistent from cursoring to cursoring, except for the consistency in resulting in the same overall ordering. The same entry could have different batch numbers from one cursoring to another. There is also no requirement that any given batch number must appear, at all. It is merely a mechanism for recovering ordering from a possibly arbitrary partitioning of the data. It also follows from this, of course, that considering the batch to be a property of the data is completely invalid.

Position

This is incremented when the underlying contents changes, giving clients a way to detect change. It should be -1 when the object is in a state where values cannot be fetched. In particular, for an DataViewRowCursor, this will be before MoveNext() if ever called for the first time, or after the first time MoveNext() is called and returns false.

Note that this position is not position within the underlying data, but position of this cursor only. If one, for example, opened a set of parallel streaming cursors, or a shuffled cursor, each such cursor's first valid entry would always have position 0.

Schema

Gets a Schema, which provides name and type information for variables (i.e., columns in ML.NET's type system) stored in this row.

Methods

Dispose()

Implementation of dispose. Calls Dispose(Boolean) with true.

Dispose(Boolean)

The disposable method for the disposable pattern. This default implementation does nothing.

GetGetter<TValue>(DataViewSchema+Column)

Returns a value getter delegate to fetch the value of the given column, from the row. This throws if the column is not active in this row, or if the type TValue differs from this column's type.

GetIdGetter()

A getter for a 128-bit ID value. It is common for objects to serve multiple DataViewRow instances to iterate over what is supposed to be the same data, for example, in a IDataView a cursor set will produce the same data as a serial cursor, just partitioned, and a shuffled cursor will produce the same data as a serial cursor or any other shuffled cursor, only shuffled. The ID exists for applications that need to reconcile which entry is actually which. Ideally this ID should be unique, but for practical reasons, it suffices if collisions are simply extremely improbable.

Note that this ID, while it must be consistent for multiple streams according to the semantics above, is not considered part of the data per se. So, to take the example of a data view specifically, a single data view must render consistent IDs across all cursorings, but there is no suggestion at all that if the "same" data were presented in a different data view (as by, say, being transformed, cached, saved, or whatever), that the IDs between the two different data views would have any discernible relationship.

IsColumnActive(DataViewSchema+Column)

Returns whether the given column is active in this row.

Applies to