RecommendationCatalog.CrossValidate(IDataView, IEstimator<ITransformer>, Int32, String, String, Nullable<Int32>) Method

Definition

Run cross-validation over numberOfFolds folds of data, by fitting estimator, and respecting samplingKeyColumnName if provided. Then evaluate each sub-model against labelColumnName and return metrics.

public System.Collections.Generic.IReadOnlyList<Microsoft.ML.TrainCatalogBase.CrossValidationResult<Microsoft.ML.Data.RegressionMetrics>> CrossValidate (Microsoft.ML.IDataView data, Microsoft.ML.IEstimator<Microsoft.ML.ITransformer> estimator, int numberOfFolds = 5, string labelColumnName = "Label", string samplingKeyColumnName = null, Nullable<int> seed = null);
member this.CrossValidate : Microsoft.ML.IDataView * Microsoft.ML.IEstimator<Microsoft.ML.ITransformer> * int * string * string * Nullable<int> -> System.Collections.Generic.IReadOnlyList<Microsoft.ML.TrainCatalogBase.CrossValidationResult<Microsoft.ML.Data.RegressionMetrics>>
Public Function CrossValidate (data As IDataView, estimator As IEstimator(Of ITransformer), Optional numberOfFolds As Integer = 5, Optional labelColumnName As String = "Label", Optional samplingKeyColumnName As String = null, Optional seed As Nullable(Of Integer) = null) As IReadOnlyList(Of TrainCatalogBase.CrossValidationResult(Of RegressionMetrics))

Parameters

data
IDataView

The data to run cross-validation on.

estimator
IEstimator<ITransformer>

The estimator to fit.

numberOfFolds
Int32

Number of cross-validation folds.

labelColumnName
String

The label column (for evaluation).

samplingKeyColumnName
String

Optional name of the column to use as a stratification column. If two examples share the same value of the samplingKeyColumnName (if provided), they are guaranteed to appear in the same subset (train or test). Use this to make sure there is no label leakage from train to the test set. If this optional parameter is not provided, a stratification columns will be generated, and its values will be random numbers .

seed
Nullable<Int32>

Optional parameter used in combination with the samplingKeyColumnName. If the samplingKeyColumnName is not provided, the random numbers generated to create it, will use this seed as value. And if it is not provided, the default value will be used.

Returns

Per-fold results: metrics, models, scored datasets.

Applies to