Using "cv_splits_indices" in AutoMLConfig

Brian Barbieri 1 Reputation point
2021-03-04T08:20:03.803+00:00

When training an regression model with AutoMLConfig with n_cross_validations being a normal int, I'm facing no problems.

Now I want to use TimeSeriesSplit as the cross validation method for training a model with AutoMLConfig. For this there is a "cv_splits_indices" argument where I put in a list of lists of indicis like the following when n_splits=5 in TimeSeriesSplit :

array([[array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10]),
        array([11, 12, 13, 14])],
       [array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14]),
        array([15, 16, 17, 18])],
       [array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18]),
        array([19, 20, 21, 22])],
       [array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22]),
        array([23, 24, 25, 26])],
       [array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26]),
        array([27, 28, 29, 30])]], dtype=object)

Unfortunately when running the following cell:

automl_settings = {
    "iteration_timeout_minutes": 15,
    "experiment_timeout_hours": 0.3,
    "max_cores_per_iteration" : -1,
    "enable_early_stopping": True,
    "primary_metric": 'normalized_root_mean_squared_error',
    "featurization": 'auto',
    "verbosity": logging.INFO,
    "cv_splits_indices": idxs
}

automl_config = AutoMLConfig(task='regression',
                             debug_log=f'automated_ml_errors_.log',
                             training_data=train,
                             validation_data=train,
                             label_column_name=y_var,
                             **automl_settings)

I receive the following error:

ConfigException: ConfigException:
 Message: cv_splits_indices should be a List of List[numpy.ndarray]. Each List[numpy.ndarray] corresponds to a CV fold and should have just 2 elements: The indices for training set and for the validation set.
 InnerException: None
 ErrorResponse 
{
    "error": {
        "code": "UserError",
        "message": "cv_splits_indices should be a List of List[numpy.ndarray]. Each List[numpy.ndarray] corresponds to a CV fold and should have just 2 elements: The indices for training set and for the validation set.",
        "details_uri": "https://aka.ms/AutoMLConfig",
        "target": "cv_splits_indices",
        "inner_error": {
            "code": "BadArgument",
            "inner_error": {
                "code": "ArgumentInvalid"
            }
        },
        "reference_code": "XXXXXXREDACTEDXXXX"
    }
}

What is going wrong here? My input looks correct?

Thank you

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,580 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,616 Reputation points
    2021-03-04T14:48:46.463+00:00

    @Brian Barbieri Thanks for the question. Can you please add more details about the azure ML SDK version.
    Here is the doc for cross validation data folds.

    1 person found this answer helpful.