ForecastingJob Class

Reference

Configuration for AutoML Forecasting Task.

Initialize a new AutoML Forecasting task.

Inheritance: azure.ai.ml.entities._job.automl.tabular.automl_tabular.AutoMLTabular

ForecastingJob

Constructor

ForecastingJob(*, primary_metric: str | None = None, forecasting_settings: ForecastingSettings | None = None, **kwargs: Any)

Parameters

Name	Description
primary_metric Required	Optional[str] The primary metric to use for model selection.
forecasting_settings Required	Optional[ForecastingSettings] The settings for the forecasting task.
kwargs Required	Dict[str, Any] Job-specific arguments

Keyword-Only Parameters

Name	Description
primary_metric Required
forecasting_settings Required

Methods

dump	Dumps the job content into a file in YAML format.
set_data	Define data configuration.
set_featurization	Define feature engineering configuration.
set_forecast_settings	Manage parameters used by forecasting tasks.
set_limits	Set limits for the job.
set_training	The method to configure forecast training related settings.

dump

Dumps the job content into a file in YAML format.

dump(dest: str | PathLike | IO, **kwargs: Any) -> None

Parameters

Name	Description
dest Required	Union[<xref:PathLike>, str, IO[AnyStr]] The local path or file stream to write the YAML content to. If dest is a file path, a new file will be created. If dest is an open file, the file will be written to directly.

Keyword-Only Parameters

Name	Description
kwargs	dict Additional arguments to pass to the YAML serializer.

Exceptions

Type	Description
FileExistsError	Raised if dest is a file path and the file already exists.
IOError	Raised if dest is an open file and the file is not writable.

set_data

Define data configuration.

set_data(*, training_data: Input, target_column_name: str, weight_column_name: str | None = None, validation_data: Input | None = None, validation_data_size: float | None = None, n_cross_validations: str | int | None = None, cv_split_column_names: List[str] | None = None, test_data: Input | None = None, test_data_size: float | None = None) -> None

Keyword-Only Parameters

Name	Description
training_data	Input Training data.
target_column_name	str Column name of the target column.
weight_column_name	Optional[str] Weight column name, defaults to None
validation_data	Optional[Input] Validation data, defaults to None
validation_data_size	Optional[float] Validation data size, defaults to None
n_cross_validations	Optional[Union[str, int]] n_cross_validations, defaults to None
cv_split_column_names	Optional[List[str]] cv_split_column_names, defaults to None
test_data	Optional[Input] Test data, defaults to None
test_data_size	Optional[float] Test data size, defaults to None

Exceptions

Type	Description
FileExistsError	Raised if dest is a file path and the file already exists.
IOError	Raised if dest is an open file and the file is not writable.

set_featurization

Define feature engineering configuration.

set_featurization(*, blocked_transformers: List[BlockedTransformers | str] | None = None, column_name_and_types: Dict[str, str] | None = None, dataset_language: str | None = None, transformer_params: Dict[str, List[ColumnTransformer]] | None = None, mode: str | None = None, enable_dnn_featurization: bool | None = None) -> None

Keyword-Only Parameters

Name	Description
blocked_transformers	Optional[List[Union[BlockedTransformers, str]]] A list of transformer names to be blocked during featurization, defaults to None
column_name_and_types	Optional[Dict[str, str]] A dictionary of column names and feature types used to update column purpose , defaults to None
dataset_language	Optional[str] Three character ISO 639-3 code for the language(s) contained in the dataset. Languages other than English are only supported if you use GPU-enabled compute. The language_code 'mul' should be used if the dataset contains multiple languages. To find ISO 639-3 codes for different languages, please refer to https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes, defaults to None
transformer_params	Optional[Dict[str, List[ColumnTransformer]]] A dictionary of transformer and corresponding customization parameters , defaults to None
mode	Optional[str] "off", "auto", defaults to "auto", defaults to None
enable_dnn_featurization	Optional[bool] Whether to include DNN based feature engineering methods, defaults to None

Exceptions

Type	Description
FileExistsError	Raised if dest is a file path and the file already exists.
IOError	Raised if dest is an open file and the file is not writable.

set_forecast_settings

Manage parameters used by forecasting tasks.

set_forecast_settings(*, time_column_name: str | None = None, forecast_horizon: str | int | None = None, time_series_id_column_names: str | List[str] | None = None, target_lags: str | int | List[int] | None = None, feature_lags: str | None = None, target_rolling_window_size: str | int | None = None, country_or_region_for_holidays: str | None = None, use_stl: str | None = None, seasonality: str | int | None = None, short_series_handling_config: str | None = None, frequency: str | None = None, target_aggregate_function: str | None = None, cv_step_size: int | None = None, features_unknown_at_forecast_time: str | List[str] | None = None) -> None

Keyword-Only Parameters

Name	Description
time_column_name	Optional[str] The name of the time column. This parameter is required when forecasting to specify the datetime column in the input data used for building the time series and inferring its frequency.
forecast_horizon	The desired maximum forecast horizon in units of time-series frequency. The default value is 1. Units are based on the time interval of your training data, e.g., monthly, weekly that the forecaster should predict out. When task type is forecasting, this parameter is required. For more information on setting forecasting parameters, see Auto-train a time-series forecast model.
time_series_id_column_names	Optional[Union[str, List[str]]] The names of columns used to group a time series. It can be used to create multiple series. If time series id column names is not defined or the identifier columns specified do not identify all the series in the dataset, the time series identifiers will be automatically created for your data set.
target_lags	The number of past periods to lag from the target column. By default the lags are turned off. When forecasting, this parameter represents the number of rows to lag the target values based on the frequency of the data. This is represented as a list or single integer. Lag should be used when the relationship between the independent variables and dependent variable do not match up or correlate by default. For example, when trying to forecast demand for a product, the demand in any month may depend on the price of specific commodities 3 months prior. In this example, you may want to lag the target (demand) negatively by 3 months so that the model is training on the correct relationship. For more information, see Auto-train a time-series forecast model. Note on auto detection of target lags and rolling window size. Please see the corresponding comments in the rolling window section. We use the next algorithm to detect the optimal target lag and rolling window size. Estimate the maximum lag order for the look back feature selection. In our case it is the number of periods till the next date frequency granularity i.e. if frequency is daily, it will be a week (7), if it is a week, it will be month (4). That values multiplied by two is the largest possible values of lags/rolling windows. In our examples, we will consider the maximum lag order of 14 and 8 respectively). Create a de-seasonalized series by adding trend and residual components. This will be used in the next step. Estimate the PACF - Partial Auto Correlation Function on the on the data from (2) and search for points, where the auto correlation is significant i.e. its absolute value is more then 1.96/square_root(maximal lag value), which correspond to significance of 95%. If all points are significant, we consider it being strong seasonality and do not create look back features. We scan the PACF values from the beginning and the value before the first insignificant auto correlation will designate the lag. If first significant element (value correlate with itself) is followed by insignificant, the lag will be 0 and we will not use look back features.
feature_lags	Optional[str] Flag for generating lags for the numeric features with 'auto' or None.
target_rolling_window_size	Optional[Union[str, int]] The number of past periods used to create a rolling window average of the target column. When forecasting, this parameter represents n historical periods to use to generate forecasted values, <= training set size. If omitted, n is the full training set size. Specify this parameter when you only want to consider a certain amount of history when training the model. If set to 'auto', rolling window will be estimated as the last value where the PACF is more then the significance threshold. Please see target_lags section for details.
country_or_region_for_holidays	Optional[str] The country/region used to generate holiday features. These should be ISO 3166 two-letter country/region codes, for example 'US' or 'GB'.
use_stl	Configure STL Decomposition of the time-series target column. use_stl can take three values: None (default) - no stl decomposition, 'season' - only generate season component and season_trend - generate both season and trend components.
seasonality	Optional[Union[int, str] Set time series seasonality as an integer multiple of the series frequency. If seasonality is set to 'auto', it will be inferred. If set to None, the time series is assumed non-seasonal which is equivalent to seasonality=1.
short_series_handling_config	The parameter defining how if AutoML should handle short time series. Possible values: 'auto' (default), 'pad', 'drop' and None. auto short series will be padded if there are no long series, otherwise short series will be dropped. pad all the short series will be padded. drop all the short series will be dropped". None the short series will not be modified. If set to 'pad', the table will be padded with the zeroes and empty values for the regressors and random values for target with the mean equal to target value median for given time series id. If median is more or equal to zero, the minimal padded value will be clipped by zero: Input: Date numeric_value string target 2020-01-01 23 green 55 Output assuming minimal number of values is four: Date numeric_value string target 2019-12-29 0 NA 55.1 2019-12-30 0 NA 55.6 2019-12-31 0 NA 54.5 2020-01-01 23 green 55 Note: We have two parameters short_series_handling_configuration and legacy short_series_handling. When both parameters are set we are synchronize them as shown in the table below (short_series_handling_configuration and short_series_handling for brevity are marked as handling_configuration and handling respectively). handling handlingconfiguration resultinghandling resultinghandlingconfiguration True auto True auto True pad True auto True drop True auto True None False None False auto False None False pad False None False drop False None False None False None
frequency	Forecast frequency. When forecasting, this parameter represents the period with which the forecast is desired, for example daily, weekly, yearly, etc. The forecast frequency is dataset frequency by default. You can optionally set it to greater (but not lesser) than dataset frequency. We'll aggregate the data and generate the results at forecast frequency. For example, for daily data, you can set the frequency to be daily, weekly or monthly, but not hourly. The frequency needs to be a pandas offset alias. Please refer to pandas documentation for more information: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#dateoffset-objects
target_aggregate_function	The function to be used to aggregate the time series target column to conform to a user specified frequency. If the target_aggregation_function is set, but the freq parameter is not set, the error is raised. The possible target aggregation functions are: "sum", "max", "min" and "mean". The target column values are aggregated based on the specified operation. Typically, sum is appropriate for most scenarios. Numerical predictor columns in your data are aggregated by sum, mean, minimum value, and maximum value. As a result, automated ML generates new columns suffixed with the aggregation function name and applies the selected aggregate operation. For categorical predictor columns, the data is aggregated by mode, the most prominent category in the window. Date predictor columns are aggregated by minimum value, maximum value and mode. freq target_aggregation_function Data regularityfixing mechanism None (Default) None (Default) The aggregationis not applied.If the validfrequency cannot bedeterminedthe errorwill be raised. Some Value None (Default) The aggregationis not applied.If the numberof data pointscompliant togiven frequencygrid isless then 90%these pointswill beremoved,otherwisethe error willbe raised. None (Default) Aggregation function The error aboutmissingfrequencyparameter israised. Some Value Aggregation function Aggregate tofrequency usingprovidedaggregationfunction.
cv_step_size	Optional[int] Number of periods between the origin_time of one CV fold and the next fold. For example, if n_step = 3 for daily data, the origin time for each fold will be three days apart.
features_unknown_at_forecast_time	Optional[Union[str, List[str]]] The feature columns that are available for training but unknown at the time of forecast/inference. If features_unknown_at_forecast_time is set to an empty list, it is assumed that all the feature columns in the dataset are known at inference time. If this parameter is not set the support for future features is not enabled.

Exceptions

Type	Description
FileExistsError	Raised if dest is a file path and the file already exists.
IOError	Raised if dest is an open file and the file is not writable.

set_limits

Set limits for the job.

set_limits(*, enable_early_termination: bool | None = None, exit_score: float | None = None, max_concurrent_trials: int | None = None, max_cores_per_trial: int | None = None, max_nodes: int | None = None, max_trials: int | None = None, timeout_minutes: int | None = None, trial_timeout_minutes: int | None = None) -> None

Keyword-Only Parameters

Name	Description
enable_early_termination	Optional[bool] Whether to enable early termination if the score is not improving in the short term, defaults to None. Early stopping logic: No early stopping for first 20 iterations (landmarks). Early stopping window starts on the 21st iteration and looks for early_stopping_n_iters iterations (currently set to 10). This means that the first iteration where stopping can occur is the 31st. AutoML still schedules 2 ensemble iterations AFTER early stopping, which might result in higher scores. Early stopping is triggered if the absolute value of best score calculated is the same for past early_stopping_n_iters iterations, that is, if there is no improvement in score for early_stopping_n_iters iterations.
exit_score	Optional[float] Target score for experiment. The experiment terminates after this score is reached. If not specified (no criteria), the experiment runs until no further progress is made on the primary metric. For for more information on exit criteria, see this article , defaults to None
max_concurrent_trials	Optional[int] This is the maximum number of iterations that would be executed in parallel. The default value is 1. AmlCompute clusters support one iteration running per node. For multiple AutoML experiment parent runs executed in parallel on a single AmlCompute cluster, the sum of the `max_concurrent_trials` values for all experiments should be less than or equal to the maximum number of nodes. Otherwise, runs will be queued until nodes are available. DSVM supports multiple iterations per node. `max_concurrent_trials` should be less than or equal to the number of cores on the DSVM. For multiple experiments run in parallel on a single DSVM, the sum of the `max_concurrent_trials` values for all experiments should be less than or equal to the maximum number of nodes. Databricks - `max_concurrent_trials` should be less than or equal to the number of worker nodes on Databricks. `max_concurrent_trials` does not apply to local runs. Formerly, this parameter was named `concurrent_iterations`.
max_cores_per_trial	Optional[int] The maximum number of threads to use for a given training iteration. Acceptable values: Greater than 1 and less than or equal to the maximum number of cores on the compute target. Equal to -1, which means to use all the possible cores per iteration per child-run. Equal to 1, the default.
max_nodes	Optional[int] [Experimental] The maximum number of nodes to use for distributed training. For forecasting, each model is trained using max(2, int(max_nodes / max_concurrent_trials)) nodes. For classification/regression, each model is trained using max_nodes nodes. Note- This parameter is in public preview and might change in future.
max_trials	Optional[int] The total number of different algorithm and parameter combinations to test during an automated ML experiment. If not specified, the default is 1000 iterations.
timeout_minutes	Optional[int] Maximum amount of time in minutes that all iterations combined can take before the experiment terminates. If not specified, the default experiment timeout is 6 days. To specify a timeout less than or equal to 1 hour, make sure your dataset's size is not greater than 10,000,000 (rows times column) or an error results, defaults to None
trial_timeout_minutes	Optional[int] Maximum time in minutes that each iteration can run for before it terminates. If not specified, a value of 1 month or 43200 minutes is used, defaults to None

Exceptions

Type	Description
FileExistsError	Raised if dest is a file path and the file already exists.
IOError	Raised if dest is an open file and the file is not writable.

set_training

The method to configure forecast training related settings.

set_training(*, enable_onnx_compatible_models: bool | None = None, enable_dnn_training: bool | None = None, enable_model_explainability: bool | None = None, enable_stack_ensemble: bool | None = None, enable_vote_ensemble: bool | None = None, stack_ensemble_settings: StackEnsembleSettings | None = None, ensemble_model_download_timeout: int | None = None, allowed_training_algorithms: List[str] | None = None, blocked_training_algorithms: List[str] | None = None, training_mode: str | TrainingMode | None = None) -> None

Keyword-Only Parameters

Name	Description
enable_onnx_compatible_models	Whether to enable or disable enforcing the ONNX-compatible models. The default is False. For more information about Open Neural Network Exchange (ONNX) and Azure Machine Learning, see this article.
enable_dnn_training	Optional[bool] Whether to include DNN based models during model selection. However, the default is True for DNN NLP tasks, and it's False for all other AutoML tasks.
enable_model_explainability	Whether to enable explaining the best AutoML model at the end of all AutoML training iterations. For more information, see Interpretability: model explanations in automated machine learning. , defaults to None
enable_stack_ensemble	Whether to enable/disable StackEnsemble iteration. If enable_onnx_compatible_models flag is being set, then StackEnsemble iteration will be disabled. Similarly, for Timeseries tasks, StackEnsemble iteration will be disabled by default, to avoid risks of overfitting due to small training set used in fitting the meta learner. For more information about ensembles, see Ensemble configuration , defaults to None
enable_vote_ensemble	Whether to enable/disable VotingEnsemble iteration. For more information about ensembles, see Ensemble configuration , defaults to None
stack_ensemble_settings	Optional[StackEnsembleSettings] Settings for StackEnsemble iteration, defaults to None
ensemble_model_download_timeout	Optional[int] During VotingEnsemble and StackEnsemble model generation, multiple fitted models from the previous child runs are downloaded. Configure this parameter with a higher value than 300 secs, if more time is needed, defaults to None
allowed_training_algorithms	Optional[List[str]] A list of model names to search for an experiment. If not specified, then all models supported for the task are used minus any specified in `blocked_training_algorithms` or deprecated TensorFlow models, defaults to None
blocked_training_algorithms	Optional[List[str]] A list of algorithms to ignore for an experiment, defaults to None
training_mode	[Experimental] The training mode to use. The possible values are- distributed- enables distributed training for supported algorithms. non_distributed- disables distributed training. auto- Currently, it is same as non_distributed. In future, this might change. Note: This parameter is in public preview and may change in future.

Exceptions

Type	Description
FileExistsError	Raised if dest is a file path and the file already exists.
IOError	Raised if dest is an open file and the file is not writable.

Attributes

base_path

The base path of the resource.

Returns

Type	Description
str	The base path of the resource.

creation_context

The creation context of the resource.

Returns

Type	Description
Optional[SystemData]	The creation metadata for the resource.

featurization

Get the tabular featurization settings for the AutoML job.

Returns

Type	Description
TabularFeaturizationSettings	Tabular featurization settings for the AutoML job

forecasting_settings

Return the forecast settings.

Returns

Type	Description
ForecastingSettings	forecast settings.

id

The resource ID.

Returns

Type	Description
Optional[str]	The global ID of the resource, an Azure Resource Manager (ARM) ID.

inputs

limits

Get the tabular limits for the AutoML job.

Returns

Type	Description
TabularLimitSettings	Tabular limits for the AutoML job

log_files

Job output files.

Returns

Type	Description
Optional[Dict[str, str]]	The dictionary of log names and URLs.

log_verbosity

Get the log verbosity for the AutoML job.

Returns

Type	Description
<xref:LogVerbosity>	log verbosity for the AutoML job

outputs

primary_metric

Return the primary metric to use for model selection.

Returns

Type	Description
Optional[str]	The primary metric for model selection.

status

The status of the job.

Common values returned include "Running", "Completed", and "Failed". All possible values are:

NotStarted - This is a temporary state that client-side Run objects are in before cloud submission.
Starting - The Run has started being processed in the cloud. The caller has a run ID at this point.
Provisioning - On-demand compute is being created for a given job submission.
Preparing - The run environment is being prepared and is in one of two stages:
- Docker image build
- conda environment setup
Queued - The job is queued on the compute target. For example, in BatchAI, the job is in a queued state

while waiting for all the requested nodes to be ready.
Running - The job has started to run on the compute target.
Finalizing - User code execution has completed, and the run is in post-processing stages.
CancelRequested - Cancellation has been requested for the job.
Completed - The run has completed successfully. This includes both the user code execution and run

post-processing stages.
Failed - The run failed. Usually the Error property on a run will provide details as to why.
Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled.
NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent.

Returns

Type	Description
Optional[str]	Status of the job.

studio_url

Azure ML studio endpoint.

Returns

Type	Description
Optional[str]	The URL to the job details page.

task_type

Get task type.

Returns

Type	Description
str	The type of task to run. Possible values include: "classification", "regression", "forecasting".

test_data

Get test data.

Returns

Type	Description
Input	Test data input

training

Return the forecast training settings.

Returns

Type	Description
<xref:azure.ai.ml.automl.ForecastingTrainingSettings>	training settings.

training_data

Get training data.

Returns

Type	Description
Input	Training data input

type

The type of the job.

Returns

Type	Description
Optional[str]	The type of the job.

validation_data

Get validation data.

Returns

Type	Description
Input	Validation data input

ForecastingJob Class

Constructor

Parameters

Keyword-Only Parameters

Methods

dump

Parameters

Keyword-Only Parameters

Exceptions

set_data

Keyword-Only Parameters

Exceptions

set_featurization

Keyword-Only Parameters

Exceptions

set_forecast_settings

Keyword-Only Parameters

Exceptions

set_limits

Keyword-Only Parameters

Exceptions

set_training

Keyword-Only Parameters

Exceptions

Attributes

base_path

Returns

creation_context

Returns

featurization

Returns

forecasting_settings

Returns

id

Returns

inputs

limits

Returns

log_files

Returns

log_verbosity

Returns

outputs

primary_metric

Returns

status

Returns

studio_url

Returns

task_type

Returns

test_data

Returns

training

Returns

training_data

Returns

type

Returns

validation_data

Returns

Feedback

Additional resources