您现在访问的是微软AZURE全球版技术文档网站,若需要访问由世纪互联运营的MICROSOFT AZURE中国区技术文档网站,请访问 https://docs.azure.cn.

通过 Azure 机器学习优化模型的超参数Tune hyperparameters for your model with Azure Machine Learning

使用 Azure 机器学习有效优化模型的超参数。Efficiently tune hyperparameters for your model using Azure Machine Learning. 超参数优化包括以下步骤:Hyperparameter tuning includes the following steps:

  • 定义参数搜索空间Define the parameter search space
  • 指定要优化的主要指标Specify a primary metric to optimize
  • 指定提前终止性能不佳运行的条件Specify early termination criteria for poorly performing runs
  • 分配资源用于优化超参数Allocate resources for hyperparameter tuning
  • 使用上述配置启动试验Launch an experiment with the above configuration
  • 将训练运行可视化Visualize the training runs
  • 为模型选择最佳性能配置Select the best performing configuration for your model

什么是超参数?What are hyperparameters?

超参数是所选的可调整参数,用于训练控制训练过程本身的模型。Hyperparameters are adjustable parameters you choose to train a model that govern the training process itself. 例如,要想训练深度神经网络,需要在训练模型前确定网络中隐藏层的数量,以及每个层中的节点数量。For example, to train a deep neural network, you decide the number of hidden layers in the network and the number of nodes in each layer prior to training the model. 这些值通常在训练过程中保持恒定。These values usually stay constant during the training process.

在深度学习/机器学习方案中,模型性能在很大程度上取决于选择的超参数值。In deep learning / machine learning scenarios, model performance depends heavily on the hyperparameter values selected. 超参数探索的目标是搜索不同的超参数配置,从而找到可以提供最佳性能的配置。The goal of hyperparameter exploration is to search across various hyperparameter configurations to find a configuration that results in the best performance. 如果搜索空间庞大并且评估每个配置所需的开销很大,则超参数探索过程通常是一个繁琐的手动过程。Typically, the hyperparameter exploration process is painstakingly manual, given that the search space is vast and evaluation of each configuration can be expensive.

使用 Azure 机器学习可以有效地将超参数探索过程自动化,从而节省大量时间和资源。Azure Machine Learning allows you to automate hyperparameter exploration in an efficient manner, saving you significant time and resources. 指定超参数值的范围及训练运行的最大数量。You specify the range of hyperparameter values and a maximum number of training runs. 然后,系统会自动启动多个具有不同参数配置且同时进行的运行,并找到可以提供最佳性能的配置(由你选择的指标来度量)。The system then automatically launches multiple simultaneous runs with different parameter configurations and finds the configuration that results in the best performance, measured by the metric you choose. 性能不佳的训练运行会自动提前终止,从而可以减少计算资源的浪费。Poorly performing training runs are automatically early terminated, reducing wastage of compute resources. 这些资源可以用来探索其他超参数配置。These resources are instead used to explore other hyperparameter configurations.

定义搜索空间Define search space

通过探索针对每个超参数定义的值范围来自动优化超参数。Automatically tune hyperparameters by exploring the range of values defined for each hyperparameter.

超参数的类型Types of hyperparameters

每个超参数既可以是离散的,也可以是连续的,并且具有由参数表达式描述的值的分布。Each hyperparameter can either be discrete or continuous and has a distribution of values described by a parameter expression.

离散超参数Discrete hyperparameters

离散超参数将指定为离散值中的一个 choiceDiscrete hyperparameters are specified as a choice among discrete values. choice 可以是:choice can be:

  • 一个或多个逗号分隔值one or more comma-separated values
  • range 对象a range object
  • 任意 list 对象any arbitrary list object
    {
        "batch_size": choice(16, 32, 64, 128)
        "number_of_hidden_layers": choice(range(1,5))
    }

在这种情况下,batch_size 采用 [16、32、64、128] 中的一个值,number_of_hidden_layers 采用 [1、2、3、4] 中的一个值。In this case, batch_size takes on one of the values [16, 32, 64, 128] and number_of_hidden_layers takes on one of the values [1, 2, 3, 4].

也可以使用分布来指定高级离散超参数。Advanced discrete hyperparameters can also be specified using a distribution. 支持以下分布:The following distributions are supported:

  • quniform(low, high, q) - 返回类似于 round(uniform(low, high) / q) * q 的值quniform(low, high, q) - Returns a value like round(uniform(low, high) / q) * q
  • qloguniform(low, high, q) - 返回类似于 round(exp(uniform(low, high)) / q) * q 的值qloguniform(low, high, q) - Returns a value like round(exp(uniform(low, high)) / q) * q
  • qnormal(mu, sigma, q) - 返回类似于 round(normal(mu, sigma) / q) * q 的值qnormal(mu, sigma, q) - Returns a value like round(normal(mu, sigma) / q) * q
  • qlognormal(mu, sigma, q) - 返回类似于 round(exp(normal(mu, sigma)) / q) * q 的值qlognormal(mu, sigma, q) - Returns a value like round(exp(normal(mu, sigma)) / q) * q

连续超参数Continuous hyperparameters

将连续超参数指定为一个连续值范围内的分布。Continuous hyperparameters are specified as a distribution over a continuous range of values. 支持的分布包括:Supported distributions include:

  • uniform(low, high) - 返回高低之间的均匀分布值uniform(low, high) - Returns a value uniformly distributed between low and high
  • loguniform(low, high) - 返回根据 exp(uniform(low, high)) 绘制的值,使返回值的对数均匀分布loguniform(low, high) - Returns a value drawn according to exp(uniform(low, high)) so that the logarithm of the return value is uniformly distributed
  • normal(mu, sigma) - 返回正态分布的实际值,包括平均值 μ 和标准方差 σnormal(mu, sigma) - Returns a real value that's normally distributed with mean mu and standard deviation sigma
  • lognormal(mu, sigma) - 返回根据 exp(normal(mu, sigma)) 绘制的值,使返回值的对数呈正态分布lognormal(mu, sigma) - Returns a value drawn according to exp(normal(mu, sigma)) so that the logarithm of the return value is normally distributed

参数空间定义的示例:An example of a parameter space definition:

    {    
        "learning_rate": normal(10, 3),
        "keep_probability": uniform(0.05, 0.1)
    }

此代码定义具有两个参数(learning_ratekeep_probability)的搜索空间。This code defines a search space with two parameters - learning_rate and keep_probability. learning_rate 包含平均值为 10、标准偏差为 3 的正态分布。learning_rate has a normal distribution with mean value 10 and a standard deviation of 3. keep_probability 包含最小值为 0.05、最大值为 0.1 的均匀分布。keep_probability has a uniform distribution with a minimum value of 0.05 and a maximum value of 0.1.

超参数空间采样Sampling the hyperparameter space

还可以指定参数采样方法来取代超参数空间定义。You can also specify the parameter sampling method to use over the hyperparameter space definition. Azure 机器学习支持随机采样、网格采样和 Bayesian 采样。Azure Machine Learning supports random sampling, grid sampling, and Bayesian sampling.

随机采样Random sampling

在随机采样中,超参数值是从定义的搜索空间中随机选择的。In random sampling, hyperparameter values are randomly selected from the defined search space. 随机采样允许搜索空间同时包含离散的和连续的超参数。Random sampling allows the search space to include both discrete and continuous hyperparameters.

from azureml.train.hyperdrive import RandomParameterSampling
param_sampling = RandomParameterSampling( {
        "learning_rate": normal(10, 3),
        "keep_probability": uniform(0.05, 0.1),
        "batch_size": choice(16, 32, 64, 128)
    }
)

网格采样Grid sampling

网格采样对定义的搜索空间中的所有可行值执行简单的网格搜索。Grid sampling performs a simple grid search over all feasible values in the defined search space. 只能对使用 choice 指定的超参数使用此方法。It can only be used with hyperparameters specified using choice. 例如,以下空间总共包含 6 个样本:For example, the following space has a total of six samples:

from azureml.train.hyperdrive import GridParameterSampling
param_sampling = GridParameterSampling( {
        "num_hidden_layers": choice(1, 2, 3),
        "batch_size": choice(16, 32)
    }
)

贝叶斯采样Bayesian sampling

Bayesian 采样基于 Bayesian 优化算法, 并使超参数值上的智能选择成为接下来的示例。Bayesian sampling is based on the Bayesian optimization algorithm and makes intelligent choices on the hyperparameter values to sample next. 它会根据以前的采样方式选取样本,使新样本能够改进报告的主要指标。It picks the sample based on how the previous samples performed, such that the new sample improves the reported primary metric.

使用贝叶斯采样时,并发运行的数目会影响优化效果。When you use Bayesian sampling, the number of concurrent runs has an impact on the effectiveness of the tuning process. 通常情况下,数目较小的并发运行可能带来更好的采样收敛,因为较小的并行度会增加可从先前完成的运行中获益的运行的数量。Typically, a smaller number of concurrent runs can lead to better sampling convergence, since the smaller degree of parallelism increases the number of runs that benefit from previously completed runs.

Bayesian 采样仅支持choice通过uniform搜索空间quniform的、和分布。Bayesian sampling only supports choice, uniform, and quniform distributions over the search space.

from azureml.train.hyperdrive import BayesianParameterSampling
param_sampling = BayesianParameterSampling( {
        "learning_rate": uniform(0.05, 0.1),
        "batch_size": choice(16, 32, 64, 128)
    }
)

备注

贝叶斯采样不支持任何提前终止策略(请参阅指定提前终止策略)。Bayesian sampling does not support any early termination policy (See Specify an early termination policy). 使用贝叶斯参数采样时,请设置 early_termination_policy = None,或不使用 early_termination_policy 参数。When using Bayesian parameter sampling, set early_termination_policy = None, or leave off the early_termination_policy parameter.

指定主要指标Specify primary metric

指定想要超参数优化试验优化的主要指标Specify the primary metric you want the hyperparameter tuning experiment to optimize. 将根据此主要指标评估每个训练运行。Each training run is evaluated for the primary metric. 性能不佳的运行(其主要指标不符合提前终止策略设置的条件)将会终止。Poorly performing runs (where the primary metric does not meet criteria set by the early termination policy) will be terminated. 除了主要指标名称以外,还需指定优化的目标是要最大化还是最小化主要指标。In addition to the primary metric name, you also specify the goal of the optimization - whether to maximize or minimize the primary metric.

  • primary_metric_name:要优化的主要指标的名称。primary_metric_name: The name of the primary metric to optimize. 主要指标的名称需要与训练脚本记录的指标名称完全匹配。The name of the primary metric needs to exactly match the name of the metric logged by the training script. 请参阅记录用于超参数优化的指标See Log metrics for hyperparameter tuning.
  • primary_metric_goal:可以是 PrimaryMetricGoal.MAXIMIZEPrimaryMetricGoal.MINIMIZE,确定在评估运行时是要将主要指标最大化还是最小化。primary_metric_goal: It can be either PrimaryMetricGoal.MAXIMIZE or PrimaryMetricGoal.MINIMIZE and determines whether the primary metric will be maximized or minimized when evaluating the runs.
primary_metric_name="accuracy",
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE

优化运行以最大化“准确性”。Optimize the runs to maximize "accuracy". 确保在训练脚本中记录此值。Make sure to log this value in your training script.

记录用于超参数优化的指标Log metrics for hyperparameter tuning

模型的训练脚本必须在模型训练过程中记录相关指标。The training script for your model must log the relevant metrics during model training. 配置超参数优化时,指定要用于评估运行性能的主要指标。When you configure the hyperparameter tuning, you specify the primary metric to use for evaluating run performance. (请参阅指定要优化的主要指标。)必须在训练脚本中记录此指标,以便将其用于超参数优化过程。(See Specify a primary metric to optimize.) In your training script, you must log this metric so it is available to the hyperparameter tuning process.

使用以下示例代码片段将此指标记录在训练脚本中:Log this metric in your training script with the following sample snippet:

from azureml.core.run import Run
run_logger = Run.get_context()
run_logger.log("accuracy", float(val_accuracy))

训练脚本将计算 val_accuracy,并将其记录为“准确性”,它会用作主要指标。The training script calculates the val_accuracy and logs it as "accuracy", which is used as the primary metric. 每次记录指标时,超参数优化服务都将收到该指标。Each time the metric is logged it is received by the hyperparameter tuning service. 由模型开发人员确定报告此指标的频率。It is up to the model developer to determine how frequently to report this metric.

指定提前终止策略Specify early termination policy

使用提前终止策略自动终止性能不佳的运行。Terminate poorly performing runs automatically with an early termination policy. 终止可以减少资源浪费,并将这些资源用于探索其他参数配置。Termination reduces wastage of resources and instead uses these resources for exploring other parameter configurations.

使用提前终止策略时,可以配置以下参数来控制何时应用该策略:When using an early termination policy, you can configure the following parameters that control when a policy is applied:

  • evaluation_interval:应用策略的频率。evaluation_interval: the frequency for applying the policy. 每次训练脚本都会将主要指标计数记录为一个间隔。Each time the training script logs the primary metric counts as one interval. 因此,如果 evaluation_interval 为 1,则训练脚本每次报告主要指标时,都会应用策略。Thus an evaluation_interval of 1 will apply the policy every time the training script reports the primary metric. 如果 evaluation_interval 为 2,则训练脚本每两次报告主要指标时会应用策略。An evaluation_interval of 2 will apply the policy every other time the training script reports the primary metric. 如果未指定,则默认将 evaluation_interval 设置为 1。If not specified, evaluation_interval is set to 1 by default.
  • delay_evaluation:将第一个策略评估延迟指定的间隔数。delay_evaluation: delays the first policy evaluation for a specified number of intervals. 这是一个可选参数,可让所有配置运行初始设置的最小间隔数,避免训练运行过早终止。It is an optional parameter that allows all configurations to run for an initial minimum number of intervals, avoiding premature termination of training runs. 如果已指定,则每隔大于或等于 delay_evaluation 的 evaluation_interval 倍数应用策略。If specified, the policy applies every multiple of evaluation_interval that is greater than or equal to delay_evaluation.

Azure 机器学习支持以下早期终止策略。Azure Machine Learning supports the following Early Termination Policies.

老虎机策略Bandit policy

Bandit是基于可宽延时间系数/时差和评估间隔的终止策略。Bandit is a termination policy based on slack factor/slack amount and evaluation interval. 如果主要指标不在与最佳性能训练运行相关的指定松驰因子/松驰数量范围内,则此策略会提前终止任何运行。The policy early terminates any runs where the primary metric is not within the specified slack factor / slack amount with respect to the best performing training run. 此策略采用以下配置参数:It takes the following configuration parameters:

  • slack_factorslack_amount:在最佳性能训练运行方面允许的松驰。slack_factor or slack_amount: the slack allowed with respect to the best performing training run. slack_factor 以比率的形式指定允许的松驰。slack_factor specifies the allowable slack as a ratio. slack_amount 以绝对数量(而不是比率)的形式指定允许的松驰。slack_amount specifies the allowable slack as an absolute amount, instead of a ratio.

    例如,假设以间隔 10 应用老虎机策略。For example, consider a Bandit policy being applied at interval 10. 另外,性能最佳的运行以间隔 10 报告了主要指标 0.8,目标是最大化主要指标。Assume that the best performing run at interval 10 reported a primary metric 0.8 with a goal to maximize the primary metric. 如果为策略指定的 slack_factor 为 0.2,则间隔为 10 时其最佳指标小于 0.66 (0.8/(1+slack_factor)) 的任何训练运行将被终止。If the policy was specified with a slack_factor of 0.2, any training runs, whose best metric at interval 10 is less than 0.66 (0.8/(1+slack_factor)) will be terminated. 如果为策略指定的 slack_amount 为 0.2,则间隔为 10 时其最佳指标小于 0.6 (0.8-slack_amount) 的任何训练运行将被终止。If instead, the policy was specified with a slack_amount of 0.2, any training runs, whose best metric at interval 10 is less than 0.6 (0.8 - slack_amount) will be terminated.

  • evaluation_interval:应用策略的频率(可选参数)。evaluation_interval: the frequency for applying the policy (optional parameter).

  • delay_evaluation:将第一个策略评估延迟指定的间隔数(可选参数)。delay_evaluation: delays the first policy evaluation for a specified number of intervals (optional parameter).

from azureml.train.hyperdrive import BanditPolicy
early_termination_policy = BanditPolicy(slack_factor = 0.1, evaluation_interval=1, delay_evaluation=5)

在此示例中,指标报告时将在每个间隔应用提前终止策略,从评估间隔 5 开始。In this example, the early termination policy is applied at every interval when metrics are reported, starting at evaluation interval 5. 其最佳指标小于最佳性能运行的 (1/(1+0.1) 或 91% 的任何运行将被终止。Any run whose best metric is less than (1/(1+0.1) or 91% of the best performing run will be terminated.

中间值停止策略Median stopping policy

中间停止是基于运行时所报告的主要指标的运行平均值的早期终止策略。Median stopping is an early termination policy based on running averages of primary metrics reported by the runs. 此策略计算所有训练运行的运行平均值,并终止其性能比运行平均值的中间值更差的运行。This policy computes running averages across all training runs and terminates runs whose performance is worse than the median of the running averages. 此策略采用以下配置参数:This policy takes the following configuration parameters:

  • evaluation_interval:应用策略的频率(可选参数)。evaluation_interval: the frequency for applying the policy (optional parameter).
  • delay_evaluation:将第一个策略评估延迟指定的间隔数(可选参数)。delay_evaluation: delays the first policy evaluation for a specified number of intervals (optional parameter).
from azureml.train.hyperdrive import MedianStoppingPolicy
early_termination_policy = MedianStoppingPolicy(evaluation_interval=1, delay_evaluation=5)

在此示例中,将在每个间隔应用提前终止策略,从评估间隔 5 开始。In this example, the early termination policy is applied at every interval starting at evaluation interval 5. 如果某个运行的最佳主要指标比所有训练运行中间隔 1:5 的运行平均值的中间值更差,则在间隔 5 处终止该运行。A run will be terminated at interval 5 if its best primary metric is worse than the median of the running averages over intervals 1:5 across all training runs.

截断选择策略Truncation selection policy

截断选择将取消每个评估时间间隔内最低执行运行的给定百分比。Truncation selection cancels a given percentage of lowest performing runs at each evaluation interval. 根据运行的性能和主要指标比较运行,并终止 X% 的性能最低的运行。Runs are compared based on their performance on the primary metric and the lowest X% are terminated. 此策略采用以下配置参数:It takes the following configuration parameters:

  • truncation_percentage:要在每个评估间隔终止的性能最低的运行百分比。truncation_percentage: the percentage of lowest performing runs to terminate at each evaluation interval. 指定一个介于 1 到 99 之间的整数值。Specify an integer value between 1 and 99.
  • evaluation_interval:应用策略的频率(可选参数)。evaluation_interval: the frequency for applying the policy (optional parameter).
  • delay_evaluation:将第一个策略评估延迟指定的间隔数(可选参数)。delay_evaluation: delays the first policy evaluation for a specified number of intervals (optional parameter).
from azureml.train.hyperdrive import TruncationSelectionPolicy
early_termination_policy = TruncationSelectionPolicy(evaluation_interval=1, truncation_percentage=20, delay_evaluation=5)

在此示例中,将在每个间隔应用提前终止策略,从评估间隔 5 开始。In this example, the early termination policy is applied at every interval starting at evaluation interval 5. 如果某个运行的时间间隔为5,则该运行会在时间间隔5终止。A run will be terminated at interval 5 if its performance at interval 5 is in the lowest 20% of performance of all runs at interval 5.

无终止策略No termination policy

如果希望所有训练运行一直运行到完成,请将策略设置为“无”。If you want all training runs to run to completion, set policy to None. 这样就不会应用任何提前终止策略。This will have the effect of not applying any early termination policy.

policy=None

默认策略Default policy

如果未指定任何策略,超参数优化服务将允许执行所有定型运行,直到完成。If no policy is specified, the hyperparameter tuning service will let all training runs execute to completion.

备注

如果你正在寻找既可以节省成本又不会终止有前景的作业的保守策略,则可以使用 evaluation_interval 为 1 且 delay_evaluation 为 5 的中间值停止策略。If you are looking for a conservative policy that provides savings without terminating promising jobs, you can use a Median Stopping Policy with evaluation_interval 1 and delay_evaluation 5. 这属于保守的设置,可以提供大约 25%-35% 的节省,且不会造成主要指标损失(基于我们的评估数据)。These are conservative settings, that can provide approximately 25%-35% savings with no loss on primary metric (based on our evaluation data).

分配资源Allocate resources

通过指定训练运行的最大总数,控制超参数优化试验的资源预算。Control your resource budget for your hyperparameter tuning experiment by specifying the maximum total number of training runs. (可选)指定超参数优化试验的最长持续时间。Optionally specify the maximum duration for your hyperparameter tuning experiment.

  • max_total_runs:要创建的训练运行的最大总数。max_total_runs: Maximum total number of training runs that will be created. 上限 - 例如,如果超参数空间有限并且样本较少,则运行数也会较少。Upper bound - there may be fewer runs, for instance, if the hyperparameter space is finite and has fewer samples. 必须是介于 1 和 1000 之间的数字。Must be a number between 1 and 1000.
  • max_duration_minutes:超参数优化试验的最长持续时间(分钟)。max_duration_minutes: Maximum duration in minutes of the hyperparameter tuning experiment. 此参数可选。如果已指定,则会自动取消在此持续时间后运行的所有运行。Parameter is optional, and if present, any runs that would be running after this duration are automatically canceled.

备注

如果同时指定了 max_total_runsmax_duration_minutes,在达到其中的第一个阈值时,会终止超参数优化试验。If both max_total_runs and max_duration_minutes are specified, the hyperparameter tuning experiment terminates when the first of these two thresholds is reached.

此外,指定在超参数优化搜索期间要并行运行的最大训练运行数。Additionally, specify the maximum number of training runs to run concurrently during your hyperparameter tuning search.

  • max_concurrent_runs:在任意给定时刻要并行运行的最大运行数。max_concurrent_runs: Maximum number of runs to run concurrently at any given moment. 如果未指定,将并行启动所有 max_total_runsIf none specified, all max_total_runs will be launched in parallel. 如果要指定,值必须是介于 1 和 100 之间的数字。If specified, must be a number between 1 and 100.

备注

并发运行数根据指定计算目标中的可用资源进行限制。The number of concurrent runs is gated on the resources available in the specified compute target. 因此,需要确保计算目标能够为所需的并发性提供足够的可用资源。Hence, you need to ensure that the compute target has the available resources for the desired concurrency.

分配用于优化超参数的资源:Allocate resources for hyperparameter tuning:

max_total_runs=20,
max_concurrent_runs=4

此代码将超参数优化试验配置为使用最多20个总运行,每次运行四个配置。This code configures the hyperparameter tuning experiment to use a maximum of 20 total runs, running four configurations at a time.

配置试验Configure experiment

使用上述部分中定义的超参数搜索空间、提前终止策略、主要指标和资源分配来配置超参数优化试验。Configure your hyperparameter tuning experiment using the defined hyperparameter search space, early termination policy, primary metric, and resource allocation from the sections above. 此外,提供将结合采样的超参数调用的 estimatorAdditionally, provide an estimator that will be called with the sampled hyperparameters. estimator 描述运行的训练脚本、每个作业的资源(单个或多个 GPU)以及要使用的计算目标。The estimator describes the training script you run, the resources per job (single or multi-gpu), and the compute target to use. 由于超参数优化试验的并发性受可用资源限制,所以请确保 estimator 中指定的计算目标能够为所需的并发性提供足够的资源。Since concurrency for your hyperparameter tuning experiment is gated on the resources available, ensure that the compute target specified in the estimator has sufficient resources for your desired concurrency. (有关估算器的详细信息,请参阅如何训练模型。)(For more information on estimators, see how to train models.)

配置超参数优化试验:Configure your hyperparameter tuning experiment:

from azureml.train.hyperdrive import HyperDriveConfig
hyperdrive_run_config = HyperDriveConfig(estimator=estimator,
                          hyperparameter_sampling=param_sampling, 
                          policy=early_termination_policy,
                          primary_metric_name="accuracy", 
                          primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                          max_total_runs=100,
                          max_concurrent_runs=4)

提交试验Submit experiment

定义超参数优化配置后, 请提交试验:Once you define your hyperparameter tuning configuration, submit an experiment:

from azureml.core.experiment import Experiment
experiment = Experiment(workspace, experiment_name)
hyperdrive_run = experiment.submit(hyperdrive_run_config)

experiment_name分配给超参数优化试验的名称, workspace是要在其中创建试验的工作区(有关试验的详细信息,请参阅如何 Azure 机器学习工作?experiment_name is the name you assign to your hyperparameter tuning experiment, and workspace is the workspace in which you want to create the experiment (For more information on experiments, see How does Azure Machine Learning work?)

将实验可视化Visualize experiment

Azure 机器学习 SDK 提供了一小的笔记本小组件, 可直观显示定型运行进度。The Azure Machine Learning SDK provides a Notebook widget that visualizes the progress of your training runs. 以下代码片段可在 Jupyter 笔记本中的一个位置可视化所有的超参数优化运行:The following snippet visualizes all your hyperparameter tuning runs in one place in a Jupyter notebook:

from azureml.widgets import RunDetails
RunDetails(hyperdrive_run).show()

此代码会显示一个表格,其中详细描述了每个超参数配置的训练运行。This code displays a table with details about the training runs for each of the hyperparameter configurations.

超参数优化表

还可以将每个运行的性能可视化为训练进度。You can also visualize the performance of each of the runs as training progresses.

超参数优化绘图

此外,可以使用“并行坐标绘图”来直观地识别各个超参数的性能与值之间的关联。Additionally, you can visually identify the correlation between performance and values of individual hyperparameters using a Parallel Coordinates Plot.

超参数优化并行坐标hyperparameter tuning parallel coordinates

还可将 Azure Web 门户中的所有超参数优化运行可视化。You can visualize all your hyperparameter tuning runs in the Azure web portal as well. 若要详细了解如何在 Web 门户中查看试验,请参阅如何跟踪试验For more information on how to view an experiment in the web portal, see how to track experiments.

找到最佳模型Find the best model

所有超参数优化运行完成后,确定最佳的执行配置和相应的超参数值:Once all of the hyperparameter tuning runs have completed, identify the best performing configuration and the corresponding hyperparameter values:

best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()
parameter_values = best_run.get_details()['runDefinition']['Arguments']

print('Best Run Id: ', best_run.id)
print('\n Accuracy:', best_run_metrics['accuracy'])
print('\n learning rate:',parameter_values[3])
print('\n keep probability:',parameter_values[5])
print('\n batch size:',parameter_values[7])

示例 NotebookSample notebook

请参阅以下文件夹中的超参数-* 笔记本:Refer to train-hyperparameter-* notebooks in this folder:

按照文章使用 Jupyter Notebook 来探索此服务了解如何运行 Notebook。Learn how to run notebooks by following the article, Use Jupyter notebooks to explore this service.

后续步骤Next steps