Adjust the machine learning algorithm

The algorithm clause lets advanced users experiment with changing the machine learning algorithms and tuning parameters used by the training engine. However, the engine considers algorithm statements hints, not requirements. The engine may ignore the statement if it determines that the requested algorithm or parameters are inappropriate or obsolete. When hints are used, they only apply to the concept for which they are specified.

Supported algorithm identifiers

Token Name Used for
APEX Distributed Deep Q Network nominal action spaces
PPO Proximal Policy Optimization continuous and discrete ordinal action spaces
SAC Soft Actor Critic continuous and discrete ordinal action spaces

Example

graph (input: GameState): Action {
  concept balance(input): Action {
    curriculum {
      source MySimulator

      algorithm {
        Algorithm: "APEX",
        HiddenLayers: [
          {
            Size: 64,
            Activation: "tanh"
          },
          {
            Size: 64,
            Activation: "tanh"
          }
        ]
      }
    }
  }
}

Memory mode configuration

Use the MemoryMode parameter to configure whether the concept's action can learn to depend on past states or actions. Supported values:

Value Description
default The training engine will choose the memory mode state and action automatically. This is the default behavior if MemoryMode is not specified.
none No memory. Learned actions can depend only on current state.
state Memory of past states. Can learn actions that depend on the current state as well as the previous states in the episode.
state and action Memory of past states and past actions. Learned actions can depend on previous states and actions in the episode.

MemoryMode can be set without specifying an algorithm. For example:

algorithm {
    MemoryMode: "state"
}

Layer specification

Several algorithms allow hidden layers of the neural network(s) to be configured. Hidden layers are defined as an array of structures that define the size and (optionally) the activation function for each hidden layer. Sizes must be positive integers and activation functions must be one of: linear, tanh, relu, logistic, softmax, elu, or default. For example:

HiddenLayers: [
  {
    Size: 400,
    Activation: "relu"
  },
  {
    Size: 300,
    Activation: "tanh"
  }
]

APEX-DQN Parameters

APEX-DQN supports the following parameters.

Parameter Type Description Example
QLearningRate number The learning rate for training the Q network QLearningRate: 0.0001
HiddenLayers LayerSpec[] An array of layer specifications See Layer specification

The default configuration for APEX-DQN:

algorithm {
    Algorithm: "APEX",
    QLearningRate: 5e-4,
    HiddenLayers: [
        {
            Size: 256,
            Activation: "tanh"
        },
        {
            Size: 256,
            Activation: "tanh"
        }
    ]
}

PPO Parameters

PPO supports the following parameters.

Parameter Type Description Example
BatchSize Number.UInt32 Batch size (1000-2000000) for aggregating assessment data BatchSize: 8000
PolicyLearningRate number The learning rate for training the policy network PolicyLearningRate: 0.0001
HiddenLayers LayerSpec[] An array of layer specifications See Layer specification

Proximal Policy Optimization (PPO) works by gathering a large number of complete trajectories and analyzing them in aggregate to obtain a confidence metric for the probability that a given change to the policy will improve performance. With a sufficiently large batch size, PPO yields monotonic policy improvement. As the algorithm makes changes its confidence will lead to real improvements based on large amounts of data. The BatchSize parameter determines how much data will be aggregated to make this decision.

Smaller batches will lead to faster convergence as updates are computed more frequently. But small batch updates will be less reliable and the policy may become unstable if the batch size is too small.

The default configuration for PPO:

algorithm {
    Algorithm: "PPO",
    BatchSize: 6000,
    PolicyLearningRate: 5e-5,
    HiddenLayers: [
        {
            Size: 256,
            Activation: "tanh"
        },
        {
            Size: 256,
            Activation: "tanh"
        }
    ]
}

SAC Parameters

SAC supports the following parameters.

Parameter Type Description Example
QHiddenLayers LayerSpec[] An array of layer specifications for the Q value network See Layer specification
PolicyHiddenLayers LayerSpec[] An array of layer specifications for the policy network See Layer specification

The default configuration for SAC:

algorithm {
    Algorithm: "SAC",
    QHiddenLayers: [
        {
            Size: 256,
            Activation: "relu"
        },
        {
            Size: 256,
            Activation: "relu"
        }
    ],
    PolicyHiddenLayers: [
        {
            Size: 256,
            Activation: "relu"
        },
        {
            Size: 256,
            Activation: "relu"
        }
    ],
}