Adjust the machine learning algorithm
The algorithm clause lets advanced users experiment with changing the machine
learning algorithms and tuning parameters used by the training engine. However,
the engine considers algorithm statements hints, not requirements. The engine
may ignore the statement if it determines that the requested algorithm or
parameters are inappropriate or obsolete. When hints are used, they only apply
to the concept for which they are specified.
Supported algorithm identifiers
| Token | Name | Used for |
|---|---|---|
APEX |
Distributed Deep Q Network | nominal action spaces |
PPO |
Proximal Policy Optimization | continuous and discrete ordinal action spaces |
SAC |
Soft Actor Critic | continuous and discrete ordinal action spaces |
Example
graph (input: GameState): Action {
concept balance(input): Action {
curriculum {
source MySimulator
algorithm {
Algorithm: "APEX",
HiddenLayers: [
{
Size: 64,
Activation: "tanh"
},
{
Size: 64,
Activation: "tanh"
}
]
}
}
}
}
Memory mode configuration
Use the MemoryMode parameter to configure whether the concept's action can learn to depend on past states or actions. Supported values:
| Value | Description |
|---|---|
default |
The training engine will choose the memory mode state and action automatically. This is the default behavior if MemoryMode is not specified. |
none |
No memory. Learned actions can depend only on current state. |
state |
Memory of past states. Can learn actions that depend on the current state as well as the previous states in the episode. |
state and action |
Memory of past states and past actions. Learned actions can depend on previous states and actions in the episode. |
MemoryMode can be set without specifying an algorithm. For example:
algorithm {
MemoryMode: "state"
}
Layer specification
Several algorithms allow hidden layers of the neural network(s) to be configured. Hidden layers are defined as an array of structures that define the size and (optionally) the activation
function for each hidden layer. Sizes must be positive integers and activation
functions must be one of: linear, tanh, relu, logistic, softmax,
elu, or default. For example:
HiddenLayers: [
{
Size: 400,
Activation: "relu"
},
{
Size: 300,
Activation: "tanh"
}
]
APEX-DQN Parameters
APEX-DQN supports the following parameters.
| Parameter | Type | Description | Example |
|---|---|---|---|
QLearningRate |
number | The learning rate for training the Q network | QLearningRate: 0.0001 |
HiddenLayers |
LayerSpec[] | An array of layer specifications | See Layer specification |
The default configuration for APEX-DQN:
algorithm {
Algorithm: "APEX",
QLearningRate: 5e-4,
HiddenLayers: [
{
Size: 256,
Activation: "tanh"
},
{
Size: 256,
Activation: "tanh"
}
]
}
PPO Parameters
PPO supports the following parameters.
| Parameter | Type | Description | Example |
|---|---|---|---|
BatchSize |
Number.UInt32 | Batch size (1000-2000000) for aggregating assessment data | BatchSize: 8000 |
PolicyLearningRate |
number | The learning rate for training the policy network | PolicyLearningRate: 0.0001 |
HiddenLayers |
LayerSpec[] | An array of layer specifications | See Layer specification |
Proximal Policy Optimization (PPO) works by gathering a large number of
complete trajectories and analyzing them in aggregate to obtain a
confidence metric for the probability that a given change to the policy
will improve performance. With a sufficiently large batch size, PPO yields
monotonic policy improvement. As the algorithm makes changes its confidence
will lead to real improvements based on large amounts of data. The BatchSize
parameter determines how much data will be aggregated to make this decision.
Smaller batches will lead to faster convergence as updates are computed more frequently. But small batch updates will be less reliable and the policy may become unstable if the batch size is too small.
The default configuration for PPO:
algorithm {
Algorithm: "PPO",
BatchSize: 6000,
PolicyLearningRate: 5e-5,
HiddenLayers: [
{
Size: 256,
Activation: "tanh"
},
{
Size: 256,
Activation: "tanh"
}
]
}
SAC Parameters
SAC supports the following parameters.
| Parameter | Type | Description | Example |
|---|---|---|---|
QHiddenLayers |
LayerSpec[] | An array of layer specifications for the Q value network | See Layer specification |
PolicyHiddenLayers |
LayerSpec[] | An array of layer specifications for the policy network | See Layer specification |
The default configuration for SAC:
algorithm {
Algorithm: "SAC",
QHiddenLayers: [
{
Size: 256,
Activation: "relu"
},
{
Size: 256,
Activation: "relu"
}
],
PolicyHiddenLayers: [
{
Size: 256,
Activation: "relu"
},
{
Size: 256,
Activation: "relu"
}
],
}