Keyword: Goal
Goals are a high-level specification of what you want the AI to learn. Use goals to let the training engine automatically determine appropriate reward functions and conditions for early termination.
A goal-based curriculum lets Bonsai report on training progress in terms of the concrete objectives you specify rather than abstract reward scores.
Usage
To use goals, include the Goal namespace at the beginning of your Inkling
file and include the goal statement (with one or more objectives), in your
curriculum definition.
using Goal
...
curriculum {
...
goal (state: SimState, action: Action) {
avoid Fall: Math.Abs(state.angle) in Goal.RangeAbove(MaxAngle)
drive SmallAngle: Math.Abs(state.angle) in Goal.Range(0, MaxAngle/6)
drive StayCentered: Math.Abs(state.position) in Goal.Range(0, MaxPosition/10)
minimize EnergyUsed: action.force**2 in Goal.RangeBelow(TargetMeanSquaredForce)
}
training {
EpisodeIterationLimit: MaxIterationCount
}
}
The state parameter to the goal statement is required. The action parameter is optional,
and refers to the action that got the environment to the current state.
The order in which you specify your objectives does not matter. The training engine will try to satisfy the success criteria for all of them.
Important
You cannot use goals and define explicit reward and terminal functions
Supported objectives
Goals support the following objectives:
avoid: Avoid a defined region.drive: Get to a target range as quickly as possible and try to stay in that range.maximize: Push the target value as high as possible.minimize: Push the target value as low as possible.reach: Get to a target range as quickly as possible.
The avoid objective tells the training engine that the AI should keep the
simulated environment from entering a particular state.
An avoid statement includes the following elements:
objectiveName: an objective name.testValue: an expression representing the value to check for the objective.avoidRange: an expression that defines the set of values to be avoided.
avoid objectiveName: testValue in avoidRange
For example:
# if the angle ever goes outside [-MaxAngle, MaxAngle], the
# episode terminates and is marked a failure
avoid Fall: Math.Abs(S.Angle) in Goal.RangeAbove(MaxAngle)
The avoid objective succeeds if the test value never enters the
prescribed range during the episode.
The avoid objective fails if the test value enters the prescribed range at
any point during the episode.
The training engine provides the following details for avoid objectives:
- Success rate: the fraction of episodes in an assessment where the AI achieves the objective.
- Goal satisfaction rate: the average progress toward satisfying the objective, across test episodes in an assessment. A satisfaction of 100% means the AI successfully completed the objective.
- Goal robustness: how robust the learned policy is to noise and perturbation.
For
avoidobjectives, robustness is proportional to the minimal distance between the test value and the range to avoid. Negative robustness means the objective failed.
Early episode termination
Goal-based curriculum supports early episode termination: terminating a training episode before reaching the episode iteration limit. Early termination happens under the following conditions:
- Early episode termination because an objective failed. Happens when any of the following conditions occurs:
- An
avoidobjective triggers. - A
driveobjective with awithin kclause fails to be in the target region for more thankiterations.
- An
- Early episode termination because all objectives succeeded. Happens when all of the following conditions are met:
- The list of objectives does include a
reachobjective. - The list of objectives does not include a
driveobjective. - All
reachobjectives succeed.
- The list of objectives does include a
Define goal ranges
The Goal namespace provides functions for defining your goal ranges.
Currently, the namespace includes functions for numeric thresholds and
N-dimensional spaces.
| Shorthand | Defined range |
|---|---|
Goal.Range(X, Y) |
Values between X and Y, inclusive |
Goal.RangeAbove(X) |
Values greater than or equal to X |
Goal.RangeBelow(X) |
Values less than or equal to X |
Goal.Box([m1, M1], [m2, M2]) |
Values within the rectangle with sides [m1, M1] and [m2, M2] |
Goal.Box([m1, M1], [m2, M2], [m3, M3]) |
Values within the 3D box with sides [m1, M1], [m2, M2], [m3, M3] |
Goal.Box([m1, M1], [m2, M2], ..., [m8, M8]) |
Values within the n-rectangle with sides [m_i, M_i] in each dimension i (up to eight dimensions) |
Goal.Sphere(X, R) |
Values within a 1D sphere (line) with radius R, centered on the point (X) |
Goal.Sphere([X, Y], R) |
Values within a 2D sphere (circle) with radius R, centered on the point (X, Y) |
Goal.Sphere([X, Y, Z], R) |
Values within a 3D sphere with radius R, centered on the point (X, Y, Z) |
The dimension of the objective test value must match the dimension of the provided range. For example, if the objective is defined as a 3D sphere, the test value must correspond to a position in three-dimensional space:
reach [a, b, c] in Goal.Sphere([x, y, z], r)
Tip
If the range can vary between episodes, ensure that the range parameters are part of the observable state.
Objective weights
The Bonsai teaching engine starts training by instructing the AI to satisfy all goal objectives. When the learned policy satisfies all objectives, the teaching engine encourages the policy to be more robust to noise and perturbations. The AI is encouraged to behave so that the test value for each objective is further into the objective's success range. This table shows what increasing robustness means for each type of objective:
| Objective type | Increasing robustness |
|---|---|
avoid objectiveName: testValue in avoidRange |
testValue further from avoidRange |
drive objectiveName: testValue in targetRange |
testValue closer to the middle of targetRange |
maximize objectiveName: testValue in targetRange |
testValue higher in targetRange |
minimize objectiveName: testValue in targetRange |
testValue lower in targetRange |
reach objectiveName: testValue in targetRange |
testValue closer to the middle of targetRange |
By default, all objectives are treated as equally important. To specify relative objective importance, use the weight keyword:
goal (S: SimState) {
# make avoid falling more important than getting close to the target
avoid Fall weight 4: Math.Abs(S.Angle) in Goal.RangeAbove(MaxAngle)
# use the default weight of 1
minimize CloseToTarget: Math.Abs(S.position - S.target_position)
in Goal.RangeBelow(TargetDistance)
}
The above code sets the weight of the avoid Fall objective to be four times more important than the minimize CloseToTarget objective. Relative to the default of equal weight, this will encourage the AI to keep S.Angle smaller, even if that increases Math.Abs(S.position - S.target_position).
Weights default to 1, and can be set to any positive number. Constant expressions can be used as well. Only relative weights matter: with four objectives, using weights 1, 1, 2, 4 would be equivalent to weights 10, 10, 20, 40.
Goal training parameters
You can adjust the following training parameters for the goal with the
training clause:
| Parameter | Values | Default | Description |
|---|---|---|---|
EpisodeIterationLimit |
Number.UInt32 |
1000 | Total iterations allowed per training episode. |
For example:
concept MyConcept(input: SimState): BrainAction {
curriculum {
training {
EpisodeIterationLimit: 250
TotalIterationLimit: 100000,
LessonSuccessThreshold: 0.7,
}
}
}
EpisodeIterationLimit
The training engine terminates the training episode and begins a new one after
EpisodeIterationLimit iterations if no terminal condition has been reached.