Reward and terminal functions in Inkling

Caution

Reward and terminal functions are considered advanced usage for special cases. Most users should use Goals instead.

Reward functions

Reward functions take one or two input parameters. The first parameter provides the new state that was returned from the simulator. The second parameter (which can be omitted) is the action provided by the AI during a training episode. The reward function must return a numeric value indicating the reward associated with that state and action.

The reward function is specified within the curriculum statement using the reward keyword and a globally named or inline function.

concept Balance(input: SimState): Action {
    curriculum {
        source MySimulator
        reward GetReward
    }
}

function IsUpright(Angle: number) {
  return Angle > 85 and Angle < 95
}

function GetReward(State: SimState) {
  if IsUpright(State.Angle) {
      return 1.0
  }
  return -0.01
}

Warning

Reward functions cannot be used in conjunction with goals. Including both will generate an error.

Terminal functions

As with reward functions, terminal functions can be written within Inkling. When you specify a terminal function in Inkling, the training engine ignores any terminal values passed from the simulator. Terminal functions cannot be used in conjunction with goals.

An Inkling terminal function takes one or two input parameters. The first is the new state that was returned from the simulator. The second parameter (which can be omitted) is the action that was provided by the model. The terminal function must return true (1) if the state is a terminal state or false (0) otherwise.

The terminal function is specified within the curriculum statement using the terminal keyword and a globally named or inline function.

concept Balance(input: SimState): Action {
    curriculum {
        source MySimulator
        terminal function (State: SimState) {
          return State.Position > 20
        }
    }
}

Terminal functions are not required when using reward functions. If no terminal function is provided, the training episode automatically terminates once the the configured iteration limit (EpisodeIterationLimit) is reached.

Warning

Terminal functions cannot be used in conjunction with goals. Including both will generate an error.