Validate your simulation for machine teaching
The Bonsai training engine uses reinforcement learning (RL) to structure Machine Teaching for brains. Simulations support reinforcement learning with data and an interactive environment for iterative training. But not all simulations are compatible with Machine teaching. Verifying the readiness of your simulation is critical to successfully training a brain with Bonsai.
The steps below represent the basic, best practices for simulation validation. Ultimately, developing a comprehensive, domain-relevant validation process requires situation specific experimentation and comparisons to real world data. The goal of any validation plan should be to identify and address critical gaps between your simulated environment and the real environment in which your brain will be deployed.
Before you start
- Make sure you can interact with your simulation programmatically. You can use the programming language of your choice.
- Make sure that Bonsai supports your simulation platform (or language).
Step 1: Verify your stepping function
To support reinforcement learning, your simulator must respond to iterative information changes. To handle iterative changes, your simulation must include a stepping function. The stepping function accepts input (action) variables and returns output (state) values unless a terminal condition occurs.
- Input variables represent actions taken by your Bonsai brain during the previous iteration.
- Output variables provide outcome information the brain will use during the current iteration.
For example, consider an apiary HVAC simulation that controls temperature by manipulating circulation vents. The number and position of open vents are the input variables. The internal temperature of the apiary is the output state. To work with machine teaching, the simulation must allow iterative changes to the position and number of open vents and return the new temperature on every iteration.
To verify your stepping function:
- Confirm the set of expected input control actions match, or is a superset of, the set of actions your brain will train with.
- Confirm your stepping function is deterministic. In other words, all possible inputs have a corresponding output state or terminal condition.
- Confirm the output state information uses correct units. For example, if the brain expects temperature in Celsius, confirm your simulation is not sending temperature information in Kelvins.
Step 2: Verify your terminal conditions and reset function
An important part of iterative simulation is knowing when to stop and reset the process. Terminal conditions let the simulation know when the brain has reached a point where recovery is no longer possible. For example, the temperature inside an apiary has reached a point where the bees will have died.
When a terminal condition occurs, your reset function should set the environment in your simulation back to the expected start state so your brain can try again.
To verify your terminal conditions and reset function:
- Confirm your terminal conditions can actually occur as part of the simulation output. For example, if your apiary simulation is set in a tropical environment and your simulation only terminates when the temperature drops below freezing, it will never terminate.
- Confirm your reset function is exposed to users so it can be called by the Bonsai platform.
- Confirm you are handling edge cases correctly. For example, if it is possible for temperatures to change by more than 1 degree in an iteration, check for values greater than a cutoff value, rather than equality.
Step 3: Verify your configuration variables
Your simulator should support initializing environment variables so your Bonsai brain can train against a variety of scenarios. For example, an apiary HVAC simulation should support any reasonable starting temperature.
To verify your configuration variables:
- Confirm the set of available configuration variables covers all relevant starting state information.
- Confirm the configuration variables can be set by your reset function.
- Confirm the units between your simulation and brain match or are converted appropriately.
Step 4: Design a base test protocol
A base test protocol establishes a simple (base) case and runs your simulation steps in a loop for that case a predefined number of iterations. Use the generated output data for closer analysis to confirm your simulator is behaving as expected.
Your base test protocol depends on two metrics:
- Control frequency (CF): how often the brain takes action to control the simulated environment as a measure of events per metric, typically noted in Hertz (Hz). The control window metric should reflect the real-world measure (time, distance) you expect the trained brain to use when evaluating and take action to control the real-world environment.
- Simulation time (ST): the amount of real-world clock time it takes to simulate the desired control window. The ST for a given iteration will vary based on the complexity of the simulation.
To determine an appropriate control frequency, calculate the number of control events your brain should handle in a meaningful unit of your control metric. Time-based control systems typically calculate CF in events per second while distance-based control systems typically calculate CF in events per meter.
For example, assume you want your apiary HVAC brain to evaluate the state of the apiary and take action every 100ms when it is deployed. To effectively mimic your production environment, every training iteration should simulate a 100ms window with one control action per iteration.
1 event every 100 ms → 1 event / 100 ms
1000 ms per second → (1 event / 100 ms) × (1000 ms / 1 second) → target frequency is 10 control events per second
1 Hz = 1 event per second
So, to confirm your simulation is replicating the environment appropriately, your base test protocol should use a control frequency of 10 Hz.
Defining your control frequency is Hertz is typical, but not required. Ultimately, what you need to determine is how many control events you want for a given measurement. Once you know how often your control events should happen, you can determine the rest of the test variables.
Once you calculate the control frequency, you can use the following steps to design a general base protocol you can adapt to the specifics of your simulation:
- Run your simulation with a default configuration to generate a typical log file.
- Based on your CF and the log file, determine the maximum number of iterations you want to allow for training episodes.
- Make sure you can calculate and log the following information for each
iteration of the test:
- All output states provided by the simulator.
- The actual CF value achieved by the simulation.
- The ST value for the iteration.
- Analyze the relationship between the inputs and outputs of your log file to identify a simple configuration (test) scenario.
- Define at least one fixed testing scenario. Select an appropriate starting configuration and determine the expected control actions (policy responses) for each iteration of the test scenario.
- Define at least one random testing scenario. For an arbitrary starting configuration, write test code that randomly selects one of the available control actions for each iteration of the test scenario.
- Write test code to call the stepping function with the desired input (fixed or random) for each iteration and repeat the process until it reaches a terminal condition or the maximum number of iterations you determined previously.
Step 5: Run a base test protocol
Simulation validation is an iterative process. You should expect to run your test protocol multiple times with different starting configurations and rerun the protocol as you make adjustments to the simulation.
For best results, you should have multiple configurations defined for your fixed and random policies.
For each testing run:
- Set the simulator with the initial state for your test scenario.
- Run the simulation with varying inputs for the fixed testing scenario.
- Run the simulation with varying inputs the random testing scenario.
- Run the simulation for a large number of iterations so a reset can occur.
- Run the simulation and force a reset at a random point.
- Run the simulation with an initial configuration that forces a reset.
Step 6: Analyze the results of your test
There is no right way to analyze the results of a simulation test. Some best practices are noted below, but you should also rely on your creativity and subject matter expertise to evaluate the behavior of your simulation. Plotting your results as inputs versus outputs can make it easier to analyze simulator behavior and communicate the results to others.
The reliability of your simulation relates directly to the input-output (action-state) relationship in your base protocol. Running the same scenario with a fixed control policy and a random control policy makes reliability easier to determine:
- The fixed policy should result in a well-understood outcome.
- The random policy generates unexpected conditions in the simulator.
The fixed policy helps you verify good behavior while the random policy helps identify edge cases where the simulation is slow, stuck, or hits other issues.
For example, in the apiary HVAC simulation, opening a vent should lower the temperature in the apiary. If the temperature rises instead, it indicates that then simulation is buggy.
The flexibility of your simulation relates directly to how well it handles varied configurations. Review the logs for your fixed and random control policies. For each of the starting configurations, determine if the corresponding changes in the simulated environment make sense.
For example, changing the size and number of circulation vents in the apiary should have a corresponding change in how quickly or slowly the temperature changes when the vents are opened.
A critical part of machine teaching is the ability to reset the training environment when needed. Check your testing logs for all the places a terminal condition occurred and the environment reset. Note the simulator state, the resulting actions when the reset occurred, and if the behavior makes sense in that context.
If you notice a false positive or false negative terminal condition, it indicates that your simulation is buggy.
For example, did the apiary HVAC reset when the internal temperature reached a point that all the plants in the environment would dry out and wither? Did the new internal temperature after the reset make sense as a starting temperature?
Evaluate the simulated control frequency
If plan to deploy your brain on hardware rather than installing it as software, your simulated control frequency should be equal to, or a factor of, the real-world control frequency you expect to see in production. That way, your stepping can loop as many times as needed to keep the simulation CF in line with the expected hardware CF.
For example, if the hardware for your brain has a CF of 100 Hz, then your simulation must allow inputs at every 10 ms of real world simulation.
1 Hz = 1 event per second → 100 Hz = 100 events per second
1000 ms per second → 1000 ms / 100 events = 10 ms per event
Simulation controls at 1 ms, 2 ms, and 5 ms will also work. In each case, the simulator can loop through the stepping function additional times to achieve the desired control frequency.
|Simulated control window||Required loops||Simulated CF|
Evaluate the simulation time
Calculate the average ST value for all the iterations, across all test runs. To work reliably with machine teaching and Bonsai, the average ST for your simulation must be ≤ 20 seconds.
If your average ST is above the supported threshold, it is considered a slow simulator and cannot provide your brain with a realistic model for training.
Once you have validated your simulator, try running it locally against a basic Inkling file to start integrating with Bonsai.