A feature that allows a trajectory to contain multiple separate conversation histories. Used for training agents with non-linear conversation flows, preserving special tokens across turns, or handling sub-agent interactions. See Additional Histories for details.
The programmatic environment that the agent interacts with. This includes all the tools available to the agent, the data it can query, and any other external aspects of the system the agent is operating in.
The scenarios that the agent will run through during training. Adding new training scenarios that represent edge cases on which the agent is currently underperforming will help it correct is behavior.
A single step in the training loop. During a training step, the agent completes a set of training scenarios and has its performance assessed and weights updated to improve its performance.
Validation scenarios are the scenarios that the agent is evaluated on. These scenarios are used to assess the agentβs performance and determine whether it has improved.