Learn how to use RULER to automatically reward your agents.
RULER performance across multiple tasks at launch. In 3 out of 4 tasks, models trained with RULER slightly outperform those trained with hand-crafted reward functions. See the full launch announcement for details.
art.Trajectory
objects, you can use the lower-level ruler
function:
debug=True
to see the judge’s reasoning, which helps identify scoring patterns.
gather_trajectory_groups
helper with an after_each
callback:
swallow_exceptions=True
parameter is recommended in production to handle judge API failures gracefully - groups that fail to be judged are simply filtered out rather than crashing the training loop.
ruler_score_group
: