> ## Documentation Index
> Fetch the complete documentation index at: https://art.openpipe.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Tracking Metrics

> See what ART logs automatically and how to add your own metrics and costs.

ART writes a metrics row every time you call `model.log(...)`. Those rows go to
`history.jsonl` in the run directory and, if W\&B logging is enabled, to W\&B.

Use this page for three things:

* understand the metrics ART emits automatically
* add task-specific metrics from your own rollout code
* track external judge and API spend alongside training metrics

## What ART logs automatically

When you call `await model.train(...)` or `await model.log(train_groups, split="train")`,
ART already logs most of the metrics you need to monitor a run.

| Type   | Examples                                                                                                                    |
| ------ | --------------------------------------------------------------------------------------------------------------------------- |
| Reward | `train/reward`, `train/reward_std_dev`, `train/exception_rate`, `val/reward`                                                |
| Loss   | `loss/train`, `loss/entropy`, `loss/kl_div`, `loss/grad_norm`, `loss/learning_rate`                                         |
| Data   | `data/step_num_scenarios`, `data/step_num_trajectories`, `data/step_num_groups_submitted`, `data/step_num_groups_trainable` |
| Time   | `time/wall_clock_sec`, `time/step_wall_s`, `time/step_trainer_s`                                                            |
| Cost   | `costs/gpu` on `LocalBackend` when GPU pricing is known                                                                     |

If ART has the inputs it needs, it also derives:

* cumulative metrics such as `time/cum/trainer_s`, `data/cum/num_unique_scenarios`, and `costs/cum/all`
* cost rollups such as `costs/train`, `costs/eval`, and `costs/all`
* throughput metrics such as `throughput/avg_trainer_tok_per_s` and `throughput/avg_actor_tok_per_s`

<Note>
  Some metrics only appear when the backend or your code provides the underlying
  inputs. For example, `throughput/avg_actor_tok_per_s` requires both
  `data/step_actor_tokens` and `time/step_actor_s`.
</Note>

## Add task-specific outcome metrics

Attach metrics directly to each `Trajectory` when your rollout code knows whether
an attempt succeeded, how many tools it called, or any other task-specific
signal.

```python theme={null}
async def rollout(model: art.Model, scenario: Scenario) -> art.Trajectory:
    trajectory = art.Trajectory(
        messages_and_choices=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": scenario.prompt},
        ],
        metadata={"scenario_id": scenario.id},
    )

    completion = await model.openai_client().chat.completions.create(
        model=model.get_inference_name(),
        messages=trajectory.messages(),
    )
    trajectory.messages_and_choices.append(completion.choices[0])

    trajectory.reward = score_reward(trajectory)
    trajectory.metrics["correct"] = float(is_correct(trajectory))
    trajectory.metrics["tool_calls"] = float(count_tool_calls(trajectory))
    return trajectory
```

On train steps, ART averages those rollout metrics and logs them under the
`train/` namespace, such as `train/correct` and `train/tool_calls`.

If you want to record one value per `TrajectoryGroup` instead of one per
trajectory, pass `metrics={...}` when you build the group. ART logs those once
per group, using keys like `train/group_difficulty` on train steps.

## Add step-level metrics ART cannot infer

Use `model.metrics_builder()` for metrics that live outside individual
trajectories, such as actor-side timing, token counts, or idle time.

```python theme={null}
builder = model.metrics_builder()

with builder.measure("time/step_actor_s"):
    result = await run_rollouts()

builder.add_data(
    step_num_scenarios=result.num_scenarios,
    step_actor_tokens=result.actor_tokens,
    scenario_ids=result.scenario_ids,
)
builder.add_idle_times(step_actor_idle_s=result.actor_idle_s)

await model.log(result.train_groups, split="train", step=result.step)
```

A few useful patterns:

* log `scenario_ids` to unlock `data/cum/num_unique_scenarios`
* log both `data/step_actor_tokens` and `time/step_actor_s` to unlock actor throughput metrics
* log `time/step_eval_s` when eval runs happen outside the backend
* use fully qualified keys like `time/step_actor_s` or `data/step_actor_tokens` for builder-managed metrics

ART flushes builder-managed metrics on the next `model.log(...)` or
`model.train(...)` call.

## Track judge and API costs

Use `@track_api_cost` when a function returns a provider response object with
token usage. Wrap the relevant part of your code in a metrics context so ART
knows whether the spend belongs to training or evaluation.

```python theme={null}
from art.metrics import track_api_cost

@track_api_cost(
    source="llm_judge/correctness",
    provider="openai",
    model_name="openai/gpt-4.1",
)
async def run_judge(client, messages):
    return await client.chat.completions.create(
        model="gpt-4.1",
        messages=messages,
    )

with model.metrics_builder("train").activate_context():
    await run_judge(judge_client, train_messages)

with model.metrics_builder("eval").activate_context():
    await run_judge(judge_client, eval_messages)
```

The next metrics row will include:

* `costs/train/llm_judge/correctness` or `costs/eval/llm_judge/correctness`
* rollups such as `costs/train`, `costs/eval`, and `costs/all`
* cumulative totals such as `costs/cum/all`

ART can price OpenAI and Anthropic responses from their usage fields. You must
pass both `provider` and `model_name` to `@track_api_cost`.

For custom pricing or unsupported models, register pricing on the builder:

```python theme={null}
builder = model.metrics_builder()
builder.register_model_pricing(
    "anthropic/my-custom-judge",
    prompt_per_million=1.2,
    completion_per_million=4.8,
)
```

## Track GPU cost on LocalBackend

`LocalBackend` can log `costs/gpu` automatically on train steps. ART currently
auto-detects H200 pricing at `$3/hour` per GPU. For other hardware, pass an
explicit override:

```python theme={null}
backend = LocalBackend(gpu_cost_per_hour_usd=2.25)
```

This lets ART include GPU spend in the same metrics stream as rewards, losses,
and judge/API costs.
