Skip to main content
ART divides the logic for training an agent into two distinct abstractions. The client is responsible for interfacing with the environment in which the agent runs and for sending inference and training requests to the backend. The backend is responsible for generating tokens at inference time, updating the agent’s weights based on past performance, and managing GPU memory as it switches from inference to training mode. This separation of concerns simplifies the process of teaching an agent to improve its performance using RL. While the backend’s training and inference settings are highly configurable, they’re also set up to use intelligent defaults that save beginners time while getting started. However, there are a few important considerations to take before running your first training job.

ServerlessBackend

LocalBackend

Managed or local training

ART provides two backend classes:
  • ServerlessBackend - train remotely on autoscaling GPUs
  • LocalBackend - run your agent and training code on the same machine
If your agent is already set up on a machine equipped with an advanced GPU and you want to run training on the same machine, use LocalBackend. If your agent is running on a machine without an advanced GPU (this includes most personal computers and production servers), use ServerlessBackend instead. ServerlessBackend optimizes speed and cost by autoscaling across managed clusters.

ServerlessBackend

Setting up ServerlessBackend requires a W&B API key. Once you have one, you can provide it to ServerlessBackend either as an environment variable or initialization argument.
from art.serverless.backend import ServerlessBackend

backend = ServerlessBackend(
  api_key="my-api-key",
  # or set WANDB_API_KEY in the environment
)
As your training job progresses, ServerlessBackend automatically saves your LoRA checkpoints as W&B Artifacts and deploys them for production inference on W&B Inference.

LocalBackend

The LocalBackend class runs a vLLM server and either an Unsloth or torchtune instance on whatever machine your agent itself is executing. This is a good fit if you’re already running your agent on a machine with a GPU. To declare a LocalBackend instance, follow the code sample below:
from art.local import LocalBackend

backend = LocalBackend(
    # set to True if you want your backend to shut down automatically
    # when your client process ends
    in_process: False,
    # local path where the backend will store trajectory logs and model weights
    path: './.art',
)
If you’re using PipelineTrainer, LocalBackend is currently supported only in dedicated mode, where training and inference run on separate GPUs.
from art import TrainableModel
from art.dev import InternalModelConfig
from art.local import LocalBackend

backend = LocalBackend(path="./.art")
model = TrainableModel(
    name="pipeline-localbackend",
    project="my-project",
    base_model="Qwen/Qwen3-0.6B",
    _internal_config=InternalModelConfig(
        trainer_gpu_ids=[0],
        inference_gpu_ids=[1],
    ),
)
Shared LocalBackend still pauses inference during training, so ART rejects that configuration for PipelineTrainer. In dedicated mode, a new checkpoint becomes the default inference target only after its LoRA has been reloaded into vLLM. That checkpoint publication flow is backend-specific, so save_checkpoint does not have identical semantics across every ART backend. Requests that are already in flight keep using the adapter they started with; the reload only affects subsequent routing to the latest served step.

Using a backend

Once initialized, a backend can be used in the same way regardless of whether it runs locally or remotely.
BACKEND_TYPE = "serverless"

if BACKEND_TYPE == "serverless":
    from art.serverless.backend import ServerlessBackend
    backend = await ServerlessBackend()
else:
    from art.local import LocalBackend
    backend = LocalBackend()

model = art.TrainableModel(...)

await model.register(backend)

# ...training code...
To see LocalBackend and ServerlessBackend in action, try the examples below.

2048 Notebook

Use ServerlessBackend to train an agent to play 2048.

Summarizer

Use LocalBackend to train a SOTA summarizing agent.