One of ART’s primary goals is to minimize the amount of setup necessary to begin benefitting from RL within an existing codebase. The ART client is a lightweight object that allows you to run inference and train models against either local or remote backends. That means that you can run your agent anywhere, including on a laptop without a powerful GPU, and still get all the performance benefits of training and generating tokens on a H100. Pretty cool!

If you’re curious about how ART allows you to run training and inference either remotely or locally depending on your development machine, check out the backend docs below. Otherwise, let’s dive deeper into the client!

Initializing the client

The client that you’ll use to generate tokens and train your model is initialized through the art.TrainableModel class.

import art

model = art.TrainableModel(
    # the name of your model as it will appear in W&B
    # and other observability platforms
    name="agent-001",
    # keep your project name constant between all the models you train
    # for a given task to consistently group metrics
    project="my-agentic-task",
    # the model that you want to train from
    base_model="Qwen/Qwen2.5-14B-Instruct",
)

Once you’ve initialized your backend, you can register it with your model. This sets up all the wiring to run inference and training.

# local training
backend = art.LocalBackend()

# remote training
backend = SkyPilotBackend.initialize_cluster(
    cluster_name="art", gpu="H100"
)

await model.register(backend)

You’re now ready to start training your agent.

Running inference

Your model will generate inference tokens by making requests to a vLLM server running on whichever backend you previously registered. To route inference requests to this backend, follow the code sample below.

openai_client = model.openai_client()

messages: art.Messages = [
    {
        "role": "system",
        "content": "...",
    },
    {
        "role": "user",
        "content": "..."
    }
]
chat_completion = await openai_client.chat.completions.create(
    messages=messages,
    model=model.name,
    max_tokens=100,
    timeout=100,
    tools=[...]
)
print(chat_completion.choices[0].message.tool_calls)

As your model learns to become more capable at the task, its weights will update and each new LoRA instance will be automatically loaded onto the vLLM server running on your backend. The registration and inference process shown above will ensure that your inference requests are always routed to the latest version of the model, saving you a lot of complexity!

Training the model

Before training your model, you need to provide a few scenarios that your agent should learn from. While completing these scenarios, its weights will update to avoid past mistakes and reproduce successes. It’s best to provide at least 10 scenarios that adequately represent the real scenarios your agent will handle after it’s deployed.

class Scenario:
    # add whatever fields differ from one real-world scenario to another
    field_1: str
    field_2: float

scenarios = [
    Scenario(
        field_1: "hello",
        field_2: 0
    ),
    Scenario(
        field_1: "world!",
        field_2: 1
    )
]

Define a rollout function that runs the agent through an individual scenario.

# define a rollout function that puts the model through its paces for a given scenario
async def rollout(model: art.Model, scenario: Scenario) -> art.Trajectory:
    openai_client = model.openai_client()

    trajectory = art.Trajectory(
        messages_and_choices=[{
            "role": "system",
            "content": "..."
        },
        {
            "role": "user",
            "content": "...
        }]
    )

    # generate a completion using the client
    chat_completion = await openai_client.chat.completions.create(
        messages=trajectory.messages(), model=model.name
    )
    choice = chat_completion.choices[0]
    trajectory.messages_and_choices.append(choice)

    # determine how well the agent did during this particular run
    agent_performance_score: float = ...
    trajectory.reward = agent_performance_score

    return trajectory

Now that your training scenarios and rollout function are declared, training the model is straightforward. The following code trains the model for 50 steps, allowing the agent 8 attempts at each training scenario during each step. Since a reward is assigned to each Trajectory that the rollout function returns, the agent will learn to produce completions that are more similar to those that resulted in high rewards in the past, and will shy away from behavior that resulted in low rewards.

async def train(): # train for 50 steps
    for _ in range(await model.get_step(), 50):
        # Trajectories produced using the same training scenario are automatically grouped
        train_groups = await art.gather_trajectory_groups(
            (
                art.TrajectoryGroup(rollout(model, scenario) for _ in range(8))
                for scenario in scenarios
            ),
            pbar_desc="gather",
        )

        print("num train groups:", len(train_groups))
        # num train groups: 2

        print("length of each train group:", len(train_groups[0]))
        # length of each train group: 8

        # send the grouped trajectories to the backend and wait until training finishes
        await model.train(
            train_groups,
            config=art.TrainConfig(learning_rate=1e-5),
        )
        # once model.train finishes for the current step
        # the backend updates the LoRA weights for inference and
        # the training loop continues until 50 steps have completed

To see the ART client and backend working together in action, check out our Summarizer tutorial or one of the notebooks! If you have questions on how to integrate the ART client into your own codebase, please ask in the Discord!