One of ART’s primary goals is to minimize the amount of setup necessary to begin benefitting from RL within an existing codebase. The ART client is a lightweight object that allows you to run inference and train models against either local or remote backends. That means that you can run your agent anywhere, including on a laptop without a powerful GPU, and still get all the performance benefits of training and generating tokens on a H100. Pretty cool!If you’re curious about how ART allows you to run training and inference either remotely or locally depending on your development machine, check out the backend docs below. Otherwise, let’s dive deeper into the client!
The client that you’ll use to generate tokens and train your model is initialized through the art.TrainableModel class.
Copy
Ask AI
import artmodel = art.TrainableModel( # the name of your model as it will appear in W&B # and other observability platforms name="agent-001", # keep your project name constant between all the models you train # for a given task to consistently group metrics project="my-agentic-task", # the model that you want to train from base_model="Qwen/Qwen2.5-14B-Instruct",)
Once you’ve initialized your backend, you can register it with your model. This sets up all the wiring to run inference and training.
If you’ve already fine-tuned a model with SFT using a LoRA adapter (e.g., Unsloth/PEFT) and have a standard Hugging Face–style adapter directory, you can start RL training from those weights by passing the adapter directory path as base_model when creating your TrainableModel.Why this?
Warm-start from task-aligned weights to reduce steps/GPU cost.
Stabilize early training, especially for small models (1B–8B) that may get near-zero rewards at RL start.
Copy
Ask AI
import artmodel = art.TrainableModel( name="agent-001", project="my-agentic-task", # Point to the local SFT LoRA adapter directory # (e.g., contains adapter_config.json and adapter_model.bin/safetensors) base_model="/path/to/my_sft_lora_adapter",)
ART will load the adapter as the initial checkpoint and proceed with RL updates from there.You’re now ready to start training your agent.
Your model will generate inference tokens by making requests to a vLLM server running on whichever backend you previously registered. To route inference requests to this backend, follow the code sample below.
As your model learns to become more capable at the task, its weights will update and each new LoRA instance will be automatically loaded onto the vLLM server running on your backend. The registration and inference process shown above will ensure that your inference requests are always routed to the latest version of the model, saving you a lot of complexity!
Before training your model, you need to provide a few scenarios that your agent should learn from. While completing these scenarios, its weights will update to avoid past mistakes and reproduce successes. It’s best to provide at least 10 scenarios that adequately represent the real scenarios your agent will handle after it’s deployed.
Copy
Ask AI
class Scenario: # add whatever fields differ from one real-world scenario to another field_1: str field_2: floatscenarios = [ Scenario( field_1: "hello", field_2: 0 ), Scenario( field_1: "world!", field_2: 1 )]
Define a rollout function that runs the agent through an individual scenario.
Copy
Ask AI
# define a rollout function that puts the model through its paces for a given scenarioasync def rollout(model: art.Model, scenario: Scenario) -> art.Trajectory: openai_client = model.openai_client() trajectory = art.Trajectory( messages_and_choices=[{ "role": "system", "content": "..." }, { "role": "user", "content": "... }] ) # generate a completion using the client chat_completion = await openai_client.chat.completions.create( messages=trajectory.messages(), model=model.name ) choice = chat_completion.choices[0] trajectory.messages_and_choices.append(choice) # determine how well the agent did during this particular run agent_performance_score: float = ... trajectory.reward = agent_performance_score return trajectory
Now that your training scenarios and rollout function are declared, training the model is straightforward. The following code trains the model for 50 steps, allowing the agent 8 attempts at each training scenario during each step. Since a reward is assigned to each Trajectory that the rollout function returns, the agent will learn to produce completions that are more similar to those that resulted in high rewards in the past, and will shy away from behavior that resulted in low rewards.
Copy
Ask AI
async def train(): # train for 50 steps for _ in range(await model.get_step(), 50): # Trajectories produced using the same training scenario are automatically grouped train_groups = await art.gather_trajectory_groups( ( art.TrajectoryGroup(rollout(model, scenario) for _ in range(8)) for scenario in scenarios ), pbar_desc="gather", ) print("num train groups:", len(train_groups)) # num train groups: 2 print("length of each train group:", len(train_groups[0])) # length of each train group: 8 # send the grouped trajectories to the backend and wait until training finishes await model.train( train_groups, config=art.TrainConfig(learning_rate=1e-5), ) # once model.train finishes for the current step # the backend updates the LoRA weights for inference and # the training loop continues until 50 steps have completed
To see the ART client and backend working together in action, check out our Summarizer tutorial or one of the notebooks! If you have questions on how to integrate the ART client into your own codebase, please ask in the Discord!