-
Inference
- Your code uses the ART client to perform an agentic workflow (usually executing several rollouts in parallel to gather data faster).
- Completion requests are routed to the ART backend, which runs the modelβs latest LoRA in vLLM.
- As the agent executes, each
system,user, andassistantmessage is stored in a Trajectory. - After your rollouts finish, your code assigns a
rewardto each Trajectory, with higher rewards indicating better performance than low ones.
-
Training
- When all rollouts have finished, Trajectories are grouped and sent to the backend. Inference is blocked while training executes.
- The backend trains your model using GRPO, initializing from the latest checkpoint (or an empty LoRA on the first iteration).
- The backend saves the newly trained LoRA to a local directory and loads it into vLLM.
- Inference is unblocked and the loop resumes at step 1.
PipelineTrainer can also run with LocalBackend in dedicated mode, where training and inference stay on separate GPUs and the latest served step advances only after vLLM reloads the new LoRA.
Training and inference use both the ART client and backend. Learn more by following the links below!
ART Client
The client is responsible for interfacing between your code and the ART
backend.
ART Backend
The backend is responsible for generating tokens and training your models.