Learn how to train language models to effectively use Model Context Protocol (MCP) servers using ART
MCP•RL is a specialized application of ART that teaches language models to effectively use Model Context Protocol (MCP) servers. This approach enables you to train agents that can seamlessly interact with any MCP-compatible tool or service.
Model Context Protocol (MCP): A standard for connecting AI assistants to external tools and data sources
ART (Agent Reinforcement Trainer): OpenPipe’s framework for training better AI agents using reinforcement learning
The result is a training pipeline that can automatically teach any language model to use MCP servers effectively, without requiring manually labeled training data.
# Generate diverse training scenarios automaticallyfrom art.mcp import generate_scenariosscenario_collection = await generate_scenarios( tools=tools_list, num_scenarios=24, show_preview=True, generator_model="openai/gpt-4.1-mini", generator_api_key="your_openrouter_key", generator_base_url="https://openrouter.ai/api/v1",)
ART automatically generates diverse training scenarios that exercise different aspects of the MCP server: simple single-tool usage, complex multi-step workflows, edge cases and error handling, and creative combinations of available tools.
from art.rewards import ruler_score_group# RULER evaluates responses without labeled datascored_group = await ruler_score_group( group, judge_model="openai/o4-mini",)
Instead of requiring human-labeled examples, RULER judges response quality by analyzing whether the agent accomplished the intended task, quality of tool usage, efficiency of the approach, and error handling.
# Train using RULER feedbackgroups = await gather_trajectory_groups( trajectory_groups_generator, pbar_desc="train gather step",)scored_groups = [ await ruler_score_group( group, judge_model="openai/o4-mini", ) for group in groups]await model.train( scored_groups, config=art.TrainConfig(learning_rate=1e-5),)
The model learns from RULER feedback using reinforcement learning, improving its ability to select appropriate tools, use correct parameters, chain tools effectively, and handle failures gracefully.
Join our Discord to discuss MCP•RL with the community
MCP•RL is particularly effective because RULER can judge response quality
purely from the agent’s final output—no labeled data required! This makes it
possible to train high-quality MCP agents with minimal manual intervention.