MCP•RL: Training Agents to Use MCP Servers

MCP•RL is a specialized application of ART that teaches language models to effectively use Model Context Protocol (MCP) servers. This approach enables you to train agents that can seamlessly interact with any MCP-compatible tool or service.

What is MCP•RL?

MCP•RL combines two powerful technologies:

Model Context Protocol (MCP): A standard for connecting AI assistants to external tools and data sources
ART (Agent Reinforcement Trainer): OpenPipe’s framework for training better AI agents using reinforcement learning

The result is a training pipeline that can automatically teach any language model to use MCP servers effectively, without requiring manually labeled training data.

How MCP•RL Works

The training process follows these key steps:

1. Server Discovery

# Query the MCP server to understand available tools
tools_list = await mcp_client.list_tools()

2. Scenario Generation

# Generate diverse training scenarios automatically
from art.mcp import generate_scenarios

scenario_collection = await generate_scenarios(
    tools=tools_list,
    num_scenarios=24,
    show_preview=True,
    generator_model="openai/gpt-4.1-mini",
    generator_api_key="your_openrouter_key",
    generator_base_url="https://openrouter.ai/api/v1",
)

ART automatically generates diverse training scenarios that exercise different aspects of the MCP server: simple single-tool usage, complex multi-step workflows, edge cases and error handling, and creative combinations of available tools.

3. RULER Evaluation

from art.rewards import ruler_score_group

# RULER evaluates responses without labeled data
scored_group = await ruler_score_group(
    group,
    judge_model="openai/o4-mini",
)

Instead of requiring human-labeled examples, RULER judges response quality by analyzing whether the agent accomplished the intended task, quality of tool usage, efficiency of the approach, and error handling.

4. Reinforcement Learning

# Train using RULER feedback
groups = await gather_trajectory_groups(
    trajectory_groups_generator,
    pbar_desc="train gather step",
)

scored_groups = [
    await ruler_score_group(
        group,
        judge_model="openai/o4-mini",
    )
    for group in groups
]

await model.train(
    scored_groups,
    config=art.TrainConfig(learning_rate=1e-5),
)

The model learns from RULER feedback using reinforcement learning, improving its ability to select appropriate tools, use correct parameters, chain tools effectively, and handle failures gracefully.

Getting Started

Optimizing against an MCP server can be surprisingly straightforward!

Prerequisites

Access to an MCP server you want to train on
OpenRouter API key for training
Python environment with ART installed

Basic Training Pipeline

Here’s a simplified example of training a model to use an MCP server:

import art
from art.mcp import generate_scenarios
from art.rewards import ruler_score_group
from art import gather_trajectory_groups

# Initialize the model
model = art.TrainableModel(
    model="OpenPipe/Qwen3-14B-Instruct",
    openrouter_api_key="your_openrouter_key"
)

# Generate training scenarios automatically
scenario_collection = await generate_scenarios(
    tools=tools_list,
    resources=resources_list,
    num_scenarios=100,
    show_preview=False,
    generator_model="gpt-4o-mini",
    generator_api_key="your_openrouter_key",
)

# Gather trajectory groups
groups = await gather_trajectory_groups(
    (
        art.TrajectoryGroup(
            rollout(model, scenario, False)
            for _ in range(4)  # rollouts per group
        )
        for scenario in scenario_collection
    ),
    pbar_desc="train gather step",
)

# Score groups using RULER
scored_groups = [
    await ruler_score_group(
        group,
        judge_model="gpt-4o-mini",
        debug=True,
        swallow_exceptions=True
    )
    for group in groups
]

# Train the model
await model.train(
    scored_groups,
    config=art.TrainConfig(learning_rate=1e-5),
)

Example Use Cases

Database Agent: Train a model to query databases, understand schemas, and generate appropriate SQL commands via an MCP database server.
File Management Agent: Teach an agent to navigate file systems, read/write files, and perform complex file operations through an MCP file server.
API Integration Agent: Train models to interact with REST APIs, handle authentication, and process responses via MCP API wrappers.
Development Tools Agent: Create agents that can use development tools like Git, package managers, or testing frameworks through MCP servers.

What MCP•RL is Good At

MCP•RL excels at training agents to effectively use MCP servers by:

Tool Usage: Teaching when and how to use specific tools with appropriate parameters
Multi-Step Workflows: Chaining tool calls and interpreting outputs to build complex workflows
Domain Adaptation: Learning specialized terminology and conventions for different server types

Best Practices

📈 Iterative Training - Use checkpoint forking to experiment with different training approaches and parameters.
🔍 Monitor RULER Scores - Pay attention to RULER evaluation metrics to understand where your agent excels and where it needs improvement.
🧪 Test Thoroughly - Validate your trained agent on held-out scenarios that weren’t used during training.
📊 Use Diverse Scenarios - Ensure your training data covers the full range of tasks your agent will encounter in production.

Troubleshooting

Common Issues

Low RULER Scores:

Check if your MCP server is responding correctly
Verify that generated scenarios are appropriate for your use case
Consider adjusting training parameters

Tool Selection Errors:

Ensure the model has seen diverse examples of when to use each tool
Add more training scenarios that require careful tool selection

Parameter Issues:

Include scenarios that demonstrate correct parameter usage
Consider adding validation examples to your training data

Next Steps

Explore the complete MCP•RL notebook
Learn more about RULER evaluation
Check out checkpoint forking for iterative training
Join our Discord to discuss MCP•RL with the community

MCP•RL is particularly effective because RULER can judge response quality purely from the agent’s final output—no labeled data required! This makes it possible to train high-quality MCP agents with minimal manual intervention.

Get Started

Fundamentals

Features

Integrations

Tutorials

Resources

Experimental

MCP•RL: Training Agents to Use MCP Servers

What is MCP•RL?

How MCP•RL Works

1. Server Discovery

2. Scenario Generation

3. RULER Evaluation

4. Reinforcement Learning

Getting Started

Prerequisites

Basic Training Pipeline

Example Use Cases

What MCP•RL is Good At

Best Practices

Troubleshooting

Common Issues

Next Steps

Get Started

Fundamentals

Features

Integrations

Tutorials

Resources

Experimental

​What is MCP•RL?

​How MCP•RL Works

​1. Server Discovery

​2. Scenario Generation

​3. RULER Evaluation

​4. Reinforcement Learning

​Getting Started

​Prerequisites

​Basic Training Pipeline

​Example Use Cases

​What MCP•RL is Good At

​Best Practices

​Troubleshooting

​Common Issues

​Next Steps

What is MCP•RL?

How MCP•RL Works

1. Server Discovery

2. Scenario Generation

3. RULER Evaluation

4. Reinforcement Learning

Getting Started

Prerequisites

Basic Training Pipeline

Example Use Cases

What MCP•RL is Good At

Best Practices

Troubleshooting

Common Issues

Next Steps