What is MCP•RL?
MCP•RL combines two powerful technologies:- Model Context Protocol (MCP): A standard for connecting AI assistants to external tools and data sources
- ART (Agent Reinforcement Trainer): OpenPipe’s framework for training better AI agents using reinforcement learning
How MCP•RL Works
The training process follows these key steps:1. Server Discovery
2. Scenario Generation
3. RULER Evaluation
4. Reinforcement Learning
Getting Started
Optimizing against an MCP server can be surprisingly straightforward!Prerequisites
- Access to an MCP server you want to train on
- OpenRouter API key for training
- Python environment with ART installed
Basic Training Pipeline
Here’s a simplified example of training a model to use an MCP server:Example Use Cases
- Database Agent: Train a model to query databases, understand schemas, and generate appropriate SQL commands via an MCP database server.
- File Management Agent: Teach an agent to navigate file systems, read/write files, and perform complex file operations through an MCP file server.
- API Integration Agent: Train models to interact with REST APIs, handle authentication, and process responses via MCP API wrappers.
- Development Tools Agent: Create agents that can use development tools like Git, package managers, or testing frameworks through MCP servers.
What MCP•RL is Good At
MCP•RL excels at training agents to effectively use MCP servers by:- Tool Usage: Teaching when and how to use specific tools with appropriate parameters
- Multi-Step Workflows: Chaining tool calls and interpreting outputs to build complex workflows
- Domain Adaptation: Learning specialized terminology and conventions for different server types
Best Practices
- 📈 Iterative Training - Use checkpoint forking to experiment with different training approaches and parameters.
- 🔍 Monitor RULER Scores - Pay attention to RULER evaluation metrics to understand where your agent excels and where it needs improvement.
- 🧪 Test Thoroughly - Validate your trained agent on held-out scenarios that weren’t used during training.
- 📊 Use Diverse Scenarios - Ensure your training data covers the full range of tasks your agent will encounter in production.
Troubleshooting
Common Issues
Low RULER Scores:- Check if your MCP server is responding correctly
- Verify that generated scenarios are appropriate for your use case
- Consider adjusting training parameters
- Ensure the model has seen diverse examples of when to use each tool
- Add more training scenarios that require careful tool selection
- Include scenarios that demonstrate correct parameter usage
- Consider adding validation examples to your training data
Next Steps
- Explore the complete MCP•RL notebook
- Learn more about RULER evaluation
- Check out checkpoint forking for iterative training
- Join our Discord to discuss MCP•RL with the community
MCP•RL is particularly effective because RULER can judge response quality
purely from the agent’s final output—no labeled data required! This makes it
possible to train high-quality MCP agents with minimal manual intervention.