Skip to main content
Supervised fine-tuning (SFT) trains a model on labeled chat examples rather than through trial-and-error with rewards. It’s useful for distillation (training a smaller model on outputs from a larger teacher model), teaching a specific output style or format, and warming up a model before RL training so it starts from a stronger baseline. ART supports SFT on both LocalBackend and ServerlessBackend.

Data format

SFT training data is a JSONL file where each line is a JSON object with messages and optionally tools. Here’s a simple example:
{
  "messages": [
    { "role": "system", "content": "You are a helpful assistant" },
    { "role": "user", "content": "What is the capital of Tasmania?" },
    { "role": "assistant", "content": "Hobart" }
  ]
}
To train on tool-call conversations, include a tools array and tool_calls in the assistant message:
{
  "messages": [
    { "role": "system", "content": "You are a helpful assistant" },
    { "role": "user", "content": "What's the weather in Hobart?" },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "id": "call_1",
          "type": "function",
          "function": {
            "name": "get_weather",
            "arguments": "{\"location\": \"Hobart\"}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "tool_call_id": "call_1",
      "content": "15Β°C, partly cloudy"
    },
    {
      "role": "assistant",
      "content": "It's currently 15Β°C and partly cloudy in Hobart."
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
          "type": "object",
          "properties": { "location": { "type": "string" } }
        }
      }
    }
  ]
}
Each line must follow these rules:
  • messages (required) β€” a non-empty list of chat messages. Each message has a role (system, user, assistant, or tool) and content. The last message must be from the assistant role.
  • tools (optional) β€” a list of tool/function definitions, following the OpenAI tool format.
Messages follow the OpenAI chat format, including support for tool_calls in assistant messages.
Only the assistant’s response tokens contribute to the training loss. Instruction and user tokens are automatically masked so the model learns to produce better responses without memorizing prompts.

Training from a JSONL file

For large datasets, use train_sft_from_file. It handles batching and applies a learning rate schedule automatically.
import asyncio
import art
from art.local import LocalBackend
# from art.serverless.backend import ServerlessBackend
from art.utils.sft import train_sft_from_file

async def main():
    backend = LocalBackend()
    # backend = ServerlessBackend()  # or use serverless for managed GPUs
    model = art.TrainableModel(
        name="my-sft-model",
        project="sft-project",
        base_model="Qwen/Qwen3-30B-A3B-Instruct-2507",
    )
    await model.register(backend)

    await train_sft_from_file(
        model=model,
        file_path="data/train.jsonl",
        epochs=3,
        batch_size=2,
        peak_lr=2e-4,
        schedule_type="cosine",
        warmup_ratio=0.1,
        verbose=True,
    )

asyncio.run(main())

Distillation

Distillation trains a smaller model on completions from a larger teacher model. Generate responses from the teacher, wrap them as trajectories, and fine-tune:
import asyncio
from openai import AsyncOpenAI
import art
from art.local import LocalBackend
# from art.serverless.backend import ServerlessBackend
from art.utils.sft import create_sft_dataset_iterator

TEACHER_MODEL = "z-ai/glm-5"

async def main():
    teacher_client = AsyncOpenAI(
        api_key="your-api-key",
        base_url="https://openrouter.ai/api/v1",
    )
    # Small models often produce malformed JSON or miss fields.
    # Distilling from a larger model teaches consistent structured extraction.
    system_prompt = "Extract {name, role, company} as JSON from the text. Return only valid JSON."
    inputs = [
        "Hi, I'm Sarah Chen, VP of Engineering at Acme Corp.",
        "David Park here β€” senior data scientist at Globex.",
        "I'm Maria Lopez. I lead product at Initech.",
        "Hey, this is James Wu from Umbrella Corp, working as a DevOps engineer.",
        "My name is Aisha Patel and I'm a research lead at DeepMind.",
        # ... more inputs
    ]

    trajectories = []
    for text in inputs:
        messages = [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": text},
        ]
        completion = await teacher_client.chat.completions.create(
            model=TEACHER_MODEL,
            messages=messages,
        )
        trajectories.append(art.Trajectory(
            messages_and_choices=[
                *messages,
                {"role": "assistant", "content": completion.choices[0].message.content},
            ],
        ))

    # Train student model on teacher outputs
    backend = LocalBackend()
    # backend = ServerlessBackend()  # or use serverless for managed GPUs
    student = art.TrainableModel(
        name="distillation-001",
        project="sft-distillation",
        base_model="Qwen/Qwen3-30B-A3B-Instruct-2507",
    )
    await student.register(backend)

    # create_sft_dataset_iterator computes the LR schedule (warmup + decay) over
    # the full dataset, then slices it correctly across chunks. Each
    # chunk's train_sft call logs its own metrics, giving you granular
    # loss curves instead of a single aggregated number.
    for chunk in create_sft_dataset_iterator(trajectories, peak_lr=2e-4):
        await student.train_sft(chunk.trajectories, chunk.config)

asyncio.run(main())

SFT as warmup before RL

A common pattern is to run SFT first to give the model a head start, then switch to RL for further improvement. ART supports switching between SFT and RL training seamlessly within the same run:
import art
from art.local import LocalBackend
# from art.serverless.backend import ServerlessBackend
from art.utils.sft import train_sft_from_file

async def main():
    backend = LocalBackend()
    # backend = ServerlessBackend()  # or use serverless for managed GPUs
    model = art.TrainableModel(
        name="warmup-then-rl",
        project="my-project",
        base_model="Qwen/Qwen3-30B-A3B-Instruct-2507",
    )
    await model.register(backend)

    # Phase 1: SFT warmup from a dataset
    await train_sft_from_file(
        model=model,
        file_path="data/train.jsonl",
        epochs=3,
    )

    # Phase 2: RL training picks up from the SFT checkpoint
    from my_project import rollout, scenarios
    for step in range(await model.get_step(), 50):
        train_groups = await art.gather_trajectory_groups(
            [
                art.TrajectoryGroup(rollout(model, scenario) for _ in range(8))
                for scenario in scenarios
            ]
        )
        await model.train(train_groups)
This works because both SFT and RL train the same LoRA adapter. After SFT completes, RL continues from the updated weights.

Local vs Serverless

Both backends support SFT with the same API. The key differences are in how training executes:
LocalBackendServerlessBackend
ExecutionTrains on your local GPUSends data to remote managed GPUs
CheckpointsSaved as LoRA adapters in .art/Stored as W&B Artifacts
InferenceYou deploy the LoRA adapter yourselfProduction-ready inference endpoint out of the box
Best forDevelopment, iteration, full controlProduction, no local GPU, large-scale training
The ServerlessBackend requires a W&B API key. See the backend docs for setup instructions.
# Serverless β€” same API, training runs remotely
from art.serverless.backend import ServerlessBackend

backend = ServerlessBackend()  # uses WANDB_API_KEY env var
model = art.TrainableModel(
    name="my-sft-model",
    project="sft-project",
    base_model="Qwen/Qwen3-30B-A3B-Instruct-2507",
)
await model.register(backend)

await model.train_sft(trajectories, config=art.TrainSFTConfig(learning_rate=5e-5))