ServerlessBackend
Run training and inference on autoscaling GPUs.
SkyPilotBackend
Run training and inference on a separate ephemeral machine.
LocalBackend
Run training and inference on your local machine.
Initializing the client
The client that you’ll use to generate tokens and train your model is initialized through theart.TrainableModel
class.
Initializing from an existing SFT LoRA
If you’ve already fine-tuned a model with SFT using a LoRA adapter (e.g., Unsloth/PEFT) and have a standard Hugging Face–style adapter directory, you can start RL training from those weights by passing the adapter directory path asbase_model
when creating your TrainableModel
.
Why this?
- Warm-start from task-aligned weights to reduce steps/GPU cost.
- Stabilize early training, especially for small models (1B–8B) that may get near-zero rewards at RL start.