Train a summarizer model to outperform Sonnet 4 and GPT-4.1.
uv
by following the instructions here.
Then install the project dependencies by running uv sync
.
SkyPilotBackend
to manage the GPU that your model will be trained on. You’ll need to install ART with the SkyPilot optional dependency:
.env.example
..env
file at the root of the repository, set the following optional environment variables:
WANDB_API_KEY
- Enables metric logging to Weights & Biases.OPENPIPE_API_KEY
- Enables chat completion logging to OpenPipe.OPENAI_API_KEY
- Will be necessary for later comparison benchmarks, but not used for training.benchmarks
directory, but not for training itself. If you don’t already have AWS credentials with create/read/write permissions for s3 buckets, follow the instructions here.
AWS_ACCESS_KEY_ID
- Your AWS access key ID, which should have create/read/write permissions for s3 buckets.AWS_SECRET_ACCESS_KEY
- Your matching secret access key.AWS_REGION
- The region of the S3 bucket.BACKUP_BUCKET
- The name of the S3 bucket in which to store model checkpoints and logging data. Can be a new bucket or an existing one.benchmark_models.py
script will compare the performance of the trained model to gpt-4o
, gpt-4.1
, o4-mini
, and gemini-2.5-pro-preview
.
Before running the benchmark script, make sure you’ve provided a valid OPENROUTER_API_KEY
and the AWS credentials detailed in step 3. These credentials are necessary for the script to upload the benchmark results to S3.
benchmarks/display_benchmarks.ipynb
and running the cells. After running all the cells, you should see something like the following: