GSPO was introduced by the Qwen team to train state-of-the-art models including Qwen3-235B-A22B-Instruct-2507. It can improve training stability and efficiency for Mixture-of-Experts (MoE) models, and may have limited or no impact for dense models.
GSPO’s core innovation is its sequence-level optimization objective. Instead of focusing on individual token likelihoods, GSPO defines importance ratios based on the sequence likelihood with length normalization to reduce variance.The algorithm optimizes:
GSPO can be configured using the importance_sampling_level parameter when training with ART:
Copy
Ask AI
from art import PolicyOptimizer# Initialize with GSPOoptimizer = PolicyOptimizer( algorithm="gspo", importance_sampling_level=0.8 # Adjust based on your needs)