01Configurable SFT regularization to prevent model degradation and forgetting
02Reference-free preference optimization reducing VRAM and compute overhead
03DeepSpeed ZeRO-3 integration for scaling up to 70B parameter models
04Specialized workflows for base, instruct, and reasoning-intensive models
05Superior performance over DPO on major alignment benchmarks
063,983 GitHub stars