How do I prevent the model from forgetting its base capabilities?

The skill allows you to add SFT (Supervised Fine-Tuning) regularization by adjusting the sft_weight parameter, which helps preserve the model's original knowledge during the alignment process.

What makes SimPO better than DPO?

SimPO is reference-free, meaning it doesn't require a baseline model during training, making it faster, more memory-efficient, and typically achieving higher scores on benchmarks like AlpacaEval 2.0.

Can I use SimPO on consumer GPUs?

7B models can be trained on a single 40GB A100; however, for consumer hardware with lower VRAM, you must use DeepSpeed ZeRO-3 and gradient checkpointing to manage memory effectively.

Does this skill support Llama 3?

Yes, it includes specific configurations and workflows for fine-tuning Llama 3 8B Instruct models using binarized preference datasets.

SimPO LLM Alignment

Name: SimPO LLM Alignment
Author: Orchestra-Research

byOrchestra-Research

•

3,983

•

Data Science & ML

Implements Simple Preference Optimization to align Large Language Models without requiring a reference model.

SimPO (Simple Preference Optimization) is a high-performance, reference-free alternative to DPO designed for efficient LLM alignment. By eliminating the need for a secondary reference model during training, it significantly reduces compute requirements while delivering superior results—outperforming DPO by over 6 points on benchmarks like AlpacaEval 2.0. This skill provides standardized workflows for training base models like Mistral 7B, fine-tuning instruct models like Llama 3, and optimizing reasoning-heavy tasks through customizable hyperparameters such as reward scaling (beta) and target margins (gamma).

Key Features

01Configurable SFT regularization to prevent model degradation and forgetting

02Reference-free preference optimization reducing VRAM and compute overhead

03DeepSpeed ZeRO-3 integration for scaling up to 70B parameter models

04Specialized workflows for base, instruct, and reasoning-intensive models

05Superior performance over DPO on major alignment benchmarks

063,983 GitHub stars

Use Cases

01Improving the conversational quality and preference alignment of Llama or Mistral models

02Aligning a base model to follow instructions without a separate reference model

03Optimizing LLMs for specialized reasoning or coding tasks using preference datasets

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add orchestra-research/ai-research-skills simpo

For use in Claude.ai and ChatGPT

Download Skill