01Direct Preference Optimization (DPO) for simplified model alignment
023,983 GitHub stars
03Pre-configured checklists and workflows for post-training scenarios
04Memory-efficient online reinforcement learning with GRPO
05Full RLHF pipelines including Reward Model training and PPO
06Supervised Fine-Tuning (SFT) for instruction-based training