How does this skill improve training speed?

It utilizes the Unsloth library to patch the TRL SFTTrainer, providing up to 2x faster training and significantly reduced VRAM usage through optimized kernels.

What models are supported?

It supports most popular open-weight models compatible with Unsloth, including Llama 3, Mistral, Qwen, and Phi-3, specifically optimized for 4-bit loading.

Can I use this for reasoning models?

Yes, it includes specific patterns and formatting templates for training 'thinking' models like Qwen3-Thinking using special reasoning tags and self-questioning styles.

How do I avoid out-of-memory errors in Jupyter notebooks?

The skill provides specific configurations for gradient checkpointing, 4-bit quantization, and instructions for shutting down the Jupyter kernel to release GPU memory.

LLM Supervised Fine-Tuning (SFT)

Name: LLM Supervised Fine-Tuning (SFT)
Author: atrawog

byatrawog

데이터 과학 및 ML

Streamlines the supervised fine-tuning of Large Language Models using Unsloth for optimized performance and reasoning model development.

소개

This Claude Code skill provides a comprehensive framework for adapting pre-trained Large Language Models (LLMs) to follow specific instructions through Supervised Fine-Tuning (SFT). By leveraging the Unsloth library and TRL's SFTTrainer, it enables up to 2x faster training speeds and significantly reduced memory overhead, making it ideal for local GPU and Jupyter environments. The skill includes specialized patterns for training thinking or reasoning models, handles complex dataset formatting for various chat templates, and provides clear paths for exporting fine-tuned models to GGUF for Ollama deployment.

주요 기능

Specialized support for training 'thinking' and reasoning-based models
Memory-efficient 4-bit and LoRA/QLoRA configuration patterns
0 GitHub stars
Automated dataset formatting for instruction-response and chat templates
Optimized 2x faster training via Unsloth and SFTTrainer patching
Seamless export to GGUF for Ollama and vLLM inference integration

사용 사례

Developing custom instruction-following AI assistants for specialized domains
Creating domain-specific models from open-weight bases for local deployment
Fine-tuning reasoning models that perform internal chain-of-thought processing

소개

주요 기능

Specialized support for training 'thinking' and reasoning-based models
Memory-efficient 4-bit and LoRA/QLoRA configuration patterns
0 GitHub stars
Automated dataset formatting for instruction-response and chat templates
Optimized 2x faster training via Unsloth and SFTTrainer patching
Seamless export to GGUF for Ollama and vLLM inference integration

사용 사례

Developing custom instruction-following AI assistants for specialized domains
Creating domain-specific models from open-weight bases for local deployment
Fine-tuning reasoning models that perform internal chain-of-thought processing