概要
This skill provides specialized guidance for implementing QLoRA (Quantized Low-Rank Adaptation) to fine-tune massive language models on consumer-grade hardware. By leveraging 4-bit NormalFloat (NF4) quantization, double quantization, and paged optimizers, it allows engineers to train models with up to 65B parameters on a single 48GB GPU. It serves as an advanced extension of the LoRA skill, focusing on extreme VRAM optimization while maintaining performance parity with 16-bit fine-tuning methods.