About
HQQ (Half-Quadratic Quantization) is a high-performance model optimization skill designed for rapid LLM compression. Unlike traditional methods like GPTQ or AWQ, HQQ is calibration-free, allowing developers to quantize models to ultra-low bit-widths (down to 1-bit) in minutes rather than hours without needing external datasets. It features native integration with HuggingFace Transformers and vLLM, supports multiple optimized CUDA backends like Marlin and BitBlas, and maintains compatibility with PEFT/LoRA fine-tuning for efficient model adaptation on consumer-grade hardware.