01384 GitHub stars
02Calibration-free quantization requiring no sample datasets
03Full compatibility with PEFT and LoRA for quantized fine-tuning
04Support for extreme compression from 8-bit down to 1-bit precision
05Optimized inference backends including Marlin, BitBlas, and TorchAO
06Seamless integration with HuggingFace Transformers and vLLM