014-bit weight quantization with minimal (<5%) accuracy degradation
02Seamless integration with vLLM, AutoAWQ, and HuggingFace Transformers
03Up to 3x inference speedup compared to standard FP16 models
04Automated calibration workflows for custom domain-specific datasets
05Support for Marlin kernels providing 2x speedup on Ampere and Hopper GPUs
063,983 GitHub stars