01Memory-efficient 4-bit quantization for training 70B models on 24GB GPUs
02Integration patterns for TRL, Axolotl, and vLLM inference
03Supports 25+ adapter methods including LoRA, QLoRA, IA3, and Prefix Tuning
04Standardized configuration for attention and MLP layer targeting across architectures
05Dynamic multi-adapter management and runtime switching capabilities
06384 GitHub stars