소개
Provides expert guidance and implementation patterns for AI model quantization, focusing on GGUF format conversion for llama.cpp. It enables developers to deploy high-performance LLMs on consumer-grade hardware by balancing quality-performance tradeoffs, reducing memory footprints, and conducting rigorous quality benchmarking. This skill ensures model integrity through checksum verification while optimizing for specific CPU/GPU memory constraints and maintaining minimal perplexity degradation.