概要
This skill equips developers with an exhaustive reference for the llama.cpp C API, facilitating the integration of state-of-the-art local AI inference into C and C++ applications. It covers the entire lifecycle of LLM interaction, from backend initialization and GGUF model loading to advanced batching, KV cache management, and complex sampling strategies. By providing curated workflows, non-deprecated function lookups, and troubleshooting guidance, it simplifies the process of building efficient, low-dependency AI tools on diverse hardware.