About
This skill provides expert guidance for optimizing vector indexes in production LLM and RAG applications. It helps developers navigate the complex trade-offs between search latency, recall accuracy, and memory consumption by providing implementation patterns for HNSW tuning, scalar and product quantization, and infrastructure scaling. Whether you are managing ten thousand or one hundred million vectors, this skill offers the benchmarking templates and configuration logic needed to build efficient, high-performance retrieval systems.