About
This skill provides a comprehensive framework for building high-performance retrieval-augmented generation (RAG) and semantic search systems. It guides developers through selecting the right embedding models—ranging from local lightweight options to premium APIs—while implementing context-aware chunking strategies for diverse document types like code, legal texts, and technical manuals. By deploying multi-tier caching architectures and batch processing optimizations, this skill helps developers reduce API costs by up to 90% and significantly improve retrieval quality and system throughput.