01Low-latency mode for faster initial and follow-up responses
02Grounded answers with explicit citations (RAG)
03Conversation summary caching per session for reduced prompt context
04Multi-provider LLM support (OpenAI, Gemini, or Auto-fallback)
05Reliable retrieval-only operation when no LLM API keys are configured
060 GitHub stars