01Automated tool usage auditing to detect redundant searches and missing results
02Detailed transcript generation including entity IDs and graph traversal paths
03Diagnostic script for identifying RAG-specific failure modes and proposing fixes
04Headless batch processing with consistent thread-id context tracking
05Structured LLM-as-judge scoring based on a 5-point performance rubric
060 GitHub stars