What are the key metrics for retrieval quality in this framework?

The primary retrieval metrics include Context Recall (C-Rec), Context Precision, and Multi-hop Coverage to ensure the system is navigating the graph effectively.

Can I use this skill to compare different RAG architectures?

Yes, the skill includes a specific workflow for baseline comparison, allowing you to run identical test sets against GraphRAG, pure vector RAG, and LLM-only configurations.

What makes GraphRAG evaluation different from standard RAG evaluation?

GraphRAG evaluation requires assessing unique dimensions like knowledge graph completeness, multi-hop retrieval paths, and the accuracy of step-by-step reasoning chains that traditional vector RAG doesn't utilize.

How does this skill measure AI hallucinations?

It uses grounding rate metrics to quantify both intrinsic hallucinations (contradicting retrieved evidence) and extrinsic hallucinations (claims not supported by any source in the knowledge graph).

GraphRAG Evaluation & Benchmarking

Name: GraphRAG Evaluation & Benchmarking
Author: lyndonkl

bylyndonkl

•

データサイエンスとML

Evaluates GraphRAG system performance across knowledge graph quality, retrieval accuracy, multi-step reasoning, and hallucination prevention.

The GraphRAG Evaluation skill provides a specialized framework for measuring the effectiveness of systems that combine knowledge graphs with retrieval-augmented generation. It guides developers through a systematic process of assessing knowledge graph completeness, measuring multi-hop retrieval precision, and validating complex reasoning chains. By offering standardized metrics and reporting templates, this skill helps identify specific system weaknesses, quantify grounding rates to reduce hallucinations, and conduct rigorous baseline comparisons against traditional vector-based RAG implementations.

主な機能

01Standardized evaluation reporting with automated rubric scoring and baseline comparisons.

02Step-by-step reasoning verification to identify error propagation in complex queries.

03Quantitative hallucination measurement using intrinsic and extrinsic grounding rates.

04Multi-hop retrieval analysis including context recall and precision scoring.

0532 GitHub stars

06Comprehensive metrics for knowledge graph quality and schema consistency.

ユースケース

01Benchmarking a GraphRAG implementation against pure vector RAG to justify graph overhead.

02Auditing an AI system's factual accuracy and multi-step reasoning capabilities before production.

03Identifying specific gaps in entity coverage or relation completeness within a knowledge graph.

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add lyndonkl/claude graphrag-evaluation

For use in Claude.ai and ChatGPT

Download Skill