What is a golden dataset in the context of Claude Code?

A golden dataset is a curated collection of high-quality 'ground truth' data used to benchmark and evaluate the performance of AI models and RAG systems.

Does it handle duplicate content?

Yes, the skill includes a duplicate check that prevents adding content with more than 80% similarity to existing entries in your dataset.

What content types are supported for extraction?

The skill can detect and extract structure from blog articles, step-by-step tutorials, API documentation, and academic research papers.

How does the quality scoring system function?

It uses a weighted scoring system across four dimensions: Accuracy (0.25), Coherence (0.20), Depth (0.25), and Relevance (0.30). Documents scoring above 0.75 are included automatically, while those between 0.55 and 0.75 require review.

How does the multi-agent validation work?

The skill employs four parallel agents: one for code quality review and three 'Explore' agents that handle difficulty classification, domain tagging, and test query generation.

Golden Dataset Curator

Name: Golden Dataset Curator
Author: yonatangross

byyonatangross

•

セキュリティとテスト

Curates and validates high-quality documents for golden datasets using a multi-agent evaluation workflow.

The add-golden skill streamlines the creation of high-quality evaluation datasets for LLMs and RAG systems by automating the curation process. Using a sophisticated multi-agent architecture, it fetches web content, classifies document types, and performs parallel analysis to score quality across dimensions like accuracy and relevance. It ensures dataset integrity through automated duplicate detection, schema validation, and difficulty classification, providing developers with a robust, production-ready pipeline for managing ground-truth data within Claude Code.

主な機能

01Multi-agent validation with four specialized analysis agents

02Automatic content extraction and difficulty classification

03Quality-gate scoring based on accuracy, coherence, and depth

04Seamless integration with JSON fixture files for evaluation

0529 GitHub stars

06Automated duplicate detection and 80% similarity checking

ユースケース

01Curating high-quality training examples from technical documentation

02Automating the expansion of test suites with verified web content

03Building ground-truth datasets for RAG system evaluation

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add yonatangross/skillforge-claude-plugin add-golden

For use in Claude.ai and ChatGPT

Download Skill