关于
The Golden Dataset Curation skill streamlines the creation of high-quality benchmarks for AI evaluation by implementing a rigorous multi-agent validation workflow. It automatically classifies content types, assigns semantic difficulty levels, and generates targeted test queries to ensure dataset diversity and rigor. By utilizing a weighted consensus model and integrating directly with Langfuse for observability, this skill provides a production-ready framework for maintaining the 'ground truth' data necessary for RAG systems, LLM fine-tuning, and performance testing.