关于
Streamline the creation of benchmark test cases for natural language models and agents within the CAC evaluation framework. This skill automates the generation of standardized directory structures and essential files, including metadata, prompts, and reference answers, while enforcing strict numbering rules and formatting guidelines. It is an essential tool for researchers and developers looking to expand evaluation datasets with consistent, machine-readable test questions that include specific scoring indicators and difficulty levels.