Can I compare different Claude models using this skill?

Yes, Promptfoo allows you to configure multiple providers, such as Claude 3.5 Sonnet and Opus, to see how they perform on the same test cases.

What is Promptfoo used for in Claude Code?

Promptfoo is used to systematically test and compare LLM outputs against specific criteria to ensure accuracy and consistency across different prompts and models.

Does Promptfoo support custom grading logic?

Yes, you can write custom assertions using JavaScript or Python, or use the 'llm-rubric' feature to have an LLM grade the output based on a text-based rubric.

What are 'assertions' in Promptfoo?

Assertions are rules that check if an output is correct. They can range from simple text matching and JSON validation to advanced semantic similarity checks.

Promptfoo LLM Testing

Name: Promptfoo LLM Testing
Author: bendrucker

bybendrucker

0•

보안 및 테스팅

Evaluates and compares LLM outputs using an automated testing framework to ensure prompt reliability and quality.

Promptfoo for Claude Code provides a powerful evaluation framework for testing, benchmarking, and debugging LLM outputs. It enables developers to define structured test cases, run complex assertions (including semantic similarity and LLM-as-judge rubrics), and compare results across different Claude model versions side-by-side. By integrating this skill, users can automate the prompt engineering process, catch regressions early, and maintain high standards for AI-driven application logic directly from the CLI.

주요 기능

010 GitHub stars

02Support for Anthropic Claude 3.5 and 4.5 model families

03Automated LLM output testing and side-by-side comparison

04LLM-as-judge capabilities via llm-rubric for qualitative grading

05Extensive assertion library including Regex, JSON, and Semantic Similarity

06Integrated web UI for visualizing evaluation results and metrics

사용 사례

01Validating structured JSON data and schema consistency in model responses

02Benchmarking different system prompts to find the most effective instructions

03Regression testing prompts after upgrading to newer model versions

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add bendrucker/route-agent promptfoo

For use in Claude.ai and ChatGPT

Download Skill