What is the Who-Wins skill for Claude?

It is a specialized skill that allows Claude to query the PinchBench leaderboard for real-time data on how different AI models perform on standardized coding tasks.

Can I see which AI model is the cheapest to use?

Yes, you can use the skill to sort models by cost to identify the most budget-friendly options for your specific development needs.

Does it support comparisons between specific models?

Absolutely. You can request a comparison between specific models like 'Claude vs Gemini' to see their relative scores, speeds, and costs side-by-side.

How does this differ from Claude's general knowledge?

While Claude has general knowledge of models, this skill fetches live, external benchmark data, providing much more accurate and up-to-date rankings and scores.

AI Model Benchmarks & Comparisons

Name: AI Model Benchmarks & Comparisons
Author: dvcrn

bydvcrn

•

Data Science & ML

Queries the PinchBench leaderboard to provide real-time performance, cost, and speed data for AI coding models.

The Who-Wins skill gives Claude direct access to the PinchBench AI agent leaderboard, providing objective, real-world benchmark data for LLMs performing OpenClaw coding tasks. Instead of relying on outdated training data, this skill allows users to fetch live rankings to compare model scores, execution costs, and processing speeds. It is an essential tool for developers and architects who need to decide which model offers the best balance of performance and value for automated coding workflows.

Key Features

01Provides data-driven insights into AI model efficiency and value

02Fetches real-time AI agent rankings from the PinchBench leaderboard

03Sorts models by performance score, API cost, or execution time

045 GitHub stars

05Compares specific models like Claude, GPT, and Gemini side-by-side

06Filters leaderboard results to find the best models for specific use cases

Use Cases

01Deciding which LLM is the most cost-effective for high-volume coding tasks

02Finding the fastest responding AI agent for real-time developer tools

03Benchmarking new model releases against existing industry leaders

Key Features

01Provides data-driven insights into AI model efficiency and value

02Fetches real-time AI agent rankings from the PinchBench leaderboard

03Sorts models by performance score, API cost, or execution time

045 GitHub stars

05Compares specific models like Claude, GPT, and Gemini side-by-side

06Filters leaderboard results to find the best models for specific use cases

Use Cases

01Deciding which LLM is the most cost-effective for high-volume coding tasks

02Finding the fastest responding AI agent for real-time developer tools

03Benchmarking new model releases against existing industry leaders