Why should I use Python instead of Shell for Taiga API requests?

Shell environments often suffer from bugs related to environment variables and piping that can strip cookie values required for Taiga authentication. Using Python's urllib ensures stable and secure header handling.

How do I handle authentication failures with the Taiga API?

Authentication uses session cookies. If a request fails, refresh your session in the browser, navigate to DevTools > Network, and copy the 'Cookie' header from a request to taiga.ant.dev into your local environment file.

Which model is recommended for submitting jobs via this skill?

To ensure consistent and high-quality results, this skill is configured to always use 'claude-opus-4-5-20251101' for job submissions unless you explicitly request a different model.

Can I check the pass rates of a specific job ID?

Yes, you can use the '/jobs/{job_id}/problems' endpoint to retrieve detailed scores for every problem in a run, which the skill can then aggregate into a pass rate percentage.

Taiga Evaluation Manager

Name: Taiga Evaluation Manager
Author: atondwal

byatondwal

0•

분석 및 모니터링

Automates interactions with the Taiga API to track job results, analyze pass rates, and manage model evaluation runs.

This skill provides a comprehensive interface for interacting with the Taiga evaluation platform API, specifically tailored for Claude Code. It allows developers to query job statuses, aggregate problem pass rates, and retrieve detailed execution transcripts directly through Python-based API requests. By enforcing best practices—such as utilizing the Claude 4.5 Opus model for submissions and robust cookie-based authentication—it streamlines the workflow for evaluating AI model performance across various environments and problem sets, ensuring reliable data collection and analysis.

주요 기능

01Automate job creation using standardized model configurations like Claude 4.5 Opus

02Query and filter jobs based on environment IDs, status, or problem sets

03Retrieve real-time job results and pass rates for specific evaluation runs

040 GitHub stars

05Access detailed execution transcripts and container logs for deep-dive debugging

06Python-native implementation to ensure stable authentication and header handling

사용 사례

01Debugging failed problem runs by streaming logs and analyzing execution transcripts

02Monitoring the progress and success rates of large-scale model evaluation batches

03Automating the submission of new model evaluation jobs to the Taiga platform

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add atondwal/config taiga-api

For use in Claude.ai and ChatGPT

Download Skill