Is Docker required to use this skill?

Docker is required for 'Gap 4' testing, which involves executing AI-generated code within a secure sandbox environment. Other tiers can run with mock implementations.

What models are supported for local testing?

The skill is optimized for Ollama-hosted models, specifically Qwen3:4b for fast iteration and DeepSeek-R1:8b for complex reasoning tasks.

How do I switch between Anthropic and local models?

You can use the 'provider switch' trigger or specify the provider directly in the test command, such as './scripts/run-tests.sh e2e anthropic'.

What does the 'Sanity' test tier cover?

Sanity tests are quick (~30s) validations that check basic imports, configuration loading, and mock workflows to ensure the environment is functional.

Kosmos E2E Testing

Name: Kosmos E2E Testing
Author: jimmc414

byjimmc414

•

324

•

セキュリティとテスト

Automates comprehensive end-to-end testing for the Kosmos autonomous AI scientist using local models, external APIs, and Docker sandboxes.

This skill provides a robust testing framework specifically designed for the Kosmos project, an implementation of the AI Scientist autonomous discovery system. It enables developers to run tiered test suites—ranging from quick sanity checks to full research workflows—across multiple environments including Ollama-powered local LLMs, Anthropic/OpenAI APIs, and secure Docker execution environments. By automating environment health checks and provider switching, it ensures the reliability of complex AI-driven research workflows and scientific discovery pipelines within the Claude Code environment.

主な機能

01Multi-tier testing support including Sanity, Smoke, E2E, and Production suites

02Performance benchmarking for comparing local versus API-based model reasoning capabilities

03Automatic provider detection and switching between local Ollama models and external APIs

04Docker sandbox orchestration for secure execution of AI-generated code

05324 GitHub stars

06Integrated health checks for environment readiness across models, Docker, and databases

ユースケース

01Setting up isolated Docker environments for safe code execution during autonomous AI discovery

02Validating autonomous research workflows using local reasoning models like DeepSeek-R1

03Running regression tests across different LLM providers to ensure cross-model stability

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add jimmc414/kosmos kosmos-e2e-testing

For use in Claude.ai and ChatGPT

Download Skill