How do I switch between Anthropic and local models?

You can use the provider-switch trigger or specify the provider directly in the shell command, such as './scripts/run-tests.sh e2e anthropic'.

What is the fastest way to verify my Kosmos environment?

Run the sanity test tier (~30s) or use the health-check script to verify available providers and dependencies.

What models are supported for local testing?

The skill natively supports Ollama models, specifically recommending qwen3:4b for fast sanity checks and deepseek-r1:8b for complex reasoning tasks.

Do I need Docker to use this skill?

Docker is required for Gap 4 testing, which involves actual code execution in a sandboxed environment. Other tiers can run without it using mock implementations.

Kosmos E2E Testing Suite

Name: Kosmos E2E Testing Suite
Author: Zehong-Wang

byZehong-Wang

0•

보안 및 테스팅

Automates comprehensive end-to-end testing for the Kosmos autonomous AI scientist project across local and cloud LLM providers.

This skill provides a robust testing infrastructure for the Kosmos project, enabling developers to validate autonomous research workflows using various models like Ollama (local) or Anthropic (cloud). It features a multi-tiered testing strategy ranging from 30-second sanity checks to 20-minute full suite runs, integrated Docker sandboxing for secure code execution, and automated provider detection to ensure the most efficient model is used for specific validation tasks. It is designed to bridge the gap between model reasoning and practical code execution in a controlled environment.

주요 기능

01Multi-tiered testing strategy including Sanity, Smoke, E2E, and Production levels

020 GitHub stars

03Automated Docker sandbox setup for secure research code execution

04Cross-provider support for Ollama (local), Anthropic, and OpenAI models

05Built-in environment health checks and provider auto-detection capabilities

06Integrated benchmarking to compare local model vs. cloud API performance

사용 사례

01Benchmarking local Ollama models like DeepSeek-R1 against cloud APIs for cost optimization

02Validating autonomous research workflows before deploying complex AI agent experiments

03Ensuring secure code execution environments using automated Docker sandbox configurations

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add zehong-wang/kosmos kosmos-e2e-testing

For use in Claude.ai and ChatGPT

Download Skill