Akab FAQs

Question 1

What is Akab and what is its primary purpose?

Accepted Answer

Akab is a production-grade A/B testing framework designed to scientifically compare outputs from Large Language Models (LLMs). It provides a unified system for rigorous evaluation, from quick explorations to fully blinded experiments.

Question 2

What are the three levels of testing offered by Akab?

Accepted Answer

Akab features a Three-Level Testing Architecture: Level 1 (Quick Compare) for rapid iteration and debugging; Level 2 (Campaign) for standard A/B testing with execution blinding; and Level 3 (Experiment) for unbiased scientific evaluation requiring statistical significance.

Question 3

How does Akab ensure scientific rigor and unbiased results?

Accepted Answer

Akab employs blinding options at different levels to prevent bias, performs robust statistical analysis (e.g., trimmed means, confidence intervals), and supports formal hypothesis testing for Level 3 experiments. It uses real API calls and actual LLM responses.

Question 4

Which LLM providers are supported by Akab?

Accepted Answer

Akab currently provides full support for Anthropic (Claude) and OpenAI (GPT) models. It also includes experimental support for Google (Gemini) models, which can be activated with a simple setup.

Question 5

Can Akab help define success criteria for my A/B tests?

Accepted Answer

Yes, Akab supports Dynamic Success Criteria, allowing users to define primary and secondary metrics (like quality score or response speed), assign weights, and set various constraints (e.g., token limits, keyword inclusion) for automated winner selection. It also offers intelligent assistance for constraint suggestions.

Akab

Acerca de

Características Principales

Casos de Uso