Evaluates code completion tools' ability to accurately capture a developer's intent and suggest appropriate code snippets in diverse contexts.
Codev-Bench is a comprehensive evaluation framework designed to assess the performance of code completion tools in real-world, repository-level, and developer-centric scenarios. It moves beyond traditional benchmarks that focus solely on function generation from comments by incorporating diverse sub-scenes encountered in daily IDE-based coding, such as contextual completion for logical blocks, function parameter lists, and ordinary statements. Using unit tests and AST parsing, Codev-Bench accurately evaluates the code quality generated by various Language Learning Models (LLMs) across a range of completion scenarios, including full block, incomplete suffix, inner block, and Retrieval-Augmented Generation (RAG)-based completion.