Codev-Bench
Evaluates code completion tools' ability to accurately capture a developer's intent and suggest appropriate code snippets in diverse contexts.
About
Codev-Bench is a comprehensive evaluation framework designed to assess the performance of code completion tools in real-world, repository-level, and developer-centric scenarios. It moves beyond traditional benchmarks that focus solely on function generation from comments by incorporating diverse sub-scenes encountered in daily IDE-based coding, such as contextual completion for logical blocks, function parameter lists, and ordinary statements. Using unit tests and AST parsing, Codev-Bench accurately evaluates the code quality generated by various Language Learning Models (LLMs) across a range of completion scenarios, including full block, incomplete suffix, inner block, and Retrieval-Augmented Generation (RAG)-based completion.
Key Features
- Fine-grained evaluation of code completion tools.
- Real-world, repository-level benchmark.
- Developer-centric scenarios.
- Unit test-based evaluation.
- Supports diverse completion sub-scenes
- 41 GitHub stars
Use Cases
- Developing and improving code completion models to better align with developer needs.
- Evaluating the performance of code completion models.
- Identifying strengths and weaknesses of code completion tools in different scenarios.