Dingo
Detects data quality issues in datasets automatically using built-in rules and model evaluation methods.
About
Dingo is a comprehensive data quality evaluation tool that helps you automatically detect data quality issues in your datasets. It provides a variety of built-in rules and model evaluation methods and also supports custom evaluation methods. Dingo supports commonly used text datasets and multimodal datasets, including pre-training, fine-tuning, and evaluation datasets. It offers multiple usage methods, including local CLI and SDK, making it easy to integrate into various evaluation platforms.
Key Features
- Provides built-in rules for data quality checks
- Supports LLM-based evaluation using models like OpenAI and Llama3
- Offers CLI and SDK for flexible integration
- Supports text and image data modalities
- Provides a GUI for visualizing evaluation results
- 142 GitHub stars
Use Cases
- Evaluating the quality of pre-training datasets
- Assessing the quality of fine-tuning datasets
- Identifying data quality issues in text and image datasets