Dataset Onboarding
Automates dataset onboarding and cataloging via a FastAPI-based server, integrating with Google Drive for input and storage.
Acerca de
The MCP Dataset Onboarding Server is a comprehensive solution designed to automate the entire workflow of transforming raw CSV/Excel files into professionally cataloged and documented datasets. Leveraging FastAPI, it integrates seamlessly with Google Drive, using it as both an input source for new files and a mock catalog for processed outputs. The system prioritizes security with explicit credential management, offers robust automated processing capabilities, extracts rich metadata, suggests data quality rules, and generates detailed Excel contracts. It provides multiple interfaces including a fully automated daemon, a REST API, a Model-Compatible Protocol (MCP) server for LLM integration, and CLI tools, complemented by a real-time monitoring dashboard, making it an ideal tool for efficiently managing and preparing data for downstream applications.
Características Principales
- Google Drive Integration
- Data Quality Rules
- Metadata Extraction
- Contract Generation
- Automated Dataset Processing
- 0 GitHub stars
Casos de Uso
- Automated ingestion and processing of datasets from Google Drive for 'set-and-forget' workflows.
- Enabling LLMs (e.g., Claude Desktop) to process and analyze data using natural language commands.
- Integrating programmatic dataset onboarding into existing data pipelines via a REST API.