01Reproducible data pipelines using dbt models and uv-managed virtual environments.
02Intelligent content classification for DuckDB storage or RAG-optimized text extraction.
03Automated image extraction and vision-based analysis for charts, diagrams, and tables.
04Multi-format support for Excel (.xlsx, .csv) and Word (.docx) documents.
05Advanced parsing for complex layouts, including merged cells and transposed tables.
062 GitHub stars