Streamlines high-performance CSV data processing through a standardized workflow for cleaning, transforming, and analyzing large datasets.
This skill integrates the blazing-fast qsv toolkit into Claude Code, providing a structured methodology for managing CSV data at scale. It offers a comprehensive suite of tools for data discovery, profiling, and transformation, allowing users to perform complex SQL queries, joins, and aggregations without a traditional database. By following a rigorous 8-step workflow—from indexing to AI-powered documentation—it ensures data integrity and operational efficiency while handling multi-gigabyte files with optimized memory management.
Key Features
01Comprehensive data profiling including cardinality, statistics, and value distributions
02Automated data documentation and dictionary generation using AI-powered describegpt
03Standardized 8-step workflow for data discovery, profiling, and transformation
043,614 GitHub stars
05High-performance SQL querying directly on CSV and Parquet files using sqlp
06Multi-format conversion support for Parquet, JSONL, Excel, SQLite, and Postgres
Use Cases
01Performing complex multi-file joins and aggregations on disparate data sources
02Automating data integrity audits and generating standardized data dictionaries
03Cleaning and deduplicating massive datasets for machine learning or database migration