Can this skill handle very large CSV files?

Yes, qsv is designed for high performance. It uses indexing and optimized streaming for many commands, and provides specific tools like 'sqlp' and 'extsort' to handle files exceeding 5GB.

How does the AI documentation feature work?

The 'describegpt' command analyzes your data's statistics and structure to automatically generate a comprehensive Data Dictionary, description, and tags.

Does it handle different delimiters like TSV or SSV?

Yes, it can auto-detect delimiters using 'sniff' or manually specify them for TSV, SSV, and other character-separated formats.

Does it support SQL queries on CSV data?

Absolutely. The 'sqlp' tool allows you to run complex SQL queries, including JOINs and GROUP BY operations, directly on your CSV or Parquet files.

What formats can I export my data to?

The skill supports exporting to JSONL, Parquet, Excel (xlsx), SQLite, Postgres, ODS, and Datapackage formats.

CSV Wrangling with qsv

Name: CSV Wrangling with qsv
Author: dathere

bydathere

•

3,614

•

Data Science & ML

Streamlines high-performance CSV data processing through a standardized workflow for cleaning, transforming, and analyzing large datasets.

This skill integrates the blazing-fast qsv toolkit into Claude Code, providing a structured methodology for managing CSV data at scale. It offers a comprehensive suite of tools for data discovery, profiling, and transformation, allowing users to perform complex SQL queries, joins, and aggregations without a traditional database. By following a rigorous 8-step workflow—from indexing to AI-powered documentation—it ensures data integrity and operational efficiency while handling multi-gigabyte files with optimized memory management.

Key Features

01Comprehensive data profiling including cardinality, statistics, and value distributions

02Automated data documentation and dictionary generation using AI-powered describegpt

03Standardized 8-step workflow for data discovery, profiling, and transformation

043,614 GitHub stars

05High-performance SQL querying directly on CSV and Parquet files using sqlp

06Multi-format conversion support for Parquet, JSONL, Excel, SQLite, and Postgres

Use Cases

01Performing complex multi-file joins and aggregations on disparate data sources

02Automating data integrity audits and generating standardized data dictionaries

03Cleaning and deduplicating massive datasets for machine learning or database migration

Key Features

01Comprehensive data profiling including cardinality, statistics, and value distributions

02Automated data documentation and dictionary generation using AI-powered describegpt

03Standardized 8-step workflow for data discovery, profiling, and transformation

043,614 GitHub stars

05High-performance SQL querying directly on CSV and Parquet files using sqlp

06Multi-format conversion support for Parquet, JSONL, Excel, SQLite, and Postgres

Use Cases

01Performing complex multi-file joins and aggregations on disparate data sources

02Automating data integrity audits and generating standardized data dictionaries

03Cleaning and deduplicating massive datasets for machine learning or database migration