Optimizes high-speed CSV data-wrangling by implementing indexing, stats caching, and high-performance engine selection for large datasets.
This skill empowers Claude to efficiently handle massive CSV datasets by applying advanced performance optimizations for the qsv toolkit. It guides the agent in choosing between streaming, Polars-based, and memory-intensive commands while managing index files and stats caches to ensure near-instant operations even on gigabyte-scale files. By utilizing the Large File Decision Tree and Parquet acceleration, it enables seamless processing of data that would otherwise exceed standard memory limits.
主な機能
01Memory-Aware Routing: Categorizes commands into streaming vs. memory-intensive types to prevent system crashes.
02Automated Indexing Management: Handles .csv.idx files for O(1) lookups and multithreaded processing support.
03Stats Cache Optimization: Leverages cardinality data to skip redundant calculations and optimize complex join orders.
04Parquet Acceleration: Facilitates conversion to Parquet for optimized, repeated SQL queries via DuckDB integration.
05Polars Engine Integration: Utilizes vectorized columnar processing for high-speed SQL, joins, and pivots on large files.
063,614 GitHub stars
ユースケース
01Speeding up repeated analytical SQL queries on large datasets using Parquet and indexing optimizations.
02Processing multi-gigabyte CSV files that exceed available system RAM through streaming and Polars engines.
03Automating complex data joins and pivots on 100MB+ files while maintaining high throughput and low memory usage.