What are 'smart commands' in the context of the stats cache?

Smart commands use cardinality and data type information from the stats cache to optimize execution, such as skipping unique columns in frequency analysis or estimating output width for pivots.

What is the purpose of the qsv index file?

Index files (.csv.idx) enable O(1) random access lookups, instant row counts, and multithreaded processing, significantly speeding up commands like slice, sample, and count.

How do I handle CSV files larger than 1GB safely?

For files over 1GB, always use index and stats caches, prefer Polars engine commands for joins, avoid memory-intensive commands like 'sort', and consider splitting the file if necessary.

When should I use Polars commands like sqlp or joinp?

You should use Polars-based commands for files larger than 100MB or for complex queries, as the vectorized columnar processing model is much faster than row-by-row streaming.

qsv Performance Accelerator

Name: qsv Performance Accelerator
Author: dathere

bydathere

•

3,614

•

データサイエンスとML

Optimizes high-speed CSV data-wrangling by implementing indexing, stats caching, and high-performance engine selection for large datasets.

This skill empowers Claude to efficiently handle massive CSV datasets by applying advanced performance optimizations for the qsv toolkit. It guides the agent in choosing between streaming, Polars-based, and memory-intensive commands while managing index files and stats caches to ensure near-instant operations even on gigabyte-scale files. By utilizing the Large File Decision Tree and Parquet acceleration, it enables seamless processing of data that would otherwise exceed standard memory limits.

主な機能

01Memory-Aware Routing: Categorizes commands into streaming vs. memory-intensive types to prevent system crashes.

02Automated Indexing Management: Handles .csv.idx files for O(1) lookups and multithreaded processing support.

03Stats Cache Optimization: Leverages cardinality data to skip redundant calculations and optimize complex join orders.

04Parquet Acceleration: Facilitates conversion to Parquet for optimized, repeated SQL queries via DuckDB integration.

05Polars Engine Integration: Utilizes vectorized columnar processing for high-speed SQL, joins, and pivots on large files.

063,614 GitHub stars

ユースケース

01Speeding up repeated analytical SQL queries on large datasets using Parquet and indexing optimizations.

02Processing multi-gigabyte CSV files that exceed available system RAM through streaming and Polars engines.

03Automating complex data joins and pivots on 100MB+ files while maintaining high throughput and low memory usage.

主な機能

01Memory-Aware Routing: Categorizes commands into streaming vs. memory-intensive types to prevent system crashes.

02Automated Indexing Management: Handles .csv.idx files for O(1) lookups and multithreaded processing support.

03Stats Cache Optimization: Leverages cardinality data to skip redundant calculations and optimize complex join orders.

04Parquet Acceleration: Facilitates conversion to Parquet for optimized, repeated SQL queries via DuckDB integration.

05Polars Engine Integration: Utilizes vectorized columnar processing for high-speed SQL, joins, and pivots on large files.

063,614 GitHub stars

ユースケース

01Speeding up repeated analytical SQL queries on large datasets using Parquet and indexing optimizations.

02Processing multi-gigabyte CSV files that exceed available system RAM through streaming and Polars engines.

03Automating complex data joins and pivots on 100MB+ files while maintaining high throughput and low memory usage.