Optimizes Parquet file operations in Rust to improve query performance, reduce storage costs, and prevent memory issues.
This skill provides domain-specific expertise for Rust developers working with Apache Parquet files, specifically targeting the arrow-rs and parquet-arrow ecosystems. It proactively identifies inefficient reading and writing patterns—such as missing compression, suboptimal row group sizing, and lack of column projection—and suggests high-performance alternatives. Whether you are building a data lake on S3 or a local analytics engine, this skill ensures your Parquet implementation utilizes best practices like ZSTD compression, dictionary encoding, and memory-efficient streaming to maximize throughput and minimize resource consumption.
主な機能
01Row group sizing recommendations tailored for cloud storage (S3) scanning
02Automated analysis of WriterProperties for compression and encoding settings
03Memory-efficient streaming patterns to prevent OOM errors in large datasets
04Column projection and predicate pushdown optimization for faster data retrieval
050 GitHub stars
06Column-specific encoding suggestions based on data cardinality
ユースケース
01Troubleshooting slow analytical queries in Rust-based data engines like DataFusion
02Building production-grade data lakes on AWS S3 with optimized ZSTD storage
03Implementing memory-safe Parquet readers for high-throughput data pipelines