Optimizes Parquet file operations in Rust to improve query performance, reduce storage costs, and prevent memory issues.
This skill provides domain-specific expertise for Rust developers working with Apache Parquet files, specifically targeting the arrow-rs and parquet-arrow ecosystems. It proactively identifies inefficient reading and writing patterns—such as missing compression, suboptimal row group sizing, and lack of column projection—and suggests high-performance alternatives. Whether you are building a data lake on S3 or a local analytics engine, this skill ensures your Parquet implementation utilizes best practices like ZSTD compression, dictionary encoding, and memory-efficient streaming to maximize throughput and minimize resource consumption.
주요 기능
01Row group sizing recommendations tailored for cloud storage (S3) scanning
02Automated analysis of WriterProperties for compression and encoding settings
03Memory-efficient streaming patterns to prevent OOM errors in large datasets
04Column projection and predicate pushdown optimization for faster data retrieval
050 GitHub stars
06Column-specific encoding suggestions based on data cardinality
사용 사례
01Troubleshooting slow analytical queries in Rust-based data engines like DataFusion
02Building production-grade data lakes on AWS S3 with optimized ZSTD storage
03Implementing memory-safe Parquet readers for high-throughput data pipelines