01Automated train/validation/test splitting with custom rule-based logic
02Multi-stage temporal encoding and feature transformation pipelines
03Centralized vocabulary management for multi-source feature sets
04Extensible builder pattern for creating custom transform and split functions
050 GitHub stars
06Native integration with HuggingFace DatasetDict and Parquet storage