01Automatic format detection for Alpaca, ShareGPT, ChatML, and Raw text
02Chinchilla optimality calculations for specific model sizes and LoRA ranks
03Seamless context handoff to training workflows like funsloth-train
044 GitHub stars
05Comprehensive token statistics including mean, median, and outlier detection
06Schema validation with actionable suggestions for data fixes