01Advanced deduplication via hash, simhash, and embeddings
022 GitHub stars
03Multi-provider LLM support including OpenAI, Azure, and local APIs
04Automated data validation for JSON schemas and conversation formats
05Interactive Q&A for rapid pipeline prototyping
06Multi-turn conversation and function-calling dataset generation