01Hybrid analysis mode that automatically balances speed and precision based on dataset size
020 GitHub stars
03Multiple similarity algorithms including TF-IDF, BM25, and Tree Edit Distance (APTED)
04Configurable similarity thresholds to filter results and reduce noise
05Flexible output formats including similarity matrices, sorted lists, and clusters
06Advanced file filtering using glob patterns and extension-specific includes/excludes