Does this skill work with any programming language?

Yes, while it features specific optimizations for languages like MoonBit and TypeScript, its core similarity strategies are effective across most text-based source files.

When should I use explore versus plan refactor?

Use 'explore' for a quick initial scan to find clusters of similar code; once duplicates are identified, use 'plan refactor' for more actionable, granular details on how to fix them.

Can I exclude specific files like configuration or build artifacts?

Yes, you can use the --exclude flag with glob patterns to filter out noise like 'node_modules', 'moon.pkg', or other config files that might skew similarity scores.

How can I find exact duplicates in my repository?

Run the command with the --format=list option and set a high --threshold (e.g., 0.95 or 1.0) to focus on identical or near-identical files.

What algorithms does indexion-explore use for analysis?

It supports a variety of strategies including TF-IDF, BM25, Jensen-Shannon Divergence, and precise function-level Tree Edit Distance (APTED/TSED).

Code Similarity Explorer

Name: Code Similarity Explorer
Author: trkbt10

bytrkbt10

0•

开发者工具

Analyzes codebases to detect duplicate files, find similar logic, and visualize code overlap for refactoring.

indexion-explore is a sophisticated codebase analysis tool designed to identify redundancy and structural similarities within a project. By utilizing multiple strategies ranging from fast TF-IDF token matching to precise tree edit distance algorithms, it helps developers pinpoint copy-pasted code and related modules before refactoring. It provides actionable insights through various output formats like similarity matrices and clusters, allowing for deep dives into architectural overlap and facilitating cleaner, more maintainable code.

主要功能

01Hybrid analysis mode that automatically balances speed and precision based on dataset size

020 GitHub stars

03Multiple similarity algorithms including TF-IDF, BM25, and Tree Edit Distance (APTED)

04Configurable similarity thresholds to filter results and reduce noise

05Flexible output formats including similarity matrices, sorted lists, and clusters

06Advanced file filtering using glob patterns and extension-specific includes/excludes

使用场景

01Auditing a codebase for copy-pasted logic during large-scale refactoring sessions

02Identifying and merging duplicated utility functions or components across large projects

03Visualizing architectural relationships and structural overlap between different modules

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add trkbt10/indexion-skills indexion-explore

For use in Claude.ai and ChatGPT

主要功能

01Hybrid analysis mode that automatically balances speed and precision based on dataset size

020 GitHub stars

03Multiple similarity algorithms including TF-IDF, BM25, and Tree Edit Distance (APTED)

04Configurable similarity thresholds to filter results and reduce noise

05Flexible output formats including similarity matrices, sorted lists, and clusters

06Advanced file filtering using glob patterns and extension-specific includes/excludes

使用场景

01Auditing a codebase for copy-pasted logic during large-scale refactoring sessions

02Identifying and merging duplicated utility functions or components across large projects

03Visualizing architectural relationships and structural overlap between different modules

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add trkbt10/indexion-skills indexion-explore

For use in Claude.ai and ChatGPT