Normalizes and merges duplicate data from multiple sources using reputation scoring and semantic hash-based grouping.
This skill provides a robust framework for handling data overlap in multi-source environments, such as news aggregators, product catalogs, or event feeds. It goes beyond simple URL matching by implementing semantic similarity grouping, source reputation scoring, and canonical version selection. By leveraging hash-based grouping and customizable preference logic, it ensures your application always presents the most authoritative and complete version of a record while providing detailed metrics on data reduction and optimization.
주요 기능
01585 GitHub stars
02ID-based conflict resolution with customizable preference logic
03Automated deduplication metrics including reduction percentage tracking
04Tiered source reputation scoring for authoritative canonical selection
05Flexible TypeScript implementation for complex multi-source data aggregation
06Content-based semantic grouping via hash-based key generation
사용 사례
01Aggregating news stories from different outlets to display a single authoritative article
02Cleaning event data streams where multiple sensors or APIs report the same incident
03Merging product listings from multiple e-commerce vendors into a unified catalog