关于
The LoCoMo Benchmark skill is a specialized evaluation tool designed to measure the effectiveness of long-term conversational memory within the cc-soul ecosystem. It automates the process of ingesting multi-session conversation data—extracting observations and speaker facts—and then subjects the system to rigorous QA testing. By measuring retrieval accuracy across multi-hop, temporal, and adversarial categories, it provides developers with a standardized F1 score to compare their AI's memory performance against human baselines and state-of-the-art models like GPT-4.