Twitter Screenshot Archive offers a comprehensive solution for organizing and analyzing personal collections of Twitter screenshots. It leverages OCR technology to ingest image content, converting visual information into searchable text, and integrates with PostgreSQL for storage and indexing. Users can explore their archives through an optional Flask web UI, which provides fuzzy search, stemming, and similarity-based browsing. For advanced analysis, an MCP server interface allows integration with LLMs, enabling semantic search, topic clustering, and discourse tracing through specialized tools, transforming unstructured screenshot data into an intelligent, queryable knowledge base.
주요 기능
01OCR-based ingestion of Twitter screenshots
02Full-text search with fuzzy matching, stemming, and boolean syntax
03Semantic search powered by LLM embeddings (via LM Studio)
04Topic clustering and discourse tracing (PCA, HDBSCAN)
05Related tweet discovery using MinHash/LSH for lexical similarity
060 GitHub stars