Why use content hashes instead of file paths for caching?

Content hashes ensure the cache remains valid even if files are renamed or moved, and automatically invalidate the cache the moment file content changes.

What happens if a cache file becomes corrupted?

The pattern follows a graceful degradation strategy where corrupted or unreadable JSON cache files are treated as cache misses, triggering a fresh processing run.

How does this skill handle large files?

The pattern implements chunked reading (typically 64KB blocks) during the hashing process to avoid memory exhaustion when dealing with large datasets.

Is this pattern compatible with existing codebases?

Yes, it uses a service layer wrapper that allows you to add caching to existing pure functions without modifying their internal logic.

Content-Hash File Caching

Name: Content-Hash File Caching
Author: flatrick

byflatrick

0•

開発者ツール

Implements high-performance, content-aware file caching to optimize expensive data processing and extraction tasks.

This skill provides a robust architecture for caching the results of computationally intensive file operations like PDF parsing, OCR, and image analysis. By using SHA-256 hashes of file contents rather than file paths as cache keys, it ensures that moved or renamed files do not trigger unnecessary re-processing while automatically invalidating the cache when file contents change. It utilizes a clean service layer separation, keeping core processing functions pure and ensuring the application architecture remains scalable, maintainable, and efficient.

主な機能

01Path-independent caching using SHA-256 content hashes

02Automatic cache invalidation on file modification

03O(1) lookup performance without centralized index files

04Service layer pattern for clean separation of concerns

05Graceful corruption handling and lazy directory management

060 GitHub stars

ユースケース

01Optimizing PDF and text extraction pipelines in AI applications

02Reducing API costs and latency in batch file processing workflows

03Speeding up image analysis and metadata generation tools

主な機能

01Path-independent caching using SHA-256 content hashes

02Automatic cache invalidation on file modification

03O(1) lookup performance without centralized index files

04Service layer pattern for clean separation of concerns

05Graceful corruption handling and lazy directory management

060 GitHub stars

ユースケース

01Optimizing PDF and text extraction pipelines in AI applications

02Reducing API costs and latency in batch file processing workflows

03Speeding up image analysis and metadata generation tools