Does this pattern support large files?

Yes, the pattern utilizes chunked reading to compute SHA-256 hashes without loading entire large files into system memory.

Why use content hashes instead of file paths for caching?

Content hashes allow the cache to persist even if files are moved or renamed, and they ensure the cache automatically invalidates if the file content changes, preventing stale data.

Is it difficult to integrate with existing code?

No, it uses a service layer wrapper approach that allows you to add caching to existing pure functions without modifying their internal logic, following the Single Responsibility Principle.

What happens if a cache file becomes corrupted?

The pattern is designed to handle corruption gracefully by treating invalid or unreadable JSON cache files as cache misses and re-processing the source file.

Content-Hash File Caching

Name: Content-Hash File Caching
Author: flatrick

byflatrick

0•

开发者工具

Optimizes file processing performance by caching results based on SHA-256 content hashes rather than file paths.

The Content-Hash Cache Pattern provides a robust framework for caching expensive file processing results, such as PDF extraction or image analysis, by using unique SHA-256 content hashes as identifiers. Unlike traditional path-based caching, this approach ensures that moves, renames, or identical copies of files do not trigger redundant processing while guaranteeing automatic invalidation the moment file content changes. It emphasizes a clean service-layer architecture that keeps core processing logic pure and separate from the caching mechanism, resulting in faster, more reliable, and cost-efficient development workflows for Claude-powered agents and CLI tools.

主要功能

01Memory-efficient chunked hashing for large file processing

02SHA-256 content-based identity for path-independent caching

03Automatic cache invalidation triggered by content changes

04Service layer separation to maintain pure processing functions

050 GitHub stars

06High-performance O(1) lookup using hash-named storage

使用场景

01Adding persistent caching to CLI tools and data processing scripts

02Accelerating PDF, OCR, and document extraction pipelines

03Reducing API costs for repetitive image and text analysis tasks

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add flatrick/mdt content-hash-cache-pattern

For use in Claude.ai and ChatGPT

主要功能

01Memory-efficient chunked hashing for large file processing

02SHA-256 content-based identity for path-independent caching

03Automatic cache invalidation triggered by content changes

04Service layer separation to maintain pure processing functions

050 GitHub stars

06High-performance O(1) lookup using hash-named storage

使用场景

01Adding persistent caching to CLI tools and data processing scripts

02Accelerating PDF, OCR, and document extraction pipelines

03Reducing API costs for repetitive image and text analysis tasks

What are Skills?·How to Install

Install with 🐟 Skill.Fish

npx skillfish add flatrick/mdt content-hash-cache-pattern

For use in Claude.ai and ChatGPT