Optimizes text parsing workflows by combining efficient Regex patterns with LLM-based validation for high-accuracy, cost-effective data extraction.
This skill provides a comprehensive decision framework and hybrid architecture for parsing structured text like invoices, quizzes, and forms. It advocates for a 'Regex-first' approach that handles the vast majority of consistent patterns deterministically, significantly reducing API costs. By implementing a confidence scoring layer, the skill programmatically identifies edge cases and redirects them to lightweight LLMs for validation, ensuring near-perfect accuracy without the expense of full LLM processing. It is ideal for developers building scalable data pipelines where speed and cost-efficiency are as critical as reliability.
主な機能
01323 GitHub stars
02Python implementation patterns for reusable structured data parsers
03Programmatic confidence scoring to detect extraction anomalies
04Decision framework for selecting between Regex and LLM methods