How does this handle API rate limits?

The skill implements patterns for exponential backoff with jitter and logic to monitor rate limit headers, allowing the script to pause or record the current state to resume after the reset period.

Is this skill suitable for real-time data streaming?

No, this skill is specifically designed for batch-oriented, paginated REST API ingestion. For real-time needs, you should use websocket or streaming connection patterns instead.

What is the Two Watermarks pattern?

The pattern uses two separate pointers—newest_id and oldest_id—to track the boundaries of fetched data. This allows a pipeline to simultaneously fetch new data (forward) and backfill historical data (backward) without creating gaps or overlaps.

Why are watermarks only saved at the end of a run?

While data records are saved after every page to ensure resilience, the watermarks (the 'source of truth' for progress) are only updated after a successful run to prevent the system from thinking it has completed a fetch that actually failed midway.

Incremental API Fetcher

Name: Incremental API Fetcher
Author: shipshitdev

byshipshitdev

•

웹 스크래핑 및 데이터 수집

Builds resilient data ingestion pipelines that handle paginated API results with state tracking and historical backfills.

Incremental Fetch is a specialized skill designed to automate the creation of robust data pipelines that never lose progress or duplicate records. By implementing the 'Two Watermarks' pattern, it enables Claude to track both the newest and oldest records, allowing for seamless forward updates and backward historical backfills. It prioritizes resilience by saving data records page-by-page while deferring watermark updates until successful completion, making it ideal for large-scale data ingestion from platforms like X (Twitter), financial exchanges, and complex REST APIs where rate limits and connectivity issues are common.

주요 기능

01Resilient page-by-page data persistence to prevent loss on interruption

02Support for ID-based, cursor-based, and timestamp-based pagination types

0310 GitHub stars

04State management logic for resuming interrupted downloads without duplicates

05Two-watermark pattern for managing forward updates and historical backfills

06Configurable retry mechanisms with exponential backoff and jitter

사용 사례

01Building a continuous sync pipeline for social media mentions or posts

02Creating reliable ingestion scripts for third-party SaaS platform data

03Performing massive historical data backfills from financial market APIs

주요 기능

01Resilient page-by-page data persistence to prevent loss on interruption

02Support for ID-based, cursor-based, and timestamp-based pagination types

0310 GitHub stars

04State management logic for resuming interrupted downloads without duplicates

05Two-watermark pattern for managing forward updates and historical backfills

06Configurable retry mechanisms with exponential backoff and jitter

사용 사례

01Building a continuous sync pipeline for social media mentions or posts

02Creating reliable ingestion scripts for third-party SaaS platform data

03Performing massive historical data backfills from financial market APIs