Is this scraper free to run?

Yes, it is specifically designed to leverage a 100% free stack including GitHub Actions for scheduling, Python for logic, and the Gemini Flash free tier for AI analysis.

How does the AI enrichment work?

The agent uses Gemini Flash to batch-process collected items, assigning relevance scores, summaries, or classifications based on your specific criteria and context.

Can it learn from my preferences?

Yes, it includes a feedback learning system that stores your positive and negative decisions in a JSON file to bias and improve future AI scoring accuracy.

Where can I store the collected data?

The skill provides templates for syncing data directly to Notion databases, Google Sheets, or Supabase, as well as local file storage.

What data sources can it scrape?

The agent supports standard HTML scraping with BeautifulSoup, JS-rendered sites via Playwright, REST APIs, and RSS feeds.

AI Data Scraper Agent

Name: AI Data Scraper Agent
Author: affaan-m

byaffaan-m

•

172,009

•

Web Scraping y Recopilación de Datos

Builds automated, AI-powered data collection agents that scrape, enrich, and store data from any public source for free.

This skill enables the creation of robust, production-ready data monitoring systems using a completely free infrastructure stack. It automates the entire lifecycle of data collection—from scraping public websites and APIs using BeautifulSoup or Playwright to enriching results with Gemini Flash for relevance scoring and summarization. By leveraging GitHub Actions for scheduling and a feedback-driven learning system, it creates an autonomous agent that improves its accuracy over time while syncing results directly to Notion, Google Sheets, or Supabase.

Características Principales

01Scheduled execution via GitHub Actions for 100% free hosting

02Automated scraping for HTML, JS-rendered sites, APIs, and RSS feeds

03AI enrichment using free-tier Gemini Flash with automatic model fallback

04Batch processing architecture to maximize LLM rate limits and efficiency

05Feedback-loop system that learns and improves scoring from user decisions

06172,009 GitHub stars

Casos de Uso

01Automated job board monitoring with relevance scoring based on a resume

02Summarizing and classifying news feeds or GitHub repository updates

03Product price tracking and competitive intelligence alerts

Características Principales

01Scheduled execution via GitHub Actions for 100% free hosting

02Automated scraping for HTML, JS-rendered sites, APIs, and RSS feeds

03AI enrichment using free-tier Gemini Flash with automatic model fallback

04Batch processing architecture to maximize LLM rate limits and efficiency

05Feedback-loop system that learns and improves scoring from user decisions

06172,009 GitHub stars

Casos de Uso

01Automated job board monitoring with relevance scoring based on a resume

02Summarizing and classifying news feeds or GitHub repository updates

03Product price tracking and competitive intelligence alerts