Crawlforge provides an open-source, LLM-native alternative to proprietary SEO spiders, enabling users to perform detailed technical SEO audits. It crawls websites, extracts critical on-page data, and evaluates 269 technical SEO rules across 18 categories. All crawl data is stored in a queryable DuckDB database using Parquet files, allowing for custom SQL analysis, crawl diffs, and seamless integration with data warehouses. Designed with a "rules as code" philosophy, it offers transparency and extensibility. A key differentiator is its native MCP server, allowing advanced LLMs like Claude Code, Codex, or Cursor to directly interact with the crawler, run queries, summarize findings, and generate client-ready reports using natural language.
Key Features
01Open-source technical SEO spider with 269 built-in rules across 18 categories
020 GitHub stars
03LLM-native design with an included Model Context Protocol (MCP) server for AI integration
04Columnar storage of crawl data in DuckDB + Parquet for SQL querying and custom analysis
05Extensible "rules as code" architecture for adding custom SEO checks with unit tests
06Comprehensive technical SEO audit capabilities covering response codes, metadata, structured data, hreflang, and more
Use Cases
01Automating SEO analysis, reporting, and issue identification through AI assistants using natural language
02Conducting deep technical SEO audits for websites without relying on a GUI or SaaS platform
03Performing custom data analysis, combining, and diffing of crawl data using SQL queries