Crawl4AI RAG
Createdcoleam00
Empowers AI agents and coding assistants with web crawling and retrieval-augmented generation (RAG) capabilities.
About
Crawl4AI RAG provides AI agents with advanced web crawling and RAG capabilities through the Model Context Protocol (MCP). It allows agents to crawl websites, store content in a vector database (Supabase), and perform RAG over the crawled content, enabling them to scrape any web content and leverage that knowledge for RAG tasks. The system intelligently handles various URL types, recursively crawls websites, and efficiently processes content in parallel, making it ideal for building comprehensive knowledge engines for AI coding assistants.
Key Features
- Vector Search: Performs RAG over crawled content with optional source filtering.
- Content Chunking: Intelligently splits content by headers and size for better processing.
- Smart URL Detection: Automatically detects and handles different URL types (webpages, sitemaps, text files).
- Parallel Processing: Efficiently crawls multiple pages simultaneously.
- Recursive Crawling: Follows internal links to discover content.
- 451 GitHub stars
Use Cases
- Enable AI coding assistants to build AI agents with web crawling capabilities.
- Provide AI agents with up-to-date information by crawling and indexing web content.
- Enhance the knowledge base of AI models by performing RAG over crawled web data.