What is semantic focus in the context of crawling?

By using the --instructions flag, the crawler uses AI to extract only the most relevant content chunks from pages rather than full text, which reduces token usage and noise.

Can I filter which parts of a site are crawled?

Absolutely. You can use --select-paths and --exclude-paths with regex patterns to focus the crawl on specific sections like /api/ or /guides/.

Can I limit the scale of a website crawl?

Yes, you can use the --limit, --max-depth, and --max-breadth flags to strictly control how many pages and levels the crawler explores.

How do I install the Tavily CLI for this skill?

You can install the required tool by running 'curl -fsSL https://cli.tavily.com/install.sh | bash' and then authenticating with 'tvly login'.

Can I save the crawled pages locally?

Yes, the --output-dir flag allows you to specify a local directory where every crawled page will be saved as an individual .md file.

Tavily Web Crawler

Name: Tavily Web Crawler
Author: tavily-ai

bytavily-ai

•

213

•

网络抓取与数据收集

Crawls websites and extracts content from multiple pages into structured JSON or local markdown files.

Tavily Web Crawler is a specialized Claude Code skill designed for bulk content extraction and deep web research. It enables users to crawl entire domains or specific sub-paths, automatically converting web pages into clean, local markdown files for offline use or AI training. With advanced controls for depth, breadth, and regex-based path filtering, it allows developers to download full documentation sets, extract semantic context for LLMs, and automate data collection tasks without manual scraping configuration.

主要功能

01Configurable multi-page crawling with depth and breadth controls

02Bulk documentation downloading for offline reference and RAG workflows

03213 GitHub stars

04Advanced path filtering using include and exclude regex patterns

05Semantic extraction that prioritizes relevant content chunks for AI context

06Automatic conversion of web content to formatted local markdown files

使用场景

01Bulk data collection for research and competitive analysis across multiple domains

02Extracting specific API guides from a domain to feed into an LLM context window

03Downloading an entire documentation site (e.g., /docs) for local reference

主要功能

01Configurable multi-page crawling with depth and breadth controls

02Bulk documentation downloading for offline reference and RAG workflows

03213 GitHub stars

04Advanced path filtering using include and exclude regex patterns

05Semantic extraction that prioritizes relevant content chunks for AI context

06Automatic conversion of web content to formatted local markdown files

使用场景

01Bulk data collection for research and competitive analysis across multiple domains

02Extracting specific API guides from a domain to feed into an LLM context window

03Downloading an entire documentation site (e.g., /docs) for local reference