Descubre Habilidades de Claude para web scraping & data collection. Explora 16 habilidades y encuentra las capacidades perfectas para tus flujos de trabajo de IA.
Optimizes online research by applying structured query patterns and advanced search techniques for precise information retrieval.
Integrates the Perplexity API to conduct deep web research, capture real-time data, and generate structured reports with verifiable citations.
Performs real-time web and local searches using the Brave Search API directly via curl commands to retrieve current information and technical solutions.
Converts PDF documents into LLM-friendly Markdown while preserving complex structures like tables, headers, and lists.
Overcomes web access restrictions and rate limits by performing federated searches and intelligent content extraction from blocked or challenging URLs.
Scrapes websites, extracts structured data, and automates web data collection pipelines using the Crawl4AI library.
Extracts and structures metadata from PDF form fields into JSON format to facilitate automated document processing and form filling.
Extracts Twitter posts and comments to organize viewpoints and generate professional narration scripts for content production.
Extracts specific data from JSON files efficiently to minimize token usage and improve processing speed.
Extracts and validates structured data from scientific literature collections to create analysis-ready datasets for systematic reviews and meta-analyses.
Conducts deep, iterative web research to generate comprehensive reports with verified citations and source tracking.
Extracts YouTube video transcripts, metadata, and chapters into formatted Markdown files for knowledge management systems.
Executes comprehensive web searches using the Gemini command to gather real-time data and detailed information.
Performs intelligent web searches using a prioritized MCP strategy to find the most relevant documentation and live technical data.
Adds and configures Instagram accounts and web aggregators to local media event tracking systems.
Converts batches of images and scanned documents into structured markdown files using local DeepSeek-OCR models via Ollama.
Orchestrates a multi-source image pipeline to download, validate, and normalize fighter photos from Wikimedia, Sherdog, and Bing.
Converts websites into LLM-ready markdown or structured data using the Firecrawl v2 API.
Implements ethical, resilient, and legally compliant web scraping strategies to extract high-quality data while avoiding bot detection.
Deploys a local, privacy-respecting metasearch engine to aggregate web, package repository, and code results in structured JSON.
Conducts deep technical research by gathering multi-source evidence, analyzing GitHub repositories, and documenting implementation options.
Orchestrates the extraction, validation, and database loading of comprehensive fighter data from UFCStats.com using Scrapy spiders.
Extracts and analyzes large PDF documents locally with semantic chunking to minimize token usage and maximize context efficiency.
Extracts and organizes technical trading articles and documentation from mql5.com for research and training data collection.
Extracts event data from Instagram, Facebook, and web aggregators to power local media newsletters.
Empowers Claude with AI-powered semantic search to find web content, research papers, and code repositories by meaning rather than keywords.
Discovers related web content, articles, and research papers using AI-powered similarity matching via Exa.ai.
Automates the periodic search and refresh of Exa.ai websets to keep your data collections continuously updated.
Generates fact-based answers and structured data from the web using AI-powered search and synthesis.
Conducts complex, multi-step asynchronous research and deep analysis using Exa's AI-driven search engine.
Scroll for more results...