web scraping & data collection Claude 스킬을 발견하세요. 16개의 스킬을 탐색하고 AI 워크플로우에 완벽한 기능을 찾아보세요.
Overcomes web access restrictions and rate limits by performing federated searches and intelligent content extraction from blocked or challenging URLs.
Converts PDF documents into LLM-friendly Markdown while preserving complex structures like tables, headers, and lists.
Scrapes websites, extracts structured data, and automates web data collection pipelines using the Crawl4AI library.
Extracts and structures metadata from PDF form fields into JSON format to facilitate automated document processing and form filling.
Extracts YouTube video transcripts, metadata, and chapters into formatted Markdown files for knowledge management systems.
Extracts Twitter posts and comments to organize viewpoints and generate professional narration scripts for content production.
Performs intelligent web searches using a prioritized MCP strategy to find the most relevant documentation and live technical data.
Extracts and validates structured data from scientific literature collections to create analysis-ready datasets for systematic reviews and meta-analyses.
Executes comprehensive web searches using the Gemini command to gather real-time data and detailed information.
Extracts specific data from JSON files efficiently to minimize token usage and improve processing speed.
Conducts deep, iterative web research to generate comprehensive reports with verified citations and source tracking.
Automates the periodic search and refresh of Exa.ai websets to keep your data collections continuously updated.
Conducts complex, multi-step asynchronous research and deep analysis using Exa's AI-driven search engine.
Manages automated web searches, structured data enrichment, and entity-based collection building using the Exa.ai engine.
Extracts and organizes technical trading articles and documentation from mql5.com for research and training data collection.
Extracts structured data and AI-generated summaries from any URL with high token efficiency and live crawling.
Adds and configures Instagram accounts and web aggregators to local media event tracking systems.
Deploys a local, privacy-respecting metasearch engine to aggregate web, package repository, and code results in structured JSON.
Extracts event data from Instagram, Facebook, and web aggregators to power local media newsletters.
Implements ethical, resilient, and legally compliant web scraping strategies to extract high-quality data while avoiding bot detection.
Crawls global AI news sources to generate deduplicated, Chinese-language summaries in a structured JSON format.
Converts websites into LLM-ready markdown or structured data using the Firecrawl v2 API.
Orchestrates a multi-source image pipeline to download, validate, and normalize fighter photos from Wikimedia, Sherdog, and Bing.
Conducts deep technical research by gathering multi-source evidence, analyzing GitHub repositories, and documenting implementation options.
Orchestrates the extraction, validation, and database loading of comprehensive fighter data from UFCStats.com using Scrapy spiders.
Converts batches of images and scanned documents into structured markdown files using local DeepSeek-OCR models via Ollama.
Empowers Claude with AI-powered semantic search to find web content, research papers, and code repositories by meaning rather than keywords.
Extracts and analyzes large PDF documents locally with semantic chunking to minimize token usage and maximize context efficiency.
Discovers related web content, articles, and research papers using AI-powered similarity matching via Exa.ai.
Generates fact-based answers and structured data from the web using AI-powered search and synthesis.
Scroll for more results...