How does the metadata enrichment feature work?

If you have a list of arXiv IDs but are missing details like abstracts or categories, the enrichment flag uses the API to backfill that missing data automatically.

Can I filter results by specific years or keywords?

Yes, the skill supports detailed filtering through a 'queries.md' file where you can define keywords, exclusions, and specific time windows.

What file formats does the arXiv Search skill output?

The skill generates a structured 'papers_raw.jsonl' file (one record per line) and a convenience 'papers_raw.csv' file for easy viewing.

Does this skill require an active internet connection?

Online retrieval via the arXiv API requires internet access; however, the skill features an offline mode for importing and normalizing local CSV or JSON exports.

arXiv Search & Metadata Retrieval

Name: arXiv Search & Metadata Retrieval
Author: WILLOSCAR

byWILLOSCAR

•

236

•

ウェブスクレイピングとデータ収集

Automates the retrieval and normalization of academic paper metadata from arXiv to support research pipelines and literature reviews.

The arXiv Search skill is a specialized tool for researchers and developers building automated discovery pipelines. It streamlines the initial phase of literature surveys by fetching comprehensive metadata—including abstracts, authors, categories, and PDF URLs—directly from the arXiv API or via offline imports. By standardizing paper data into structured JSONL and CSV formats, it establishes a reliable evidence-first foundation for downstream tasks like ranking, taxonomy building, and citation generation, ensuring that AI-driven research remains grounded in verifiable source material.

主な機能

01Support for complex queries including keywords, exclusions, and specific time windows.

02Structured data output (JSONL/CSV) designed for seamless research pipeline integration.

03Metadata enrichment feature to backfill missing abstracts and author details using arXiv IDs.

04Offline import mode to normalize existing CSV, JSON, or JSONL datasets.

05Automated arXiv API integration for real-time paper metadata retrieval.

06236 GitHub stars

ユースケース

01Performing systematic literature reviews and state-of-the-art surveys for technical topics.

02Building structured research databases from specific arXiv categories or search terms.

03Converting disparate paper exports into a standardized format for data analysis or RAG systems.

主な機能

01Support for complex queries including keywords, exclusions, and specific time windows.

02Structured data output (JSONL/CSV) designed for seamless research pipeline integration.

03Metadata enrichment feature to backfill missing abstracts and author details using arXiv IDs.

04Offline import mode to normalize existing CSV, JSON, or JSONL datasets.

05Automated arXiv API integration for real-time paper metadata retrieval.

06236 GitHub stars

ユースケース

01Performing systematic literature reviews and state-of-the-art surveys for technical topics.

02Building structured research databases from specific arXiv categories or search terms.

03Converting disparate paper exports into a standardized format for data analysis or RAG systems.