Captures diverse content into structured Markdown notes and LLM training-ready JSONL.
Tidbit is a powerful command-line tool designed to transform various forms of content—like URLs, PDFs, ebooks, or clipboard data—into consistently structured Markdown notes and machine learning training-ready JSONL records. It allows users to define custom YAML schemas, ensuring that captured information, such as research papers or articles, adheres to a uniform structure for easy searching and knowledge base construction. Simultaneously, every capture creates a valuable dataset of (content, structured output) pairs, ideal for evaluation, retrieval, or fine-tuning of domain-specific LLMs, all without needing a database or server.
