Does it support special characters and accents?

Yes, it includes a decode phase that translates LaTeX-specific accents (like those used in names) into standard Unicode characters.

Can I process an entire .tex file at once?

Absolutely. You can use the --file option to read a document and the --output option to save the cleaned text directly to a new file.

How does this skill handle nested LaTeX commands?

The tool uses an iterative stripping phase that recursively processes the text to remove all levels of nested formatting until only the plain text remains.

What happens to the whitespace and document structure?

The stripper normalizes extra whitespace to single spaces and can preserve paragraph breaks if the --keep-structure flag is used.

LaTeX Text Extractor

Name: LaTeX Text Extractor
Author: Mearman

byMearman

•

コンテンツ管理

Extracts clean, plain text from LaTeX documents by stripping formatting commands and decoding special characters into Unicode.

The LaTeX Text Extractor (tex-strip) is a specialized utility designed to transform complex LaTeX source files into readable plain text. It recursively removes formatting commands, font styles, and nested tags while preserving the underlying content. Unlike basic strippers, it features a decoding phase that converts LaTeX-specific accents and ligatures into standard Unicode characters, making it an essential tool for preparing academic papers or technical documentation for content analysis, LLM processing, or simplified reading.

主な機能

01Normalization of whitespace and preservation of paragraph breaks

02Automatic conversion of escaped characters like & and % to standard text

03Recursive removal of nested LaTeX commands and formatting blocks

042 GitHub stars

05Unicode decoding for LaTeX accents and special ligatures

06Support for file-to-file batch processing and command-line input

ユースケース

01Cleaning LaTeX source code to perform accurate word counts and readability analysis

02Migrating content from .tex documents into web-based content management systems

03Converting academic LaTeX papers into plain text for LLM summarization

主な機能

01Normalization of whitespace and preservation of paragraph breaks

02Automatic conversion of escaped characters like & and % to standard text

03Recursive removal of nested LaTeX commands and formatting blocks

042 GitHub stars

05Unicode decoding for LaTeX accents and special ligatures

06Support for file-to-file batch processing and command-line input

ユースケース

01Cleaning LaTeX source code to perform accurate word counts and readability analysis

02Migrating content from .tex documents into web-based content management systems

03Converting academic LaTeX papers into plain text for LLM summarization