概要
This skill integrates the HuggingFace Tokenizers library into your workflow, enabling extremely fast text processing that can handle 1GB of data in under 20 seconds. It supports the industry-standard BPE, WordPiece, and Unigram algorithms, allowing for the training of custom vocabularies and sophisticated alignment tracking between tokens and original text. Ideal for building production NLP pipelines, training domain-specific models, or performing complex text normalization, this skill bridges the gap between Python's ease of use and Rust's raw performance.