Tiktoken icon

Tiktoken

Createdopenai

Tokenizes text quickly for use with OpenAI's language models using Byte Pair Encoding (BPE).

About

Tiktoken is a fast, open-source BPE tokeniser designed for efficient use with OpenAI's models. It allows you to convert text into tokens, which are numerical representations that language models understand. Tiktoken boasts impressive performance, being 3-6x faster than comparable open-source tokenizers. It offers features like reversibility, lossless conversion, and compression, making it a valuable tool for developers working with large language models.

Key Features

  • Fast BPE tokenization
  • Extensible for custom encodings
  • Reversible and lossless tokenization
  • Support for OpenAI models
  • Educational submodule for learning BPE
  • 14,124 GitHub stars

Use Cases

  • Training new BPE tokenizers
  • Tokenizing text for OpenAI API models
  • Counting tokens in text