Tokenization
Also known as: text tokenization, subword tokenization, BPE tokenization, byte-pair encoding
Before a language model can process text, that text has to be converted into a sequence of numbers. Tokenization is the step that does this. The text gets split into small units called tokens, each mapped to an ID in the model's vocabulary. Those IDs are what the model actually sees. Most modern models use subword tokenization: common words stay intact as a single token, while rare or long words get split into two or more fragments. The word "tokenization" itself, for example, might become two tokens.
The practical reason builders care about tokenization: tokens are the unit of measurement for context windows, latency, and pricing. A model with a 128K-token context window can hold roughly 90–100K words of English before it runs out of room. API costs are typically quoted per million input and output tokens. A prompt that feels short to you might contain far more tokens than expected if it includes code, URLs, or non-English text, which tend to tokenize less efficiently than plain English prose.
Different model families use different tokenizers, so the same text can produce a different token count depending on the model. OpenAI's tiktoken library and tools like platform.openai.com/tokenizer let you count tokens before sending requests. When you're optimizing prompts for cost or fitting a long document into a context window, knowing your token counts is the first practical step.