LLM Tokens Playground

A playground that makes LLM tokens easier to understand.

When you send a message to an LLM, it doesn't work with your text directly. It breaks it up into smaller pieces, called tokens.

Enter some text to see how it's tokenized:
The LLM encodes your text as a sequence of tokens, of small text fragments.
["How"," are"," you"," today","?"]


The LLM's tokenizer has a fixed set of tokens - a token vocabulary, if you will.
It represents these tokens as integers, and the integers are called token ids.

This is your text represented as a sequence of token ids:
[5299,553,481,4044,30]

Each token id represents a fragment in your text - a word, a fraction of a word, a syllable, or a character.

The table below shows how each token id maps to its text:
IndexToken IDToken Text
05299How
1553 are
2481 you
34044 today
430?
Total token: 5

Joining the token strings together reconstructs the original text:
How are you today?

The LLM does not use your message text. It works with token ids internally. And when it generates a reply, it uses token ids too.

The token vocabulary of each LLM may be different - each vendor chooses their own tokenizer, on a per model basis.