LLM Tokens Playground

A playground that makes LLM tokens easier to understand.

When you send a message to an LLM, it doesn't work with your text directly. It breaks it up into smaller pieces, called tokens.

Enter some text to see how it's tokenized:

The LLM encodes your text as a sequence of tokens, of small text fragments.

["How"," are"," you"," today","?"]

The LLM's tokenizer has a fixed set of tokens - a token vocabulary, if you will.
It represents these tokens as integers, and the integers are called token ids.

This is your text represented as a sequence of token ids:

[5299,553,481,4044,30]

Each token id represents a fragment in your text - a word, a fraction of a word, a syllable, or a character.

The table below shows how each token id maps to its text:

Index	Token ID	Token Text
0	5299	How
1	553	are
2	481	you
3	4044	today
4	30	?
Total token: 5

Joining the token strings together reconstructs the original text:

How are you today?

The LLM does not use your message text. It works with token ids internally. And when it generates a reply, it uses token ids too.

The token vocabulary of each LLM may be different - each vendor chooses their own tokenizer, on a per model basis.