LLM Tokens Playground
A playground that makes LLM tokens easier to understand.
When you send a message to an LLM, it doesn't work with your text directly. It breaks it up into smaller pieces, called tokens.
Enter some text to see how it's tokenized:
Enter some text to see how it's tokenized:
The LLM encodes your text as a sequence of tokens, of small text fragments.
The LLM's tokenizer has a fixed set of tokens - a token vocabulary, if you will.
It represents these tokens as integers, and the integers are called token ids.
This is your text represented as a sequence of token ids:
Each token id represents a fragment in your text - a word, a fraction of a word, a syllable, or a character.
The table below shows how each token id maps to its text:
["How"," are"," you"," today","?"]
The LLM's tokenizer has a fixed set of tokens - a token vocabulary, if you will.
It represents these tokens as integers, and the integers are called token ids.
This is your text represented as a sequence of token ids:
[5299,553,481,4044,30]
Each token id represents a fragment in your text - a word, a fraction of a word, a syllable, or a character.
The table below shows how each token id maps to its text:
Index | Token ID | Token Text |
---|---|---|
0 | 5299 | How |
1 | 553 | are |
2 | 481 | you |
3 | 4044 | today |
4 | 30 | ? |
Total token: 5 |
Joining the token strings together reconstructs the original text:
How are you today?
The LLM does not use your message text. It works with token ids internally. And when it generates a reply, it uses token ids too.
The token vocabulary of each LLM may be different - each vendor chooses their own tokenizer, on a per model basis.