The format token component visualizes text tokenization for language models, displaying how text is split into tokens using various tokenization strategies like WordPiece, BPE, SentencePiece, and LLaMA.
Example
Hello world! This is a test of token segmentation.
code
<script type="module">
import '@blueprintui/components/include/format-token.js';
</script>
<bp-format-token>Hello world! This is a test of token segmentation.</bp-format-token>
BPE (GPT-style)
Hello world! This is a test.WordPiece (BERT-style)
Hello world! This is a test.SentencePiece
Hello world! This is a test.LLaMA
Hello world! This is a test.Character-level
Hello world!Whitespace
Hello world! This is a test.code
<script type="module">
import '@blueprintui/components/include/format-token.js';
</script>
<div bp-layout="block gap:md">
<div>
<h4>BPE (GPT-style)</h4>
<bp-format-token format="bpe">Hello world! This is a test.</bp-format-token>
</div>
<div>
<h4>WordPiece (BERT-style)</h4>
<bp-format-token format="word-piece">Hello world! This is a test.</bp-format-token>
</div>
<div>
<h4>SentencePiece</h4>
<bp-format-token format="sentence-piece">Hello world! This is a test.</bp-format-token>
</div>
<div>
<h4>LLaMA</h4>
<bp-format-token format="llama">Hello world! This is a test.</bp-format-token>
</div>
<div>
<h4>Character-level</h4>
<bp-format-token format="character">Hello world!</bp-format-token>
</div>
<div>
<h4>Whitespace</h4>
<bp-format-token format="whitespace">Hello world! This is a test.</bp-format-token>
</div>
</div>
Install
NPM
import '@blueprintui/components/include/format-token.js';
CDN
<script type="module">
import 'https://cdn.jsdelivr.net/npm/@blueprintui/components/include/format-token.js/+esm';
</script>
Properties
| Name | Types | Description |
|---|
format | | 'bpe' | 'word-piece' | 'sentence-piece' | 'llama' | 'character' | 'whitespace' | Tokenization format/strategy to use |
tokens | string[] | |
Attributes
| Name | Types | Description |
|---|
format | | 'bpe' | 'word-piece' | 'sentence-piece' | 'llama' | 'character' | 'whitespace' | Tokenization format/strategy to use |
CSS Properties
| Name | Types | Description |
|---|
--padding | | |
--border-radius | | |
--border | | |
--font-family | | |
--line-height | | |
--gap | | |
Slots
| Name | Types | Description |
|---|
default | | Provide text content to be tokenized |