The format token component visualizes text tokenization for language models, displaying how text is split into tokens using various tokenization strategies like WordPiece, BPE, SentencePiece, and LLaMA.
Example
Displays syntax-highlighted code tokens.
Hello world! This is a test of token segmentation.
code
<script type="module">
import '@blueprintui/components/include/format-token.js';
</script>
<bp-format-token>Hello world! This is a test of token segmentation.</bp-format-token>
Demonstrates various token format styles for code highlighting.
BPE (GPT-style)
Hello world! This is a test.WordPiece (BERT-style)
Hello world! This is a test.SentencePiece
Hello world! This is a test.LLaMA
Hello world! This is a test.Character-level
Hello world!Whitespace
Hello world! This is a test.code
<script type="module">
import '@blueprintui/components/include/format-token.js';
</script>
<div bp-layout="block gap:md">
<div>
<h4>BPE (GPT-style)</h4>
<bp-format-token format="bpe">Hello world! This is a test.</bp-format-token>
</div>
<div>
<h4>WordPiece (BERT-style)</h4>
<bp-format-token format="word-piece">Hello world! This is a test.</bp-format-token>
</div>
<div>
<h4>SentencePiece</h4>
<bp-format-token format="sentence-piece">Hello world! This is a test.</bp-format-token>
</div>
<div>
<h4>LLaMA</h4>
<bp-format-token format="llama">Hello world! This is a test.</bp-format-token>
</div>
<div>
<h4>Character-level</h4>
<bp-format-token format="character">Hello world!</bp-format-token>
</div>
<div>
<h4>Whitespace</h4>
<bp-format-token format="whitespace">Hello world! This is a test.</bp-format-token>
</div>
</div>
Install
NPM
import '@blueprintui/components/include/format-token.js';
CDN
<script type="module">
import 'https://cdn.jsdelivr.net/npm/@blueprintui/components/include/format-token.js/+esm';
</script>
Properties
| Name | Types | Description |
|---|
format | | 'bpe' | 'word-piece' | 'sentence-piece' | 'llama' | 'character' | 'whitespace' | Specifies the tokenization strategy used to split text into tokens for language model visualization |
tokens | string[] | |
Attributes
| Name | Types | Description |
|---|
format | | 'bpe' | 'word-piece' | 'sentence-piece' | 'llama' | 'character' | 'whitespace' | Specifies the tokenization strategy used to split text into tokens for language model visualization |
CSS Properties
| Name | Types | Description |
|---|
--padding | | |
--border-radius | | |
--border | | |
--font-family | | |
--line-height | | |
--gap | | |
Slots
| Name | Types | Description |
|---|
default | | Provide text content to be tokenized |