Format Token

npm2.0.0

The format token component visualizes text tokenization for language models, displaying how text is split into tokens using various tokenization strategies like WordPiece, BPE, SentencePiece, and LLaMA.

Example

Displays syntax-highlighted code tokens.

Hello world! This is a test of token segmentation.

code

<script type="module">
  import '@blueprintui/components/include/format-token.js';
</script>

<bp-format-token>Hello world! This is a test of token segmentation.</bp-format-token>

Formats

Demonstrates various token format styles for code highlighting.

BPE (GPT-style)

Hello world! This is a test.

WordPiece (BERT-style)

Hello world! This is a test.

SentencePiece

Hello world! This is a test.

LLaMA

Hello world! This is a test.

Character-level

Hello world!

Whitespace

Hello world! This is a test.

code

<script type="module">
  import '@blueprintui/components/include/format-token.js';
</script>

<div bp-layout="block gap:md">
  <div>
    <h4>BPE (GPT-style)</h4>
    <bp-format-token format="bpe">Hello world! This is a test.</bp-format-token>
  </div>

  <div>
    <h4>WordPiece (BERT-style)</h4>
    <bp-format-token format="word-piece">Hello world! This is a test.</bp-format-token>
  </div>

  <div>
    <h4>SentencePiece</h4>
    <bp-format-token format="sentence-piece">Hello world! This is a test.</bp-format-token>
  </div>

  <div>
    <h4>LLaMA</h4>
    <bp-format-token format="llama">Hello world! This is a test.</bp-format-token>
  </div>

  <div>
    <h4>Character-level</h4>
    <bp-format-token format="character">Hello world!</bp-format-token>
  </div>

  <div>
    <h4>Whitespace</h4>
    <bp-format-token format="whitespace">Hello world! This is a test.</bp-format-token>
  </div>
</div>

Install

NPM

// npm package
import '@blueprintui/components/include/format-token.js';

CDN

<script type="module">
  import 'https://cdn.jsdelivr.net/npm/@blueprintui/components/include/format-token.js/+esm';
</script>

bp-format-token

Properties

Name	Types	Description
`format`	`\| 'bpe' \| 'word-piece' \| 'sentence-piece' \| 'llama' \| 'character' \| 'whitespace'`	Specifies the tokenization strategy used to split text into tokens for language model visualization
`tokens`	`string[]`

Attributes

Name	Types	Description
`format`	`\| 'bpe' \| 'word-piece' \| 'sentence-piece' \| 'llama' \| 'character' \| 'whitespace'`	Specifies the tokenization strategy used to split text into tokens for language model visualization

CSS Properties

Name	Types	Description
`--padding`
`--border-radius`
`--border`
`--font-family`
`--line-height`
`--gap`

Slots

Name	Types	Description
`default`		Provide text content to be tokenized