A polyfill for Intl.Segmenter.
ECMA-402 Spec Compliance#
This package is fully compliant with the ECMA-402 specification for Intl.Segmenter.
Specification Details#
- TC39 Proposal: Intl.Segmenter
- Stage: Stage 4 (Finalized)
- Spec: ECMA-402 Intl.Segmenter
✅ Implemented Features#
Core Methods#
segment(string)- Returns an iterableSegmentsobject for the stringresolvedOptions()- Returns resolved optionssupportedLocalesOf(locales)- Returns supported locales
Granularity Options#
All 3 segmentation granularities are supported:
'grapheme'- Grapheme cluster boundaries (user-perceived characters)- Handles combining marks, emoji, etc.
- Example: "👨👩👧👦" is one grapheme
'word'- Word boundaries with word/punctuation classification- Identifies words, spaces, punctuation
- Provides
isWordLikeproperty
'sentence'- Sentence boundaries- Handles abbreviations, numbers, quotes
- Locale-aware sentence breaks
Segments Object#
The Segments object returned by segment() is:
- Iterable - Can be used with
for...ofloops - Array-like - Supports indexed access and
containing(index)method
Segment Object Properties#
Each segment has:
segment- The text of the segmentindex- Start index in the original stringinput- The original input stringisWordLike- (word granularity only) Whether segment is a word
Example Usage#
Global import#
import '@formatjs/intl-segmenter/polyfill.js'
// Grapheme segmentation (user-perceived characters)
const graphemeSegmenter = new Intl.Segmenter('en', {granularity: 'grapheme'})
const graphemes = [...graphemeSegmenter.segment('Hello👋')]
// [
// {segment: 'H', index: 0, input: 'Hello👋'},
// {segment: 'e', index: 1, input: 'Hello👋'},
// {segment: 'l', index: 2, input: 'Hello👋'},
// {segment: 'l', index: 3, input: 'Hello👋'},
// {segment: 'o', index: 4, input: 'Hello👋'},
// {segment: '👋', index: 5, input: 'Hello👋'} // Emoji as one grapheme
// ]
// Word segmentation
const wordSegmenter = new Intl.Segmenter('en', {granularity: 'word'})
const words = [...wordSegmenter.segment('Hello, world!')]
// [
// {segment: 'Hello', index: 0, isWordLike: true},
// {segment: ',', index: 5, isWordLike: false},
// {segment: ' ', index: 6, isWordLike: false},
// {segment: 'world', index: 7, isWordLike: true},
// {segment: '!', index: 12, isWordLike: false}
// ]
// Filter to only word-like segments
const onlyWords = words.filter(s => s.isWordLike)
// [{segment: 'Hello', ...}, {segment: 'world', ...}]
// Sentence segmentation
const sentenceSegmenter = new Intl.Segmenter('en', {granularity: 'sentence'})
const sentences = [
...sentenceSegmenter.segment('Hello! How are you? I am fine.'),
]
// [
// {segment: 'Hello! ', index: 0, input: '...'},
// {segment: 'How are you? ', index: 7, input: '...'},
// {segment: 'I am fine.', index: 20, input: '...'}
// ]
// containing() method - find segment at specific index
const segments = wordSegmenter.segment('Hello, world!')
segments.containing(7)
// {segment: 'world', index: 7, isWordLike: true}
// Locale-aware segmentation
const thaiSegmenter = new Intl.Segmenter('th', {granularity: 'word'})
const thaiWords = [...thaiSegmenter.segment('สวัสดีครับ')]
// Correctly segments Thai text without spaces
// Complex emoji handling
const emojiSegmenter = new Intl.Segmenter('en', {granularity: 'grapheme'})
const emojis = [...emojiSegmenter.segment('👨👩👧👦🏴')]
// [
// {segment: '👨👩👧👦', index: 0}, // Family emoji (ZWJ sequence)
// {segment: '🏴', index: ...} // Scotland flag
// ]
Info
The global import does not include TypeScript type declarations. For TypeScript projects, we recommend using ES module imports instead.
If you choose to use the global import, in order to prevent type errors, you must manually include the corresponding type declaration files (.d.ts) in your project.
ES Modules#
import {Segmenter} from '@formatjs/intl-segmenter'
// Grapheme segmentation (user-perceived characters)
const graphemeSegmenter = new Segmenter('en', {granularity: 'grapheme'})
const graphemes = [...graphemeSegmenter.segment('Hello👋')]
Use Cases#
- Character counting: Get accurate character count including complex emoji
- Word counting: Count words across different scripts and languages
- Text truncation: Safely truncate at grapheme boundaries
- Syntax highlighting: Break code into word segments
- Search indexing: Segment text for full-text search
- Text analysis: Analyze sentence structure
Installation#
npm i @formatjs/intl-segmenter
Features#
Everything in intl-segmenter proposal
Usage#
Simple#
import '@formatjs/intl-segmenter/polyfill.js'
Dynamic import + capability detection#
async function polyfill(locale: string) {
if (shouldPolyfill()) {
await import('@formatjs/intl-segmenter/polyfill-force.js')
}
}