Overview#
A high-performance Rust implementation of the ICU MessageFormat parser, providing native parsing capabilities for Rust applications.
Features#
- 🚀 High Performance: 2.6-3.7x faster than JavaScript parser (optimized build)
- 🌐 Full ICU MessageFormat Support: Complete syntax support including plurals, selects, and formatting
- 🔒 Type Safe: Strongly-typed AST with comprehensive error handling
- 🎯 Zero-Copy Parsing: Efficient parsing with minimal allocations where possible
- 🧪 Well Tested: Comprehensive test suite ensuring correctness
Installation#
Add to your Cargo.toml:
[dependencies]
formatjs_icu_messageformat_parser = "0.2.4"
Usage#
use formatjs_icu_messageformat_parser::{Parser, ParserOptions};
fn main() {
let message = "Hello {name}! You have {count, plural, one{# message} other{# messages}}.";
let options = ParserOptions::default();
let parser = Parser::new(message, options);
match parser.parse() {
Ok(ast) => {
println!("Parsed AST: {:?}", ast);
}
Err(e) => {
eprintln!("Parse error: {}", e);
}
}
}
API Reference#
Parser::new(message: &str, options: ParserOptions) -> Parser#
Creates a new parser instance.
Parameters:
message: The ICU MessageFormat string to parseoptions: Parser configuration options
Parser::parse(&self) -> Result<Vec<MessageFormatElement>, ParserError>#
Parses the message and returns an AST.
Returns:
Ok(Vec<MessageFormatElement>): Parsed AST on successErr(ParserError): Detailed error information on failure
ParserOptions#
Configuration options for the parser:
pub struct ParserOptions {
pub ignore_tag: bool, // Treat HTML-like tags as literals
pub should_parse_skeletons: bool, // Parse number/date skeletons
pub capture_location: bool, // Include location info in AST
}
AST Types#
The parser produces a strongly-typed AST with the following element types:
LiteralElement: Plain textArgumentElement: Simple variable reference{name}NumberElement: Number formatting{price, number, ::currency/USD}DateElement: Date formatting{today, date, short}TimeElement: Time formatting{now, time, short}PluralElement: Plural rules{count, plural, one{#} other{#}}SelectElement: Select choices{gender, select, male{he} female{she}}TagElement: HTML-like tags<b>text</b>PoundElement: Pound symbol#(placeholder in plural rules)
Utility Functions#
AST Manipulation#
use formatjs_icu_messageformat_parser::{hoist_selectors, is_structurally_same};
// Hoist nested selectors to top level
let hoisted_ast = hoist_selectors(ast);
// Compare two ASTs for structural equivalence
let are_same = is_structurally_same(
&source_ast,
&target_ast,
"message.id".to_string()
);
AST Printing#
use formatjs_icu_messageformat_parser::print_ast;
// Convert AST back to ICU MessageFormat string
let message = print_ast(&ast);
println!("{}", message);
Examples#
Parsing Plurals#
use formatjs_icu_messageformat_parser::{Parser, ParserOptions, MessageFormatElement};
let message = "I have {count, plural, one{# dog} other{# dogs}}.";
let parser = Parser::new(message, ParserOptions::default());
let ast = parser.parse().unwrap();
// Process plural element
if let MessageFormatElement::Plural(plural) = &ast[1] {
println!("Variable: {}", plural.value);
println!("Options: {:?}", plural.options.keys());
}
Parsing with Skeletons#
use formatjs_icu_messageformat_parser::{Parser, ParserOptions};
let message = "Price: {price, number, ::currency/USD}";
let options = ParserOptions {
should_parse_skeletons: true,
..Default::default()
};
let parser = Parser::new(message, options);
let ast = parser.parse().unwrap();
Error Handling#
use formatjs_icu_messageformat_parser::{Parser, ParserOptions, ErrorKind};
let invalid_message = "{unclosed";
let parser = Parser::new(invalid_message, ParserOptions::default());
match parser.parse() {
Ok(_) => println!("Parsed successfully"),
Err(e) => {
println!("Error: {}", e);
println!("Location: {:?}", e.location);
match e.kind {
ErrorKind::Expect(expected) => {
println!("Expected: {}", expected);
}
_ => {}
}
}
}
Performance#
The Rust parser (optimized build) provides significant performance improvements over JavaScript implementations:
Benchmark Results#
Run with: bazel run -c opt //crates/icu_messageformat_parser:comparison_bench
| Message Type | Rust Parser | JavaScript | Speedup | SWC Parser | vs SWC |
|---|---|---|---|---|---|
| complex_msg | 9.22 µs | 23.85 µs | 2.59x faster | 10.3 µs | 1.11x faster |
| normal_msg | 1.14 µs | 3.27 µs | 2.87x faster | 1.25 µs | 1.10x faster |
| simple_msg | 163 ns | 600 ns | 3.68x faster | 184 ns | 1.13x faster |
| string_msg | 118 ns | 320 ns | 2.71x faster | 126 ns | 1.07x faster |
Key Performance Characteristics#
- Parsing Speed: 2.6-3.7x faster than JavaScript parser
- Memory Usage: Lower memory footprint due to efficient allocation
- Optimizations: Branch prediction hints, pre-allocated vectors, zero-copy parsing
- Note: Always use
-c optfor release builds to enable optimizations
Building from Source#
# Run tests
bazel test //crates/icu_messageformat_parser:icu_messageformat_parser_test
# Run benchmarks (always use -c opt for accurate results)
bazel run -c opt //crates/icu_messageformat_parser:parser_bench
# Run comparison benchmark vs other parsers
bazel run -c opt //crates/icu_messageformat_parser:comparison_bench
Documentation#
Related Packages#
formatjs_cli- Native Rust CLI for message extractionformatjs_icu_skeleton_parser- Number and date skeleton parser@formatjs/icu-messageformat-parser- JavaScript implementation