My Favorite Bugs: Invalid Surrogate Pairs • George Mandis
In which I revisit one of my favorite bugs, the invalid surrogate pair.
The modern answer
If you're doing string manipulation in JavaScript and you care about not corrupting characters, use Intl.Segmenter:
const seg = new Intl.Segmenter(undefined, { granularity: "grapheme" });
const segments = [...seg.segment("👩🚀A👍")].map((s) => s.segment);
// → ['👩🚀', 'A', '👍']
This splits by grapheme clusters rather than code units. No orphaned surrogates, no split emoji. It's what .slice() should have been doing all along, but of course UTF-16 predates emoji by decades.
Once you know about it, you start seeing it in the wild. Any code that does str.slice(0, 1) or str[0] to get "the first character" is potentially broken.
June 8, 2026 at 1:44:37 PM EDT
*
FILLER