• Shaarli
  • Tag cloud
  • Picture wall
  • Daily
  • RSS
  • Login
4418 shaares
Filters

My Favorite Bugs: Invalid Surrogate Pairs • George Mandis

QRCode

In which I revisit one of my favorite bugs, the invalid surrogate pair.

The modern answer
If you're doing string manipulation in JavaScript and you care about not corrupting characters, use Intl.Segmenter:

const seg = new Intl.Segmenter(undefined, { granularity: "grapheme" });
const segments = [...seg.segment("👩‍🚀A👍")].map((s) => s.segment);
// → ['👩‍🚀', 'A', '👍']
This splits by grapheme clusters rather than code units. No orphaned surrogates, no split emoji. It's what .slice() should have been doing all along, but of course UTF-16 predates emoji by decades.

Once you know about it, you start seeing it in the wild. Any code that does str.slice(0, 1) or str[0] to get "the first character" is potentially broken.

https://george.mand.is/2026/05/my-favorite-bugs-invalid-surrogate-pairs/
June 8, 2026 at 1:44:37 PM EDT *
javascript unicode
FILLER
Shaarli · The personal, minimalist, super fast, database-free, bookmarking service by the Shaarli community · Documentation
Fold Fold all Expand Expand all Are you sure you want to delete this link? Are you sure you want to delete this tag? The personal, minimalist, super fast, database-free, bookmarking service by the Shaarli community