10

A valid sequence of code-points can begin with one or more combining mark, which form a grapheme cluster that has no base glyph.

I'm unsure how that should be handled, if at all.

For example, consider a string consisting solely of an accent, with no "letter" attached. Is that grapheme cluster valid? should it be "sanitized" somehow, for example by prefixing it with a U+FFFD?

More specifically, I am worried about handling string concatenation. If I append that "string consisting solely in an accent" to another string, not necessarily I want the accent to be part of the other string's last grapheme.

Does Unicode specify anything about how to handle these cases?

Wes
  • 872
  • 6
  • 13

0 Answers0