
On Oct 23, 2019, at 18:59, Christopher Barker <pythonchb@gmail.com> wrote:
Since I'm doing this, the three that aren't are:
U+180E MONGOLIAN VOWEL SEPARATOR U+200B ZERO WIDTH SPACE U+FEFF ZERO WIDTH NO-BREAK SPACE
The Mongolian vowel separator makes some sense (not knowing Mongolian in the least). Though I wonder what the point of a zero-width space is if it's NOT going to be a separator?
It’s a Cf (formatting character), because it’s not used for spacing, it’s used for controlling higher-level formatting like soft line breaks. Or, put another way, it’s a bit more like a soft hyphen than it is like a space. It’s a weird distinction, but not as weird as, say, U+2028 and U+2029, which are also used for controlling formatting but literally have “separator” in their name, so they ended up creating a special category for each one so they can be Z but not Zs. Anyway, some of the answers the Unicode committee came up with are odd, but they’re the right answers by definition. Plus, even if I had a time machine and an unlimited life span, I’m pretty sure I wouldn’t want to participate in those arguments.