D'uh! stupid bug:
Is this the same code points identified by `str.isspace`?
I haven't checked -- so I will:
and the answer is no:
wrong, the answer is yes:
$ python weird_spaces.py x x x xx x x x x x x x x x x xx x x xx ['x', 'x', 'x', 'x\u180ex', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x', 'x\u200bx', 'x', 'x', 'x\ufeffx'] out of 20, 17 were used as split chars out of 20, 17 were True according to .isspace
That makes far more sense.
Since I'm doing this, the three that aren't are:
U+180E MONGOLIAN VOWEL SEPARATOR U+200B ZERO WIDTH SPACE U+FEFF ZERO WIDTH NO-BREAK SPACE
The Mongolian vowel separator makes some sense (not knowing Mongolian in the least). Though I wonder what the point of a zero-width space is if it's NOT going to be a separator?