[I18n-sig] Unicode surrogates: just say no!
Martin v. Loewis
martin@loewis.home.cs.tu-berlin.de
Wed, 27 Jun 2001 22:53:11 +0200
> That's a separate question. On wide interpreters, surrogate pairs
> "shouldn't" exist if the app plays by the rules. But they're easily
> created of course! What should ord(u'\uD800\uDC00') mean on a wide
> interpreter? I think it's nice if you support this. Of course, if a
> length-two Unicode string is anything else than a high surrogate
> followed by a low surrogate, ord() should be illegal.
But then, you get unichr(ord(u'\uD800\uDC00')) <> u'\uD800\uDC00'.
Is that acceptable?
I'd rather prefer ord not to work on surrogate pairs. It means that
code may behave differently, but that is no surprise:
len(u'\U00102030') already varies depending on the width of unicode.
Regards,
Martin