[I18n-sig] Unicode surrogates: just say no!
Guido van Rossum
guido@digicool.com
Wed, 27 Jun 2001 15:57:12 -0400
> Guido van Rossum wrote:
> >
> >...
> >
> > Oooh, hadn't thought of that, but yes, it makes sense!
> >
> > Not yet implemented, but I think it should. Makes for a nice pair
> > of invariants:
> >
> > unichr(ord('\Udddddddd')) == '\Udddddddd'
> > ord(unichr(0xdddddddd)) == 0xdddddddd
> >
> > regardless of whether we're using UCS-2 or UCS-4 storage.
>
> I'm going to presume that ord should accept surrogate pairs on both
> narrow and wide interpreters.
That's a separate question. On wide interpreters, surrogate pairs
"shouldn't" exist if the app plays by the rules. But they're easily
created of course! What should ord(u'\uD800\uDC00') mean on a wide
interpreter? I think it's nice if you support this. Of course, if a
length-two Unicode string is anything else than a high surrogate
followed by a low surrogate, ord() should be illegal.
--Guido van Rossum (home page: http://www.python.org/~guido/)