[Python-ideas] Processing surrogates in

Stephen J. Turnbull stephen at xemacs.org
Tue May 5 12:46:41 CEST 2015


Andrew Barnert writes:

 > (I'm not sure if we actually have a UCS-2 codec, but if not, it's
 > trivial to write--it's just UTF-16 without surrogates.)

The PEP 393 machinery knows when astral characters are introduced
because it has to widen the representation.  That might be a more
convenient place to raise an exception on non-BMP characters.



More information about the Python-ideas mailing list