[I18n-sig] Unicode surrogates: just say no!

Paul Prescod paulp@ActiveState.com
Wed, 27 Jun 2001 18:41:17 -0700

Guido van Rossum wrote:
> Yes, the longer I think about this the less I like it.  Unfortunately,
> the surrogate-creating behavior of \U is present in 2.0 and 2.1, so I
> think we can't reasonably remove this from narrow Python 2.2, and 

I'm having a hard time caring about backwards compatibilty much here.
And I can't square it with your enthusiasm for ripping the guts out of
poor old xrange. <wink>

We're talking about a certain kind of *literal* right? Even ASCII
literals are rare in my code. Unicode literals are extremely rare. Now
consider that we're talking about Unicode literals to characters so
obscure that they were passed over by the first three versions of
Unicode. And so new that most people don't even know that they are part
of Unicode.

Let's just put a deprecation warning in for \U where you've asked for a
character larger than your build's code unit size. And if there is a
need, someone, somewhere will write a beautiful surrogates library that
handles all details of surrogate handling.

> Then sys.maxunicode should be the largest value that unichr() will
> accept.  This could be 0xffff (narrow Python), 0x10ffff (wide Python
> with strict unichr()), or 0xffffffffL (wide Python with liberal
> unichr()).  The latter is an open PEP issue.

Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook