[Python-Dev] Proposal: from __future__ import unicode_string_literals

Eric Smith eric+python-dev at trueblade.com
Fri Mar 21 19:06:23 CET 2008


Christian Heimes wrote:
> Eric Smith schrieb:
>  > It's not implementable because the work has to occur in ast.c (see
>> Py_UnicodeFlag).  It can't occur later, because you need to skip the 
>> encoding being done in parsestr().  But the __future__ import can only 
>> be interpreted after the AST is built, at which time the encoding has 
>> already been applied.  There are some radical things you could do to 
>> work around this, but it would be a gigantic change.
> 
> So this basically comes down to "Either spend lots of time (and money)
> to rewrite the tokenizer and AST generator or keep the current behavior"? :/

Pretty much.  And even if it were possible, I don't see the point in 
doing it.

>> For this particular issue, just use u'' in 2.6 and let 2to3 deal with 
>> it.  If you have some 2.6 code that you want to run in 3.0 (by way of 
>> 2to3), I think all of your string literals should either be b'' or u''. 
>>   Don't use plain ''.
> 
> For this particular issue one could probably and easily come up with a
> fast fixer. A simple regexp should be cover 99% of all occurrences of
> u'' and u"".

2to3 already does this.

My current thinking is that only b'' and u'' strings should be in 2.6 
code that you want to move to 3.0.  Maybe -3 should warn about regular 
string literals?



More information about the Python-Dev mailing list