[Python-Dev] Proposal: from __future__ import unicode_string_literals

Brett Cannon brett at python.org
Fri Mar 21 21:54:52 CET 2008


On Fri, Mar 21, 2008 at 11:06 AM, Eric Smith
<eric+python-dev at trueblade.com> wrote:
> Christian Heimes wrote:
>  > Eric Smith schrieb:
>  >  > It's not implementable because the work has to occur in ast.c (see
>  >> Py_UnicodeFlag).  It can't occur later, because you need to skip the
>  >> encoding being done in parsestr().  But the __future__ import can only
>  >> be interpreted after the AST is built, at which time the encoding has
>  >> already been applied.  There are some radical things you could do to
>  >> work around this, but it would be a gigantic change.
>  >
>  > So this basically comes down to "Either spend lots of time (and money)
>  > to rewrite the tokenizer and AST generator or keep the current behavior"? :/
>
>  Pretty much.  And even if it were possible, I don't see the point in
>  doing it.
>
>
>  >> For this particular issue, just use u'' in 2.6 and let 2to3 deal with
>  >> it.  If you have some 2.6 code that you want to run in 3.0 (by way of
>  >> 2to3), I think all of your string literals should either be b'' or u''.
>  >>   Don't use plain ''.
>  >
>  > For this particular issue one could probably and easily come up with a
>  > fast fixer. A simple regexp should be cover 99% of all occurrences of
>  > u'' and u"".
>
>  2to3 already does this.
>
>  My current thinking is that only b'' and u'' strings should be in 2.6
>  code that you want to move to 3.0.  Maybe -3 should warn about regular
>  string literals?

That's a  possibility. It might also help to have a 3to2 fixer that
goes through a module and adds the needed prefixes so one doesn't have
to go through manually to tack them on.

-Brett


More information about the Python-Dev mailing list