On Fri, Mar 21, 2008 at 11:06 AM, Eric Smith
Christian Heimes wrote:
Eric Smith schrieb:
It's not implementable because the work has to occur in ast.c (see Py_UnicodeFlag). It can't occur later, because you need to skip the encoding being done in parsestr(). But the __future__ import can only be interpreted after the AST is built, at which time the encoding has already been applied. There are some radical things you could do to work around this, but it would be a gigantic change.
So this basically comes down to "Either spend lots of time (and money) to rewrite the tokenizer and AST generator or keep the current behavior"? :/
Pretty much. And even if it were possible, I don't see the point in doing it.
For this particular issue, just use u'' in 2.6 and let 2to3 deal with it. If you have some 2.6 code that you want to run in 3.0 (by way of 2to3), I think all of your string literals should either be b'' or u''. Don't use plain ''.
For this particular issue one could probably and easily come up with a fast fixer. A simple regexp should be cover 99% of all occurrences of u'' and u"".
2to3 already does this.
My current thinking is that only b'' and u'' strings should be in 2.6 code that you want to move to 3.0. Maybe -3 should warn about regular string literals?
That's a possibility. It might also help to have a 3to2 fixer that goes through a module and adds the needed prefixes so one doesn't have to go through manually to tack them on. -Brett