[Python-Dev] Raw string syntax inconsistency
Terry Reedy
tjreedy at udel.edu
Mon Jun 18 07:59:39 CEST 2012
On 6/17/2012 9:07 PM, Guido van Rossum wrote:
> On Sun, Jun 17, 2012 at 4:55 PM, Nick Coghlan <ncoghlan at gmail.com
> So, perhaps the answer is to leave this as is, and try to make 2to3
> smart enough to detect such escapes and replace them with their
> properly encoded (according to the source code encoding) Unicode
> equivalent?
>
>
> But the whole point of the reintroduction of u"..." is to support code
> that isn't run through 2to3.
People writing 2&3 code sometimes use 2to3 once (or a few times) on
their 2.6/7 version during development to find things they must pay
attention to. So Nick's idea could be helpful to people who do not want
to use 2to3 routinely either in development or deployment.
> Frankly, I don't care how it's done, but
> I'd say it's important not to silently have different behavior for the
> same notation in the two versions.
The fundamental problem was giving the 'u' prefix two different meanings
in 2.x: 'change the storage type from bytes to unicode', and 'change the
contents by partially cooking the literal even when raw processing is
requested'*. The only way to silently have the same behavior is to
re-introduce the second meaning of partial cooking. (But I would rather
make it unnecessary.) But that would freeze the 'u' prefix, or at least
'ur' ('un-raw') forever. It would be better to introduce a new, separate
'p' prefix, to mean partially raw, partially cooked. (But I am opposes to
*I think this non-orthogonal interaction effect was a design mistake and
that it would have been better to have re do all the cooking needed by
also interpreting \u and \U sequences. I also think we should add this
now for 3.3 if possible, to make partial cooking at the parsing stage
unnecessary. Putting the processing in re makes it work for all strings,
not just those given as literals.
> If that means we have to add an extra
> step to the compiler to reject r"\u03b3", so be it.
I do not get this. Surely you cannot mean to suddenly start rejecting,
in 3.3, a large set of perfectly legal and sensible 6 and 10 character
sequences when embedded in literals?
> Hm. I still encounter enough environments that don't know how to display
> such characters that I would prefer to have a rock solid \u escape
> mechanism. I can think of two ways to support "expanded" unicode
> characters in raw strings a la Python 2;
(a) let the re module interpret the escapes (like it does for \r and \n);
As said above, I favor this. The 2.x partial cooking (with 'ur' prefix)
was primarily a substitute for this.
(b) the user can write r"someblah" "\u03b3" r"moreblah".
This is somewhat orthogonal to (a). Users can this whenever they want
partial processing of backslashes without doubling those they want left
as is. A generic example is r'someraw' 'somecooked' r'moreraw'
'morecooked'.
--
Terry Jan Reedy
More information about the Python-Dev
mailing list