[Python-Dev] Raw string syntax inconsistency

Sun Jun 17 22:11:27 CEST 2012

On Mon, Jun 18, 2012 at 3:54 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> The premise of the discussion of adding 'u', and of Guido's acceptance, was
> that "it's about as harmless as they come". I do not remember any discussion
> of 'ur' and what it really means in 2.x, and that supporting it meant adding
> back 2.x's interaction effect. Indeed, Nick's version goes on to say "This
> PEP was originally written by Armin Ronacher, and Guido's approval was given
> based on that version." Armin's original version (and subsequent edit) only
> proposed adding 'u' (and 'U') and made no mention of 'ur'. Nick's seemingly
> innocuous addition of also adding 'ur' came after Guido's approval, and as
> discovered, is not so innocuous.

Right, that matches my recollection as well - we (or least I) thought
mapping "ur" to the Python 3 "r" prefix was sufficient, but it turns
out doing so means there are some 2.x string literals that will
silently behave differently in 3.x.

Martin's right that that part of the PEP should definitely be amended
(along with the relevant section in What's New)

> I do not think he needs to discuss adding and deleting support, but merely
> state that 'ur' support is not added because 'ur' has a special meaning that
> would require changing literal handling. The sentence about supporting 'ur'
> could be negated and moved after the sentence about not changing Unicode
> handling. A possibility:
>
> "Combination of the unicode prefix with the raw string prefix will not be
> supported because in Python 2, the combination 'ur' has a special meaning
> that would require changing the handling of unicode literals"

In addition to changing the proposal section to only cover "u" and
"U", I'll actually add a new subsection along the lines of the
following:

Exclusion of Raw Unicode Strings
-------------------------------------------------

Python 2.x includes a concept of "raw Unicode" strings. These are
partially raw string literals that still support the "\u" and "\U"
escape codes for Unicode character entry, but otherwise treat "\" as a
literal backslash character. As 3.x has no such concept of a partially
raw string literal, explicit raw Unicode literals are still not
supported. Such literals in Python 2 code will need to be converted to
ordinary Unicode literals for forward compatibility with Python 3.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia