[Python-Dev] Raw string syntax inconsistency

Mon Jun 18 03:13:31 CEST 2012

On 18/06/2012 00:55, Nick Coghlan wrote:
> On Mon, Jun 18, 2012 at 6:41 AM, Guido van Rossum<guido at python.org>  wrote:
>>  Would it make sense to detect and reject these in 3.3 if the 2.7 syntax is
>>  used?
>
> Possibly - I'm trying not to actually *change* any of the internals of
> the string literal processing, though. (If I recall the way we
> implemented the change correctly, by the time we get to processing the
> string contents, we've forgotten which specific prefix was used)
>
> However, tis question did remind me of another detail I wanted to
> check after realising this discrepancy existed: it turns out this
> semantic inconsistency already arises if you use "from __future__
> import unicode_literals" to get supposedly "Python 3 style" string
> literals in 2.x
>
> Python 2.7.3 (default, May 29 2012, 14:54:22)
>>>>  from __future__ import unicode_literals
>>>>  print(r"\u03b3")
> γ
>>>>  print("\u03b3")
> γ
>
> Python 3.2.1 (default, Jul 11 2011, 18:54:42)
>>>>  print(r"\u03b3")
> \u03b3
>>>>  print("\u03b3")
> γ
>
> So, perhaps the answer is to leave this as is, and try to make 2to3
> smart enough to detect such escapes and replace them with their
> properly encoded (according to the source code encoding) Unicode
> equivalent?

What if it's not possible to encode that character? I suppose that it
could be expanded into a string expression so that a non-raw string
literal could be used, possibly using implicit concatenation,
parenthesised, if necessary (or always?).

 > After all, that's already the way to include such characters in a
> forward compatible way when using the future import:
>
> Python 2.7.3 (default, May 29 2012, 14:54:22)
>>>>  from __future__ import unicode_literals
>>>>  print("γ")
> γ
>>>>  print(r"γ\n")
> γ\n
>
> Python 3.2.1 (default, Jul 11 2011, 18:54:42)
>>>>  print("γ")
> γ
>>>>  print(r"γ\n")
> γ\n
>
> So, rather than going ahead with reverting "ur" support as I first
> suggested (since it turns out that's not a *new* problem, but just a
> different way of spelling an *existing* problem), how about I do the
> following:
>
> 1. Add a note to PEP 414 and the Py3k porting guide regarding the
> discrepancy in escaping semantics for raw Unicode strings between 2.x
> and 3.x
> 2. Reject the tracker issue for reverting the ur support (the semantic
> problem already exists, and any solution we come up with for
> __future__.unicode_literals should handle the ur prefix as well)
> 3. Create a new feature request for 2to3 to see if it can
> automatically handle the problem of translating "\u" and "\U" escapes
> into properly encoded Unicode characters
>
> The scope of the problem is really quite small: you have to be using a
> raw Unicode string in 2.x (either via the string prefix, or the future
> import) *and* using a "\u" or "\U" escape within that string.
>
[snip]