Re: [Python-Dev] \u and \U escapes in raw unicode string literals

13 May 2007

      ...
* without the Unicode escapes, the only way to put non-ASCII
  code points into a raw Unicode string is via a source code encoding
  of say UTF-8 or UTF-16, pretty much defeating the original
  requirement of writing ASCII code only
That's no problem, though - just don't put the Unicode character
into a raw string. Use plain strings if you have a need to include
Unicode characters, and are not willing to leave ASCII.

For Python 3, the default source encoding is UTF-8, so it is
much easier to use non-ASCII characters in the source code.
The original requirement may not be as strong anymore as it
used to be.
...
* non-ASCII code points in text are not uncommon, they occur
  in most European scripts, all Asian scripts,
  many scientific texts and in also texts meant for the web
  (just have a look at the HTML entities, or think of Word
  exports using quotes)
And you are seriously telling me that people who commonly
use non-ASCII code points in their source code are willing
to refer to them by Unicode ordinal number (which, of course,
they all know by heart, from 1 to 65536)?
...
* adding Unicode escapes to the re module will break code
  already using "...\u..." in the regular expressions for
  other purposes; writing conversion tools that detect this
  usage is going to be hard
It's unlikely to occur in code today - \u just means the same
as u (so \u1234 matches u1234); if you want a backslash
followed by u in your regular expression, you should write
\\u.

It would be possible to future-warn about \u in 2.6, catching
these cases. Authors then would either have to remove the
backslash, or duplicate it, depending on what they want to
express.

Regards,
Martin

Re: [Python-Dev] \u and \U escapes in raw unicode string literals

"Martin v. Löwis"