[Python-ideas] Support Unicode code point notation

MRAB python at mrabarnett.plus.com
Fri Aug 2 03:46:40 CEST 2013


On 02/08/2013 02:08, Alexander Belopolsky wrote:
>
> On Sat, Jul 27, 2013 at 6:01 AM, Steven D'Aprano <steve at pearwood.info
> <mailto:steve at pearwood.info>> wrote:
>
>     Why do we need yet another way of writing escape sequences?
>     ------------------------------__-----------------------------
>
>     We don't need another one, we need a better one. U+xxxx is the
>     standard Unicode notation, while existing Python escapes have
>     various problems.
>
>
> The current situation with \u and \U escapes can hardly qualify as an
> obvious way to do it.  There is nothing obvious about either \u
> limitation to four digits nor \U requirement to have eight.  (I remember
> discovering that after first trying something like  \u1FFFF, then
> \U1FFFF and then checking the reference manual to discover \U0001FFFF.
> I don't think my experience was unique.)
>
> I have a counter-proposal that may improve the situation: allow 4, 5, 6
> or 8 hex digits after \U optionally surrounded by braces. When used
> without braces, maximal munch rule applies: the escape sequence ends at
> the first non-hex-digit.  I would allow only upper-case A-F in 4-6
> digits escapes to minimize the need for braces.
>
Perl has \x{...}.

Ruby has \u{...}.

Python would have \U{...}.

We could follow Perl or Ruby, or both of them, or even allow braces
with any of the hex escapes.



More information about the Python-ideas mailing list