[Python-ideas] Support Unicode code point notation

Sun Jul 28 11:02:26 CEST 2013

On 28 July 2013 18:21, Steven D'Aprano <steve at pearwood.info> wrote:
> On 28/07/13 17:41, Stephen J. Turnbull wrote:
>>
>>   > (Sorry, I have forgotten who made that suggestion originally.) That
>>   > could be extended to allow multiple space-separated code points:
>>   >
>>   > \N{U+xxxx U+yyyy U+zzzzz}
>>   >
>>   > or
>>   >
>>   > \N{U+xxxx yyyy zzzzz}
>>
>> This is a modal encoding, which has proved to be a really bad idea in
>> its past incarnations.  I hope that extension is never added to
>> Python.
>
>
> Could you elaborate please? What do you mean "modal encoding", and what past
> incarnations are you referring to?

I believe what Stephen means is that it changes the \N{} notation from
a relatively straightforward key lookup (where everything inside the
"{}" refers to a single code point), to a two level parser, where the
contents of the "{}" need to be further parsed to see if they refer to
one code point or many. It doesn't bother me that much personally,
especially if it was a general comma delimited capability that also
worked with other code point names, but my inclination is to call
YAGNI on the additional complexity.

Using "modal encoding" to refer to that change isn't really valid
though - Python string syntax is already modal, since "\N{" switches
modes to "any characters until the next '}' are part of a code point
name rather than part of the string contents", and similar statements
can be made about the other escape sequences (especially the other
Unicode related ones).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia