[Python-ideas] Support Unicode code point notation
Nick Coghlan
ncoghlan at gmail.com
Sun Jul 28 11:47:08 CEST 2013
On 28 July 2013 19:05, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Steven D'Aprano writes:
> > On 28/07/13 17:41, Stephen J. Turnbull wrote:
> > > > (Sorry, I have forgotten who made that suggestion originally.) That
> > > > could be extended to allow multiple space-separated code points:
> > > >
> > > > \N{U+xxxx U+yyyy U+zzzzz}
> > > >
> > > > or
> > > >
> > > > \N{U+xxxx yyyy zzzzz}
> > >
> > > This is a modal encoding, which has proved to be a really bad idea in
> > > its past incarnations. I hope that extension is never added to
> > > Python.
> >
> > Could you elaborate please? What do you mean "modal encoding", and
> > what past incarnations are you referring to?
>
> A "modal encoding" is one in which the same combination of code units
> (here, ASCII characters) is interpreted differently depending on
> arbitrarily distant context.
Ah, I had missed the "arbitrarily distant" sense you intended for
modal encoding. Agreed, the fact that unicode escapes (including \N{})
are limited in length to a single code point is a definite win in that
regard.
Cheers,
Nick.
P.S. It occurs to me that the str.format mini-language has no such
limitation, though:
>> def hexchr(x):
... return chr(int(x, 16))
...
>>> def hex2str(s):
... return "".join(hexchr(x) for x in s.split())
...
>>> class chrformat:
... def __format__(self, fmt):
... return hex2str(fmt)
...
>>> "{:40 60 1234 e9}".format(chrformat())
'@`ሴé'
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-ideas
mailing list