[Python-ideas] Support Unicode code point notation

Stephen J. Turnbull stephen at xemacs.org
Sun Jul 28 09:41:45 CEST 2013


Steven D'Aprano writes:

 > if you're writing Ethiopian text, I don't think it is actually very
 > likely that you will want to follow ETHIOPIC SYLLABLE SEE by a
 > Latin digit 5 with no separator between them. Yes, it "might"
 > happen,

If you're writing Ethiopic text, I doubt you'll be using escape
sequences to denote Ethiopic characters in the first place.  I think
it's hard to predict how these sequences are going to be used in the
future.  What I would worry about it not whether writers would "want"
to use such sequences, but whether they'll bother to clean them up if
they occur in the first place.  The writer knows what she wants; it's
the reader who has to parse the resulting mess.

 > (Sorry, I have forgotten who made that suggestion originally.) That
 > could be extended to allow multiple space-separated code points:
 > 
 > \N{U+xxxx U+yyyy U+zzzzz}
 > 
 > or
 > 
 > \N{U+xxxx yyyy zzzzz}

This is a modal encoding, which has proved to be a really bad idea in
its past incarnations.  I hope that extension is never added to
Python.



More information about the Python-ideas mailing list