[Python-Dev] Unicode character names
M.-A. Lemburg
mal@lemburg.com
Thu, 23 Mar 2000 22:07:35 +0100
"Andrew M. Kuchling" wrote:
>
> Paul Prescod writes:
> >The new \N escape interpolates named characters within strings. For
> >example, "Hi! \N{WHITE SMILING FACE}" evaluates to a string with a
> >unicode smiley face at the end.
>
> Cute idea, and it certainly means you can avoid looking up Unicode
> numbers. (You can look up names instead. :) ) Note that this means the
> Unicode database is no longer optional if this is done; it has to be
> around at code-parsing time. Python could import it automatically, as
> exceptions.py is imported. Christian's work on compressing
> unicodedatabase.c is therefore really important. (Is Perl5.6 actually
> dragging around the Unicode database in the binary, or is it read out
> of some external file or data structure?)
Sorry to disappoint you guys, but the Unicode name and comments
are *not* included in the unicodedatabase.c file Christian
is currently working on. The reason is simple: it would add
huge amounts of string data to the file. So this is a no-no
for the core distribution...
Still, the above is easily possible by inventing a new
encoding, say unicode-with-smileys, which then reads in
a file containing the Unicode names and applies the necessary
magic to decode/encode data as Paul described above.
Would probably make a cool fun-project for someone who wants
to dive into writing codecs.
--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/