[Python-Dev] Unicode character names

Andy Robinson andy@reportlab.com
Fri, 24 Mar 2000 10:14:44 GMT


On Thu, 23 Mar 2000 21:49:13 -0500 (EST), you wrote:

>Sorry to disappoint you guys, but the Unicode name and comments
>are *not* included in the unicodedatabase.c file Christian
>is currently working on. The reason is simple: it would add
>huge amounts of string data to the file. So this is a no-no
>for the core distribution...


You're right about what is compiled into the core.  I have to keep
reminding myself to distinguish three places functionality can live:

1. What is compiled into the Python core
2. What is in the standard Python library relating to encodings. =20
3. Completely separate add-on packages, maintained outside of Python,
to provide extra functionality for (e.g.) Asian encodings.

It is clear that both the Unicode database, and the mapping tables and
other files at unicode.org, are a great resource; but they could be
placed in (2) or (3) easily, along with scripts to unpack them.  It
probably makes sense for the i18n-sig to kick off a separate
'CodecKit' project for now, and we can see what good emerges from it
before thinking about what should go into the library.

>Still, the above is easily possible by inventing a new
>encoding, say unicode-with-smileys, which then reads in
>a file containing the Unicode names and applies the necessary
>magic to decode/encode data as Paul described above.
>Would probably make a cool fun-project for someone who wants
>to dive into writing codecs.
Yup.  Prime candidate for CodecKit.


- Andy