On Thu, 23 Mar 2000 21:49:13 -0500 (EST), you wrote:
Sorry to disappoint you guys, but the Unicode name and comments are *not* included in the unicodedatabase.c file Christian is currently working on. The reason is simple: it would add huge amounts of string data to the file. So this is a no-no for the core distribution...
You're right about what is compiled into the core. I have to keep reminding myself to distinguish three places functionality can live: 1. What is compiled into the Python core 2. What is in the standard Python library relating to encodings. 3. Completely separate add-on packages, maintained outside of Python, to provide extra functionality for (e.g.) Asian encodings. It is clear that both the Unicode database, and the mapping tables and other files at unicode.org, are a great resource; but they could be placed in (2) or (3) easily, along with scripts to unpack them. It probably makes sense for the i18n-sig to kick off a separate 'CodecKit' project for now, and we can see what good emerges from it before thinking about what should go into the library.
Still, the above is easily possible by inventing a new encoding, say unicode-with-smileys, which then reads in a file containing the Unicode names and applies the necessary magic to decode/encode data as Paul described above. Would probably make a cool fun-project for someone who wants to dive into writing codecs. Yup. Prime candidate for CodecKit.
- Andy
participants (1)
-
andy@reportlab.com