[I18n-sig] thinking of CJK codec, some questions
Brian Takashi Hooper
Mon, 13 Mar 2000 21:05:50 +0900
Hi there i18n-siggers -
First of all, thank you very very much Marc-Andre (and Fredrik Lundh for
the original implementation) for all your hard work, I checked out the
CVS checkin yesterday and played with it a little, and took a print out
of the source home with me. It seems really well thought out and
I scrutinized the code base thinking about issues for a CJK codec, and
came up with a few questions:
1. Should the CJK ideograms also be included in the unicodehelpers
numeric converters? From my perspective, I'd really like to see them go
in, and think that it would make sense, too - any opinions?
2. Same as above with double-width alphanumeric characters - I assume
these should probably also be included in the lowercase / uppercase
helpers? Or will there be a way to add to these lists through the codec
API (for those worried about data from unused codecs clogging up their
character type helpers, maybe this would be a good option to have; I
would by contrast like to be able to exclude all the extra Latin 1 stuff
that I don't need, hmm.)
3. Same thing for whitespace - I think there are a number of
double-width whitespace characters around also.
4. Are there any conventions for how non-standard codecs should be
installed? Should they be added to Python's encodings directory, or
should they just be added to site-packages or site-python like other
5. Are there any existing tools for converting from Unicode mapping
files to a C source file that can be handily made into a dynamic
library, or am I on my own there?
Anyone who has any opinions on the above please chime in, I'm trying to
start a discussion :-) !
Also, while I was reading the code, I found a few typos and spelling
mistakes (for example the notoriously often misspelled 'occurrence').
While I doubt this is a very high priority, from watching the checkins
list apparently Guido accepts spelling patches - so, I have a big
context diff, who should I send it to?