[Tutor] encoding problem (asian text)

Tue Apr 29 14:29:01 2003

> I work at a printing company and we handle typesetting in 20 different
> languages here, and I'm dying to have codecs for Arabic, Chinese
> (Simplified and Traditional), and Korean. Python obviously has great
> support for Latin-based languages and Japanese, but the above three
> ("Four, sire!") give me endless headaches.

Hi Robert,

The i18n-sig folks could probably help us out here; we may want to chat
with them for more details.  Korean codecs can be found here:

    http://sourceforge.net/projects/koco

And Chinese codecs are part of the python-codecs project:

    http://sourceforge.net/projects/python-codecs/

(Actually, it looks like the KoreanCodecs have been folded into
python-codecs tree too...)  Although there isn't a formal release yet,
their CVS tree does have source code:

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python-codecs/practicecodecs/ChineseCodecs/

> What I'd like to know is, how do I go about getting other codecs, or if
> they're not available, how can I build them? I don't need to actually
> display anything, but I need to be able to pre-process the text for
> import into Frame, Quark, Pagemaker, etc.
>
> Also, I've actually had encoding errors with French and the standard
> "latin-1" codec. (We have to convert between Mac and Windows text for
> the applications we're using.) I'll have to look it up, but I think one
> of the offending character codes was decimal 159. In the end, it was so
> common that I had to abandon that script. How would one look into fixing
> something like this?
>
> Sigh... so close and yet so far.

Unforunately, I don't think many of us on Tutor have too much experience
with handling regional stuff yet.  Because of our inexperience, I'd
recommend chatting with the i18n group for answers: the folks there have
been grappling with Unicode and encoding issues, and can probably help you
with those encoding issues.

    http://www.python.org/sigs/i18n-sig/

Good luck to you!