algorithm to autodetect (japanese) encodings..

Andy Robinson andy at reportlab.com
Fri Mar 14 16:28:34 EST 2003


On 12 Mar 2003 23:28:00 +0100, gabor <gabor at z10n.net> wrote:
>my question:
>does anyone have a working algo to find the correct encoding between the
>3 jap. encodings?

Nearly -)  Get hold of Tamito Kajiyama's excellent Japanese codecs,
which plug into Pythoin's unicode conversion system.  You can
just try converting from each (Shift-JIS, EUC, UTF8) to Unicode in
turn, ad see which raise exceptions.

It SOUNDS clunky but I think it's the correct approach.  And
since you're running mostly C code, you can try a decent sample
in much less time than you could apply Python-coded heuristics.

Ganbatte kudasai,

Andy robinson




More information about the Python-list mailing list