algorithm to autodetect (japanese) encodings..
andy at reportlab.com
Fri Mar 14 22:28:34 CET 2003
On 12 Mar 2003 23:28:00 +0100, gabor <gabor at z10n.net> wrote:
>does anyone have a working algo to find the correct encoding between the
>3 jap. encodings?
Nearly -) Get hold of Tamito Kajiyama's excellent Japanese codecs,
which plug into Pythoin's unicode conversion system. You can
just try converting from each (Shift-JIS, EUC, UTF8) to Unicode in
turn, ad see which raise exceptions.
It SOUNDS clunky but I think it's the correct approach. And
since you're running mostly C code, you can try a decent sample
in much less time than you could apply Python-coded heuristics.
More information about the Python-list