Detect character encoding
Diez B. Roggisch
deets at nospam.web.de
Sun Dec 4 16:24:03 CET 2005
> is there any way how to detect string encoding in Python?
> I need to proccess several files. Each of them could be encoded in
> different charset (iso-8859-2, cp1250, etc). I want to detect it, and
> encode it to utf-8 (with string function encode).
You can only guess, by e.g. looking for words that contain e.g. umlauts.
Recode might be of help here, it has such heuristics built in AFAIK.
But there is _no_ way to be absolutely sure. 8bit are 8bit, so each file
is "legal" in all encodings.
More information about the Python-list