Detect character encoding

Diez B. Roggisch deets at
Sun Dec 4 16:24:03 CET 2005

Michal wrote:
> Hello,
> is there any way how to detect string encoding in Python?
> I need to proccess several files. Each of them could be encoded in 
> different charset (iso-8859-2, cp1250, etc). I want to detect it, and 
> encode it to utf-8 (with string function encode).

You can only guess, by e.g. looking for words that contain e.g. umlauts. 
Recode might be of help here, it has such heuristics built in AFAIK.

But there is _no_ way to be absolutely sure. 8bit are 8bit, so each file 
is "legal" in all encodings.


More information about the Python-list mailing list