Detect character encoding

Mike Meyer mwm at
Sun Dec 4 20:31:54 CET 2005

"Diez B. Roggisch" <deets at> writes:
> Michal wrote:
>> is there any way how to detect string encoding in Python?
>> I need to proccess several files. Each of them could be encoded in
>> different charset (iso-8859-2, cp1250, etc). I want to detect it,
>> and encode it to utf-8 (with string function encode).
> But there is _no_ way to be absolutely sure. 8bit are 8bit, so each
> file is "legal" in all encodings.

Not quite. Some encodings don't use all the valid 8-bit characters, so
if you encounter a character not in an encoding, you can eliminate it
from the list of possible encodings. This doesn't really help much by
itself, though.

Mike Meyer <mwm at>
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.

More information about the Python-list mailing list