Detect character encoding
mwm at mired.org
Sun Dec 4 20:31:54 CET 2005
"Diez B. Roggisch" <deets at nospam.web.de> writes:
> Michal wrote:
>> is there any way how to detect string encoding in Python?
>> I need to proccess several files. Each of them could be encoded in
>> different charset (iso-8859-2, cp1250, etc). I want to detect it,
>> and encode it to utf-8 (with string function encode).
> But there is _no_ way to be absolutely sure. 8bit are 8bit, so each
> file is "legal" in all encodings.
Not quite. Some encodings don't use all the valid 8-bit characters, so
if you encounter a character not in an encoding, you can eliminate it
from the list of possible encodings. This doesn't really help much by
Mike Meyer <mwm at mired.org> http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
More information about the Python-list