Detect character encoding
The new guy
not at interesting.com
Tue Dec 6 05:59:49 CET 2005
> is there any way how to detect string encoding in Python?
> I need to proccess several files. Each of them could be encoded in
> different charset (iso-8859-2, cp1250, etc). I want to detect it, and
> encode it to utf-8 (with string function encode).
Well, about how to detect it in Python, I can't help. My first guess,
though, would be to have a look at the source code of the "file" utility.
This is an example of what it does:
# file *
de.i18n: ISO-8859 text, with very long lines
en.i18n: ISO-8859 English text, with very long lines
More information about the Python-list