Determining the encoding of a text file

J.R. j.r.gao at
Tue Mar 2 03:20:21 CET 2004

"Rajorshi" <rajorshi at> wrote in message
news:85b5e3f8.0403010224.939e8f8 at
> Hello!
>  How do I determine the encoding of a text file ? That is,
> given a text file I want to know the encoding it is in
> UTF8 or UTF16 or Latin etc. It would be very helpful if
> you could tell me how to do this in python on Linux. But
> just the method is acceptable.
> Thanks in advance!

The python integrated development environment IDLE, which is distributed
alone with python, shows one approach how to decode a
string. You could find it in the file $PYTHON/lib/idlelib/, find
the decode().

But it's not perfect, you could integrate with Skip's example writing your
Additional, if you want to guess the Chinese encoding, the perl lib
may be for your reference, it can support GB2312-80, Hz, Big5, UTF-8, etc.


More information about the Python-list mailing list