Determining the encoding of a text file

David Opstad opstad at batnet.com
Mon Mar 1 10:47:23 EST 2004


In article <85b5e3f8.0403010224.939e8f8 at posting.google.com>,
 rajorshi at fastmail.fm (Rajorshi) wrote:

>  How do I determine the encoding of a text file ? That is,
> given a text file I want to know the encoding it is in
> UTF8 or UTF16 or Latin etc. It would be very helpful if
> you could tell me how to do this in python on Linux. But
> just the method is acceptable.

If the first byte in the file is 0xFE and the second is 0xFF, then it's 
likely the file is encoded in big-endian UTF-16. If the first byte is 
0xFF and the second is 0xFE, then it's likely to be little-endian UTF-16.

Once you've eliminated those possibilities, then it gets trickier...

Dave



More information about the Python-list mailing list