how to detect the encoding used for a specific text data ?

Jussi Piitulainen jpiitula at ling.helsinki.fi
Thu Dec 20 15:10:08 CET 2012


iMath writes:

> which package to use ?

Read the text in as a "bytes object" (bytes), then it has a .decode
method that you can experiment with. Strings (str) are Unicode and
have an .encode method. These methods allow you to specify a desired
encoding and and what to do when there are errors.

help(bytes.decode)
help(str.encode)
help(open)
<http://docs.python.org/3.3/library/stdtypes.html>

In Python 2.7 and before, strings seem to do double duty and have both
the .encode and .decode methods, so Python version matters here.



More information about the Python-list mailing list