how to detect the encoding used for a specific text data ?
jpiitula at ling.helsinki.fi
Thu Dec 20 15:10:08 CET 2012
> which package to use ?
Read the text in as a "bytes object" (bytes), then it has a .decode
method that you can experiment with. Strings (str) are Unicode and
have an .encode method. These methods allow you to specify a desired
encoding and and what to do when there are errors.
In Python 2.7 and before, strings seem to do double duty and have both
the .encode and .decode methods, so Python version matters here.
More information about the Python-list