[Tutor] String encoding

Jerry Hill malaclypse2 at gmail.com
Fri Aug 26 17:23:14 CEST 2011

On Thu, Aug 25, 2011 at 7:07 PM, Prasad, Ramit
<ramit.prasad at jpmorgan.com> wrote:
> Nice catch! Yeah, I am stuck on the encoding mechanism as well. I know how to encode/decode...but not what encoding to use. Is there a reference that I can look up to find what encoding that would correspond to? I know what the character looks like if that helps. I know that Python does display the correct character sometimes, but not sure when or why.

In this case, the encoding is almost certainly "latin-1".  I know that
from playing around at the interactive interpreter, like this:

>>> s = 'M\xc9XICO'
>>> print s.decode('latin-1')

If you want to see charts of various encodings, wikipedia has a bunch.
 For instance, the Latin-1 encoding is here:
http://en.wikipedia.org/wiki/ISO/IEC_8859-1 and UTF-8 is here:

As the other respondents have said, it's really hard to figure this
out just in code.  The chardet module mentioned by Steven D'Aprano is
probably the best bet if you really *have* to guess the encoding of an
arbitrary sequence of bytes, but it much, much better to actually know
the encoding of your inputs.

Good luck!


More information about the Tutor mailing list