Binary strings, unicode and encodings

Laurent Therond google at
Fri Jan 16 00:29:21 CET 2004

I used the interpreter on my system:

>>> import sys
>>> sys.getdefaultencoding()


>>> from cStringIO import StringIO
>>> b = StringIO()
>>> b.write('%d:%s' % (len('string'), 'string'))
>>> print b.getvalue()


>>> c = StringIO()
>>> c.write('%d:%s' % (len('stringé'), 'stringé'))
>>> print c.getvalue()


Did StringIO just recognize Extended ASCII?
Did StringIO just recognize ISO 8859-1?

é belongs to Extended ASCII AND ISO 8859-1.

>>> print c.getvalue().decode('US-ASCII')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 8: ordinal
not in range(128)

>>> print c.getvalue().decode('ISO-8859-1')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "C:\Python23\lib\encodings\", line 18, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\x82' in position 8
: character maps to <undefined>


It must have been Extended ASCII, then.

I must do other tests.

More information about the Python-list mailing list