A 'raw' codec for binary "strings" in Python?

Jeff Epler jepler at unpythonic.net
Mon Mar 1 22:51:26 CET 2004

You have to understand the difference between
    "\xc0".encode('US-ASCII', 'replace')
    u"\xc0".encode('US-ASCII', 'replace')
.. the latter returns the string '?', the former probably throws an
error assuming that tour default encoding is 'ascii'.
That's because ''.encode(...) is really the same as
''.decode(sys.getdefaultencoding()).encode(...)  It's in the decode step
that the error is being raised.

You could use
    "\xc0".decode("iso-8859-1").encode('US-ASCII', 'replace')
or you could use ''.translate:
    s = ''.join([chr(x) for x in range(128,256)])
    t = '?' * 128
    replace_map = string.maketrans(s, t)

>>> "abc\xc0\xff".translate(replace_map)


More information about the Python-list mailing list