A 'raw' codec for binary "strings" in Python?

Jeff Epler jepler at unpythonic.net
Mon Mar 1 16:51:26 EST 2004


You have to understand the difference between
    "\xc0".encode('US-ASCII', 'replace')
and
    u"\xc0".encode('US-ASCII', 'replace')
.. the latter returns the string '?', the former probably throws an
error assuming that tour default encoding is 'ascii'.
That's because ''.encode(...) is really the same as
''.decode(sys.getdefaultencoding()).encode(...)  It's in the decode step
that the error is being raised.

You could use
    "\xc0".decode("iso-8859-1").encode('US-ASCII', 'replace')
or you could use ''.translate:
    s = ''.join([chr(x) for x in range(128,256)])
    t = '?' * 128
    replace_map = string.maketrans(s, t)

>>> "abc\xc0\xff".translate(replace_map)
'abc??'

Jeff




More information about the Python-list mailing list