A 'raw' codec for binary "strings" in Python?
Jeff Epler
jepler at unpythonic.net
Mon Mar 1 16:51:26 EST 2004
You have to understand the difference between
"\xc0".encode('US-ASCII', 'replace')
and
u"\xc0".encode('US-ASCII', 'replace')
.. the latter returns the string '?', the former probably throws an
error assuming that tour default encoding is 'ascii'.
That's because ''.encode(...) is really the same as
''.decode(sys.getdefaultencoding()).encode(...) It's in the decode step
that the error is being raised.
You could use
"\xc0".decode("iso-8859-1").encode('US-ASCII', 'replace')
or you could use ''.translate:
s = ''.join([chr(x) for x in range(128,256)])
t = '?' * 128
replace_map = string.maketrans(s, t)
>>> "abc\xc0\xff".translate(replace_map)
'abc??'
Jeff
More information about the Python-list
mailing list