unicode-to-ascii: replace with space, not "?"

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Thu Oct 15 00:04:36 EDT 2009


En Wed, 14 Oct 2009 23:08:53 -0300, Allen Fowler <allen.fowler at yahoo.com>  
escribió:

> I've been using "data.encode('ascii','replace')" to force an ASCII  
> string out of Unicode data, with "?" in the place of non-ASCII letters.
>
> However, now I want to use a blank space (or maybe a dash) instead of a  
> question mark.

Use a custom encoding handler:

import codecs

def replace_spc_error_handler(error):
     # error is an UnicodeEncodeError/UnicodeDecodeError instance
     # with these attributes:
     #   object = unicode object being encoded
     #   start:end = slice of object with error
     #   reason = error message
     # Must return a tuple (replacement unicode object,
     #   index into object to continue encoding)
     # or raise the same or another exception
     return (u' ' * (error.end-error.start), error.end)

codecs.register_error("replace_spc", replace_spc_error_handler)

print u"¡añá membuí!".encode("ascii", "replace_spc")
print "¡añá membuí!".decode("ascii", "replace_spc")


-- 
Gabriel Genellina




More information about the Python-list mailing list