Customizing character set conversions with an error handler
Jukka Aho
jukka.aho at iki.fi
Tue Mar 14 09:36:49 EST 2006
Serge Orlov wrote:
>> # So the question becomes: how can I make this work
>> # in a graceful manner?
> change the return statement with this code:
>
> return (substitution.encode(error.encoding,"practical").decode(
> error.encoding), error.start+1)
Thanks, that was a quite neat recursive solution. :) I wouldn't have
thought of that.
I ended up doing it without the recursion, by testing the individual
problematic code points with .encode() within the handler, and catching
the possible exceptions:
--- 8< ---
# This is our original problematic code point:
c = error.object[error.start]
while 1:
# Search for a substitute code point in
# our table:
c = table.get(c)
# If a substitute wasn't found, convert the original code
# point into a hexadecimal string representation of itself
# and exit the loop.
if c == None:
c = u"[U+%04x]" % ord(error.object[error.start])
break
# A substitute was found, but we're not sure if it is OK
# for for our target encoding. Let's check:
try:
c.encode(error.encoding,'strict')
# No exception; everything was OK, we
# can break off from the loop now
break
except UnicodeEncodeError:
# The mapping that was found in the table was not
# OK for the target encoding. Let's loop and try
# again; there might be a better (more generic)
# substitution in the chain waiting for us.
pass
--- 8< ---
--
znark
More information about the Python-list
mailing list