Is this the right way to write a codec error handler?
Serhiy Storchaka
storchaka at gmail.com
Sat Jan 20 05:57:45 EST 2018
20.01.18 10:32, Steven D'Aprano пише:
> I want an error handler that falls back on Latin-1 for anything which
> cannot be decoded. Is this the right way to write it?
>
>
> def latin1_fallback(exception):
> assert isinstance(exception, UnicodeError)
> start, end = exception.start, exception.end
> obj = exception.object
> if isinstance(exception, UnicodeDecodeError):
> return (obj[start:end].decode('latin1'), end+1)
> elif isinstance(exception, UnicodeEncodeError):
> return (obj[start:end].encode('latin1'), end+1)
> else:
> raise
Just `end` instead of `end+1`.
And it is safer to use `bytes.decode(obj[start:end], 'latin1')` or
`str(obj[start:end], 'latin1')` instead of
`obj[start:end].decode('latin1')`. Just for the case if obj has
overridden decode() method.
Otherwise LGTM.
More information about the Python-list
mailing list