[I18n-sig] XML and codecs
Martin v. Loewis
martin@loewis.home.cs.tu-berlin.de
Sat, 2 Jun 2001 00:17:32 +0200
> Until we've found a backward compatible way to fix this, how
> about adding a new error handling scheme which at least gives
> the caller enough information to do some smart processing on the
> input and output, e.g.
>
> errors="break":
>
> raise an UnicodeBreakError with argument
> (reason, error_position_in_input, work_done_so_far)
That is good enough, IMO, so let's do it. I think we also need a few
well-defined reasons, in particular
UnicodeBreakError.CannotConvert # character not supported in target
# character set
UnicodeBreakError.OutOfData # input string stops in the middle
# of a character
The latter case deals with the nasty problem of UTF-8 input which
breaks if your file.read() call happens to split a UTF-8 sequence.
Of course, the well-known reasons could be subclasses, too.
Regards,
Martin