[I18n-sig] XML and codecs

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Sat, 2 Jun 2001 00:17:32 +0200


> Until we've found a backward compatible way to fix this, how
> about adding a new error handling scheme which at least gives
> the caller enough information to do some smart processing on the
> input and output, e.g.
> 
> errors="break":
> 
> 	raise an UnicodeBreakError with argument
>         (reason, error_position_in_input, work_done_so_far)

That is good enough, IMO, so let's do it. I think we also need a few
well-defined reasons, in particular

UnicodeBreakError.CannotConvert # character not supported in target
                                # character set
UnicodeBreakError.OutOfData     # input string stops in the middle
                                # of a character

The latter case deals with the nasty problem of UTF-8 input which
breaks if your file.read() call happens to split a UTF-8 sequence.
Of course, the well-known reasons could be subclasses, too.

Regards,
Martin