[I18n-sig] XML and codecs
M.-A. Lemburg
mal@lemburg.com
Sat, 02 Jun 2001 13:26:14 +0200
"Martin v. Loewis" wrote:
>
> > Until we've found a backward compatible way to fix this, how
> > about adding a new error handling scheme which at least gives
> > the caller enough information to do some smart processing on the
> > input and output, e.g.
> >
> > errors="break":
> >
> > raise an UnicodeBreakError with argument
> > (reason, error_position_in_input, work_done_so_far)
>
> That is good enough, IMO, so let's do it.
Ok.
> I think we also need a few
> well-defined reasons, in particular
>
> UnicodeBreakError.CannotConvert # character not supported in target
> # character set
> UnicodeBreakError.OutOfData # input string stops in the middle
> # of a character
>
> The latter case deals with the nasty problem of UTF-8 input which
> breaks if your file.read() call happens to split a UTF-8 sequence.
> Of course, the well-known reasons could be subclasses, too.
Fine.
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/