[I18n-sig] XML and codecs

M.-A. Lemburg mal@lemburg.com
Sat, 02 Jun 2001 13:26:14 +0200


"Martin v. Loewis" wrote:
> 
> > Until we've found a backward compatible way to fix this, how
> > about adding a new error handling scheme which at least gives
> > the caller enough information to do some smart processing on the
> > input and output, e.g.
> >
> > errors="break":
> >
> >       raise an UnicodeBreakError with argument
> >         (reason, error_position_in_input, work_done_so_far)
> 
> That is good enough, IMO, so let's do it. 

Ok.

> I think we also need a few
> well-defined reasons, in particular
> 
> UnicodeBreakError.CannotConvert # character not supported in target
>                                 # character set
> UnicodeBreakError.OutOfData     # input string stops in the middle
>                                 # of a character
> 
> The latter case deals with the nasty problem of UTF-8 input which
> breaks if your file.read() call happens to split a UTF-8 sequence.
> Of course, the well-known reasons could be subclasses, too.

Fine.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/