[I18n-sig] XML and codecs

Walter Doerwald walter@livinglogic.de
Tue, 05 Jun 2001 10:39:04 +0200

On 01.06.01 at 23:23 M.-A. Lemburg wrote:

> "Martin v. Loewis" wrote:
> > 
> > As for XML and encodings, having a convenient mechanism to extend
> > existing codecs to encode unknown characters as character entities is
> > much more important, IMO, since that is very difficult to achieve with
> > the existing API.
> Until we've found a backward compatible way to fix this, how
> about adding a new error handling scheme which at least gives
> the caller enough information to do some smart processing on the
> input and output, e.g.
> errors=3D"break":
> 	raise an UnicodeBreakError with argument
>         (reason, error_position_in_input, work_done_so_far)
> The caller could then use the information returned
> by the codec to fix the input data and reuse the already
> encoded/decoded data to avoid duplicate work.

How would UTF-16 be handled? I guess without additional
code multiple BOMs would be generated for a string that
contains unencodable characters.

> This scheme is very simple, but also very effective, since
> it allows complex error processing to be done in the
> namespace where the data is being processed (rather than
> in a callback which wouldn't have access to this namespace).

A callback could be a class instance with a __call__ method
and so can have as much state information as it needs.

   Walter D=F6rwald

Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7