[I18n-sig] XML and codecs

M.-A. Lemburg mal@lemburg.com
Fri, 01 Jun 2001 23:23:02 +0200


"Martin v. Loewis" wrote:
> 
> As for XML and encodings, having a convenient mechanism to extend
> existing codecs to encode unknown characters as character entities is
> much more important, IMO, since that is very difficult to achieve with
> the existing API.

Until we've found a backward compatible way to fix this, how
about adding a new error handling scheme which at least gives
the caller enough information to do some smart processing on the
input and output, e.g.

errors="break":

	raise an UnicodeBreakError with argument
        (reason, error_position_in_input, work_done_so_far)

The caller could then use the information returned
by the codec to fix the input data and reuse the already
encoded/decoded data to avoid duplicate work.

This scheme is very simple, but also very effective, since
it allows complex error processing to be done in the
namespace where the data is being processed (rather than
in a callback which wouldn't have access to this namespace).

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/