[I18n-sig] Proposal: Extended error handling for unicode.encode

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Wed, 20 Dec 2000 21:54:56 +0100


> The problem with this is that the error handler will usually
> have to have access to the internal data structure of the codec
> to be able to process the error, e.g. <char> in your example
> could be a single character, a UTF-16 sequence, etc. 

Please note that in his encoding, char is a Unicode string
(specifically, character), so it can't be a UTF-16 sequence.
What *encoder* that you know needs to have internal state?

Anyway, if you think that state should be accessible to the error
handling function, it won't be hard to pass state to the callback.
E.g. you could pass the string being encoded, the current position,
and optionally a Codec instance (many codecs would pass None, as they
don't keep any state).

> The codec in general knows better what to do in case of an error

In the demonstrated use case, it doesn't know. It should create an XML
character entity, but doesn't know anything about XML character
entities.

> Since your main problem is locating the character causing the
> error, one possibility would be to extend the error instance
> to reference the position of the error as error instance
> attribute, e.g. unierror.position.

That would work as well, but it would require to re-encode everything
up to that position. The callback solution is more general.

Regards,
Martin