[I18n-sig] Proposal: Extended error handling forunicode.encode

M.-A. Lemburg mal@lemburg.com
Fri, 22 Dec 2000 19:15:38 +0100

"Walter Dörwald" wrote:
> On 21.12.00 at 18:30 M.-A. Lemburg wrote:
> > [about state in encoders and error handlers]
> But I don't see how this internal encoder state should influence
> what the error handler does. There are two layers involved: The
> character encoding layer and the "unencodable character escape
> mechanism". Both layers are completely independent, even in your
> "Unicode compression" example, where the "unencodable character
> escape mechanism" is XML character entities.

This is true for your XML entity escape example, but error
resolving in general will likely need to know about the
current state of the encoder, e.g. to be able to write data
corresponding page in the Unicode compression example or to
force a switch of the current page to a different one.

I know that error handling could be more generic, but passing
a callable object instead of the error parameter is not an
option since the internal APIs all use a const char parameter
for error. Besides, I consider such an approach a hack and not
a solution.

Instead of trying to tweak the implementation into providing
some kind of new error scheme, let's focus on finding a generic
framework which could provide a solution for the general case
while not breaking the existing applications.

> > Writing your own function helpers which then apply all the necessary
> > magic is simple and doesn't warrant changing APIs in the core.
> It is not as simple as the error handler, but I could live with that.
> The big problem is that it effectively kill the speed of your
> application. Every XML application written in Python, no matter
> what is does internally, will in the end have to produce an output
> bytestring. Normally the output encoding should be one that produces
> no unencodable characters, but you have to be prepared to handle
> them. With the error handler the complete encoding will be done
> in C code (with very infrequent calls to the error handler), so
> this scheme gives the best speed possible.

It would give even better performance if the codec would provide
this hook in some way at C level. Note that almost all codecs
have their own error handlers written in C already.
> > Since the error handling is extensible by adding new options
> > such as 'callback',
> I would prefer a more object oriented way of extending the error
> handling.

Sure, but we have to assure backward compatibility as well.
> > the existing codecs could be extended to
> > provide this functionality as well. We'd only need a way to
> > pass the callback to the codecs in some way, e.g. by using
> > a keyword argument on the constructor or by subclassing it
> > and providing a new method for the error handling in question.
> There is no need for a string argument 'callback' and
> an additional callback function/method that is passed to the
> encoder. When the error argument is a string, the old mechanism
> can be used, when it is a callable object the new will be used.

This is bad style and also gives problems in the core 
implementation (have a look at unicodeobject.c).

Marc-Andre Lemburg
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/