[Python-Dev] PEP 293, Codec Error Handling Callbacks

M.-A. Lemburg mal@lemburg.com
Mon, 12 Aug 2002 18:03:14 +0200

Martin v. Loewis wrote:
> Walter D=F6rwald <walter@livinglogic.de> writes:
>> > - charmap_encoding_error, which insists on implementing known error
>> >   handling algorithms inline,
>>This is done for performance reasons.
> Is that really worth it? Such errors are rare, and when they occur,
> they usually cause an exception as the result of the "strict" error
> handling.
> I'd strongly encourage you to avoid duplication of code, and use
> Python whereever possible.

See below: this is not always possible; much for the same reason
that exceptions are implemented in C as well.

>>The PyCodec_XMLCharRefReplaceErrors functionality is
>>independent of the rest, so moving this to Python
>>won't reduce complexity that much. And it will
>>slow down "xmlcharrefreplace" handling for those
>>codecs that don't implement it inline.
> Sure it will. But how much does that matter in the overall context of
> generating HTML/XML?
>> > - the UnicodeError exception methods (which could be omitted, IMO).
>>Those methods were implemented so that we can easily
>>move to new style exceptions.=20
> What are new-style exceptions?=20

Exceptions that are built as subclassable types.

>>The exception attributes can then be members of the C struct and the
>>accessor functions can be simple macros.
> Again, I sense premature optimization.

There's nothing premature here. By moving exception handling to
C level, you get *much* better performance than at Python level.
Remember that applications like e.g. escaping chars in an XML
document can cause lots of these exceptions to be generated.

>>1. For each error handler two Python function objects are created:
>>One in the registry and a different one in the codecs module. This
>>means that e.g.
>>codecs.lookup_error("replace") !=3D codecs.replace_errors
> Why would this be a problem?=20
>>We can fix that by making the name ob the Python function object
>>globally visible or by changing the codecs init function to do a
>>lookup and use the result or simply by removing codecs.replace_errors
> I recommend to fix this by implementing the registry in Python.

This doesn't work as I've already explained before. The predefined
error handling modes of builtin codecs must work with relying on
the Python import mechanism.

>>4. Assigning to an attribute of an exception object does not
>>change the appropriate entry in the args attribute. Is this
>>worth changing?
> No. Exception objects should be treated as immutable (even if they
> aren't). If somebody complains, we can fix it; until then, it suffices
> if this is documented.

What ? That exceptions are immutable ? I think it's a big win that
exceptions are in fact mutable -- they are great for transporting
extra information up the chain...

except Exception, obj:
     obj.been_there =3D 1

>>5. UTF-7 decoding does not yet take full advantage of the machinery:
>>When an unterminated shift sequence is encountered (e.g. "+xxx")
>>the faulty byte sequence has already been emitted.
> It would be ok if it works as good as it did in 2.2. UTF-7 is rarely
> used; if it is used, it is machine-generated, so there shouldn't be
> any errors.


Marc-Andre Lemburg
CEO eGenix.com Software GmbH
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/