Re: [Python-Dev] PEP 293, Codec Error Handling Callbacks

Aug. 12, 2002


      Walter Dörwald <walter@livinglogic.de> writes:
...
...
- charmap_encoding_error, which insists on implementing known error
  handling algorithms inline,
This is done for performance reasons.
Is that really worth it? Such errors are rare, and when they occur,
they usually cause an exception as the result of the "strict" error
handling.

I'd strongly encourage you to avoid duplication of code, and use
Python whereever possible.
...
The PyCodec_XMLCharRefReplaceErrors functionality is
independent of the rest, so moving this to Python
won't reduce complexity that much. And it will
slow down "xmlcharrefreplace" handling for those
codecs that don't implement it inline.
Sure it will. But how much does that matter in the overall context of
generating HTML/XML?
...
...
- the UnicodeError exception methods (which could be omitted, IMO).
Those methods were implemented so that we can easily
move to new style exceptions.
What are new-style exceptions?
...
The exception attributes can then be members of the C struct and the
accessor functions can be simple macros.
Again, I sense premature optimization.
...
1. For each error handler two Python function objects are created:
One in the registry and a different one in the codecs module. This
means that e.g.
codecs.lookup_error("replace") != codecs.replace_errors
Why would this be a problem?
...
We can fix that by making the name ob the Python function object
globally visible or by changing the codecs init function to do a
lookup and use the result or simply by removing codecs.replace_errors
I recommend to fix this by implementing the registry in Python.
...
4. Assigning to an attribute of an exception object does not
change the appropriate entry in the args attribute. Is this
worth changing?
No. Exception objects should be treated as immutable (even if they
aren't). If somebody complains, we can fix it; until then, it suffices
if this is documented.
...
5. UTF-7 decoding does not yet take full advantage of the machinery:
When an unterminated shift sequence is encountered (e.g. "+xxx")
the faulty byte sequence has already been emitted.
It would be ok if it works as good as it did in 2.2. UTF-7 is rarely
used; if it is used, it is machine-generated, so there shouldn't be
any errors.

Regards,
Martin

Re: [Python-Dev] PEP 293, Codec Error Handling Callbacks

martin＠v.loewis.de