[Python-Dev] PEP 293, Codec Error Handling Callbacks
Mon, 5 Aug 2002 14:46:36 +0300
On Mon, Aug 05, 2002 at 10:12:30AM +0200, M.-A. Lemburg wrote:
> I'd like to put the following PEP up for pronouncement. Walter
> is currently on vacation, but he asked me to already go ahead
> with the process.
> I like the patch a lot and the implementation strategy is very
> interesting as well (just wish that classes were new types --
> then things could run a tad faster and the patch would be
Here's another implementation strategy:
Charmap entries can currently be None, an integer or a unicode string. I
suggest adding another option: a function or other callable. The function
will be called with the input string and current position as arguments and
return a 2-tuple of the replacement string and number of characters
consumed. This will make it very easy to take the decoding charmap of an
existing codec and patch it with a special-case for one character like '&'
to generate character references, for example.
The function may raise an exception. The error strategy argument will
not be overloaded with new functionality - it will just determine whether
this exception will be ignored or passed on.
An existing encoding charmap (usually a dictionary) can also be patched for
special characters like <,>,&. A special entry with a None key will be
the default entry used on a KeyError and will usually be mapped to a
function. If no None key is present the charmap will behave exactly the way
it does now.
Tying it all together:
A codec that does both charmap and entity reference translations may be
dynamically generated. A function will be registered that intercepts
any codec name that looks like 'xmlcharref.CODECNAME', import that codec,
create patched charmaps and return the (enc, dec, reader, writer) tuple.
The dynamically created entry will be cached for later use.