[Python-Dev] PEP 293, Codec Error Handling Callbacks

Oren Tirosh oren-py-d@hishome.net
Mon, 5 Aug 2002 14:46:36 +0300

On Mon, Aug 05, 2002 at 10:12:30AM +0200, M.-A. Lemburg wrote:
> I'd like to put the following PEP up for pronouncement. Walter
> is currently on vacation, but he asked me to already go ahead
> with the process.
> 	http://www.python.org/peps/pep-0293.html
> I like the patch a lot and the implementation strategy is very
> interesting as well (just wish that classes were new types --
> then things could run a tad faster and the patch would be
> simpler).

Here's another implementation strategy:

Charmap entries can currently be None, an integer or a unicode string. I
suggest adding another option: a function or other callable. The function
will be called with the input string and current position as arguments and
return a 2-tuple of the replacement string and number of characters
consumed.  This will make it very easy to take the decoding charmap of an 
existing codec and patch it with a special-case for one character like '&'
to generate character references, for example. 

The function may raise an exception.  The error strategy argument will 
not be overloaded with new functionality - it will just determine whether 
this exception will be ignored or passed on.

An existing encoding charmap (usually a dictionary) can also be patched for 
special characters like <,>,&.  A special entry with a None key will be
the default entry used on a KeyError and will usually be mapped to a 
function.  If no None key is present the charmap will behave exactly the way 
it does now.  

Tying it all together:

A codec that does both charmap and entity reference translations may be 
dynamically generated.  A function will be registered that intercepts 
any codec name that looks like 'xmlcharref.CODECNAME', import that codec, 
create patched charmaps and return the (enc, dec, reader, writer) tuple.
The dynamically created entry will be cached for later use.