[Python-Dev] PEP 293, Codec Error Handling Callbacks

M.-A. Lemburg mal@lemburg.com
Tue, 06 Aug 2002 10:06:13 +0200


Oren Tirosh wrote:
> On Mon, Aug 05, 2002 at 11:06:25PM +0200, Martin v. Loewis wrote:
> 
>>If you look at the patch, you see that it precisely does what you
>>propose to do: add a callback to the charmap codec:
> 
> But it's NOT an error. It's new encoding functionality.  What if the new 
> functionality you've added this way has an error of its own? Perhaps you
> would like to have a flag to tell it whether to ignore error or raise an
> exception?  Sorry, that argument has been taken over for another purpose.  
> 
> The real problem was some missing functionality in codecs. Here are two 
> approaches to solve the problem:
> 
> 1. Add the missing functionality.
> 
> 2. Keep the old, limited functionality, let it fail, catch the error,
> re-use an argument originally intended for an error handling strategy to 
> shoehorn a callback that can implement the missing functionality, add a new 
> name-based registry to overcome the fact that the argument must be a string.
> Since this approach is conceptually stuck on treating it as an error it 
> actually creates and discards a new exception object for each character 
> converted via this path.
> 
> Ummm... <scratches head>, tough choice.

Oren, if you just want a codec which encodes and decodes
HTML entities, then this can be done easily by writing a codec
which works on Unicode only and is stacked on top of the other
existing codecs, e.g. if you first encode all non-printable
and non-ASCII code points using entity escapes and then pass
this Unicode string to one of the other codecs, you have
a solution to your problem.

Note that this is different from trying to
provide a work-around for encoding code points from Unicode
for which there are no corresponding mappings in a given
encoding. These situations would normally result in an
exception. Now HTML and XML offer you the possibility to
use special escapes for these, so that you can still encode
the complete Unicode set into e.g. ASCII, but only under
the premises that the encoded data is HTML or XML text.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/