[I18n-sig] Proposal: Extended error handlingforunicode.encode

M.-A. Lemburg mal@lemburg.com
Sat, 06 Jan 2001 16:32:10 +0100

"Martin v. Loewis" wrote:
> > The codec design is supposed to cover the general case of
> > encoding/decoding arbitrary data from and to arbitrary formats.
> Where is it documented as such? I believe it is wishful thinking to
> assume they cover some general case, although I have to acknowledge
> that *your* wish is more relevant than other people's wishes.

Please see Misc/unicode.txt for details. I tried to design the
interface with a larger application range in mind and that's
what I will continue to argue for, obviously ;-)

> [ranting about the codec design being useless for other applications]

I don't see the point in trying to argue for uselessness of
an existing design. If you want your own design, then nobody 
will stop you from rolling your own.

> > > > If we were to provide a callback as optional method to
> > > > StreamReaders/Writers, the task could be done either statically
> > > > by subclassing the existing codec StreamReaders/Writers or
> > > > dynamically by asking the codec registry to return the StreamReader/
> > > > Writer classes.
> > >
> > > So how would the implementation of charmap_encode invoke this method?
> > > It currently doesn't even get hold of the codec object.
> >
> > Through the extended API I proposed earlier on: the extra context
> > object would allow providing a callback mechanism. Alternatively,
> > the StreamRead/Writer classes could use their own specific
> > C coding functions.
> Was there some detailed proposal of an API? I don't recall that; could
> you kindly point me to the message in the archives which elaborate
> that proposal?

There wasn't a detailed proposal, only a design idea...

For the general case, I'd rather add new PyUnicode_EncodeEx()
and PyUnicode_DecodeEx() APIs which then take a Python
context object as extra argument. The error treatment string
would then define how to use this context object, e.g. 'callback'
could be made to apply processing similar to what Walter

The xxxEx() APIs will have to take special precautions to also
work with pre-2.1 codecs though, since the codec API definition
does not include the extra context objext.
> Specifically, as an author of an application that wants to extend
> existing codecs, could you post some Python code that shows how to
> create the context objects (including an implementation of the codec
> object's class), and how to pass it to Unicodeobject.encode?

Sure, but only *after* the context object design has implemented..
otherwise there wouldn't be a point ;-)
> > Exactly. There is a set of error strings which the codec
> > must accept, but it is free to also implement other schemes
> > as well.
> Ok, the guaranteed error strings being 'strict','ignore' and
> 'replace'.

> > I chose strings to simplify the implementation. Back when the
> > design was discussed, we figured that the codec should take
> > care of the error handling. Python's codec design is one of
> > the few which does allow setting error handling behaviour --
> > other implementations tend to simply raise an exception and leave
> > the user in the dark.
> >
> > It's too late to *change* the design. We can only extend it.
> It's too late to change the *API*, the design of it can be changed as
> long as the current API still emerges as a special case. That's what
> Walter's proposal does: The API is extended to allow callable objects
> as the eror parameter, and three well-known constants are
> provided (codecs.{STRICT|IGNORE|REPLACE}).

No, it does not: the error string parameter is defined as "const char*".
You can't change that to PyObject* in the C API and for the Python API
I wouldn't want to introduce "switch semantics on type" variables.
Extending APIs is OK, changing them is not.

I'll right a patch which implements the 'xml-escape' error
treatment. Hopefully that will buy us some time to think of
a design extension -- provided you play along :-)

Marc-Andre Lemburg
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/