[I18n-sig] Proposal: Extended error handlingforunicode.encode

M.-A. Lemburg mal@lemburg.com
Mon, 08 Jan 2001 16:52:14 +0100


what is the point of these endless discussions about use-cases
(which you seem esp. fond of ;), design vs. API, Walter's proposal
and whether or not the codec design covers more general cases than
just encoding and decoding from and to Unicode ?

These discussions don't get us anywhere.

To summarize:

* the codec design was discussed at length early last year
* the design was chosen after many useful suggestions from people
  who know what codecs have to deal with (e.g. Andy, Fredrik
  (from the PIL-perspective BTW)) and others
* the design is written down in Misc/unicode.txt
* extending the design is OK, breaking APIs is not
* extending the design by adding parameters is OK, extending
  the design by switching on parameter type is not
* I have no problem with extending the design
* Walter's proposal breaks the Unicode C API in untolerable ways;
  I agree that the general idea is worth persuing though and
  Walter's proposal has some good ideas into that direction

So where are we heading ?

* I will start to code a new error treatment option 'xml-escape'
  which can then also be used as basis for other escape techniques
  which might be of general use (e.g. 'unicode-escape')
* we should start thinking of ways to extend the existing C API
  to allow providing a context object to the encoder/decoder. I've
  already made a few suggestions into that direction; more are to
  come once I find more time to work on this; other suggestions
  are, of course, welcome too
* the new error handler extensions will be a post-2.1 feature
* a PEP is needed for the design (most people don't read endless 
  threads like these to catch up)

What the PEP should include:

* a proposal for extending the Unicode C API to provide an
  extra context object to the encoder/decoder functions (which
  are otherwise stateless)
* a hook for StreamWriters/Readers to use as standard error
  handler in case 'callback' is used as error handling option
* the Python APIs .encode() and unicode() should be extended
  by a third optional argument: the context object
* all builtin codecs should be extended to handle the new
* Codec.encode and .decode APIs should allow a context object as
  additional optional argument; default should be None
* the changes must be 100% backward compatible, both at C
  and at Python level

Marc-Andre Lemburg
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/