[I18n-sig] XML and codecs

M.-A. Lemburg mal@lemburg.com
Wed, 06 Jun 2001 17:57:54 +0200


Walter Doerwald wrote:
> 
> > > > Sure, but it breaks the current API completely. The above
> > > > mechanism is different in that the communication in the error
> > > > case is done by means of an exception. While this is not as
> > > > fast as a callback it does have some advantages:
> > > >
> > > > * you can write the error handling code in the context using
> > > >   the codec
> > > >
> > > > * it enables you to write error handling code at higher levels
> > > >   in the calling stack
> > >
> > > But this means that you would have to allow the encoder to keep
> > > state between calls. That's no isse with a callback, because there
> > > is only one call.
> >
> > Well, either the codec keeps state or your application;
> > here's some pseudo code to illustrate the first situation:
> >
> > def do_something(data):
> >
> >     StreamWriter = codec.lookup('myencoding')[3]
> >     output = cStringIO(data)
> >     writer = StreamWriter(output, 'break')
> >     while 1:
> >         try:
> >             writer.write(data)
> >         except UnicodeBreakError, (reason, position, work):
> >             # Write data converted so far
> >             output.write(work)
> >             # Roll back 10 chars in the input and retry
> >             data = data[position - 10:]
> >         else:
> >             break
> >     return output.getvalue()
> 
> Apart from the fact, that I have to use a StreamWriter
> (I probably would have to anyway, since only one BOM at the
> start of an output file is required.) this looks usable.
> 
> The big question is: Is 'break' a temporary workaround
> that will go away as soon as we have error handling
> callbacks? 

No.

> Do we want error handling callbacks?

I think we should still keep them on the TODO list.
 
> And finally: How fast is it?

Since errors will always cause extra cycles to be used,
I think the small overhead of using an exception for
the notification is reasonable.

Written in C, you probably won't notice much of a slowdown
compared to a callback solution, since there exceptions are
faster than in Python (the exception objects are created
lazily in Python).
 
> > > > * it fits in with the current API
> > >
> > > That's right. Unfortunately there are a lot of functions that
> > > would have to be changed.
> >
> > That's why I prefer small steps rather than replacing the
> > complete codec suite with new interfaces.
> 
> The type of one argument changes in all the functions, i.e.
> there's a new set of *Ex functions, where
>         const char *errors
> becomes
>         PyObject *errors

... plus all the callback logic which goes with it, changes
to the way errors are handled by the codecs, etc. It is doable,
but certainly a lot of work.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/