[I18n-sig] XML and codecs
M.-A. Lemburg
mal@lemburg.com
Wed, 06 Jun 2001 17:57:54 +0200
Walter Doerwald wrote:
>
> > > > Sure, but it breaks the current API completely. The above
> > > > mechanism is different in that the communication in the error
> > > > case is done by means of an exception. While this is not as
> > > > fast as a callback it does have some advantages:
> > > >
> > > > * you can write the error handling code in the context using
> > > > the codec
> > > >
> > > > * it enables you to write error handling code at higher levels
> > > > in the calling stack
> > >
> > > But this means that you would have to allow the encoder to keep
> > > state between calls. That's no isse with a callback, because there
> > > is only one call.
> >
> > Well, either the codec keeps state or your application;
> > here's some pseudo code to illustrate the first situation:
> >
> > def do_something(data):
> >
> > StreamWriter = codec.lookup('myencoding')[3]
> > output = cStringIO(data)
> > writer = StreamWriter(output, 'break')
> > while 1:
> > try:
> > writer.write(data)
> > except UnicodeBreakError, (reason, position, work):
> > # Write data converted so far
> > output.write(work)
> > # Roll back 10 chars in the input and retry
> > data = data[position - 10:]
> > else:
> > break
> > return output.getvalue()
>
> Apart from the fact, that I have to use a StreamWriter
> (I probably would have to anyway, since only one BOM at the
> start of an output file is required.) this looks usable.
>
> The big question is: Is 'break' a temporary workaround
> that will go away as soon as we have error handling
> callbacks?
No.
> Do we want error handling callbacks?
I think we should still keep them on the TODO list.
> And finally: How fast is it?
Since errors will always cause extra cycles to be used,
I think the small overhead of using an exception for
the notification is reasonable.
Written in C, you probably won't notice much of a slowdown
compared to a callback solution, since there exceptions are
faster than in Python (the exception objects are created
lazily in Python).
> > > > * it fits in with the current API
> > >
> > > That's right. Unfortunately there are a lot of functions that
> > > would have to be changed.
> >
> > That's why I prefer small steps rather than replacing the
> > complete codec suite with new interfaces.
>
> The type of one argument changes in all the functions, i.e.
> there's a new set of *Ex functions, where
> const char *errors
> becomes
> PyObject *errors
... plus all the callback logic which goes with it, changes
to the way errors are handled by the codecs, etc. It is doable,
but certainly a lot of work.
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/