[I18n-sig] XML and codecs
Walter Doerwald
walter@livinglogic.de
Wed, 06 Jun 2001 17:51:10 +0200
On 06.06.01 at 17:26 M.-A. Lemburg wrote:
> Walter Doerwald wrote:
> >
> > On 05.06.01 at 11:02 M.-A. Lemburg wrote:
> >
> > > [...]
> > >
> > > Sure, but it breaks the current API completely. The above
> > > mechanism is different in that the communication in the error
> > > case is done by means of an exception. While this is not as
> > > fast as a callback it does have some advantages:
> > >
> > > * you can write the error handling code in the context using
> > > the codec
> > >
> > > * it enables you to write error handling code at higher levels
> > > in the calling stack
> >
> > But this means that you would have to allow the encoder to keep
> > state between calls. That's no isse with a callback, because there
> > is only one call.
>
> Well, either the codec keeps state or your application;
> here's some pseudo code to illustrate the first situation:
>
> def do_something(data):
>
> StreamWriter =3D codec.lookup('myencoding')[3]
> output =3D cStringIO(data)
> writer =3D StreamWriter(output, 'break')
> while 1:
> try:
> writer.write(data)
> except UnicodeBreakError, (reason, position, work):
> # Write data converted so far
> output.write(work)
> # Roll back 10 chars in the input and retry
> data =3D data[position - 10:]
> else:
> break
> return output.getvalue()
Apart from the fact, that I have to use a StreamWriter
(I probably would have to anyway, since only one BOM at the
start of an output file is required.) this looks usable.
The big question is: Is 'break' a temporary workaround
that will go away as soon as we have error handling
callbacks? Do we want error handling callbacks?
And finally: How fast is it?
> > > * it fits in with the current API
> >
> > That's right. Unfortunately there are a lot of functions that
> > would have to be changed.
>
> That's why I prefer small steps rather than replacing the
> complete codec suite with new interfaces.
The type of one argument changes in all the functions, i.e.
there's a new set of *Ex functions, where
const char *errors
becomes
PyObject *errors
Bye,
Walter D=F6rwald
--
Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7
www.livinglogic.de