[I18n-sig] XML and codecs
Martin v. Loewis
martin@loewis.home.cs.tu-berlin.de
Wed, 6 Jun 2001 20:33:07 +0200
> > Well, either the codec keeps state or your application;
> > here's some pseudo code to illustrate the first situation:
> >
> > def do_something(data):
> >
> > StreamWriter = codec.lookup('myencoding')[3]
> > output = cStringIO(data)
> > writer = StreamWriter(output, 'break')
> > while 1:
> > try:
> > writer.write(data)
> > except UnicodeBreakError, (reason, position, work):
> > # Write data converted so far
> > output.write(work)
> > # Roll back 10 chars in the input and retry
> > data = data[position - 10:]
> > else:
> > break
> > return output.getvalue()
I've missed Marc's posting of this code fragment: How can rolling back
10 characters possibly be the right thing? Couldn't this cause data to
be written twice to the stream?
I would expect that, when calling .write(), all correctly encoded data
is written to the stream and that position points to the first
character that cannot be encoded.
Regards,
Martin