[I18n-sig] XML and codecs

M.-A. Lemburg mal@lemburg.com
Wed, 06 Jun 2001 21:24:28 +0200


"Martin v. Loewis" wrote:
> 
> > > Well, either the codec keeps state or your application;
> > > here's some pseudo code to illustrate the first situation:
> > >
> > > def do_something(data):
> > >
> > >     StreamWriter = codec.lookup('myencoding')[3]
> > >     output = cStringIO(data)
> > >     writer = StreamWriter(output, 'break')
> > >     while 1:
> > >         try:
> > >             writer.write(data)
> > >         except UnicodeBreakError, (reason, position, work):
> > >             # Write data converted so far
> > >             output.write(work)
> > >             # Roll back 10 chars in the input and retry
> > >             data = data[position - 10:]
> > >         else:
> > >             break
> > >     return output.getvalue()
> 
> I've missed Marc's posting of this code fragment: How can rolling back
> 10 characters possibly be the right thing? Couldn't this cause data to
> be written twice to the stream?

This depends on how the codec and encoding works. The above is
just an example of how you could use the 'break' mechanism
to apply customized action in case of an error.
 
> I would expect that, when calling .write(), all correctly encoded data
> is written to the stream and that position points to the first
> character that cannot be encoded.

i think it's better not to write any information to the
stream unless you are absolutely sure that no error occurred.
Remember that you cannot take back characters which were written
to the stream.

With the above information at hand, the caller can make all 
decisions needed to assure the data written to the output 
stream is correct.

The codec will place the work done so far into the third
tuple argument and the position which caused the failure
into the second. reason can be used to provide additional
information to the caller.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/