[I18n-sig] XML and codecs

M.-A. Lemburg mal@lemburg.com
Tue, 05 Jun 2001 22:23:30 +0200

"Martin v. Loewis" wrote:
> > > What do you mean: "provided it's a StreamReader/Writer". What if I
> > > invoke the encode method found in codec lookup, and get an exception?
> >
> > The encoders/decoders returned in the lookup tuple are not
> > supposed to store state. If you want to or need to store state,
> > then you should use the factory functions (StreamWriter and
> > -Reader) to first create an instance which can store state
> > and then use its .encode()/.decode() methods.
> To create one of these, I need a file object. I just want a stateful
> encoder, not a stream. So if I don't have a file object, how do I
> create an encoder?

Simple: use cStringIO !
> Plus, if I cannot use the functions returned from codecs.lookup in
> stateful encodings, what are they good for, anyways?

For simple stateless encodings.
> > > So I think the sentence in the documentation saying "expected to work"
> > > is an error.
> >
> > This is per design and not a mistake.
> Ok, so it is an error in the design, not only in the documentation.

Oh please...
> > If encoders/decoders (the first two items in the
> > lookup tuple) would store state, then you would have serious problems
> > when reusing them for different inputs. I'm not even talking about
> > threading problems here.
> What specific problems would you have? I.e. is there anything in the
> standard library that gets into serious problems if codecs.lookup
> returns a stateful object?

Please reread what I wrote and then think this over again... by
reusing a stateful encoder multiple times you would carry over
state from one usage to the next, e.g. carry over the shift
state from one data set to the next (which may not even use this
shift state).
> > The other two entries were designed to provide statefull codec
> > interfaces, so your JIS codec would have to use those in order
> > to store shift states etc. or do more complex work on the data.
> First, as I said, I cannot use them as-is, since I need a file.
> Furthermore, are you saying that I can use codecs.lookup(enc)[:2] only
> for some encodings, not for others? That sounds like a huge design
> flaw.

These two APIs are exposed to simplify the interface for simple,
stateless encodings. Since most encodings work just fine with
these APIs they are indeed very useful.
> > The encoder/decoder functions should only provide very basic
> > encoding/decoding facilities which do not require keeping
> > state (e.g. they might have additional keyword arguments to
> > parameterize them to work in different shift states).
> Arghh. Whether the facilities are basic or not depends on the
> encoding.
> So again I consider this broken, and the best fix is to allow the
> callable objects returned in codecs.lookup(enc)[:2] to maintain state
> if they want.
> Users must then either look them up again if they want to reuse them
> for different input, or they can recycle them if they happen to know
> that no state is maintained.

Again, this decision was per design: the codec registry lookup
mechanism caches the lookup tuples. With your proposal the cache
would be rendered useless.

Marc-Andre Lemburg
CEO eGenix.com Software GmbH
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/