[Python-Dev] Codecs and StreamCodecs

Guido van Rossum guido@CNRI.Reston.VA.US
Tue, 16 Nov 1999 13:09:43 -0500


> > I have some questions about the constructor.  You seem to imply
> > that instantiating the class without arguments creates a codec without
> > state.  That's fine.  When given a stream argument, shouldn't the
> > direction of the stream be given as an additional argument, so the
> > proper state for encoding or decoding can be set up?  I can see that
> > for an implementation it might be more convenient to have separate
> > classes for encoders and decoders -- certainly the state being kept is
> > very different.
> 
> Wouldn't it be possible to have the read/write methods set up
> the state when called for the first time ?

Hm, I'd rather be explicit.  We don't do this for files either.

> Note that I wrote ".read() and/or .write() methods" in the proposal
> on purpose: you can of course implement Codecs which only implement
> one of them, i.e. Readers and Writers. The registry doesn't care
> about them anyway :-)
> 
> Then, if you use a Reader for writing, it will result in an
> AttributeError...
>  
> > Also, I don't want to ignore the alternative interface that was
> > suggested by /F.  It uses feed() similar to htmllib c.s.  This has
> > some advantages (although we might want to define some compatibility
> > so it can also feed directly into a file).
> 
> AFAIK, .feed() and .finalize() (or .close() etc.) have a different
> backgound: you add data in chunks and then process it at some
> final stage rather than for each feed. This is often more
> efficient.
> 
> With respest to codecs this would mean, that you buffer the
> output in memory, first doing only preliminary operations on
> the feeds and then apply some final logic to the buffer at
> the time .finalize() is called.

This is part of the purpose, yes.

> We could define a StreamCodec subclass for this kind of operation.

The difference is that to decode from a file, your proposed interface
is to call read() on the codec which will in turn call read() on the
stream.  In /F's version, I call read() on the stream (geting multibyte
encoded data), feed() that to the codec, which in turn calls feed() to
some other back end -- perhaps another codec which in turn feed()s its
converted data to another file, perhaps an XML parser.

--Guido van Rossum (home page: http://www.python.org/~guido/)