[Python-Dev] Codecs and StreamCodecs

Tue, 16 Nov 1999 18:53:46 +0100

Guido van Rossum wrote:
> 
> > It is not required by the unicodec.register() API to provide a
> > subclass of these base class, only the given methods must be present;
> > this allows writing Codecs as extensions types.  All Codecs must
> > provide the .encode()/.decode() methods. Codecs having the .read()
> > and/or .write() methods are considered to be StreamCodecs.
> >
> > The Unicode implementation will by itself only use the
> > stateless .encode() and .decode() methods.
> >
> > All other conversion have to be done by explicitly instantiating
> > the appropriate [Stream]Codec.
> 
> Looks okay, although I'd like someone to implement a simple
> shift-state-based stream codec to check this out further.
> 
> I have some questions about the constructor.  You seem to imply
> that instantiating the class without arguments creates a codec without
> state.  That's fine.  When given a stream argument, shouldn't the
> direction of the stream be given as an additional argument, so the
> proper state for encoding or decoding can be set up?  I can see that
> for an implementation it might be more convenient to have separate
> classes for encoders and decoders -- certainly the state being kept is
> very different.

Wouldn't it be possible to have the read/write methods set up
the state when called for the first time ?

Note that I wrote ".read() and/or .write() methods" in the proposal
on purpose: you can of course implement Codecs which only implement
one of them, i.e. Readers and Writers. The registry doesn't care
about them anyway :-)

Then, if you use a Reader for writing, it will result in an
AttributeError...

> Also, I don't want to ignore the alternative interface that was
> suggested by /F.  It uses feed() similar to htmllib c.s.  This has
> some advantages (although we might want to define some compatibility
> so it can also feed directly into a file).

AFAIK, .feed() and .finalize() (or .close() etc.) have a different
backgound: you add data in chunks and then process it at some
final stage rather than for each feed. This is often more
efficient.

With respest to codecs this would mean, that you buffer the
output in memory, first doing only preliminary operations on
the feeds and then apply some final logic to the buffer at
the time .finalize() is called.

We could define a StreamCodec subclass for this kind of operation.

> Perhaps someone should go ahead and implement prototype codecs using
> either paradigm and then write some simple apps, so we can make a
> better decision.
> 
> In any case I think the specs codec registry API aren't on the
> critical path, integration of /F's basic unicode object is the first
> thing we need.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                    45 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/