[Python-Dev] Codecs and StreamCodecs

Guido van Rossum guido@CNRI.Reston.VA.US
Tue, 16 Nov 1999 11:20:28 -0500


> It is not required by the unicodec.register() API to provide a
> subclass of these base class, only the given methods must be present;
> this allows writing Codecs as extensions types.  All Codecs must
> provide the .encode()/.decode() methods. Codecs having the .read()
> and/or .write() methods are considered to be StreamCodecs.
> 
> The Unicode implementation will by itself only use the
> stateless .encode() and .decode() methods.
> 
> All other conversion have to be done by explicitly instantiating
> the appropriate [Stream]Codec.

Looks okay, although I'd like someone to implement a simple
shift-state-based stream codec to check this out further.

I have some questions about the constructor.  You seem to imply
that instantiating the class without arguments creates a codec without
state.  That's fine.  When given a stream argument, shouldn't the
direction of the stream be given as an additional argument, so the
proper state for encoding or decoding can be set up?  I can see that
for an implementation it might be more convenient to have separate
classes for encoders and decoders -- certainly the state being kept is
very different.

Also, I don't want to ignore the alternative interface that was
suggested by /F.  It uses feed() similar to htmllib c.s.  This has
some advantages (although we might want to define some compatibility
so it can also feed directly into a file).

Perhaps someone should go ahead and implement prototype codecs using
either paradigm and then write some simple apps, so we can make a
better decision.

In any case I think the specs codec registry API aren't on the
critical path, integration of /F's basic unicode object is the first
thing we need.

--Guido van Rossum (home page: http://www.python.org/~guido/)