[Python-Dev] Some thoughts on the codecs...

Andy Robinson andy@robanal.demon.co.uk
Tue, 16 Nov 1999 00:09:28 GMT


On Mon, 15 Nov 1999 23:54:38 +0100, you wrote:

>[I'll get back on this tomorrow, just some quick notes here...]
>The Codecs provide implementations for encoding and decoding,
>they are not intended as complete wrappers for e.g. files or
>sockets.
>
>The unicodec module will define a generic stream wrapper
>(which is yet to be defined) for dealing with files, sockets,
>etc. It will use the codec registry to do the actual codec
>work.
> 
>XXX unicodec.file(<filename>,<mode>,<encname>) could be provided as
>    short-hand for unicodec.file(open(<filename>,<mode>),<encname>) which
>    also assures that <mode> contains the 'b' character when needed.
>
>The Codec interface defines two pairs of methods
>on purpose: one which works internally (ie. directly between
>strings and Unicode objects), and one which works externally
>(directly between a stream and Unicode objects).

That's the problem Guido and I are worried about.  Your present API is
not enough to build stream encoders.  The 'slurp it into a unicode
string in one go' approach fails for big files or for network
connections.  And you just cannot build a generic stream reader/writer
by slicing it into strings.   The solution must be specific to the
codec - only it knows how much to buffer, when to flip states etc.  

So the codec should provide proper stream reading and writing
services.  

Unicodec can then wrap those up in labour-saving ways - I'm not fussy
which but I like the one-line file-open utility.


- Andy