[Python-Dev] Stateful codecs [Was: str object going in Py3K]

Walter Dörwald walter at livinglogic.de
Fri Feb 17 15:38:24 CET 2006


M.-A. Lemburg wrote:

> Walter Dörwald wrote:
>> Guido van Rossum wrote:
>>
>>> [...]
>>> Years ago I wrote a prototype; checkout sandbox/sio/.
>> However sio.DecodingInputFilter and sio.EncodingOutputFilter don't work
>> for encodings that need state (e.g. when reading/writing UTF-16).
>> Switching to stateful encoders/decoders isn't so easy, because the
>> stateful codecs require a stream-API, which brings in a whole bunch of
>> other functionality (readline() etc.), which we'd probably like to keep
>> separate. I have a patch (http://bugs.python.org/1101097) that should
>> fix this problem (at least for all codecs derived from
>> codecs.StreamReader/codecs.StreamWriter). Additionally it would make
>> stateful codecs more useful in the context for iterators/generators.
>>
>> I'd like this patch to go into 2.5.
> 
> The patch as-is won't go into 2.5. It's simply the wrong approach:
> StreamReaders and -Writers work on streams (hence the name). It
> doesn't make sense adding functionality to side-step this behavior,
> since it undermines the design.

I agree that using a StreamWriter without a stream somehow feels wrong.

> Like I suggested in the patch discussion, such functionality could
> be factored out of the implementations of StreamReaders/Writers
> and put into new StatefulEncoder/Decoder classes, the objects of
> which then get used by StreamReader/Writer.
> 
> In addition to that we could extend the codec registry to also
> maintain slots for the stateful encoders and decoders, if needed.

We *have* to do it like this otherwise there would be no way to get a 
StatefulEncoder/Decoder from an encoding name.

Does this mean that codecs.lookup() would have to return a 6-tuple? But 
this would break if someone uses codecs.lookup("foo")[-1]. So maybe 
codecs.lookup() should return an instance of a subclass of tuple which 
has the StatefulEncoder/Decoder as attributes. But then codecs.lookup() 
must be able to handle old 4-tuples returned by old search functions and 
update those to the new 6-tuples. (But we could drop this again after 
several releases, once all third party codecs are updated).

Bye,
    Walter Dörwald



More information about the Python-Dev mailing list