[Python-Dev] Stateful codecs [Was: str object going in Py3K]

M.-A. Lemburg mal at egenix.com
Fri Feb 17 16:04:09 CET 2006

Walter Dörwald wrote:
> M.-A. Lemburg wrote:
>> Walter Dörwald wrote:
>>> Guido van Rossum wrote:
>>>> [...]
>>>> Years ago I wrote a prototype; checkout sandbox/sio/.
>>> However sio.DecodingInputFilter and sio.EncodingOutputFilter don't work
>>> for encodings that need state (e.g. when reading/writing UTF-16).
>>> Switching to stateful encoders/decoders isn't so easy, because the
>>> stateful codecs require a stream-API, which brings in a whole bunch of
>>> other functionality (readline() etc.), which we'd probably like to keep
>>> separate. I have a patch (http://bugs.python.org/1101097) that should
>>> fix this problem (at least for all codecs derived from
>>> codecs.StreamReader/codecs.StreamWriter). Additionally it would make
>>> stateful codecs more useful in the context for iterators/generators.
>>> I'd like this patch to go into 2.5.
>> The patch as-is won't go into 2.5. It's simply the wrong approach:
>> StreamReaders and -Writers work on streams (hence the name). It
>> doesn't make sense adding functionality to side-step this behavior,
>> since it undermines the design.
> I agree that using a StreamWriter without a stream somehow feels wrong.
>> Like I suggested in the patch discussion, such functionality could
>> be factored out of the implementations of StreamReaders/Writers
>> and put into new StatefulEncoder/Decoder classes, the objects of
>> which then get used by StreamReader/Writer.
>> In addition to that we could extend the codec registry to also
>> maintain slots for the stateful encoders and decoders, if needed.
> We *have* to do it like this otherwise there would be no way to get a
> StatefulEncoder/Decoder from an encoding name.
> Does this mean that codecs.lookup() would have to return a 6-tuple? 
> But this would break if someone uses codecs.lookup("foo")[-1].

Right; though I'd much rather see that people use the direct
codecs module lookup APIs:

getencoder(), getdecoder(), getreader() and getwriter()

instead of using codecs.lookup() directly.

> So maybe
> codecs.lookup() should return an instance of a subclass of tuple which
> has the StatefulEncoder/Decoder as attributes. But then codecs.lookup()
> must be able to handle old 4-tuples returned by old search functions and
> update those to the new 6-tuples. (But we could drop this again after
> several releases, once all third party codecs are updated).

This was a design error: I should have not made
codecs.lookup() a documented function.

I'd suggest we keep codecs.lookup() the way it is and
instead add new functions to the codecs module, e.g.
codecs.getencoderobject() and codecs.getdecoderobject().

Changing the codec registration is not much of a problem:
we could simply allow 6-tuples to be passed into the

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Feb 17 2006)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

More information about the Python-Dev mailing list