[Python-Dev] Stateful codecs [Was: str object going in Py3K]

Fri Feb 17 22:11:35 CET 2006

Walter Dörwald wrote:
>>>> I'd suggest we keep codecs.lookup() the way it is and
>>>> instead add new functions to the codecs module, e.g.
>>>> codecs.getencoderobject() and codecs.getdecoderobject().
>>>>
>>>> Changing the codec registration is not much of a problem:
>>>> we could simply allow 6-tuples to be passed into the
>>>> registry.
>>> OK, so codecs.lookup() returns 4-tuples, but the registry stores
>>> 6-tuples and the search functions must return 6-tuples. And we add
>>> codecs.getencoderobject() and codecs.getdecoderobject() as well as new
>>> classes codecs.StatefulEncoder and codecs.StatefulDecoder. What about
>>> old search functions that return 4-tuples?
>>
>> The registry should then simply set the missing entries to None
>> and the getencoderobject()/getdecoderobject() would then have
>> to raise an error.
> 
> Sounds simple enough and we don't loose backwards compatibility.
> 
>> Perhaps we should also deprecate codecs.lookup() in Py 2.5 ?!
> 
> +1, but I'd like to have a replacement for this, i.e. a function that
> returns all info the registry has about an encoding:
> 
> 1. Name
> 2. Encoder function
> 3. Decoder function
> 4. Stateful encoder factory
> 5. Stateful decoder factory
> 6. Stream writer factory
> 7. Stream reader factory
> 
> and if this is an object with attributes, we won't have any problems if
> we extend it in the future.

Shouldn't be a problem: just expose the registry dictionary
via the _codecs module.

The rest can then be done in a Python function defined in
codecs.py using a CodecInfo class.

> BTW, if we change the API, can we fix the return value of the stateless
> functions? As the stateless function always encodes/decodes the complete
> string, returning the length of the string doesn't make sense.
> codecs.getencoder() and codecs.getdecoder() would have to continue to
> return the old variant of the functions, but
> codecs.getinfo("latin-1").encoder would be the new encoding function.

No: you can still write stateless encoders or decoders that do
not process the whole input string. Just because we don't have
any of those in Python, doesn't mean that they can't be written
and used. A stateless codec might want to leave the work
of buffering bytes at the end of the input data which cannot
be processed to the caller. It is also possible to write
stateful codecs on top of such stateless encoding and decoding
functions.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 17 2006)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::