[Python-Dev] Add transform() and untranform() methods

Nick Coghlan ncoghlan at gmail.com
Fri Nov 15 00:03:37 CET 2013

On 15 Nov 2013 08:34, "Victor Stinner" <victor.stinner at gmail.com> wrote:
> Hi,
> I saw that Nick Coghlan documented codecs.encode() and
> codecs.decode(), and changed the exception raised when codecs like
> rot_13 are used on bytes.decode() and str.encode().
> I don't like the functions codecs.encode() and codecs.decode() because
> the type of the result depends on the encoding (second parameter). We
> try to avoid this in Python.

The type signature of those functions is just object -> object (Similar to
the way the 2.x convenience methods were actually basestring -> basestring).

> I would prefer to split the registry of codecs to have 3 registries:
> - "encoding" (a better name can found): encode str=>bytes, decode
> - bytes: encode bytes=>bytes, decode bytes=>bytes
> - str:  encode str=>str, decode str=>str

You have to get it out of your head that codecs are just about text and and
binary data. They're not: they're arbitrary type transforms, and MAL
deliberately wrote the module that way.

> And add transform() and untransform() methods to bytes and str types.
> In practice, it might be same codecs registry for all codecs just with
> a new attribute.

This is completely the wrong approach. There's zero justification for
adding new builtin methods for this use case - encoding and decoding are
generic operations, they should use functions not methods.

What could be useful is allowing CodecInfo objects to supply an "expected
input type" and an "expected output type" (ABCs and instance check
overrides make that quite flexible).

> Examples:
> - utf8: encoding
> - zlib: bytes
> - rot13: str
> The result type of bytes.transform/untransform would be bytes, and the
> result type of str.transform/untransform would be str.
> I don't know which exception should be raised when a codec is used in
> the wrong method. LookupError? TypeError "codec xxx cannot be used
> with method xxx.xx"? Something else?

We already do this check in the existing convenience methods - it raises

> codecs.encode/decode() documentation should be removed. The functions
> should be kept, just in case if someone uses them.

No. They're part of the regression test suite, and have been since Python
2.4. They embody MAL's intended "arbitrary type transform library"
approach. They provide a source compatible mechanism for using binary
codecs in single code base Python 2/3 projects.

At this point, the only person that can get me to revert this clarification
of MAL's original vision for the codecs module is Guido, since anything
else completely fails to address the Python 3 adoption barrier posed by the
current state of Python 3's binary codec support.

Note that the only behavioural changes in the commits so far were to
exception handling - everything else was just docs.

The next planned commit (to restore the binary codec aliases) *is* a
behavioural change - that's why I posted to the list about it (it received
only two responses, both +1)


> Victor
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20131115/a5920dad/attachment.html>

More information about the Python-Dev mailing list