[Python-Dev] transform() and untransform() methods, and the codec registry

R. David Murray rdmurray at bitdance.com
Fri Dec 3 16:11:29 CET 2010

On Fri, 03 Dec 2010 10:16:04 +0100, Victor Stinner <victor.stinner at haypocalc.com> wrote:
> On Thursday 02 December 2010 19:06:51 georg.brandl wrote:
> > Author: georg.brandl
> > Date: Thu Dec  2 19:06:51 2010
> > New Revision: 86934
> >
> > Log:
> > #7475: add (un)transform method to bytes/bytearray and str, add back codecs
> > that can be used with them from Python 2.
> Oh no, someone did it. Was it really needed to reintroduce rot13 and friends?
> I'm not strongly opposed to .transform()/.untranform() if it can be complelty
> separated to text encodings (ascii, latin9, utf-8 & cie). But str.encode() and
> bytes.decode() do accept transform codec names and raise strange error
> messages. Quote of Martin von Löwis (#7475):
> "If the codecs are restored, one half of them becomes available to
> .encode/.decode methods, since the codec registry cannot tell which
> ones implement real character encodings, and which ones are other
> conversion methods. So adding them would be really confusing."
> >>> 'abc'.transform('hex')
> TypeError: 'str' does not support the buffer interface
> >>> b'abc'.transform('rot13')
> TypeError: expected an object with the buffer interface

I find these 'buffer interface' error messages to be the most confusing
error message I get out of Python3 no matter what context they show up
in.  I have no idea what they are telling me.  That issue is more
general than transform/untransform, but perhaps it could be fixed
for transform/untransform in particular.

> >>> b'abcd'.decode('hex')
> TypeError: decoder did not return a str object (type=bytes)
> >>> 'abc'.encode('rot13')
> TypeError: encoder did not return a bytes object (type=str)

These error messages make perfect sense to me.  I think it
is called "duck typing" :)

> I don't like transform() and untransform() because I think that we should not
> add too much operations to the base types (bytes and str), and they do
> implicit module import. I prefer explicit module import (eg. import binascii;
> binascii.hexlify(b'to hex')). It remembers me PHP and it's ugly namespace with
> +5000 functions. I prefer Python because it uses smaller and more namespaces
> which are more specific and well defined. If we add email and compression
> functions to bytes, why not adding a web browser to the str?

As MAL says, the codec machinery is a general purpose tool.  I think
it, and the transform methods, are a useful level of abstraction over
a general class of problems.

Please also recall that transform/untransform was discussed before
the release of Python 3.0 and was approved at the time, but it just
did not get implemented before the 3.0 release.

R. David Murray                                      www.bitdance.com

More information about the Python-Dev mailing list