[Python-Dev] transform() and untransform() methods, and the codec registry

Fri Dec 3 10:16:04 CET 2010

On Thursday 02 December 2010 19:06:51 georg.brandl wrote:
> Author: georg.brandl
> Date: Thu Dec  2 19:06:51 2010
> New Revision: 86934
> 
> Log:
> #7475: add (un)transform method to bytes/bytearray and str, add back codecs
> that can be used with them from Python 2.

Oh no, someone did it. Was it really needed to reintroduce rot13 and friends?

I'm not strongly opposed to .transform()/.untranform() if it can be complelty 
separated to text encodings (ascii, latin9, utf-8 & cie). But str.encode() and 
bytes.decode() do accept transform codec names and raise strange error 
messages. Quote of Martin von Löwis (#7475):

"If the codecs are restored, one half of them becomes available to
.encode/.decode methods, since the codec registry cannot tell which
ones implement real character encodings, and which ones are other
conversion methods. So adding them would be really confusing."

>>> 'abc'.transform('hex')
TypeError: 'str' does not support the buffer interface
>>> b'abc'.transform('rot13')
TypeError: expected an object with the buffer interface

>>> b'abcd'.decode('hex')
TypeError: decoder did not return a str object (type=bytes)
>>> 'abc'.encode('rot13')
TypeError: encoder did not return a bytes object (type=str)

I don't like transform() and untransform() because I think that we should not 
add too much operations to the base types (bytes and str), and they do 
implicit module import. I prefer explicit module import (eg. import binascii; 
binascii.hexlify(b'to hex')). It remembers me PHP and it's ugly namespace with 
+5000 functions. I prefer Python because it uses smaller and more namespaces 
which are more specific and well defined. If we add email and compression 
functions to bytes, why not adding a web browser to the str?

Victor