[Python-Dev] Add transform() and untranform() methods

Nick Coghlan ncoghlan at gmail.com
Fri Nov 15 14:50:23 CET 2013


On 15 November 2013 22:24, Paul Moore <p.f.moore at gmail.com> wrote:
> On 15 November 2013 12:07, Victor Stinner <victor.stinner at gmail.com> wrote:
>>> A new API for binary transforms is potentially an academically
>>> interesting concept, but it solves zero current real world problems.
>>
>> I would like to reply the same for these codecs: they are not solving
>> any real world problem :-)
>
> As Nick is only documenting long-existing functions, I fail to see the
> issue here.
>
> If someone were to propose new methods, builtins, or module functions,
> then I could see a reason for debate. But surely simply documenting
> existing functions is not worth all this pushback?

There's a bit more to it than that (and that's why I started the other
thread about the codec aliases before proceeding to the final step).

One of the changes Victor is concerned about is that when you use an
incorrect codec in one of the Unicode-encoding-only convenience
methods, the recent exception updates explicitly push users towards
using those module level functions instead:

>>> import codecs
>>> "no good".encode("rot_13")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'rot_13' encoder returned 'str' instead of 'bytes'; use
codecs.encode() to encode to arbitrary types
>>> codecs.encode("just fine", "rot_13")
'whfg svar'

>>> b"no good".decode("quopri_codec")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'quopri_codec' decoder returned 'bytes' instead of 'str';
use codecs.decode() to decode to arbitrary types
>>> codecs.decode(b"just fine", "quopri_codec")
b'just fine'

My perspective is that, in current Python, that *is* the right thing
for people to do, and any hypothetical new API proposed for Python 3.5
would do nothing to change what's right for Python 3.4 code (or Python
2/3 compatible code). I also find it bizarre that several of those
arguing that this is too niche a feature to be worth refining are
simultaneously in favour of a proposal to add new *methods on builtin
types* for the same niche feature.

The other part is the fact that I updated the What's New document to
highlight these tweaks:
http://docs.python.org/dev/whatsnew/3.4.html#improvements-to-handling-of-non-unicode-codecs

As noted earlier in the thread, Armin Ronacher has been the most vocal
of the users of this feature in Python 2 that lamented it's absence in
Python 3 (see, for example,
http://lucumr.pocoo.org/2012/8/11/codec-confusion/), but I've also
received plenty of subsequent feedback along the lines of "what he
said!" (such as http://bugs.python.org/issue7475#msg187630).

Many of the proposed solutions from the people affected by the change
haven't been usable (since they've often been based on a
misunderstanding of why the method behaviour changed in Python 3 in
the first place), but the pain they experience is genuine, and it can
unnecessarily sour their whole experience of the transition. I
consider documenting the existing module level functions and nudging
users towards them when they try to use the affected codecs to be an
expedient way to say "yes, this is still available if you really want
to use it, but the required spelling is different".

However, the one thing I'm *not* going to do at this point is restore
the shorthand aliases, so those opposing the lowering of this barrier
to transition can take comfort in the fact they have succeeded in
ensuring that the out-of-the-box experience for users of this feature
migrating from Python 2 remains the unfriendly:

>>> b"abcdef".decode("hex")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: hex

Rather than the more useful:

>>> b"abcdef".decode("hex")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'hex' decoder returned 'bytes' instead of 'str'; use
codecs.decode() to decode to arbitrary types

Which would then lead them to the working (and still Python 2 compatible) code:

>>> codecs.decode(b"abcdef", "hex")
b'\xab\xcd\xef'

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list