[Python-Dev] Restoring the aliases for the non-Unicode codecs

Nick Coghlan ncoghlan at gmail.com
Wed Nov 13 15:29:51 CET 2013


Back in Python 3.2, the non-Unicode codecs were restored to the
standard library, but without the associated aliases (mostly due to
some thoroughly confusing error messages when they were mistakenly
used with the Unicode encoding convenience methods).

The long gory history in http://bugs.python.org/issue7475 took a
different turn earlier this year when I noticed
(http://bugs.python.org/issue7475#msg187698) that the codecs module
already *had* type neutral helper functions in the form of
codecs.encode and codecs.decode, and has had them since Python 2.4.
These were covered in the test suite, but not in the documentation.

That realisation substantially changed my perspective on the issue,
since it was no longer a matter of adding a new API, but of better
documenting and facilitating the use of one we already had (and was
supported all the way back to Python 2.4, make it easy to use in
single-source Python 2/3 projects as well). Since then, three key
supporting issues have been addressed for Python 3.4:

* codecs.encode() and codecs.decode() have been documented in 2.7, 3.3
and default (http://bugs.python.org/issue17827)
* codec errors have been updated to incorporate the name of the codec
whenever feasible, and output type errors in Unicode encoding
convenience methods refer uses to the type neutral object<->object
convenience functions (http://bugs.python.org/issue17839)
* the especially confusing errors from the base64 codecs have been
eliminated by updating the base64 module to use memoryview to process
binary input rather than explicit typechecks on builtin types
(http://bugs.python.org/issue17839)

So, with those underlying issues resolved, I would now like to restore
the aliases for the non-Unicode codecs that were removed in
http://bugs.python.org/issue10807 (aliases) and
http://bugs.python.org/issue17841 (docs).

I also looked into the possibility of providing appropriate 2to3
fixers, but the data driven nature of the problem makes that
impractical. However, the presence of codecs.decode and codecs.encode
in Python 2.4+ makes a new Py3k warning in 2.7.7 a viable option:
http://bugs.python.org/issue19543

Regards,
Nick.

P.S. Until the next docs rebuild, you can see a summary of the
non-Unicode codec handling improvements in the What's New diff here:
http://hg.python.org/cpython/rev/854a2cea31b9


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list