[issue5902] Stricter codec names

Thu Feb 24 05:00:54 CET 2011

Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:

Ezio and I discussed on IRC the implementation of alias lookup and neither of us was able to point out to the function that strips non-alphanumeric characters from encoding names.

It turns out that there are three "normalize" functions that are successively applied to the encoding name during evaluation of str.encode/str.decode.

1. normalize_encoding() in unicodeobject.c
2. normalizestring() in codecs.c
3. normalize_encoding() in encodings/__init__.py

Each performs a slightly different transformation and only the last one strips non-alphanumeric characters.

The complexity of codec lookup is comparable with that of the import mechanism!

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5902>
_______________________________________