[issue5902] Stricter codec names
Alexander Belopolsky
report at bugs.python.org
Thu Feb 24 05:00:54 CET 2011
Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:
Ezio and I discussed on IRC the implementation of alias lookup and neither of us was able to point out to the function that strips non-alphanumeric characters from encoding names.
It turns out that there are three "normalize" functions that are successively applied to the encoding name during evaluation of str.encode/str.decode.
1. normalize_encoding() in unicodeobject.c
2. normalizestring() in codecs.c
3. normalize_encoding() in encodings/__init__.py
Each performs a slightly different transformation and only the last one strips non-alphanumeric characters.
The complexity of codec lookup is comparable with that of the import mechanism!
----------
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5902>
_______________________________________
More information about the Python-bugs-list
mailing list