[Python-Dev] Codecs lookup order
Mon, 09 Sep 2002 09:59:10 -0400
Happily playing with codecs (using Python 2.2.1), I found out that one should
be careful about _not_ naming a module after the encoding name, when closely
following the documentation in the Library Reference manual.
Here is what I guess is happening. `codecs.register()' appends the search
function from the new codec module at end of existing search functions.
`codecs.lookup()' tries the search functions in the same order in which they
were declared. Consequently, `encodings.lookup()' is tried first.
If the encoding does not exist in the cache, `encodings.lookup()' tries to
import a module by the name of the encoding, slightly transformed, and will
indeed import the new user codec module, because that module has the name of
the encoding, and is on the module search path.
But now, `encodings.lookup()' expects a `getregentry' function in that module,
does not find it, and raises a CodecRegistryError, not leaving a chance to
subsequent codec search functions to be used. On the user side, a mere
renaming the user module holding the new codec solves the problem.
I'm not sure what should best be done. The documentation might be modified to
explain the limitation, so other users do not trip up on it.
`encoding.lookup()' might merely return None in case `getregentry' is not
defined in the imported module, or else, it could make sure that it imports
modules exclusively from within the `encodings' package.
The best and simplest might be to lookup the code search functions in reverse
order of their registration. `encoding.lookup()' would be called last instead
of first. It would be easier for the user to override an encoding bundled
with the Python distribution, if there is a need to do so. Because the Python
Library Reference does not specify yet in which order codec search functions
are tried, the order is not frozen yet and it might be easier to change it.
François Pinard http://www.iro.umontreal.ca/~pinard