
Hi, people. Happily playing with codecs (using Python 2.2.1), I found out that one should be careful about _not_ naming a module after the encoding name, when closely following the documentation in the Library Reference manual. Here is what I guess is happening. `codecs.register()' appends the search function from the new codec module at end of existing search functions. `codecs.lookup()' tries the search functions in the same order in which they were declared. Consequently, `encodings.lookup()' is tried first. If the encoding does not exist in the cache, `encodings.lookup()' tries to import a module by the name of the encoding, slightly transformed, and will indeed import the new user codec module, because that module has the name of the encoding, and is on the module search path. But now, `encodings.lookup()' expects a `getregentry' function in that module, does not find it, and raises a CodecRegistryError, not leaving a chance to subsequent codec search functions to be used. On the user side, a mere renaming the user module holding the new codec solves the problem. I'm not sure what should best be done. The documentation might be modified to explain the limitation, so other users do not trip up on it. `encoding.lookup()' might merely return None in case `getregentry' is not defined in the imported module, or else, it could make sure that it imports modules exclusively from within the `encodings' package. The best and simplest might be to lookup the code search functions in reverse order of their registration. `encoding.lookup()' would be called last instead of first. It would be easier for the user to override an encoding bundled with the Python distribution, if there is a need to do so. Because the Python Library Reference does not specify yet in which order codec search functions are tried, the order is not frozen yet and it might be easier to change it. -- François Pinard http://www.iro.umontreal.ca/~pinard

pinard@iro.umontreal.ca (François Pinard) writes:
I'm not sure what should best be done. The documentation might be modified to explain the limitation, so other users do not trip up on it. `encoding.lookup()' might merely return None in case `getregentry' is not defined in the imported module, or else, it could make sure that it imports modules exclusively from within the `encodings' package.
This is what Python 2.3, and Python 2.2.2 will do. Regards, Martin

[Martin v. Loewis]
pinard@iro.umontreal.ca (Fran.ois Pinard) writes:
I'm not sure what should best be done. 1) The documentation might be modified to explain the limitation, so other users do not trip up on it. 2) `encoding.lookup()' might merely return None in case `getregentry' is not defined in the imported module, or else, 3) it could make sure that it imports modules exclusively from within the `encodings' package.
This is what Python 2.3, and Python 2.2.2 will do.
Hi, Martin. I added "1)", "2)" and "3)" in the original text for clarity. Will Python 2.2.2 and 2.3 do "3)", or all of "1)", "2)" and "3)"? If the codec search order is not changed, how one proceeds if s/he wants to override a bundled codec, with a provided other with the same encoding name? -- François Pinard http://www.iro.umontreal.ca/~pinard

pinard@iro.umontreal.ca (François Pinard) writes:
I'm not sure what should best be done. 1) The documentation might be modified to explain the limitation, so other users do not trip up on it. 2) `encoding.lookup()' might merely return None in case `getregentry' is not defined in the imported module, or else, 3) it could make sure that it imports modules exclusively from within the `encodings' package.
This is what Python 2.3, and Python 2.2.2 will do.
Hi, Martin.
I added "1)", "2)" and "3)" in the original text for clarity. Will Python 2.2.2 and 2.3 do "3)", or all of "1)", "2)" and "3)"?
Oops, it's 2) that Python 2.3 will do. Regards, Martin
participants (2)
-
martin@v.loewis.de
-
pinard@iro.umontreal.ca