RE: [Python-Dev] Fix import errors to have data
Guido van Rossum wrote:
Then you're proposing a way for a highly knowledgable user to anticipate, and partially worm around, that Python leaves behind insane module objects in sys.modules. Wouldn't it be better to change Python to stop leaving insane module objects in sys.modules to begin with? That's harder, but seems to me it would do a lot more good for a lot more people.
+1
OK - the problem as I see it is that a given module that exists, but raises ImportError, only raises ImportError *once*, whereas it really should raise ImportError every time i.e. currently doing the following: # a.py raise ImportError # b.py try: import a except ImportError: import a print 'OK' prints when it really should raise ImportError. Additionally, a module could have messed with sys.modules directly, so just putting None in for the broken module wouldn't be sufficient (you could possibly get the broken module via another name). Perhaps the import machinery could keep a (weak?) mapping from module *object* to the ImportError it raised. If the module that would be returned by the import is in the mapping, raise the corresponding ImportError instead. The appropriate line, etc in the module would thus be shown each time the module was imported. I believe this would work properly for reload() as well. Tim Delaney
[Delaney, Timothy C (Timothy)]
OK - the problem as I see it is that a given module that exists, but raises ImportError, only raises ImportError *once*, whereas it really should raise ImportError every time
Jim's after something different, while the problem you're thinking about is more general than just ImportError. When an import of an existing module fails for *any* reason, subsequent attempts to import the same module succeed. For example, C:\Code>type a.py 1/0 C:\Code>\python23\python.exe Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
import a Traceback (most recent call last): File "<stdin>", line 1, in ? File "a.py", line 1, in ? 1/0 ZeroDivisionError: integer division or modulo by zero import a
This is Bad, because an uncaught exception in module initialization means the module probably can't fulfill its contract, yet subsequent importers get no clue that the module is in a damaged state (until the module fails to do its job, which may or may not be obvious when it occurs). A module failing to import because it suffers an ImportError itself is once instance of this, but the same applies to a module failing to import for any other reason: in all these cases, subsequent imports don't get the exception the initial importer saw, they get a module object in an arbitrarily screwed-up state.
[Tim]
When an import of an existing module fails for *any* reason, subsequent attempts to import the same module succeed. For example,
C:\Code>type a.py 1/0
C:\Code>\python23\python.exe Python 2.3.4 (#53, May 25 2004, 21:17:02) [MSC v.1200 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
import a Traceback (most recent call last): File "<stdin>", line 1, in ? File "a.py", line 1, in ? 1/0 ZeroDivisionError: integer division or modulo by zero import a
This is Bad, because an uncaught exception in module initialization means the module probably can't fulfill its contract, yet subsequent importers get no clue that the module is in a damaged state (until the module fails to do its job, which may or may not be obvious when it occurs). A module failing to import because it suffers an ImportError itself is once instance of this, but the same applies to a module failing to import for any other reason: in all these cases, subsequent imports don't get the exception the initial importer saw, they get a module object in an arbitrarily screwed-up state.
So let's try to devise new semantics to cover this and other cases. One requirement is that mutual imports must work. Another is that when "import a" fails once, it must fail again if it is retried after catching the first failure. Perhaps it is as simple as deleting the module from sys.modules when the code in import.c executing the module's body gets any kind of exception from the execution? This would seem to be a relatively small change to PyImport_ExecCodeModuleEx(): unify all the error exits and there delete the module from sys.modules. What should it do if the module already existed (e.g. when used by reload())? Strawman answer: leave it there -- the reload() semantics and common use cases are best served by that. --Guido van Rossum (home page: http://www.python.org/~guido/)
I quickly whipped up this (procastinating on my OSCON keynote slides :-): Index: import.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Python/import.c,v retrieving revision 2.233 diff -c -u -r2.233 import.c --- import.c 27 Jun 2004 16:51:46 -0000 2.233 +++ import.c 28 Jul 2004 18:57:15 -0000 @@ -572,17 +572,23 @@ { PyObject *modules = PyImport_GetModuleDict(); PyObject *m, *d, *v; + int pre_existing = 0; - m = PyImport_AddModule(name); - if (m == NULL) - return NULL; + if ((m = PyDict_GetItemString(modules, name)) != NULL && + PyModule_Check(m)) + pre_existing = 1; + else { + m = PyImport_AddModule(name); + if (m == NULL) + return NULL; + } /* If the module is being reloaded, we get the old module back and re-use its dict to exec the new code. */ d = PyModule_GetDict(m); if (PyDict_GetItemString(d, "__builtins__") == NULL) { if (PyDict_SetItemString(d, "__builtins__", PyEval_GetBuiltins()) != 0) - return NULL; + goto error; } /* Remember the filename as the __file__ attribute */ v = NULL; @@ -601,7 +607,7 @@ v = PyEval_EvalCode((PyCodeObject *)co, d, d); if (v == NULL) - return NULL; + goto error; Py_DECREF(v); if ((m = PyDict_GetItemString(modules, name)) == NULL) { @@ -614,6 +620,11 @@ Py_INCREF(m); return m; + + error: + if (!pre_existing) + PyDict_DelItemString(modules, name); /* If it fails, too bad */ + return NULL; } Superficial testing suggests it works fine, except for one problem in test_pkgimport which is testing specifically for the broken behavior. BTW, building CVS Python gives these warnings for me: /home/guido/projects/python/dist/src/Modules/cjkcodecs/_codecs_iso2022.c:120: array size missing in `designations' /home/guido/projects/python/dist/src/Modules/cjkcodecs/_codecs_iso2022.c: In function `iso2022processesc': /home/guido/projects/python/dist/src/Modules/cjkcodecs/_codecs_iso2022.c:306: warning: `esclen' might be used uninitialized in this function /home/guido/projects/python/dist/src/Modules/cjkcodecs/_codecs_iso2022.c: At top level: /home/guido/projects/python/dist/src/Modules/cjkcodecs/_codecs_iso2022.c:1053: warning: excess elements in array initializer /home/guido/projects/python/dist/src/Modules/cjkcodecs/_codecs_iso2022.c:1053: warning: (near initialization for `iso2022_kr_config.designations') /home/guido/projects/python/dist/src/Modules/cjkcodecs/_codecs_iso2022.c:1058: warning: excess elements in array initializer [and many more similar] Red Hat 7.3 using gcc 2.96. --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido, patches PyImport_ExecCodeModuleEx to remove a non-pre-existing module from sys.modules if executing the module code raises an exception] It fixes at least one spurious -r test failure on Windows. Before: $ regrtest.py -uall test_sundry test___all__ test_sundry test___all__ test test___all__ failed -- Traceback (most recent call last): File "C:\Code\python\lib\test\test___all__.py", line 150, in test_all self.check_all("rlcompleter") File "C:\Code\python\lib\test\test___all__.py", line 40, in check_all "%s has no __all__ attribute" % modname) File "C:\Code\python\lib\test\test_support.py", line 208, in verify raise TestFailed(reason) TestFailed: rlcompleter has no __all__ attribute After: $ regrtest.py -uall test_sundry test___all__ test_sundry test___all__ All 2 tests OK. It does not fix at least one other one, because a second attempt to import curses still "works" on Windows:
import curses Traceback (most recent call last): File "<stdin>", line 1, in ? File "C:\Code\python\lib\curses\__init__.py", line 15, in ? from _curses import * ImportError: No module named _curses import curses curses <module 'curses' from 'C:\Code\python\lib\curses\__init__.pyc'>
So whatever this patch does <wink>, it breaks down somehow for non-importable packages. I guess that any path thru import.c calling PyImport_AddModule() before calling PyImport_ExecCodeModuleEx() will suffer similarly (because the former convinces the latter then that the module "is pre-existing", even if it really wasn't, and the latter won't remove it from sys.modules then despite that running its code fails). It breaks this Zope3 doctest twice, for an obvious reason: def test_bad_import(): """ >>> c = config.ConfigurationContext() >>> c.resolve('zope.configuration.tests.victim.x') Traceback (most recent call last): ... ConfigurationError: Couldn't import zope.configuration.tests.victim,""" \ """ No module named bad_to_the_bone Cleanup: >>> del sys.modules['zope.configuration.tests.victim'] >>> del sys.modules['zope.configuration.tests.bad'] """ That is, this test's cleanup currently requires that insane modules get left behind. It also breaks this Zope3 test, which uses a custom import hook: ERROR: test_reporting_exception_during_import (zope.importtool.tests.test_hook.HookTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Code\python\lib\unittest.py", line 260, in run testMethod() File "C:\Code\ZopeX3\src\zope\importtool\tests\test_hook.py", line 198, in test_reporting_exception_during_import import error File "C:\Code\ZopeX3\src\zope\importtool\hook.py", line 72, in importhook v = self.previous(name, globals, locals, fromlist) KeyError: 'zope.importtool.tests.error' self.previous there appears to be bound to __builtin__.__import__. I don't understand this code, but expect it's shallow to someone who does. Overall, I'm +1, but think it needs more work to plug the "curses on Windows" kind of hole.
Attached patch builds on Guido's: - refactored - load_package() now also removes stuff from sys.modules if an error occurs and the stuff wasn't in sys.modules to begin with - patches test_pkgimport so it passes again (it was relying on junk remaining in sys.modules in one spot) - regrtest -uall passes on Windows with this patch (247 tests OK, 37 tests skipped). - oops -- fixes obsolute stuff in a docstring too (the LaTeX docs had already been corrected) - does not remove insane workarounds in .py modules (contortions trying to worm around that non-working modules were left behind by failing imports) "curses on Windows" no longer screws up. Before: $ regrtest.py -uall test_sundry test___all__ test_curses test_sundry test___all__ test test___all__ failed -- Traceback (most recent call last): File "C:\Code\python\lib\test\test___all__.py", line 150, in test_all self.check_all("rlcompleter") File "C:\Code\python\lib\test\test___all__.py", line 40, in check_all "%s has no __all__ attribute" % modname) File "C:\Code\python\lib\test\test_support.py", line 208, in verify raise TestFailed(reason) TestFailed: rlcompleter has no __all__ attribute test_curses test test_curses crashed -- exceptions.AttributeError: 'module' object has no attribute 'endwin' 1 test OK. 2 tests failed: test___all__ test_curses After: $ regrtest.py -uall test_sundry test___all__ test_curses test_sundry test___all__ test_curses test_curses skipped -- No module named _curses 2 tests OK. 1 test skipped: test_curses Those skips are all expected on win32.
[Tim]
... I guess that any path thru import.c calling PyImport_AddModule() before calling PyImport_ExecCodeModuleEx() will suffer similarly (because the former convinces the latter then that the module "is pre-existing", even if it really wasn't, and the latter won't remove it from sys.modules then despite that running its code fails).
And I posted a patch later that "fixed this" in one case involving packages. I don't know much about import details, but I believe there are other cases -- just search for PyImport_AddModule. Some look harmless, but some don't: - PyImport_ImportFrozenModule. In case of a package, creates a module object first to set its __path__, before running the code. - zipimporter_load_module (in zipimport.c). Similarly, but also sets __loader__ before running the code, package or not. - Py_InitModule4 (in modsupport.c) leaves a damaged module behind if initialization fails. "A problem" is that PyImport_ExecCodeModuleEx seems to be the only real choke point, and is making a distinction between "old" and "new" modules that doesn't always match its callers' realities. This distinction came from here: [Guido]
What should it do if the module already existed (e.g. when used by reload())? Strawman answer: leave it there -- the reload() semantics and common use cases are best served by that.
I understand the point there for reload(), but don't know of other use cases (let alone common ones <wink>) for wanting to leave a module in sys.modules when its code fails. What are they? Maybe it would be better for PyImport_ExecCodeModuleEx to purge a broken module unconditionally, and change PyImport_ReloadModule to put it back?
[Tim]
... I understand the point there for reload(), but don't know of other use cases (let alone common ones <wink>) for wanting to leave a module in sys.modules when its code fails. What are they? Maybe it would be better for PyImport_ExecCodeModuleEx to purge a broken module unconditionally, and change PyImport_ReloadModule to put it back?
Attached is a patch that takes that approach. Code feels simpler/cleaner, since it makes a special case out of reload() instead of a special case out of everything except reload(). Still haven't thought of a case other than reload() that wants special treatment. The whole test suite still passes; it still fixes all the spurious test failures I know about on Windows when running the suite with -r. The internal _RemoveModule() is incidentally more careful in this version of the patch. The last patch's new _AddModuleEx() no longer exists.
Oops, attached a wrong patch. Correct patch attached.
And one more time. - regen'ed against current CVS import.c - removes some now-needless workarounds in .py files
And one more time.
- regen'ed against current CVS import.c - removes some now-needless workarounds in .py files
Looks good. I say, check it in. Any thoughts on where this affects docs? --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido]
Looks good. I say, check it in.
I can't argue with that <wink>.
Any thoughts on where this affects docs?
The only clear places I find are NEWS and C API's PyImport_ExecCodeModule docs. The latter doesn't mention sys.modules now, so I'll add stuff and note 2.4 changes.
participants (3)
-
Delaney, Timothy C (Timothy)
-
Guido van Rossum
-
Tim Peters