[py-dev] py.path import hook for getpymodule()
Hi all, I tried to implement the idea Holger and me discussed some time ago on IRC: the method getpymodule() of py.path objects, formerly known as getmodule(), should be smarter about importing modules that contain relative imports, i.e. that import other modules from the same directory. This requires a custom import hook, because we might be talking about a py.path which is not a local path. For example, in an Subversion repository, I have two files: x.py: def x1(): return 5 test_x.py: import x def test_x1(): assert x.x1() == 5 The goal is to make this work: mod = py.path.svnurl('svn+ssh://.../test_x.py').getpymodule() mod.test_x1() This is done by putting in the __file__ attribute of the 'test_x' module a subclass of str, as suggested by Holger, which remembers the py.path. A custom import hook detects the "import" statements coming from modules with such a __file__, and tries to resolve the import locally, relative to the py.path attached to the __file__. If it succeeds, it calls getpymodule() again to import the new module. Otherwise, it falls back to the standard import hook. The patch is attached for review (not checked in right now; depends if we want this kind of hack or not). Note that it doesn't work if the modules being imported hack around sys.path themselves. It works, though, if the hack is being performed by py.magic.autopath(): if called from a module with a special __file__, autopath() will not actually hack sys.path but just record the information it deduced on the path object. Then the custom import hook notices this information... Not too clean. It does allow this kind of stuff to work, though: stuff/__init__.py: # empty stuff/x.py: def x1(): return 5 stuff/test_x.py: import py; py.magic.autopath() import stuff.x # or "from stuff import x", which is harder def test_x1(): assert stuff.x.x1() == 5 Finally, note that (I'm not sure but I believe that) this custom import hook would not be needed with the new hooks of Python 2.3, which would allow a similar result in a cleaner way. Of course that would be 2.3-specific. A bientôt, Armin
Hi Armin! first of all, thanks for having the courage to tackle improvents on the import system. It's a complicated thing with a large "no-fun" risk. [Armin Rigo Mon, Nov 08, 2004 at 06:26:41PM +0000]
... description of remote imports from e.g. subversion ...
This is done by putting in the __file__ attribute of the 'test_x' module a subclass of str, as suggested by Holger, which remembers the py.path. A custom import hook detects the "import" statements coming from modules with such a __file__, and tries to resolve the import locally, relative to the py.path attached to the __file__. If it succeeds, it calls getpymodule() again to import the new module. Otherwise, it falls back to the standard import hook.
My worry is that "mixing" the standard and the custom import hook may lead to duplicate imports. The underlying reason seems is that the standard hook uses "dotted module paths" as keys for the sys.modules cache whereas we use py.path's as keys. This may result in importing a certain file twice and hilarity ensues. So how could we avoid double imports? I am not quite sure. In the end, and IIRC you suggested something like this, we might be able to reimplement enough of Python's import logic which we need to do anyway for PyPy (and have done to a certain degree already). At the extreme end we would not invoke the original import hook at all. But this may be too disruptive as we would break compatibility with other custom import hooks. In the alternative, we may want to build a rather complete mapping of __file__->modules for all successfully imported modules, no matter if they were imported by our or by the standard or even some other import hook. We intercept all import statements anyway and could add successfully imported modules to the mapping. Of course __import__('a.b.c', ...) could implicitely import a chain of modules but after the standard import hook succeeds our custom one could look at 'a', 'a.b', 'a.b.c' and put their __file__s into the __file__->module mapping so that we don't import the thing again when called with its direct filesystem location. Like i said, i am not completly sure about all this but i think a general __file__->module mapping might work well. At least it does from a theoretical viewpoint. Btw, note how "inspect.getmodule(obj)" resorts to building such a mapping, too. And it doesn't even consider that there may be multiple modules pointing to the same file!
The patch is attached for review (not checked in right now; depends if we want this kind of hack or not). Note that it doesn't work if the modules being imported hack around sys.path themselves. It works, though, if the hack is being performed by py.magic.autopath():
Yes, and manipulating sys.path directly often leads to incompatibility with e.g. zip-imports, anyway. So apart from sys.argv[0] - hacks i only see interesting use cases when you want to insert paths relative to the package directory or your current location. For this, we might want to allow something like extrapath = __file__.pkgdir.join('..', '..', 'something') __file__.searchpaths.append(extrapath) Btw, i think it's fine to just put our extra attributes like 'pkgdir' and 'searchpaths' directly on the string sub-instance instead of using magic __names__. Overall the patch looks good and it's nicely small but of course a couple of tests are missing :-) I'd love to have the double-import problem addressed somehow, though.
if called from a module with a special __file__, autopath() will not actually hack sys.path but just record the information it deduced on the path object. Then the custom import hook notices this information... Not too clean. It does allow this kind of stuff to work, though:
stuff/__init__.py: # empty
stuff/x.py: def x1(): return 5
stuff/test_x.py: import py; py.magic.autopath() import stuff.x # or "from stuff import x", which is harder def test_x1(): assert stuff.x.x1() == 5
This makes sense to me. It is indeed not completly clean but i don't see how you easily can get much cleaner than that.
Finally, note that (I'm not sure but I believe that) this custom import hook would not be needed with the new hooks of Python 2.3, which would allow a similar result in a cleaner way. Of course that would be 2.3-specific.
And i only believe it when i see it :-) ciao, holger
Hi Holger, On Tue, Nov 09, 2004 at 12:17:43AM +0100, holger krekel wrote:
So how could we avoid double imports? (...)
Like i said, i am not completly sure about all this but i think a general __file__->module mapping might work well. At least it does from a theoretical viewpoint.
I'm afraid it works in one direction only: it would prevent getpymodule() from re-importing a module already imported by CPython. We'll probably need some additional magic to prevent CPython from importing a module already seen by getpymodule(). It looks quite difficult, given that Python only cares about dotted names, and not whether, say, two dotted names (like 'x' and 'package.x') refer to the same file relative to two different entries of sys.path. For this reason I was thinking that it might be enough to document getpymodule() as completely separate from Python's sys.modules (and not use sys.modules as a cache but a custom dict). The user would have to be aware of that. For example, for py.test, it's fine (and probably better, actually): all the test mecanisms and standard library modules are "on the sys.modules side", and all the tests themselves are "on the getpymodule() side". It might be enough to add a few sanity checks and warn if the same __file__ appears to be imported on both "sides". Does it look reasonable? Armin
Hi Armin, [Armin Rigo Tue, Nov 09, 2004 at 09:33:42AM +0000]
On Tue, Nov 09, 2004 at 12:17:43AM +0100, holger krekel wrote:
So how could we avoid double imports? (...)
Like i said, i am not completly sure about all this but i think a general __file__->module mapping might work well. At least it does from a theoretical viewpoint.
I'm afraid it works in one direction only: it would prevent getpymodule() from re-importing a module already imported by CPython.
Yes.
We'll probably need some additional magic to prevent CPython from importing a module already seen by getpymodule(). It looks quite difficult, given that Python only cares about dotted names, and not whether, say, two dotted names (like 'x' and 'package.x') refer to the same file relative to two different entries of sys.path.
yes, unfortunately.
For this reason I was thinking that it might be enough to document getpymodule() as completely separate from Python's sys.modules (and not use sys.modules as a cache but a custom dict). The user would have to be aware of that. For example, for py.test, it's fine (and probably better, actually): all the test mecanisms and standard library modules are "on the sys.modules side", and all the tests themselves are "on the getpymodule() side".
Hum. But the py lib implementation files could be imported over getpymodule(), right? Right now, py/initpkg.py performs this task for bootstrapping reasons but it could at least cooperate with getpymodule(). IOW, I'd actually like to apply getpymodule() semantics to a whole package, e.g. the py lib itself. For such cases we might even refuse to defer to the standard import mechanism. i.e. when one does mypkg/__init__.py from py.initpkg import initpkg initpkg.initpkg(__name__, {...}) then we would eventually refuse to fall back to the cpython standard import hook for any import below 'mypkg.*'. Also, we could sanity-check that there is no CPython-imported module below the according dirpath of the init's __file__. This way we can make sure that a package using the initpkg() mechanism avoids duplicate imports alltogether. Of course, if you don't apply initpkg() then your above view of separation between tests and the rest of your package should still be true.
It might be enough to add a few sanity checks and warn if the same __file__ appears to be imported on both "sides".
Btw, even if such checks would be somewhat expensive (like iterating through the whole of sys.modules) we might want to enable them for the next months to detect problems early.
Does it look reasonable?
Yes, although the above unification of initpkg() and getpymodule() remains for the future if you can agree to that. I do think that it can give us the best of both worlds in a separatable way. We always wanted to have py.path based imports and this gives us the possibility to care about that without interfering with the rest of the imports. Does that make sense to you? holger
Hi Holger, On Tue, Nov 09, 2004 at 12:14:20PM +0100, holger krekel wrote:
IOW, I'd actually like to apply getpymodule() semantics to a whole package, e.g. the py lib itself. For such cases we might even refuse to defer to the standard import mechanism. i.e. when one does
mypkg/__init__.py from py.initpkg import initpkg initpkg.initpkg(__name__, {...})
then we would eventually refuse to fall back to the cpython standard import hook for any import below 'mypkg.*'.
Sounds pretty much reasonable.
Of course, if you don't apply initpkg() then your above view of separation between tests and the rest of your package should still be true.
Actually, in that view, the test files would contain import statements to import the package they test; such import statements would then be getpymodule()ed as well, so that the current result with py.test is that the py lib and stdlib are loaded using CPython's import and all the tested user code is loaded with getpymodule(), which is nice too. Allowing the py lib itself to be loaded with getpymodule() too is a separate issue, though it would be nice as well. Armin
Hi Armin, [Armin Rigo Tue, Nov 09, 2004 at 01:47:29PM +0000]
On Tue, Nov 09, 2004 at 12:14:20PM +0100, holger krekel wrote:
Of course, if you don't apply initpkg() then your above view of separation between tests and the rest of your package should still be true.
Actually, in that view, the test files would contain import statements to import the package they test; such import statements would then be getpymodule()ed as well, ...
Hehe, that's a funny but maybe also a dangerous or at least fragile side effect. I realize that you are already trying to handle absolute imports if they relate to the pkgdir. Btw, i think your patch needs to look into pkgdir.dirpath() because pkgdir itself really is the directory of the package and not the directory containing the package. But you would have catched that with an appropriate test :-) holger
Hi Holger, On Tue, Nov 09, 2004 at 03:19:00PM +0100, holger krekel wrote:
think your patch needs to look into pkgdir.dirpath() because pkgdir itself really is the directory of the package and not the directory containing the package. But you would have catched that with an appropriate test :-)
No, it's correct as it is. I did a test manually because I don't have svnadmin installed on my machine so the corresponding tests are skipped anyway -- but it works. It does, because relativeimport() expects the path to point to a file *inside* the directory -- usually it is __init__.py. Should be documented I guess :-) Armin
Hi Armin, [To py-dev@codespeak.net Tue, Nov 09, 2004 at 12:14:20PM +0100]
On Tue, Nov 09, 2004 at 12:17:43AM +0100, holger krekel wrote: Hum. But the py lib implementation files could be imported over getpymodule(), right? Right now, py/initpkg.py performs
[Armin Rigo Tue, Nov 09, 2004 at 09:33:42AM +0000] this task for bootstrapping reasons but it could at least cooperate with getpymodule().
Hum, it seems after your recent commits they actually have to cooperate. For me 'py.test' doesn't currently work because your new custom import tries to import 'py/__init__.py' but this relies on a __path__ for its export definition to work. Which is something that we don't provide. If initpkg.py would basically use the same mechanism than getpymodule() then this wouldn't be a problem. cheers, holger
participants (3)
-
Armin Rigo -
holger krekel -
hpk@trillke.net