[Python-3000] Chaning the import machinery; was: Re: [Python-Dev] setuptools in 2.5.

Fri Apr 21 15:55:09 CEST 2006

Guido van Rossum wrote:

> On 4/20/06, Walter Dörwald <walter at livinglogic.de> wrote:
>> Guido van Rossum wrote:
>>
>> > Sorry, there's so much here that seems poorly thought out that I don't know where to start.
>>
>> Consider it a collection of wild ideas.
>>
>> > Getting rid of the existing import syntax in favor of the incredibly verbose and ugly
>> >
>> >   foo = import("foo")
>> >
>> > just isn't acceptable.
>>
>> OK, then how about
>>
>> import foo
>> import foo from url("file:~guido/python-packages/foo")
>
> Sorry. I wasn't proposing that the import statement had to be extended to support the new functionality; only that the
> existing functionality should still be available by writing import statements. For the above, I'd much rather write
>
> foo = import_from_url("foo", "/home/guido/.../foo")

Then this function would need access to some kind of atomic import functionality. Maybe a version of exec where I can specify a
filename would be enough:
>>> d = {}
>>> exec "x = 1/0" in d as "foobar"
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "foobar", line 1, in ?
ZeroDivisionError: integer division or modulo by zero

It would be nice if the traceback did show source code.

> I also don't like your proposal to create a "url" object (as you might have predicted from my resistance to the path PEP :-).

OK, then dispatching would have to be done based on something else.

>> How would this work with multiple imports?
>>
>> import foo, bar from url("file:~guido/python-packages/")
>>
>> How would it recognize whether url() refers to the module itself or to a package from which modules are imported?
>
> You tell me.

Maybe url("file:~guido/python-packages/foo/") could be a package an url("file:~guido/python-packages/foo") could be a module
(and if ~guido/python-packages/foo is a directory this would be the package module.
But this seems to be to unreliable and magic to me. Probaby this shouldn't be used for mass imports, but only for importing
single specials modules outside of the stdlib.
>> > Importing from remote URLs is a non-starter from a security POV; and using HTTPS would be too slow. For code that's known
>> > to reside remotely, a better approach is to use setuptools to install that code once and for all.
>>
>> I don't see how that changes anything from a security POV. You have to trust the source in both cases.
>
> With http, even if I trusted the source, I still shouldn't trust that the data I get from the URL actually came from the
> source. With HTTPS, at least man-in-the-middle attacks should be thwarted.

True, but the man-at-the-end attack still works (although it's less likely I guess) ;)

>> Performancewise you are right, it
>> wouldn't make sense to call such an import in a tight loop.
>>
>> > How would a module know its own name?
>>
>> It probably would only have a real name for a standard "import foo".
>
> Thats a problem IMO.

Instances from classes in those module wouldn't be pickleable.

>> import foo from url("file:~guido/python-packages/foo")
>>
>> would create a module with foo.__origin__ => url("file:~guido/python-packages/foo")
>>
>> > How do you deal with packages
>> > (importing a module from a package implies importing/loading the package's __init__.py).
>>
>> I have no idea. This probably won't work (at least not in the sense that importing something imports all its parent
>> modules).
>>
>> But packages seem to be a problem for setuptools too (at least if parts of the package are distributed separately).
>
> So please do some research and find out what their problems are,

If the modules foo.bar and foo.baz are distributed as separate setuptools packages, foo/bar.py as installed as e.g.
foo-bar-0.1/foo/bar.py and foo/baz.py is installed as foo-baz-0.3/foo/baz.py. setuptools somehow manages to tie those together
into a single "virtual packages" (according to http://peak.telecommunity.com/DevCenter/setuptools#namespace-packages by
__import__('pkg_resources').declare_namespace(__name__), which fiddles with __path__) However I must admit, I don't understand
what pgk_resources.declare_namespace() and pgk_resource._handle_ns() etc. really do.
> how your problems are similar, and what should be done about
> it.

My "load a module from a file" problem is actually totally unrelated to this, except that it gets complicated if packages come
into the picture (which I wouldn't need from the "load a module from a file" scenario).
> At this point I believe you're running out of quick wild ideas that are
> actually helpful.
>
>> Maybe for
>>
>> import os.path from url("file:~guido/python-packages/os/path")
>>
>> url("file:~guido/python-packages/os/path").parent() should return url("file:~guido/python-packages/os") which then gets
>> imported and url("file:~guido/python-packages") should return None. But this would mean that the number of dots on the left
>> side has to match the number of times calling parent() on the right side returns somehing not None. Not good. Maybe we
>> should leave the current import syntax alone and add a new one for importing from files etc..
>
> I think we should design a new OO API that captures the essence of the current import machinery (but cleaned up), create a
> new mapping from all current syntactic variants of the import statements to that API, and design a separate extension
> (through subclassing or whatever) to do imports from non-traditional sources.
>
>> > I suggest that instead of answering these questions from the
>> > perspective of the solution you're offering here, you tell us a bit more about the use cases that make you think of this
>> > solution. What are you trying to do? Why are you importing code from a specific file instead of configuring sys.path so
>> > the file will be found naturally?
>>
>> The Python files I'm importing define "layout classes" that are used for generating a bunch of static HTML files (actually
>> JSP files). I.e. something like
>>
>> foo.py:
>> def link(text, href):
>>    return "<a href='%s' class='foo'>%s</a>" % (href, text)
>>
>> bar.py:
>> foo = specialimport("foo.py")
>>
>> def linklist(links):
>>    return "<ul>%s</ul>" % "".join("<li>%s</li>" % bar.link(text, href)
>> for (text, href) in links)
>>
>> This function linklist() is used for generating the HTML files. The interactive Python shell is used as a "make shell", so
>> when I change foo.py in an editor and do
>>    >>> project.build("install")
>> in the shell all the HTML files that depend on bar.py (which in turn depends on foo.py) must be rebuilt.
>
> You've nearly lost me, but it *seems* to me that what you're really doing is use an alternative import mechanism in order to
> solve the reload() problem for a set of interdependent modules.

Exactly, it's the same problem that a Python-based webservers have with reloading changed modules (and those modules that depend
on the changed one).
> That's a good thing to attempt to solve more generally, but
> hardly a use case for a special import function.
>
> What does your specialimport() implementation do when the same module is requested twice? Does it load it twice?

Only if the module source code has been changed since the first import (or one of the module used by this module has changd),
otherwise it's reused from a cache (see Repository._update() in http://styx.livinglogic.de/~walter/pythonimport/resload.py)
>> > I
>> > suspect that your module-from-a-database use case is actually intended as a building block for an import hook.
>>
>> Maybe, but ACAIK import hooks can't solve the dependency and reload problem.
>
> Only because nobody has bothered to use them to solve it; they have all the information available needed to solve it.
>
>> > I think we ought to redesign the import machinery in such a way that you could do things like importing from a specific
>> > file or from a database, but without changing the import statement syntax -- instead, we should change what import
>> > actually *does*.
>>
>> But there has to be a spot where I can actually specify from *where* I want to import, and IMHO this should be in the import
>> statement, not by putting some hooks somewhere.
>
> That may be your use case. Most people would prefer to be able to set PYTHONPATH once and then use regular import statements.

I do that for modules that change more slowely (but installing a project from 6 month ago takes a lot of work, because you have
to install al the correct version of the packages. But pkg_resources.require() should be able to resolve that).
> On Windows, there's a mechanism to specify that a particular module must be loaded from a specifc location. Maybe this would
> be helpful?

For Python, or are you talking about the Windows OS?

> It could be a dict whose keys are fully qualified module names, and whose values are pathnames (or URLs or
> whatever).
>
>> > While we're at it, we
>> > should also fix the silliness that __import__("foo.bar") imports foo.bar but returns foo.
>>
>> +1
>
> This is an example of what I mentioned above -- redesign the machinery API and then remap the import statement.

Servus,
   Walter