[Python-3000] Chaning the import machinery; was: Re: [Python-Dev] setuptools in 2.5.

Thu Apr 20 17:27:02 CEST 2006

On 4/20/06, Walter Dörwald <walter at livinglogic.de> wrote:
> Guido van Rossum wrote:
>
> > Sorry, there's so much here that seems poorly thought out that I don't
> > know where to start.
>
> Consider it a collection of wild ideas.
>
> > Getting rid of the existing import syntax in favor of the incredibly
> > verbose and ugly
> >
> >   foo = import("foo")
> >
> > just isn't acceptable.
>
> OK, then how about
>
> import foo
> import foo from url("file:~guido/python-packages/foo")

Sorry. I wasn't proposing that the import statement had to be extended
to support the new functionality; only that the existing functionality
should still be available by writing import statements. For the above,
I'd much rather write

foo = import_from_url("foo", "/home/guido/.../foo")

I also don't like your proposal to create a "url" object (as you might
have predicted from my resistance to the path PEP :-).

> How would this work with multiple imports?
>
> import foo, bar from url("file:~guido/python-packages/")
>
> How would it recognize whether url() refers to the module itself or to a
> package from which modules are imported?

You tell me.

> > Importing from remote URLs is a non-starter from a security POV; and
> > using HTTPS would be too slow. For code that's known to reside
> > remotely, a better approach is to use setuptools to install that code
> > once and for all.
>
> I don't see how that changes anything from a security POV. You have to
> trust the source in both cases.

With http, even if I trusted the source, I still shouldn't trust that
the data I get from the URL actually came from the source. With HTTPS,
at least man-in-the-middle attacks should be thwarted.

> Performancewise you are right, it
> wouldn't make sense to call such an import in a tight loop.
>
> > How would a module know its own name?
>
> It probably would only have a real name for a standard "import foo".

Thats a problem IMO.

> import foo from url("file:~guido/python-packages/foo")
>
> would create a module with foo.__origin__ ==
> url("file:~guido/python-packages/foo")
>
> > How do you deal with packages
> > (importing a module from a package implies importing/loading the
> > package's __init__.py).
>
> I have no idea. This probably won't work (at least not in the sense that
> importing something imports all its parent modules).
>
> But packages seem to be a problem for setuptools too (at least if parts
> of the package are distributed separately).

So please do some research and find out what their problems are, how
your problems are similar, and what should be done about it. At this
point I believe you're running out of quick wild ideas that are
actually helpful.

> Maybe for
>
> import os.path from url("file:~guido/python-packages/os/path")
>
> url("file:~guido/python-packages/os/path").parent() should return
> url("file:~guido/python-packages/os") which then gets imported and
> url("file:~guido/python-packages") should return None. But this would
> mean that the number of dots on the left side has to match the number of
> times calling parent() on the right side returns somehing not None. Not
> good. Maybe we should leave the current import syntax alone and add a
> new one for importing from files etc..

I think we should design a new OO API that captures the essence of the
current import machinery (but cleaned up), create a new mapping from
all current syntactic variants of the import statements to that API,
and design a separate extension (through subclassing or whatever) to
do imports from non-traditional sources.

> > I suggest that instead of answering these questions from the
> > perspective of the solution you're offering here, you tell us a bit
> > more about the use cases that make you think of this solution. What
> > are you trying to do? Why are you importing code from a specific file
> > instead of configuring sys.path so the file will be found naturally?
>
> The Python files I'm importing define "layout classes" that are used for
> generating a bunch of static HTML files (actually JSP files). I.e.
> something like
>
> foo.py:
> def link(text, href):
>    return "<a href='%s' class='foo'>%s</a>" % (href, text)
>
> bar.py:
> foo = specialimport("foo.py")
>
> def linklist(links):
>    return "<ul>%s</ul>" % "".join("<li>%s</li>" % bar.link(text, href)
> for (text, href) in links)
>
> This function linklist() is used for generating the HTML files. The
> interactive Python shell is used as a "make shell", so when I change
> foo.py in an editor and do
>    >>> project.build("install")
> in the shell all the HTML files that depend on bar.py (which in turn
> depends on foo.py) must be rebuilt.

You've nearly lost me, but it *seems* to me that what you're really
doing is use an alternative import mechanism in order to solve the
reload() problem for a set of interdependent modules. That's a good
thing to attempt to solve more generally, but hardly a use case for a
special import function.

What does your specialimport() implementation do when the same module
is requested twice? Does it load it twice?

> > I
> > suspect that your module-from-a-database use case is actually intended
> > as a building block for an import hook.
>
> Maybe, but ACAIK import hooks can't solve the dependency and reload problem.

Only because nobody has bothered to use them to solve it; they have
all the information available needed to solve it.

> > I think we ought to redesign the import machinery in such a way that
> > you could do things like importing from a specific file or from a
> > database, but without changing the import statement syntax -- instead,
> > we should change what import actually *does*.
>
> But there has to be a spot where I can actually specify from *where* I
> want to import, and IMHO this should be in the import statement, not by
> putting some hooks somewhere.

That may be your use case. Most people would prefer to be able to set
PYTHONPATH once and then use regular import statements.

On Windows, there's a mechanism to specify that a particular module
must be loaded from a specifc location. Maybe this would be helpful?
It could be a dict whose keys are fully qualified module names, and
whose values are pathnames (or URLs or whatever).

> > While we're at it, we
> > should also fix the silliness that __import__("foo.bar") imports
> > foo.bar but returns foo.
>
> +1

This is an example of what I mentioned above -- redesign the machinery
API and then remap the import statement.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)