[Python-ideas] Packages and Import

Thu Feb 8 22:34:49 CET 2007

On 2/8/07, Ron Adam <rrr at ronadam.com> wrote:
> Brett Cannon wrote:
> > On 2/7/07, Ron Adam <rrr at ronadam.com> wrote:
> >> Brett Cannon wrote:
> >> > On 2/4/07, Ron Adam <rrr at ronadam.com> wrote:
>
>
> >> It would be nice if __path__ were set on all modules in packages no
> >> matter how
> >> they are started.
> >
> > There is a slight issue with that as the __path__ attribute represents
> > the top of a package and thus that it has an __init__ module.  It has
> > some significance in terms of how stuff works at the moment.
>
> Yes, and after some reading I found __path__ isn't exactly what I was thinking.
>
> It could be it's only a matter of getting that first initial import right.  An
> example of this is this recipe by Nick.
>
>      http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/307772
>

But Nick already rolled this stuff into 2.5 when package support was
added to runpy.

>
>
> >>  The real name could be worked out by comparing __path__ and
> >> __file__ if someone needs that.  But I think it would be better to
> >> just go ahead
> >> and add a __realname__ attribute for when __name__ is "__main__".
> >>
> >> __name__ == "__main__" can stay the same and still serve it's purpose
> >> to tell
> >> weather a script was started directly or imported.
> >
> > I think the whole __main__ thing is the wrong thing to be trying to
> > keep alive for this.  I know it would break things, but it is probably
> > better to come up with a better way for a module to know when it is
> > being executed or do denote what code should only be run when it is
> > executed.
>
> I was trying to suggest things that would do the least harm as far as changing
> things in the eyes of the users.  If not keeping the "__main__" name in python
> 3k is a real option then yes, then there may be more options.  Is it a real
> option?  Or is Guido set on keeping it?
>

Beats me.  Wouldn't be hard to have 2to change ``if __name__ ==
'__main__'`` to a function definition instead.

> If you remove the "__main__" name, then you will still need to have some
> attribute for python to determine the same thing.

Why?  There is nothing saying we can't follow most other languages and
just have a reserved function name that gets executed if the module is
executed.

>  What you would end up doing
> is just moving the [if __name__=="__main__": __main__()] line off the end of
> program so that all program have it automatically.  We just won't see it.  And
> instead of checking __name__, the interpreter would check some other attribute.
>
> So what and where would that other attribute be?
>

If a thing was done like that it would be in the global namespace of
the module just like __name__ is.

> Would it be exposed so we add if __ismain__: <body> to our programs for
> initialization purposes?
>
> Or you could just replace it with an __ismain__ attribute then we can name our
> main functions anyhthing we want... like test().
>
> if __ismain__:
>     test()
>
> That is shorter and maybe less confusing than the __name__ check.
>
>
> >> >> (2)  import this_package.module
> >> >>       import this_package.sub_package
> >> >>
> >> >> If this_package is the same name as the current package, then do not
> >> >> look on
> >> >> sys.path. Use the location of this_package.
> >> >>
> >> >
> >> > Already does this (at least in my pure Python implementation).
> >> > Searches are done on __path__ when you are within a package.
> >>
> >> Cool! I don't think it's like that for the non-pure version, but it
> >> may do it
> >> that way if
> >> "from __future__ import absolute_import" is used.
> >
> > It does do it both ways, there is just a fallback on the classic
> > import semantics in terms of trying it both as a relative and absolute
> > import.  But I got the semantics from the current implementation so it
> > is not some great inspiration of mine.  =)
>
> I think there shouldn't be a fall back.. that will just confuse things. Raise an
> exception here because most likely falling back is not what you want.
>

The fallback is the old way, so don't worry about it.

> If someone wants to import an external to a package module with the same name as
> the package, (or modules in some other package with the same name), then there
> needs to be an explicit way to do that.  But I really don't think this will come
> up that often.
>
>
> <clipped general examples>
>
> > Or you could have copied the code I wrote for the filesystem
> > importer's find_module method that already does this classification.
> > =)
> >
> > Part of the problem of working backwards from path to dotted name is
> > that it might not import that way.
>
> Maybe it should work that way?  If someone wants other than that behavior, then
> maybe there can be other ways to get it?
>

That's my point; the "other way" needs to work and the default can be
based on the path.

> Hers's an example of a situation where you might think it would be a problem,
> but it isn't:
>
>      pkg1:
>        __init__.py
>        m1.py
>        spkg1:
>           __init__.py
>           m3.py
>        dirA:
>           m4.py
>           pkg2:
>              __init__.py
>              m5.py
>
> You might think it wouldn't work for pkg2.m5, but that's actually ok.  pkg2 is a
> package just being stored in dirA which just happens to be located inside
> another package.
>
> Running m5.py directly will run it as a submodule of pkg2, which is what you
> want.  It's not in a sub-package of pkg1.  And m4.py is just a regular module.
>
> Or are you thinking of other relationships?
>

I am thinking of a package's __path__ being set to a specific
directory based on the platform or something.  That totally changes
the search order for the package that does not correspond to its
directory location.

>
> >__path__ can be tweaked, importers
> > and loaders can be written to interpret the directory structure or
> > file names differently, etc.
>
> Yes, and they will need a basic set of well defined default behaviors to build
> on.  After that, it's up to them to be sure their interpretation does what they
> want.
>
>
> > Plus what about different file types
> > like .ptl files from Quixote?
>
> This is really a matter of using a corresponding file reader to get at it's
> contents or it's real (relative to python) type... Ie, is it really a module, a
> package, or a module in a package, or some other thing ... living inside of a
> zip, or some other device (or file) like container?
>
>
>
> >> >> (4)  import module
> >> >>       import package
> >> >>
> >> >> Module and package are not in a package, so don't look in any
> >> >> packages, even
> >> >> this one or sys.path locations inside of packages.
> >> >>
> >> >
> >> > This is already done.  Absolute imports would cause this to do a
> >> > shallow check on sys.path for the module or package name.
> >>
> >> Great! 2 down.  Almost half way there.  :-)
> >>
> >> But will it check the current directory if you run a module directly
> >> because
> >> currently it doesn't know if it's part of a package.  Is that correct?
> >
> > Absolute import semantics go straight to sys.path, period.
>
> Which includes the current directory.  So in effect it will fall back to a
> relative type of behavior if a module with the same name is being imported exist
> in the current, inside this package direcotry, *if* you execute the module directly.
>
> I think this should also give an error, it is the inverse of the situation
> above. (#2) In most cases (if not all) it's not what you want.
>
> You wanted a module that is not part of this modules package, and got one that is.
>
>
> >> >> MOTIVATION
> >> >> ==========
> >> >>
> >> >> (A) Added reliability.
> >> >>
> >> >> There will be much less chance of errors (silent or otherwise) due to
> >> >> path/import conflicts which are sometimes difficult to diagnose.
> >> >>
> >> >
> >> > Probably, but I don't know if the implementation complexity warrants
> >> > worrying about this.  But then again how many people have actually
> >> > needed to implement the import machinery.  =)  I could be labeled as
> >> > jaded.
> >>
> >> Well, I know it's not an easy thing to do.  But it's not finding the
> >> paths and
> >> or weather files are modules etc... that is hard.  From what I
> >> understand the
> >> hard part is making it work so it can be extended and customized.
> >>
> >> Is that correct?
> >
> > Yes.  I really think ditching this whole __main__ name thing is going
> > to be the only solid solution.  Defining a __main__() method for
> > modules that gets executed makes the most sense to me.  Just import
> > the module and then execute the function if it exists.  That allow
> > runpy to have the name be set properly and does away with import
> > problems without mucking with import semantics.  Still have the name
> > problem if you specify a file directly on the command line, though.
>
> I'll have to see more details of how this would work I think. Part of me says
> sound good. And another part says, isn't this just moving stuff around? And what
> exactly does that solve?
>

It is moving things around, but so what?  Moving it keeps __name__
sane.  At work a global could be set to the name of the module that
started the execution or have an alias in sys.modules for the
'__main__' key to the module being executed.

The point of the solution it provides is it doesn't muck with import
semantics.  It allows the execution stuff to be external to imports
and be its own thing.

Guido has rejected this idea before (see PEP 299 :
http://www.python.org/dev/peps/pep-0299/ ), but then again there was
not this issue before.

Now I see why Nick said he wouldn't touch this in PEP 338.  =)

-Brett