[Python-ideas] Packages and Import

Fri Feb 9 18:31:16 CET 2007

Brett Cannon wrote:
> On 2/8/07, Ron Adam <rrr at ronadam.com> wrote:
>> Brett Cannon wrote:
>> > On 2/7/07, Ron Adam <rrr at ronadam.com> wrote:
>> >> Brett Cannon wrote:
>> >> > On 2/4/07, Ron Adam <rrr at ronadam.com> wrote:

>> >> It would be nice if __path__ were set on all modules in packages no
>> >> matter how
>> >> they are started.
>> >
>> > There is a slight issue with that as the __path__ attribute represents
>> > the top of a package and thus that it has an __init__ module.  It has
>> > some significance in terms of how stuff works at the moment.
>>
>> Yes, and after some reading I found __path__ isn't exactly what I was 
>> thinking.
>>
>> It could be it's only a matter of getting that first initial import 
>> right.  An
>> example of this is this recipe by Nick.
>>
>>      http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/307772
> 
> But Nick already rolled this stuff into 2.5 when package support was
> added to runpy.

I'll take a look at runpy today sometime.

>> If you remove the "__main__" name, then you will still need to have some
>> attribute for python to determine the same thing.
> 
> Why?  There is nothing saying we can't follow most other languages and
> just have a reserved function name that gets executed if the module is
> executed.

Yes, but this is where python is different from other languages.  In a way, 
python's main *is* the whole module from the top to bottom.  And so the 
'__main__' name is referring to the whole module and not just a function in it.

A more specific function would be needed to get the context right.  Maybe 
__script__(),  or __run__().

Or if you want to be consistent with class's,  how about adding __call__() to 
modules?   Then the main body of the module effectively works the same way as it 
does in a class.  =)

Hey, I think that has some cool possibilities, it makes modules callable in 
general.  So if I want to run a module's __call__(), AKA main() as you call it, 
after importing I would just do...

    import module
    module()

And it would just work.  ;-)

>>  What you would end up doing
>> is just moving the [if __name__=="__main__": __main__()] line off the 
>> end of
>> program so that all program have it automatically.  We just won't see 
>> it.  And
>> instead of checking __name__, the interpreter would check some other 
>> attribute.
>>
>> So what and where would that other attribute be?
>>
> 
> If a thing was done like that it would be in the global namespace of
> the module just like __name__ is.

Forget this, I like the idea above much better!  It's fully consistent with 
class's and so it would be easy to explain as well.  A step towards unification 
of class's and modules.  The __name__ attribute isn't changed as well. ;-)

>> If someone wants to import an external to a package module with the 
>> same name as
>> the package, (or modules in some other package with the same name), 
>> then there
>> needs to be an explicit way to do that.  But I really don't think this 
>> will come
>> up that often.
>>
>>
>> <clipped general examples>
>>
>> > Or you could have copied the code I wrote for the filesystem
>> > importer's find_module method that already does this classification.
>> > =)
>> >
>> > Part of the problem of working backwards from path to dotted name is
>> > that it might not import that way.
>>
>> Maybe it should work that way?  If someone wants other than that 
>> behavior, then
>> maybe there can be other ways to get it?
>>
> 
> That's my point; the "other way" needs to work and the default can be
> based on the path.

We need to get much more specific on this.  ie... examples.   I don't think we 
will get anywhere trying to generalize this point.

>> Hers's an example of a situation where you might think it would be a 
>> problem,
>> but it isn't:
>>
>>      pkg1:
>>        __init__.py
>>        m1.py
>>        spkg1:
>>           __init__.py
>>           m3.py
>>        dirA:
>>           m4.py
>>           pkg2:
>>              __init__.py
>>              m5.py
>>
>> You might think it wouldn't work for pkg2.m5, but that's actually ok.  
>> pkg2 is a
>> package just being stored in dirA which just happens to be located inside
>> another package.
>>
>> Running m5.py directly will run it as a submodule of pkg2, which is 
>> what you
>> want.  It's not in a sub-package of pkg1.  And m4.py is just a regular 
>> module.
>>
>> Or are you thinking of other relationships?
> 
> I am thinking of a package's __path__ being set to a specific
> directory based on the platform or something.  That totally changes
> the search order for the package that does not correspond to its
> directory location.

In that case, I think the developer and anyone who tries to run the script in a 
way the developer did not intend will have to be on their own.

For example if I add a directory to __path__ to include a module that normally 
lives someplace else. Thats ok.  If I execute any of 'my' modules in 'my' 
package. It will import __init__.py and set the __path__ accordingly and 
everything will still work.

But if I execute the 'other' module directly, then python needs to run it in 
what ever context it normally lives in.  We shouldn't try to figure out what 
'other' packages it may be used in, because it may be used in many packages.  So 
the only thing to do is run it in the context it is in where we find it.  And 
not this 'special' context we put it in.

For situations where we might have several subdir's in our package that may be 
choosen from depending on platform (or other things).  We may be able to put a 
hint in the directory, such as a _init__.py file.  (Notice the single 
underscore.)  Or some variation if that's too subtle.  The idea is it's an 
inactive sub-package and the main packages __init__ file could activate a 
'reserved' sub-package using some method like renaming the _init__.py to 
__init__.py, (but I really don't like renaming as a way to do that.)  It would 
be better to have some other way.

Then we could possibly still do the search up to find the root package by 
including _init__.py files in our search in those cases as well.

>> >> >> MOTIVATION
>> >> >> ==========
>> >> >>
>> >> >> (A) Added reliability.
>> >> >>
>> >> >> There will be much less chance of errors (silent or otherwise) 
>> due to
>> >> >> path/import conflicts which are sometimes difficult to diagnose.
>> >> >>
>> >> >
>> >> > Probably, but I don't know if the implementation complexity warrants
>> >> > worrying about this.  But then again how many people have actually
>> >> > needed to implement the import machinery.  =)  I could be labeled as
>> >> > jaded.
>> >>
>> >> Well, I know it's not an easy thing to do.  But it's not finding the
>> >> paths and
>> >> or weather files are modules etc... that is hard.  From what I
>> >> understand the
>> >> hard part is making it work so it can be extended and customized.
>> >>
>> >> Is that correct?
>> >
>> > Yes.  I really think ditching this whole __main__ name thing is going
>> > to be the only solid solution.  Defining a __main__() method for
>> > modules that gets executed makes the most sense to me.  Just import
>> > the module and then execute the function if it exists.  That allow
>> > runpy to have the name be set properly and does away with import
>> > problems without mucking with import semantics.  Still have the name
>> > problem if you specify a file directly on the command line, though.
>>
>> I'll have to see more details of how this would work I think. Part of 
>> me says
>> sound good. And another part says, isn't this just moving stuff 
>> around? And what
>> exactly does that solve?
> 
> It is moving things around, but so what?  Moving it keeps __name__
> sane.  At work a global could be set to the name of the module that
> started the execution or have an alias in sys.modules for the
> '__main__' key to the module being executed.

Or just use __call__().  It already behaves in the way you want for class's.  It 
could be reused I think for modules.  The only difference is it won't have a 
self arguments.  Which I think is not a problem.

> The point of the solution it provides is it doesn't muck with import
> semantics.  It allows the execution stuff to be external to imports
> and be its own thing.
> 
> Guido has rejected this idea before (see PEP 299 :
> http://www.python.org/dev/peps/pep-0299/ ), but then again there was
> not this issue before.
> 
> Now I see why Nick said he wouldn't touch this in PEP 338.  =)

I read the thread, and backwards compatibility as well as Guido just not liking 
it were the reasons it was rejected.  Backwards compatibility is less of a 
problem for py3k, but I also agree with his reasons for not liking it.  I think 
a reserved __call__() function for modules may be a little easier to sell.  It's 
already reserved in other situations for very much the same purpose as well.

Cheers,
    Ron