[Python-ideas] Packages and Import

Thu Feb 8 01:53:54 CET 2007

Brett Cannon wrote:
> On 2/4/07, Ron Adam <rrr at ronadam.com> wrote:
>>
>> After exploring this a bit further on comp.lang.python, I was able to 
>> organize
>> these ideas better.  The more I thought about it, the more '+'s I 
>> found, and
>> about the only '-'s I can think of is the work required to actually 
>> make a patch
>> to do it.
>>
>> It's also good to keep in mind that since most people still rely on 
>> the old
>> relative import behavior, most people have not run into some of the 
>> issues I
>> mention here.  But they will at some point.
>>
>> I did mean to keep this short, but clarity won out. (At least it's 
>> clear to me,
>> but that's an entirely subjective opinion on my part.)
>>
>> Maybe someone will adopt this and make a real PEP out of it.  :-)
>>
>> Cheers,
>>    Ron
>>
>>
>>
>> PROPOSAL
>> ========
>>
>> Make pythons concept of a package, (currently an informal type), be 
>> stronger
>> than that of the underlying file system search path and directory 
>> structure.
>>
> 
> So you mean make packages more of an official thing than just having a
> __path__ attribute on a module, right?

Currently in python 2.5, __path__ attributes are only in the imported package 
name spaces.  Running a module doesn't set a __path__ attribute, just the 
__file__ attribute.

It would be nice if __path__ were set on all modules in packages no matter how 
they are started.  The real name could be worked out by comparing __path__ and 
__file__ if someone needs that.  But I think it would be better to just go ahead 
and add a __realname__ attribute for when __name__ is "__main__".

__name__ == "__main__" can stay the same and still serve it's purpose to tell 
weather a script was started directly or imported.

>> Where the following hold true in python 3.X, or when absolute_import 
>> behavior is
>> imported from __future__ in python 2.X:
>>
>>
>> (1) Python first determines if a module or package is part of a 
>> package and then
>> runs that module or package in the context of the package they belong 
>> to. (see
>> items below)
>>
> 
> Don't quite follow this statement.  What do you mean by "runs" here?
> You mean when using runpy or something and having the name set to
> '__main__'?

Yes

>> (2)  import this_package.module
>>       import this_package.sub_package
>>
>> If this_package is the same name as the current package, then do not 
>> look on
>> sys.path. Use the location of this_package.
>>
> 
> Already does this (at least in my pure Python implementation).
> Searches are done on __path__ when you are within a package.

Cool! I don't think it's like that for the non-pure version, but it may do it 
that way if
"from __future__ import absolute_import" is used.

Are you setting __path__ for each module imported in a package too?

>> (3)  import other_package.module
>>       import other_package.sub_package
>>
>> If other_package is a different name from the current package 
>> (this_package),
>> then do not look in this_package and exclude searches in sys.path 
>> locations that
>> are inside this_package including the current directory.
>
> 
> This change would require importers to do more.  Since the absolute
> import semantics automatically make this kind of import start at the
> top-level (i.e., sys.path), each import for an entry on sys.path would
> need to be told what package it is currently in, check if it handles
> that package, and then skip it if it does have it.

I don't think it will be as hard as this.  See below.

> That seems like a lot of work that I know I don't want to have to
> implement for every importer I ever write.

Only getting the correct package location for the first module executed in the 
package will be a bit of work. (But not that much.) After that, it can be passed 
around.

Here's something I used recently to get the full dotted name without importing. 
It could also return the base package path as well.  You probably don't need the 
cache.  These could be combined and shortened further for just finding a root 
package location.

def path_type(path):
     """ Determine what kind of thing path is.

         Returns  ->  'module'|'package'|'dir'| None
     """
     if os.path.isfile(path) \
         and  (path[-3:] == '.py' or \
               path[-4:] in ('.pyw', '.pyc', '.pyd', '.pyo')):
         return 'module'
     if os.path.isdir(path):
         for end in ['', 'w', 'c', 'o']:
             if os.path.isfile(os.path.join(path, '__init__.py' + end)):
                 return 'package'
         return 'dir'

def dotted_name(path, cache={}):
     """ Get a full dotted module or package name from a path name.

         Returns  ->  fully qualified (dotted) name | None
     """
     if path in cache:
         return cache[path]
     if path_type(path) in ('package', 'module'):
         parent, name = os.path.split(path)
         name, _ = os.path.splitext(name)
         while 1:
             if path_type(parent) != 'package':
                 break
             parent, nextname = os.path.split(parent)
             name = '.'.join([nextname, name])
         cache[path] = name
         return name

lets.. see  (untested)

def package_path(path):
     """ Get the package location of a module.
     """
     package = None
     if path_type(path) in ('package', 'module'):
         parent, name = os.path.split(path)
         while 1:
             if path_type(parent) != 'package':
                 break
	    package = os.path.join(parent, name)
             parent, name = os.path.split(parent)
     return package

>> (4)  import module
>>       import package
>>
>> Module and package are not in a package, so don't look in any 
>> packages, even
>> this one or sys.path locations inside of packages.
>>
> 
> This is already done.  Absolute imports would cause this to do a
> shallow check on sys.path for the module or package name.

Great! 2 down.  Almost half way there.  :-)

But will it check the current directory if you run a module directly because 
currently it doesn't know if it's part of a package.  Is that correct?

>> (5) For behaviors other than these, like when you do actually want to 
>> run a
>> module belonging to a package in a different context, a mechanism such 
>> as a
>> command line switch, or a settable import attribute should be used.
>>
>>
>> MOTIVATION
>> ==========
>>
>> (A) Added reliability.
>>
>> There will be much less chance of errors (silent or otherwise) due to
>> path/import conflicts which are sometimes difficult to diagnose.
>>
> 
> Probably, but I don't know if the implementation complexity warrants
> worrying about this.  But then again how many people have actually
> needed to implement the import machinery.  =)  I could be labeled as
> jaded.

Well, I know it's not an easy thing to do.  But it's not finding the paths and 
or weather files are modules etc... that is hard.  From what I understand the 
hard part is making it work so it can be extended and customized.

Is that correct?

>> There may also be some added security benefits as well because it 
>> would much
>> harder for someone to create a same named module or package and insert 
>> it by
>> putting it on the path. Or by altering sys.path to do the same. [*]
>>
>> [* - If this can happen there are probably more serious security 
>> issues, but not
>> everyone has the most secure setup, so this point is still probably a 
>> good
>> point. General reliable execution of modules is the first concern, 
>> this may be a
>> side benefit of that.]
>>
>>
>> (B) Reduce the need for special checks and editing sys.path.
>>
>> Currently some authors have edit sys.path or do special if 
>> os.path.exists()
>> checks to ensure proper operations in some situations such as running 
>> tests.
>> These suggestions would reduce the need for such special testing and 
>> modifications.
>>
> 
> This might minimize some sys.path hacks in some instances, but it also
> complicates imports overall in terms of implementation and semantics.

I'm not sure why it would make it so much more complicated.  The contexts for 
which the imports are done will need to be done for cases of package imports, 
relative package imports, and modules in any case.  It's just a matter of 
determining which one to use from the start.  I guess I need to look into how 
pythons imports work in a little more detail.

> Where is point C?

Woops... I could make one up if you really want one.  ;-)

(It was moved elsewhere and I forgot to reletter.)

>> (D) Easier editing and testing.
>>
>> While you are editing modules in a package, you could then run the module
>> directly (as you can with old style relative imports) and still get 
>> the correct
>> package-relative behavior instead of something else. (like an 
>> exception or wrong
>> output). Many editors support running the file being edited, including 
>> idle.
>> It's also can be difficult to write scripts for the editors to 
>> determine the
>> correct context to run a module in.
>>
> 
> How is this directly solved, though?  You mentioned "running" a module
> as if it is in a package, but there is no direct explanation of how
> you would want to change the import machinery to pull this off.
> Basically you need a way to have either modules with the name __main__
> be able to get the canonical name for import purposes.  Or you need to
> leave __name__ alone and set some other global or something to flag
> that it is the __main__ module.

Leave __name__ alone, yes.  Add a __path__ attribute for all modules that is set 
to the base package location. Add a __realname__ attribute only to modules who's 
__name__ is set to '__main__'.

The import machinery could then use those to determine how to handle imports in 
that module.

Is that clearer?

If __path__ exists, then it's module in a package.
If __realname__ exists, then it was run as a script, but here's the actual name 
anyway.

If __name__ is '__main__' then do what scripts do when __name__ == '__main__'.

> Regardless, I am not seeing how you are proposing to go about solving
> this problem.

Discussing it is a good start to doing that,  isn't it?   ;-)

> I understand the desire to fix this __main__ issue with absolute
> imports and I totally support it, but I just need a more concrete
> solution in front of me (assuming I am not totally blind and it is
> actually in this doc).
> 
> -Brett

I only outlined the behavioral rules that need to be probably first agreed on. 
After that its a matter of writing it.  As I said above these behaviors are not 
the hard part, making it extendable in a nice clean way is.

Cheers,
Ron