Python packages - problems, pitfalls.

Prabhu Ramachandran prabhu at aero.iitm.ernet.in
Mon Nov 5 03:13:24 EST 2001


hi,

>>>>> "RR" == Robert Roy <rjroy at takingcontrol.com> writes:

Thanks for the response.

    >> package.  Inside this directory I have a sub-package, called
    >> sub.  So its something like:
    >> 
    >> pkg/ __init__.py a.py sub/ __init__.py b.py
    >> 
    >> Now, from b I'd expect to be able to import 'a' straight away.
    >> 

    RR> Why would you expect that? a.py has no relationship to
    RR> b.py. It is not a sibling module, it is not a parent.

I beg to differ.  It depends on how you look at it.  a.py is certainly
in the same package. a is part of pkg b.py is also part of pkg.
Atleast, it should be possible to access b by using something like
os.pardir without having to know what package a is part of.
Apparently ni[1] used to do this.  But this was removed later.
Allowing for this also makes sense, IMHO.  The syntax was a little
dirty, I'll admit, but it makes sense.

Let me put it this way.  How are names looked up in normal Python
code?  First you check in the local namespace, then you keep going
higher and higher up the tree until you are out of all local
namespaces, then you try with the global namespace.  This makes sense
and is consistent.  So, why isn't the same thing implemented for
packages?

    >> Say in b.py I do
    >> 
    >> import a
    >> 
    >> This will not work!  I know why it happens but shouldn't Python
    >> be smart enough to avoid such problems?  Why I consider this a
    >> major problem is that the importing of a sub-package depends on
    >> where its parent package is!!

    RR> Its a good thing it doesn't. It would be a maintenance
    RR> nightmare.

Please elaborate.  Atleast provide one example.  Its easy to say "its
bad" - please elaborate as to how it is bad.

    RR> See Tim Peter's rules of python programming: Explicit is
    RR> better than implicit

Hmm, this is violated when you speak of 'sibling' packages.  As per
the above rule why should this be valid?

par_pkg/
  __init__.py
  sub/
	__init__.py
	a.py
	b.py

b.py:
import a
?

Which is correct but is _definitely_ not explicit.  Technically by
wanting explicit names you are saying that b.py should say:

import par_pkg.sub.a

and not (and in the above path lies madness since if par_pkg is
re-nested you have to refactor everything)

import a

IMHO this completely violates the above rule.  If you suggest to use
explicit names - do it all the time.  Why is it correct for packages
in the same directory to access each other without using explicit
names??  If this is consistent then it certainly makes sense to check
higher up the package structure as well.

    RR> Frankly if I had a situation as you describe above, I would
    RR> take a good look at why I had structured my modules that
    RR> way. Most likely I would find that the module structure had
    RR> outlived its usefulness and needed to be redefined/refactored.

Well, in my reply to Chris Gonnerman[2] I gave another example of a
problem and mentioned a real world project that had problems with this
issue.  Let me add -- the problem with SciPy[3] was that several
packages that were developed standalone were being put together into
another bigger package.  Each package had a perfectly valid package
structure.  The problem arose simply because the packages were put
inside another larger package.

So this problem is not merely an issue with the correctness of a
particular packages structure but with the handling of packages and
names as a whole.

    >> So if pkg is no longer the root of the package and say its put
    >> inside another BigPkg then to make pkg's sub packages work
    >> you'd have to edit *all* the sub packages and change every
    >> reference to pkg.a to BigPkg.pkg.a.  This is a huge pain.

    RR> Use a ".pth" file to tell it where the package resides. Say
    RR> you re-nest pkg into BigPkg, place a file called pkg.pth in
    RR> the root directory of your distribution and put the location
    RR> of pkg in it. See the documentation on module "site".

Of course, and this will truly be a maintenance nightmare.  Doing this
will mean that you have exposed all names inside BigPkg to the outside
world as such.  This might be the problem you are trying to address in
the first place by putting pkg and friends inside BigPkg!  

In fact this is something like simply doing the following inside the
__init__.py of my pkg example:

sys.path.append(os.path.dirname(__file__))

I did not even mention this as a solution since this is definitely
incorrect and bad practice.

    RR> for example (not tested!): Python root

    RR> pkg.pth (contents would be Lib/BigPkg/pkg)

    RR> Lib/ BigPkg/ __init__.py pkg/ __init__.py

    RR> This will allow old pkg references to continue working.

Sure, but this does not solve the problem I mention in [2] and has
problems that I mentioned above.

    >>  Are there better ways to get around this packaging problem?
    >> Is this a know problem that folks are working on??  Why is it
    >> that Python does not deal with this issue more sensibly?

    RR> Python does, it is not terribly well documented though...

Sorry, I don't agree.  Python does not do it the way I'd expect.  As
to whether my expectation is reasonable or not is a different issue
but I find no reason as to why my expectation is not reasonable.

Thanks again for the response,

prabhu

References:
 1. http://www.python.org/doc/essays/packages.html
 2. http://mail.python.org/pipermail/python-list/2001-November/070766.html
 3. http://www.scipy.org

-- 
Prabhu Ramachandran			  MayaVi Data Visualizer
http://www.aero.iitm.ernet.in/~prabhu     http://mayavi.sf.net

Where there's no emotion, there's no motive for violence.
		-- Spock, "Dagger of the Mind", stardate 2715.1




More information about the Python-list mailing list