Module/package hierarchy and its separation from file structure

Peter Schuller peter.schuller at infidyne.com
Wed Jan 23 04:49:56 EST 2008


Hello,

In writing some non-trivial amount of Python code I keep running into
an organizational issue. I will try to state the problem fairly
generally, and follow up with a (contrived) example.

The root cause of my difficulties is that by default, the relationship
between a module hierarchy and the structure of files on disk is too
strong for my taste. I want to separate the two as much as possible,
but I do not want to resort to non-conventional "hacks" to do it. I am
posting this in an attempt to present what I perceive to be a
practical problem, and to get suggestions for solutions, or opinions
on the most practical policy for how to deal with it.

Like I said, I would like a weaker relationship between file system
structure and module hierarchy. In particular there are two things I
would like:

  * Least importantly, I don't like jamming code into __init__.py,
    as a personal preference.
  * Most importantly, I do not like to jam large amounts of code
    into a single source file, just for the purpose of keeping
    the public interface in the same package.

An contrived but hopefully illustrative example:

We have an organization "Org", which has a library, and as part of
that library is code that relates to doing something with animals. As
a result, the interesting top-level package for this example is:

   org.lib.animal

Suppose now that I want an initial implementation of the most
important animal. I want to create the class (but see [1]):

   org.lib.animal.Monkey

The public interface consists of that class only (and possibly a small
handful of functions). The implementation is quite significant however
- it is 500 lines of code long.

At this point, we had to jam those 500 lines of code into
__init__.py. Let's ignore my personal preference of not liking to put
code in __init__.py; the fact remains that we have 500 lines of code
in a single source file.

Now, we want to continue working on this library, adding ten
additional animals.

At this point, we have these choices (it seems to me):

  (1) Simply add these to __init__.py, resulting in
      __init__.py being 5000 lines long[2].

  (2) Put each animal into its own file, resulting in
      org.lib.animal.Monkey now becoming
      org.lib.animal.monkey.Monkey, and animal X becoming
      org.lib.animal.x.X.

The problem I have is that both of these solutions are, in my opinion,
very ugly:

* (1) is ugly from a source code management perspective, because jamming
  5000 lines of code for ten different animals into a single file
  is bad for obvious reasons.

* (2) is ugly because we introduce org.lib.animal.x.X for
  animal X, which:
    (a) is redundant in terms of naming
    (b) redundant in function since we have a single package for
        each animal containing nothing but a single class of
        the same name

Clearly, (1) is bad due to file/source structure reasons, and (2) is
bad for module organizational reasons. So we are back to my original
wish - I want to separate the two, so that I can solve (1)
indepeendently of (2).

Now, I realize that __init__.py can contain arbitrary code, and that
one can override __import__. However, I do not want to resort to
"hacks" just to solve this problem; I would prefer some established
convention in the community, or at least something that is elegant.

Does are people's thoughts on this problem?

Let me just shoot down one possible suggestion right away, to show you
what I am trying to accomplish:

I do *not* want to simply break out X into org.lib.animal.x, and have
org.lib.animal import org.lib.animal.x.X as X. While this naively
solves the problem of being able to refer to X as org.lib.animal.X,
the solution is anything but consistent because the *identity* of X is
still org.lib.animal.x.X. Examples of way this breaks things:

  * X().__class__.__name__ gives unexpected results.
  * Automatically generated documentation will document using the "real"
    package name.
  * Moving the *actual* classes around by way of this aliasing would
    break things like pickled data structure as a result of the change
    of actual identity, unless one *always* pre-emptively maintains
    this shadow hierarchy (which is a problem in and of itself).

Thus, it's not clean. It breaks the module abstraction and as a result
has unintended consequences. I am looking for some kind of clean
solution. What do people do about this in practice?

[1] Optionally, we might introduce an "animals" package such that it
would become org.lib.animal.animals.Monkey, if we thought we were
going to have a lot of public API outside of the animals themselves.
This does not affect this dicussion however, as the exact same thing
would apply to org.lib.animal.animals as applies to org.lib.animal in
the above example.

[2] Ignoring for now that it may not be realistic that every animal
implementation would be that long; in many cases a lot of code would
be in common. But feel free to substitude for something else (a Zoo
say).

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter.schuller at infidyne.com>'
Key retrieval: Send an E-Mail to getpgpkey at scode.org
E-Mail: peter.schuller at infidyne.com Web: http://www.scode.org




More information about the Python-list mailing list