[Distutils] Pondering multi-package packages

M.-A. Lemburg mal@lemburg.com
Wed, 31 May 2000 10:12:17 +0200


Greg Ward wrote:
> 
> On 27 May 2000, M.-A. Lemburg said:
> > I was referring to installing a (pre)built binary -- just before
> > copying the compiled files to their final install location and
> > right after that step is done.
> 
> Yes: this is the one place where the Distutils' extension mechanism
> *won't* work, because the Distutils aren't present (or at least, not in
> control) when installing a pre-built binary. 

Why not ? The RPMs could use the existing Python installation
which comes with a version of distutils (at least for 1.6)
or use a copy which gets installed together with the
package. The post-install script could then pass control
to distutils and let it apply its magic.

> Here, some other mechanism
> will be needed: pass a function, or a module, or a chunk of code to be
> eval'd, or something.  Still not sure what's best; we have to balance
> the needs of the developer writing the setup script with the facilities
> available at the time the hook is run, and how the hook will be run
> ("python hookscript.py"?).

python .../distutils/setup.py --post-install ?!
 
> > "install-from-source" would execute these hooks too: right after
> > having built the binaries.
> 
> Yes, *except* in the case where the installation is being done solely
> for the purpose of creating a built distribution.  I'm pretty sure this
> can be handled by adding a "fake install" flag to the "install" command:
> if true, don't run the {pre,post}-install hooks.

Ok.
 
> > Wouldn't a method interface be more reliable and provide
> > better means of extension using subclassing ?
> >
> > I usually wrap these attributes in .get_foobar(), .set_foobar()
> > methods -- this also makes it clear which attributes are
> > read-only, read-write or "better don't touch" :-)
> 
> "Yes, but..."
> 
> I have spent the weekend thinking hard about this problem, and I think I
> can explain the situation a little better now.  Distutils commands are
> rather odd beasts, and the usual rules and conventions of OO programming
> don't work very well with them.  Not only are they singletons (enforced
> by the Distribution method 'get_command_obj()'), but they have a
> prescribed life-cycle which is also enforced by the Distribution class.
> Until today, this life-cycle was strictly linear:
> 
>         non-existent
>         ---> preinitialized ---> initialized
>         ---> finalized
>         ---> running
>         ---> run
> 
> "Preinitialized" and "initialized" are on the same line because, to
> outsiders, they are indistinguishable: the transition happens entirely
> inside the Command constructor.  It works like this:
> 
>   * before we create any command objects, we find and parse all config
>     files, and parse the command line; the results are stored in a
>     dictionary 'command_options' belonging to the Distribution instance
>   * somebody somewhere calls Distribution.get_command_obj("foo"), which
>     notices that it hasn't yet instantiated the "foo" command (typically
>     implemented by the class 'foo' in the module distutils.command.foo)
>   * 'get_command_obj()' instantiates a 'foo' object; command classes do
>     not define constructors, so we go straight into Command.__init__
>   * Command.__init__ calls self.initialize_options(), which must
>     be provided by each individual command class
>   * 'initialize_options()' is typically a series of
>       self.this = None
>       self.that = None
>     assignments: ie. it "declares" the available "options" for this
>     command.  (The 'user_options' class attribute also "declares"
>     the command's options.  The two are redundant; every "foo-bar"
>     option in 'user_options' must be matched by a "self.foo_bar = None"
>     in 'initialize_options()', or it will all end in tears.)
>   * some time later (usually immediately), the command's
>     'finalize_options()' method is called.  The job of
>     'finalize_options()' is to make up the command's mind about
>     everything that will happen when the command runs.  Typical code
>     in 'finalize_options()' is:
>       if self.foo is None:
>          self.foo = default value
>       if self.bar is None:
>          self.bar = f(self.foo)
> 
>     Thus, we respect the user's value for 'foo', and have a sensible
>     default if the user didn't provide one.  And we respect the user's
>     value for 'bar', and have a sensible -- possibly complicated --
>     default to fallback on.
> 
>     The idea is to reduce the responsibilities of the 'run()' method,
>     and to ensure that "full disclosure" about the command's intentions
>     can be made before it is ever run.
> 
> To play along with this complicated dance, Distutils command classes
> have to provide 1) the 'user_options' class attribute, 2) the
> 'initialize_options()' method, and 3) the 'finalize_options()' method.
> (They also have to provide a 'run()' method, of course, but that has
> nothing to do with setting/getting option values.)
> 
> The payoff is that new command classes get all the Distutils user
> interface -- command-line parsing and config files, for now -- for free.
> The example "configure" command that I showed in a previous post, simply
> by virtue of having "foo-inc" and "foo-lib" in 'user_options' (and
> corresponding "self.xxx = None" statements in 'initialize_options()',
> will automatically use the Distutils' config file and command-line
> parsing mechanism to set values for those options.  Only if the user
> doesn't supply the information do we have to poke around the target
> system to figure out where "foo" is installed.

Nice :-)
 
> Anyways, the point of this long-winded discussion is this: certain
> attributes of command objects are public and fair game for anyone to set
> or modify.  However, there are well-defined points in the object's
> life-cycle *before* which it is meaningless to *get* option values, and
> *after* which it is pointless to *set* option values.  In particular,
> there's no point in getting an option value *before* finalization,
> because -- duh -- the options aren't finalized yet.  More subtly,
> attempting to set some option *after* finalization time might have no
> effect at all (if eg. that option is only used to derive other options
> from, like the 'build_base' option in the "build" command); or it might
> have complicated, undesirable effects.  I can see this happening in
> particular with the "install" command, which (necessarily) has a
> frighteningly complex finalization routine.

Hmm, I still don't see why you can't add attribute access methods
which check and possibly control the forementioned problems.
A few .set_this() and .get_that() methods would make the interface
more transparent, add documentation (by virtue of __doc__ strings ;-)
and could add check assertions.

> If we go by the simple, linear state-transition diagram above, it turns
> out that setting option values for a particular command object is a
> dicey proposition: you simply don't know what state the command object
> is in, so you don't know what effect setting values on that command will
> have.  If you try to force them to have the right effect, by calling
> 'finalize_options()', it won't work: the way that method is typically
> written ("if self.foo is None: self.foo = default value", for as many
> values of "foo" as are needed), calling it a second time just won't
> work.

Why not let the .set_this() method take care of getting the
state right ? (or raise an exception if that's impossible)
 
> So today, I added a couple of new transitions to that state-transition
> diagram.  Now, you can go from any state to the "initialized" state
> using the 'reinitialize_command()' method provided by Distribution.  So
> it's now safe to do something like this, eg. in a "configure" command
> 
>     build = self.reinitialize_command("build")
>     build.include_dirs.append(foo_inc)
>     build.library_dirs.append(foo_lib)
>     build.ensure_finalized()
> 
> ...and you know that any user-specified options to the "build" command
> will be preserved, and that all dependent-but-unspecified options will
> be recomputed.  (You don't need to call 'ensure_finalized()' here unless
> you will subsequently by getting some option values from the "build"
> object.)
> 
> Thus, it should now be possible to write a "configure" command that
> respects the bureaucracy of the Distutils *and* forces the "build"
> command to do The Right Thing.  This is a small change to the code, but
> a major change to the philosophy of option-passing in the Distutils,
> which until now was (theoretically) "pull only": it was not considered
> proper or safe to assign another command's option attributes; now it is,
> as long as you play by the above rules.  Cool!

Indeed :-)
 
> BTW, I'm not opposed to the idea of 'get_foo()' and 'set_foo()' methods:
> they could add some value, but only if they are provided by the Command
> class, rather than each command having to implement a long list of
> near-identical accessor and modifier methods.  Probably 'get_foo()'
> should die if the object hasn't been finalized, and 'set_foo()' should
> die if it has been finalized (or hasn't been initialized).

Right. I'd say: go for it ;-)

In my experience, it's always better to define object access
via methods rather than attributes. This is especially true
when the projects evolves with time: you simply forget about
the details, side-effects, assertions you made months ago
(and possibly forgot to document) about the specific
attributes.

Performance is an argument here, but in the end you pay
the few percent in performance gain with a much larger
percentage in support costs...
 
-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/