[Distutils] Improving distutils vs redesigning it (was people want CPAN)

David Cournapeau david at ar.media.kyoto-u.ac.jp
Thu Nov 12 07:34:58 CET 2009

Glyph Lefkowitz wrote:
> On Nov 12, 2009, at 12:02 AM, David Cournapeau wrote:
>> Glyph Lefkowitz wrote:
>>> There are probably a dozen other ways that you *could* work on
>>> distutils and benefit more immediately from your efforts than the
>>> next Python release.  To think otherwise is a simply a failure of
>>> imagination.  Now, if you think it's *too hard* to do that, it might
>>> be interesting to hear why you think that, and what exactly the
>>> effort would be; a nebulous assertion that it's just too hard and we
>>> should throw our hands up (while I can definitely understand the
>>> impulse to make such an assertion) serves only to discourage everyone.
>> I am trying to understand what is 'nebulous' about our claims. We have
>> given plenty of hard and concrete examples of things which are
>> problematic in distutils.
> I'm sorry if I gave the impression that I was contesting that
> particular assertion.  We all agree that distutils has deep problems.
> And, I don't think that everything that has been said is overgeneral
> or unhelpful.  Before I dive into more criticism, let me just say that
> I agree 100% with Robert Kern's message where he says:
>> In order to integrate this with setuptools' develop command (...) we
>> need to create a subclass of setuptool's develop command that will
>> reinitialize build_src with the appropriate option. Then we need to
>> conditionally place the develop command into the set of command
>> classes so as not to introduce a setuptools dependency on those
>> people who don't want to use it.
>> This is nuts.
> This is completely correct.  I've done stuff like this, we've all
> probably done stuff like this.  Conditional monkeypatching and dynamic
> subclassing is all over the place in distutils extension code, and it
> is *completely* nuts.
> Still, it would have been more helpful to point out how exactly this
> problem could be solved, and to present (for example) a description of
> similar objects politely interacting and delegating responsibility to
> one another to accomplish the same task.
> I would definitely characterize these assertion from Robert as
> "nebulous", given that the prior messages in the thread (as far as I
> can tell) do not describe the kind of massive-overhaul changes which
> would fix things, only the problems that currently exist:
>> In our considered opinion, piecemeal changes probably aren't going to
>> solve the significant problems that we face.
> Why not?  The whole of computer history is the story of piecemeal
> improvements of one kind or another; despite perennial claims that,
> for example, hierarchical filesystems or bit-mapped displays
> "fundamentally" cannot support one type of data or another, here we are.

I think Robert meant piecemeal changes from an implementation POV, not
that we should ignore any history or existing design solutions.
Actually, I think that distutils is victim a lot of NIH, it is totally
different from any other build system I have seen.

> Or this one, also from Robert:
>> Mostly because I'm entirely uninterested in helping you make
>> incremental improvements that are going to break all the hard work
>> we've already done just to get things working as it is.
> Why do incremental improvements have to break all the hard work that
> has already been done?  Surely this is what a compatibility policy is
> about.

Here is what *I* mean by distutils compatibility, so that we are sure to
talk about the same thing:
    - existing setup.py should run without problem, and produce the same
software when installed/produced (bdist_wininst, sdist, etc...) under
the same conditions as distutils (actually, setuptools, since distribute
is a fork of setuptools).
    - existing usage of distutils API should remain compatible. I asked
in a previous email what is meant by distutils API, Tarek answered
anything which does not start with an underscore. But what does that
mean ? For example, in numscons, I rely on the build directory to have a
certain structure (implemented defined in distutils: you can't retrieve
them from distutils from public API). Is this part of the API ? Is using
copied commands to get some characteristics considered part of the API ?
Is the order of commands, or their attributes considered public (they
all start without an underscore, but they are not documented anywhere) ?

During a more precise discussion, I think we have more or less agreed
with Tarek that build_ext needs a significant overhaul. Although we did
not discuss concretely about other commands, I think the same kind of
arguments apply to almost any command. There is then the issue of
communicating between commands, through the Distribution class.

Let's assume for the argument's sake that we manage to convince the
community as a whole that both commands and distribution classes need to
be redesigned. At that point, what's different from a newly distribution
tool, with a totally different API ? (but which could reuse distutils
implementations parts, of course). Certainly, unless you keep the
current code and the new one, you will break almost every distutils API
user out there.

Python distutils (the one included in python) has broken our extensions
countless times already, even though no significant feature has been
added. Setuptools itself already breaks a lot of them out there. That's
why I am not convinced that you can improve distutils without causing
the exact same issues as a new distribution system. That's from my
experience from extensively writing extensions around it.

> "classes and objects" have been used in many high-performance systems.
>  Personally I find "classes and objects" fairly flexible as well.  In
> fact, if *I* were to make a nebulous claim about distutils' design
> structure, it would be that the parsimony with creating whole new
> classes and instantiating multiple objects is the problem; there
> should be more classes, more objects, less inheritance and fewer methods.

I agree this claim was vague. I was only talking about using class and
object for building, not objects in a general sense. The problem with
compilation is that you need almost total flexibility: you simply cannot
foresee how to use tools and their interaction with the launched
commands. Neither make, nor waf, nor scons does that. Instead, they
provide the fundamental abstraction source -> "action" -> target, where
action can really be anything, and should be decoupled from any tool
definition (what's common between a C compiler and a code generator, for
example ?).

Concerning classes, I don't think you can have a hierarchy for
compilers: they behave so differently depending on the platforms that
they share very little. This is true for any tool, actually. What is
common between (most) C compilers is that they produce object files from
sources, and link object files together.

About the performance claim: scons has speed issues (the issue keeps
coming up on the user and dev ML), in part because it uses too many
objects. Waf manages to be much faster (it is basically as fast as make
for reasonably sized projects, and it does automatic dependency). They
manage to do so thanks to agressive optimizations. They care about the
number of attributes of the fundamental classes, they compile string
commands into functions to avoid useless substitutions (another scons
speed issue).

> However, I feel compelled to repeat that it is a matter of historical
> fact and, I suspect, a corollary of the Church-Turing thesis that
> pretty much any software system *can* be changed into just about any
> other software system through a series of evolutionary steps where the
> system does something useful at each step; it is a question of whether
> you believe this approach requires an unreasonable amount of effort,
> and how big the steps need to be.

Yes, obviously you *can* go from distutils to a perfect system in
piecemeal changes. The question is how long and how much effort does it
take. So when I say "I don't think you can significantly improve
distutils", it is to be understood as "it will take less time to go to a
better system without bothering with keeping the same architecture".

>  If you believe the effort required would be unreasonable, then let's
> see if we can find a radical, incompatible change to distutils that we
> all agree would be an improvement, and see if we also agree that the
> effort would be impractical.

A few examples which are a problem for us (us being numpy/scipy/etc...
developers/users here):
    - automatic dependency handling. If I change one header, only the
files which include this header will be rebuilt; if a fortran compiler
flag is changed, only fortran source files are recompiled, etc...
    - package description so that simple packages could be built
automatically without running untrusted code
    - reliable parallel builds
    - integration with 3rd party tools

To be honest, the only one I really care about is the last one: if we
could find a solution so that I can build the C code from say make or
scons, and build/install in a distutils compatible way through simple
and stable API, it would solve all the other parts through 3rd party
code. Right now, numscons depends too much on distutils implementation
details, and cannot produce all the non build goodies from distutils
(sdist, bdist_wininst, etc...).



More information about the Distutils-SIG mailing list