[Distutils] thoughts on distutils 1 & 2
hengist.podd at virgin.net
Fri May 14 16:25:22 EDT 2004
[Deep breath, everyone; it's gonna get longer before it gets shorter...;]
Stefan Seefeld wrote:
>Example: compilation of extension modules.
>Scons is aiming at providing an abstraction layer for portable compilation.
>DU2 should at least allow to just delegate compilation of extension
>modules to scons.
>(and as I said previously, I think anything that doesn't allow to
>wrap traditional build systems based on 'make' and possibly the autotools
>is 'not good enough' in as a general solution).
I'm far from expert on build systems (interpreted language weenie),
but do think it makes sense for DU to hand off compilation duties to
a third-party as quickly as it can. That third-party might be a
separate Python-based system, or makefile, or anything else; however
DU shouldn't need to know this.
Thoughts on how one would separate extension compilation from the
rest of the installation procedure...
Let's say we had a standard 'src' folder to contain everything needed
to produce a module's .so file(s), and we treat that folder basically
as a self-contained entity. How would we instruct it to compile and
deliver those .so files? The party requesting the compile operation
should not need to know anything about the compilating system used.
Presumably the easiest way to decouple the two is to have a standard
'compile.py' within the 'src' folder that is executed whenever
somebody wants .so files created. Whatever code that compile.py file
then executes is its own business, and if it needs any information
from the OS/Python installation then it's up to it to request that
information itself; ideally through existing Python APIs if possible,
or through a specific DU API if not.
Once that's done, it should be easy for developers to select their
own build system from all those available to them. The Python-based
build system that's currently incorporated into DU could, of course,
be spun off as a peer to make, etc. - giving developers one more
option to choose from without forcing it upon them.
BTW, once .so compilation is decoupled from installation, it should
be possible/practical/easy? to defer .so compilation to import time
(as is currently done for .pyc files).
>>- Every Python module should be distributed, managed and used as a
>>single folder containing ALL resources relating to that module:
>>sub-modules, extensions, documentation (bundled, generated, etc.),
>>tests, examples, etc. (Note: this can be done without affecting
>>backwards-compatibility, which is important.) Similar idea to OS
>>X's package scheme, where all resources for [e.g.] an application
>>are bundled in a single folder, but less formal (no need to hide
>>package contents from user).
>are you really talking about 'package' here when you say 'module' ?
>I don't think that mandating modules to be self contained is a good
>idea. Often modules only 'make sense' in the context of the package
>that contains them. Also, are you talking about how to distribute
>packages, or about the layout of the installed files ?
>I don't think DU2 should mandate any particular layout for the target
>installation. It may well suggest layout of the files inside the
>(not yet installed) package.
(Like I say, rough notes; please keep pointing out where I'm not
Basically, what I'm proposing is that module developers stop
distributing 'naked' Python modules and use package format only (even
when there's only a single .py file involved). We then take all the
other stuff that's traditionally been bundled alongside the
module/package - documentation, unit tests, examples, etc. - and put
those into the package folder too.
The term 'package' would basically become redundant; you could just
describe everything as 'modules' and 'sub-modules'.
It's largely a philosophical shift from treating .py and .so files as
separate from source files, documentation, unit tests, examples, etc.
to treating _all_ of them equally: each being an integral component
of the module/package as a whole.
It won't require any modifications to Python itself, since Python's
import mechanism already supports the package format. For module
developers, it's really just a logistical shift from being able to
distribute 'bare' modules to always using package format. Module
developers should be happy with this, given that it's much more
accommodating towards documentation, unit tests, examples, etc. Stuff
that they already need to put somewhere, and where better than as
part of the module/package itself? And users should benefit too, as
they'll always know where to look for documentation, etc.
DU will benefit too in that the distributions will become much
simpler to create: in most cases the only thing the developer will
have to do is zip the package folder before uploading it, something
that won't even require DU to do. (That's what I'm hoping for,
anyway. In practice there might be some reason that I'm unaware of
why certain platforms would require all that extra shuffle that DU
currently does when installing packages - creating folders, copying
files, etc. I'm not a cross-platform expert. But I'd be kinda
surprised if that were the case.)
[Sidenote: in an ideal world, a Python end-user should _never_ need
to know whether FooLib exists in bare module or package form; the
transition from operating in a file-based namespace to
class-/object-based namespace would be seamless. Python's import
statement is a bit flawed here; e.g. import foo.bar can be used when
bar is a module/package within package foo, but not when it's a
attribute in module foo.]
>>- Question: is there any reason why modules should not be
>>installable via simple drag-n-drop (GUI) or mv (CLI)? A standard
>>policy of "the package IS the module" (see above) would allow a
>>good chunk of both existing and proposed DU "features" to be gotten
>>rid of completely without any loss of "functionality", greatly
>>simplifying both build and install procedures.
>Again, I don't think it is DU2's role to impose anything concerning
>the target layout. This is often platform dependent anyways.
Not quite sure if we're talking on same wavelength here. Let me try
to clarify my previous point first, then maybe you can explain yours
to me (feel free to phrase it in terms even an idiot like me can
understand; I won't be offended;).
I'm talking of how a module/package gets put in a suitable Python
directory (e.g. site-packages), which I'm assuming (unless proven
otherwise) only requires that one knows which directory to put it in
and moving the module/package to it. I'm also assuming that DU should
not need to rearrange the contents of that package folder when
installing it (except perhaps in special cases where it must install
one of several platform-specific versions of a file, say; but that'll
be the exception rather than the rule, and packages that don't
require such special handling shouldn't need to go through the same
in-depth procedures to install).
I can't immediately see anything that DU adds to this process of
duplicating a package folder from A to B, apart from filling my
Terminal window with lots of technical-looking stuff about how it's
creating new directories in site-packages and copying files over to
them. Which looks impressive, but I'm not convinced is really
necessary given a single 'mv' command can shift the package directory
and all its contents over just fine from what I can tell. And if
95-100% of modules can be installed with just a simple mv, then let's
make that the default procedure for installing modules and squeeze DU
out of that part of the process too.
>--Replace current system where user must explicitly state what they
>want included with one where user need only state what they want
>That depends on how much control users want over the process. I believe
>both are equally valid, and should be supported (similar in spirit to
>the MANIFEST.in syntax 'include' and 'exclude')
Quick bit of background info so you know where I'm coming from...
I'm also big on the "There should be [preferably] only one way to do
it" philosophy (one of the things that attracts me to Python). This
is as much out of necessity as anything, mind: I'm absolutely awful
at absorbing and retaining technical information, especially compared
to 'real' programmers who seem to soak up knowledge like a sponge.
e.g. I admire Perl for its "hell, let's totally go for broke and put
in _everything_ we can possibly think of" approach and am glad
there's somebody out there doing it cos then other languages can look
at Perl to see what's worked and what hasn't and steal the best stuff
for themselves. But it's not a language I can really use; my brain
capacity is far too limited to accommodate more than a fraction of
Perl's vast featureset and rules, so I much prefer to stick to
tighter languages like Python where I can work at a decent clip
without having to look up some 1000-page reference book at every
Thus I tend to set the bar for feature inclusion pretty high;
probably much higher than most other programmers who can happily cope
with a bit of API flab without any problem. Don't take my
feature-flaying tendencies as a religious thing. It's more a matter
of simple survival: I can't keep up with y'all otherwise. ;)
The problem I see is that manifests seem to be involved whether you
need/want them or not. If [as I'm assuming] the majority of
distributions are trivial to assemble, then manifests should be the
exception, not the rule. I dunno how other folks work, but in my Home
folder I have a PythonDev folder containing folders for each of my
module projects - FooDev, BarDev, etc. Within each of these I have a
folder named Distro, which contains all the files and folders that'll
go into my distribution.
For me, manifests are nothing but a menace: this folder setup already
makes clear what I want put into the distribution, and I can't see
why I should have to explain it twice to the stupid machine. There's
been several occasions where an error or omission in a manifest file
has gone unnoticed until I've received an email from a user to say
that the package they downloaded is missing some parts
(embarrassing). Right now I manually unzip and check distributions
before uploading, but this is kinda crazy; I shouldn't have to worry
that DU might have screwed up a build, seeing as one of the reasons
for automating the process should be to avoid making such mistakes.
Thus my conclusion: explicit inclusion is inherently unsafe; a single
mistake or forgetting to update the manifest file to keep it in sync
with changes to the package can easily result in a broken
A much more sensible default is to include everything by default, and
leave it to the developer to exclude anything they don't want
included. The worst accident likely to occur here with any regularity
is that you forget to strip out a few .pyc files resulting in a
distribution that's a few KB bigger than it really needs to be. Plus
it adheres to the philosophy that the most common case should require
the least amount of work: in this case, the majority of modules won't
ever require a manifest file and can safely skip it.
[BTW, will check out the include/exclude feature which I wasn't
previously aware of. Though my argument would be that I shouldn't
need to know about such 'advanced' features just to produce a simple,
reliable distribution: the process should be as simple as falling off
a log to begin with.]
We can take this manifest issue quite a bit further, btw. Another big
frustration with the things is they're quite brain-dead. All I want
to say is "Package everything in Folder X for distribution except for
.pyc and .so files". Thus a more pragmatic approach might be to do
away with dumb manifest files completely, and leave the developer to
optionally supply a 'build.py' script that will be automatically
executed as part of the build process.
>>-- In particular, removing most DU involvment from build procedures
>>would allow developers to use their own development/build systems
>>much more easily.
>yes !! Though that's more easily said than done: a minimum of collaboration
>between the two is required, at least the adherence to some conventions.
Of course (see earlier comments). Just how many... no, _few_
conventions would be needed?
>>- Installation and compilation should be separate procedures.
>As a starting point, the whole 'build_ext' mechanism should be re-evaluated.
>The current 'Extension' mechanism is by far not abstract enough. Either
>the build_ext or the Extension class should be made polymorphic to wrap
>any external build system that could be used (make, scons, jam, ...)
Or invert and decouple the process to put the [e.g.] 'src/compile.py'
script in control.In this case, I think we could greatly simplify the
extension building process if DU can say to the 'src' folder: "Build
me some .so files", then stand back and let it get on with it (while
being happy to lend any support if/when it's asked for).
[i.e. It's a mental trick I often try when trying to resolve an API
design: seeing if I can switch from a complex 'push' process to a
simpler 'pull' process (or from a complex 'pull' process to a simpler
'push' process). It can make quite a difference.]
>>- What else may setup.py scripts do apart from install modules (2)
>>and build extensions (3)?
>* building documentation (that, too, is highly domain specific. From
> Latex over Docbook to doxygen...)
Yup. So let's say we have a standard 'docs' folder within a package
that may optionally contain a 'format.py' script that will be called
>* running unit tests
Have a standard 'tests' folder containing an optional 'test.py'
script. (Hey, think I see a pattern evolving here...)
>>- Remove metadata from setup.py and modules.
>I don't quite agree in general. What metadata are we talking about
>anyways ? There's metadata that is to be provided to the packager
>backends, i.e. a package description of some sort. Some of these
>can be generated automatically (such as MANIFEST.in -> MANIFEST,
>build / host platform, etc.), others have to be explicitely provided
>(maintainer address, package description).
I mean user-defined metadata (I'll assume those that generate
metadata automatically for their own consumption can be left to
handle that as best suits themselves):
1. A module may contain various bits of user-defined metadata, e.g.
__version__, __author__. This info is almost certainly recorded
elsewhere, so [afaik] shouldn't need to be duplicated here.
2. The setup.py script also contains module name, version, author,
etc... potentially quite a lot of metadata, in fact. All mooshed
together with code for building and installing packages. We should
move this data out of there into a separate, dedicated metadata file
that's included in each package.
>Having 'all metadata' lumped together brings us back to the 'swiss
>army knife' syndrome.
Well, swiss armyness is always a concern. If it's really a problem
here, we'd just need to have more than one metadata file. But I don't
think it'll come to that.
Also, one great advantage of pulling metadata out of module and
setup.py files is that it'll make it much easier for other clients to
access it. Right now it's kinda locked away: the only folks who know
how to access and use it are Python (module metadata) and DU
>- Improve version control. Junk current "operators" scheme (=,
><, >, >=, <=) as both unnecessarily complex and inadequate (i.e.
>stating module X requires module Y (>= 1.0) is useless in practice
>as it's impossible to predict _future_ compatibility). Metadata
>should support 'Backwards Compatibility' (optional) value indicating
>earliest version of the module that current version is
>backwards-compatible with. Dependencies list should declare name and
>version of each required package (specifically, the version used as
>package was developed and released).
>Good idea, though this issue highly depends on the packager backend used.
Could you cite some examples to help me understand the issues involved?
>- Make it easier to have multiple installed versions of a module.
>That, too, isn't really an DU2 issue, or is it ?
Not really; more a general packaging and Python import issue. But I
included it here as I think packaging issues have a big impact on DU
>>- Reject PEP 262 (installed packages database). Complex, fragile,
>>duplication of information, single point of failure reminiscent of
>>Windows Registry. Exploit the filesystem instead - any info a
>>separate db system would provide should already be available from
>>each module's metadata.
>I don't quite agree. I couldn't live without rpm these days.
Well, it's not to say that users can't build their own databases
listing all their installed gunk if they want to. Ensuring user
freedom in such areas is crucial. Perhaps it would be clearer to say
that the intention is sound (make information on installed modules
easy to retrieve; something I'm all for), but the way 262 proposes to
do it is not.
In fact, one of my main objections to 262 is that it could well
restrict user freedom: by creating lots of dependencies and
synchronisation issues, users could find themselves locked into using
a single 'official' Package Manager because it's the only one smart
enough to deal with all these complexities. Users who venture into
their site-packages folder by any other means will quickly find
themselves being punished the PackMan Police for unlawful infractions.
This should be one of the benefits that comes from decoupling module
metadata from implementation as I've suggested above. There'll be no
need for a central authority (262's DB) to maintain metadata, because
each module already contains and looks after its own. And because
there's only one metadata instance in existence for each module,
there's no dependency/synchronisation issues to worry about. You can
still provide users with exactly the same API that the 262 DB would
have done for accessing this info, of course, so you still get all
the functionality 262 would have provided, but without any of the
Funnily enough though, one of the possible DB implementations floated
for 262 is to put the metadata for each module into a separate file
on disk. So perhaps I should say that 262's idea of maintaining a
_separate_ database simply isn't necessary: all the info it would
have provided can already be retrieved from files in the filesystem;
the only difference is that each file is bundled in package. The
module/file system _is_ the database, if you like. (After all, what's
a filesystem but a big ol' object database by any other name?;)
p.s. If you're interested, you can see a module system I designed a
couple years back at applemods.sourceforge.net. It actually uses a
version of the "module = package = distribution with all batteries
included" concept I'm floating here. (Which I think was itself
influenced by Python's package system.)
p.p.s. Anything folk can do to help me understand the issues involved
in cross-platform and extension compilation lest I spout off too much
about things I know not will be much appreciated, ta. :)
More information about the Distutils-SIG