[Python-Dev] Distribution tools: What I would like to see
Talin
talin at acm.org
Sun Nov 26 21:24:03 CET 2006
I've been looking once again over the docs for distutils and setuptools,
and thinking to myself "this seems a lot more complicated than it ought
to be".
Before I get into detail, however, I want to explain carefully the scope
of my critique - in particular, why I am talking about setuptools on the
python-dev list. You see, in my mind, the process of assembling,
distributing, and downloading a package is, or at least ought to be, a
unified process. It ought to be a fundamental part of the system, and
not split into separate tools with separate docs that have to be
mentally assembled in order to understand it.
Moreover, setuptools is the defacto standard these days - a novice
programmer who googles for 'python install tools' will encounter
setuptools long before they learn about distutils; and if you read the
various mailing lists and blogs, you'll sense a subtle aura of
deprecation and decay that surrounds distutils.
I would claim, then, that regardless of whether setuptools is officially
blessed or not, it is an intrinstic part of the "Python experience".
(I'd also like to put forward the disclaimer that there are probably
factual errors in this post, or errors of misunderstanding; All I can
claim as an excuse is that it's not for lack of trying, and corrections
are welcome as always.)
Think about the idea of module distribution from a pedagogical
standpoint - when does a newbie Python programmer start learning about
module distribution and what do they learn first? A novice Python user
will begin by writing scripts for themselves, and not thinking about
distribution at all. However, once they reach the point where they begin
to think about packaging up their module, the Python documentation ought
to be able to lead them, step by step, towards a goal of making a
distributable package:
-- It should teach them how to organize their code into packages and
modules
-- It should show them how to write the proper setup scripts
-- If there is C code involved, it should explain how that fits into
the picture.
-- It should explain how to write unit tests and where they should go.
So how does the current system fail in this regard? The docs for each
component - distutils, setuptools, unit test frameworks, and so on, only
talk about that specific module - not how it all fits together.
For example, the docs for distutils start by telling you how to build a
setup script. It never explains why you need a setup script, or why
Python programs need to be "installed" in the first place. [1]
The distutils docs never describe how your directory structure ought to
look. In fact, they never tell you how to *write* a distributable
package; rather, it seems to be more oriented towards taking an
already-working package and modifying it to be distributable.
The setuptools docs are even worse in this regard. If you look carefully
at the docs for setuptools, you'll notice that each subsection is
effectively a 'diff', describing how setuputils is different from
distutils. One section talks about the "new and changed keywords",
without explaining what the old keywords were or how to find them.
Thus, for the novice programmer, learning how to write a setup script
ends up being a process of flipping back and forth between the distutils
and setuptools docs, trying to hold in their minds enough of each to be
able to achieve some sort of understanding.
What we have now does a good job of explaining how the individual tools
work, but it doesn't do a good job of answering the question "Starting
from an empty directory, how do I create a distributable Python
package?" A novice programmer wants to know what to create first, what
to create next, and so on.
This is especially true if the novice programmer is creating an
extension module. Suppose I have a C library that I need to wrap. In
order to even compile and test it, I'm going to need a setup script.
That means I need to understand distutils before I even think about
distribution, before I even begin writing the code!
(Sure, I could write a Makefile, but I'd only end up throwing it away
later -- so why not cut to the chase and *start* with a setup script?
Ans: Because it's too hard!)
But it isn't just the docs that are at fault here - otherwise, I'd be
posting this on a different mailing list. It seems like the whole
architecture is 'diff'-based, a series of patches on top of patches,
which are in need of some serious refactoring.
Except that nobody can do this refactoring, because there's no formal
list of requirements. I look at distutils, and while some parts are
obvious, there are other parts where I go "what problem were they trying
to solve here?" In my experience, you *don't* go mucking with someone's
code and trying to fix it unless you understand what problem they were
trying to solve - otherwise you'll botch it and make a mess. Since few
people ever bother to write down what problem they were trying to solve
(although they tend to be better at describing their clever solution),
usually this ends up being done through a process of reverse engineering
the requirements from the code, unless you are lucky enough to have
someone around who knows the history of the thing.
Admittedly, I'm somewhat in ignorance here. My perspective is that of an
'end-user developer', someone who uses these tools but does not write
them. I don't know the internals of these tools, nor do I particularly
want to - I've got bigger fish to fry.
I'm posting this here because what I'd like folks to think about is the
whole process of Python development, not just the documentation. What is
the smoothest path from empty directory to a finished package on PyPI?
What can be changed about the current standard libraries that will ease
this process?
[1] The answer, AFAICT, is that 'setup' is really a Makefile - in other
words, its a platform-independent way of describing how to construct a
compiled module from sources, and making it available to all programs on
that system. Although this gets confusing when we start talking about
"pure python" modules that have no C component - because we have all
this language that talks about compiling and installing and such, when
all that is really going on underneath is a plain old file copy.
-- Talin
More information about the Python-Dev
mailing list