[Python-Dev] Distribution tools: What I would like to see

Talin talin at acm.org
Sun Nov 26 21:24:03 CET 2006

I've been looking once again over the docs for distutils and setuptools, 
and thinking to myself "this seems a lot more complicated than it ought 
to be".

Before I get into detail, however, I want to explain carefully the scope 
of my critique - in particular, why I am talking about setuptools on the 
python-dev list. You see, in my mind, the process of assembling, 
distributing, and downloading a package is, or at least ought to be, a 
unified process. It ought to be a fundamental part of the system, and 
not split into separate tools with separate docs that have to be 
mentally assembled in order to understand it.

Moreover, setuptools is the defacto standard these days - a novice 
programmer who googles for 'python install tools' will encounter 
setuptools long before they learn about distutils; and if you read the 
various mailing lists and blogs, you'll sense a subtle aura of 
deprecation and decay that surrounds distutils.

I would claim, then, that regardless of whether setuptools is officially 
blessed or not, it is an intrinstic part of the "Python experience".

(I'd also like to put forward the disclaimer that there are probably 
factual errors in this post, or errors of misunderstanding; All I can 
claim as an excuse is that it's not for lack of trying, and corrections 
are welcome as always.)

Think about the idea of module distribution from a pedagogical 
standpoint - when does a newbie Python programmer start learning about 
module distribution and what do they learn first? A novice Python user 
will begin by writing scripts for themselves, and not thinking about 
distribution at all. However, once they reach the point where they begin 
to think about packaging up their module, the Python documentation ought 
to be able to lead them, step by step, towards a goal of making a 
distributable package:

  -- It should teach them how to organize their code into packages and 
  -- It should show them how to write the proper setup scripts
  -- If there is C code involved, it should explain how that fits into 
the picture.
  -- It should explain how to write unit tests and where they should go.

So how does the current system fail in this regard? The docs for each 
component - distutils, setuptools, unit test frameworks, and so on, only 
talk about that specific module - not how it all fits together.

For example, the docs for distutils start by telling you how to build a 
setup script. It never explains why you need a setup script, or why 
Python programs need to be "installed" in the first place. [1]

The distutils docs never describe how your directory structure ought to 
look. In fact, they never tell you how to *write* a distributable 
package; rather, it seems to be more oriented towards taking an 
already-working package and modifying it to be distributable.

The setuptools docs are even worse in this regard. If you look carefully 
at the docs for setuptools, you'll notice that each subsection is 
effectively a 'diff', describing how setuputils is different from 
distutils. One section talks about the "new and changed keywords", 
without explaining what the old keywords were or how to find them.

Thus, for the novice programmer, learning how to write a setup script 
ends up being a process of flipping back and forth between the distutils 
and setuptools docs, trying to hold in their minds enough of each to be 
able to achieve some sort of understanding.

What we have now does a good job of explaining how the individual tools 
work, but it doesn't do a good job of answering the question "Starting 
from an empty directory, how do I create a distributable Python 
package?" A novice programmer wants to know what to create first, what 
to create next, and so on.

This is especially true if the novice programmer is creating an 
extension module. Suppose I have a C library that I need to wrap. In 
order to even compile and test it, I'm going to need a setup script. 
That means I need to understand distutils before I even think about 
distribution, before I even begin writing the code!

(Sure, I could write a Makefile, but I'd only end up throwing it away 
later -- so why not cut to the chase and *start* with a setup script? 
Ans: Because it's too hard!)

But it isn't just the docs that are at fault here - otherwise, I'd be 
posting this on a different mailing list. It seems like the whole 
architecture is 'diff'-based, a series of patches on top of patches, 
which are in need of some serious refactoring.

Except that nobody can do this refactoring, because there's no formal 
list of requirements. I look at distutils, and while some parts are 
obvious, there are other parts where I go "what problem were they trying 
to solve here?" In my experience, you *don't* go mucking with someone's 
code and trying to fix it unless you understand what problem they were 
trying to solve - otherwise you'll botch it and make a mess. Since few 
people ever bother to write down what problem they were trying to solve 
(although they tend to be better at describing their clever solution), 
usually this ends up being done through a process of reverse engineering 
the requirements from the code, unless you are lucky enough to have 
someone around who knows the history of the thing.

Admittedly, I'm somewhat in ignorance here. My perspective is that of an 
'end-user developer', someone who uses these tools but does not write 
them. I don't know the internals of these tools, nor do I particularly 
want to - I've got bigger fish to fry.

I'm posting this here because what I'd like folks to think about is the 
whole process of Python development, not just the documentation. What is 
the smoothest path from empty directory to a finished package on PyPI? 
What can be changed about the current standard libraries that will ease 
this process?

[1] The answer, AFAICT, is that 'setup' is really a Makefile - in other 
words, its a platform-independent way of describing how to  construct a 
compiled module from sources, and making it available to all programs on 
that system. Although this gets confusing when we start talking about 
"pure python" modules that have no C component - because we have all 
this language that talks about compiling and installing and such, when 
all that is really going on underneath is a plain old file copy.

-- Talin

More information about the Python-Dev mailing list