Distribution tools: What I would like to see
I've been looking once again over the docs for distutils and setuptools, and thinking to myself "this seems a lot more complicated than it ought to be". Before I get into detail, however, I want to explain carefully the scope of my critique - in particular, why I am talking about setuptools on the python-dev list. You see, in my mind, the process of assembling, distributing, and downloading a package is, or at least ought to be, a unified process. It ought to be a fundamental part of the system, and not split into separate tools with separate docs that have to be mentally assembled in order to understand it. Moreover, setuptools is the defacto standard these days - a novice programmer who googles for 'python install tools' will encounter setuptools long before they learn about distutils; and if you read the various mailing lists and blogs, you'll sense a subtle aura of deprecation and decay that surrounds distutils. I would claim, then, that regardless of whether setuptools is officially blessed or not, it is an intrinstic part of the "Python experience". (I'd also like to put forward the disclaimer that there are probably factual errors in this post, or errors of misunderstanding; All I can claim as an excuse is that it's not for lack of trying, and corrections are welcome as always.) Think about the idea of module distribution from a pedagogical standpoint - when does a newbie Python programmer start learning about module distribution and what do they learn first? A novice Python user will begin by writing scripts for themselves, and not thinking about distribution at all. However, once they reach the point where they begin to think about packaging up their module, the Python documentation ought to be able to lead them, step by step, towards a goal of making a distributable package: -- It should teach them how to organize their code into packages and modules -- It should show them how to write the proper setup scripts -- If there is C code involved, it should explain how that fits into the picture. -- It should explain how to write unit tests and where they should go. So how does the current system fail in this regard? The docs for each component - distutils, setuptools, unit test frameworks, and so on, only talk about that specific module - not how it all fits together. For example, the docs for distutils start by telling you how to build a setup script. It never explains why you need a setup script, or why Python programs need to be "installed" in the first place. [1] The distutils docs never describe how your directory structure ought to look. In fact, they never tell you how to *write* a distributable package; rather, it seems to be more oriented towards taking an already-working package and modifying it to be distributable. The setuptools docs are even worse in this regard. If you look carefully at the docs for setuptools, you'll notice that each subsection is effectively a 'diff', describing how setuputils is different from distutils. One section talks about the "new and changed keywords", without explaining what the old keywords were or how to find them. Thus, for the novice programmer, learning how to write a setup script ends up being a process of flipping back and forth between the distutils and setuptools docs, trying to hold in their minds enough of each to be able to achieve some sort of understanding. What we have now does a good job of explaining how the individual tools work, but it doesn't do a good job of answering the question "Starting from an empty directory, how do I create a distributable Python package?" A novice programmer wants to know what to create first, what to create next, and so on. This is especially true if the novice programmer is creating an extension module. Suppose I have a C library that I need to wrap. In order to even compile and test it, I'm going to need a setup script. That means I need to understand distutils before I even think about distribution, before I even begin writing the code! (Sure, I could write a Makefile, but I'd only end up throwing it away later -- so why not cut to the chase and *start* with a setup script? Ans: Because it's too hard!) But it isn't just the docs that are at fault here - otherwise, I'd be posting this on a different mailing list. It seems like the whole architecture is 'diff'-based, a series of patches on top of patches, which are in need of some serious refactoring. Except that nobody can do this refactoring, because there's no formal list of requirements. I look at distutils, and while some parts are obvious, there are other parts where I go "what problem were they trying to solve here?" In my experience, you *don't* go mucking with someone's code and trying to fix it unless you understand what problem they were trying to solve - otherwise you'll botch it and make a mess. Since few people ever bother to write down what problem they were trying to solve (although they tend to be better at describing their clever solution), usually this ends up being done through a process of reverse engineering the requirements from the code, unless you are lucky enough to have someone around who knows the history of the thing. Admittedly, I'm somewhat in ignorance here. My perspective is that of an 'end-user developer', someone who uses these tools but does not write them. I don't know the internals of these tools, nor do I particularly want to - I've got bigger fish to fry. I'm posting this here because what I'd like folks to think about is the whole process of Python development, not just the documentation. What is the smoothest path from empty directory to a finished package on PyPI? What can be changed about the current standard libraries that will ease this process? [1] The answer, AFAICT, is that 'setup' is really a Makefile - in other words, its a platform-independent way of describing how to construct a compiled module from sources, and making it available to all programs on that system. Although this gets confusing when we start talking about "pure python" modules that have no C component - because we have all this language that talks about compiling and installing and such, when all that is really going on underneath is a plain old file copy. -- Talin
Talin wrote:
But it isn't just the docs that are at fault here - otherwise, I'd be posting this on a different mailing list. It seems like the whole architecture is 'diff'-based, a series of patches on top of patches, which are in need of some serious refactoring.
so to summarize, you want someone to rewrite the code and write new documentation, and since you didn't even have time to make your post shorter, that someone will obviously not be you ? </F>
Fredrik Lundh wrote:
Talin wrote:
But it isn't just the docs that are at fault here - otherwise, I'd be posting this on a different mailing list. It seems like the whole architecture is 'diff'-based, a series of patches on top of patches, which are in need of some serious refactoring.
so to summarize, you want someone to rewrite the code and write new documentation, and since you didn't even have time to make your post shorter, that someone will obviously not be you ?
Oh, it was a lot longer when I started :) As far as rewriting it goes - I can only rewrite things that I understand.
</F>
Talin schrieb:
As far as rewriting it goes - I can only rewrite things that I understand.
So if you want this to change, you obviously need to understand the entire distutils. It's possible to do that; some people have done it (the "understanding" part) - just go ahead and start reading source code. Regards, Martin
On 11/27/06, "Martin v. Löwis"
Talin schrieb:
As far as rewriting it goes - I can only rewrite things that I understand.
So if you want this to change, you obviously need to understand the entire distutils. It's possible to do that; some people have done it (the "understanding" part) - just go ahead and start reading source code.
You (and Fredrik) are being a little harsh on Talin. I understand the
need to encourage people to fix things themselves rather than just
complaining about stuff they don't like. But people don't have an
unlimited amount of time and expertise to work on several Python
projects simultaneously. Nevertheless, they should be able to offer
an "It would be good if..." suggestion without being stomped on. The
suggestion itself can be a contribution if it focuses people's
attention on a problem and a potential solution. Just because
somebody can't learn a big subsystem and write code or docs for it *at
this moment* doesn't mean they never will. And even if they don't,
it's possible to make contributions in one area of Python and
suggestions in another... or does the karma account not work that way?
I don't see Talin saying, "You should fix this for me." He's saying,
"I'd like this improved and I'm working on it, but it's a big job and
I need help, ideally from someone with more expertise in distutils."
Ultimately for Python the question isn't, "Does Talin want this done?"
but, "Does this dovetail with the direction Python generally wants to
go?" From what I've seen of setuptools/distutils evolution, yes, it's
consistent with what many people want for Python. So instead of
saying, "You (Talin) should take on this task alone because you want
it" as if nobody else did, it would be better to say, "Thank you,
Talin, for moving this important Python issue along."
I've privately offered Talin some (unfinished) material I've been
working on anyway that relates to his vision. When I get some other
projects cleared away I'd like to put together that TOC of links I
mentioned and perhaps collaborate on a Guide with whoever wants to.
But I also need to learn more about setuptools before I can do that.
As it happens I need the information anyway because I'm about to
package an egg....
--
Mike Orr
Mike Orr wrote:
On 11/27/06, "Martin v. Löwis"
wrote: Talin schrieb:
As far as rewriting it goes - I can only rewrite things that I understand. So if you want this to change, you obviously need to understand the entire distutils. It's possible to do that; some people have done it (the "understanding" part) - just go ahead and start reading source code.
You (and Fredrik) are being a little harsh on Talin. I understand the need to encourage people to fix things themselves rather than just complaining about stuff they don't like. But people don't have an unlimited amount of time and expertise to work on several Python projects simultaneously. Nevertheless, they should be able to offer an "It would be good if..." suggestion without being stomped on. The suggestion itself can be a contribution if it focuses people's attention on a problem and a potential solution. Just because somebody can't learn a big subsystem and write code or docs for it *at this moment* doesn't mean they never will. And even if they don't, it's possible to make contributions in one area of Python and suggestions in another... or does the karma account not work that way?
I don't see Talin saying, "You should fix this for me." He's saying, "I'd like this improved and I'm working on it, but it's a big job and I need help, ideally from someone with more expertise in distutils." Ultimately for Python the question isn't, "Does Talin want this done?" but, "Does this dovetail with the direction Python generally wants to go?" From what I've seen of setuptools/distutils evolution, yes, it's consistent with what many people want for Python. So instead of saying, "You (Talin) should take on this task alone because you want it" as if nobody else did, it would be better to say, "Thank you, Talin, for moving this important Python issue along."
I've privately offered Talin some (unfinished) material I've been working on anyway that relates to his vision. When I get some other projects cleared away I'd like to put together that TOC of links I mentioned and perhaps collaborate on a Guide with whoever wants to. But I also need to learn more about setuptools before I can do that. As it happens I need the information anyway because I'm about to package an egg....
What you are saying is basically correct, although I have a slightly different spin on it. I've written a lot of documentation over the years, and I know that one of the hardest parts of writing documentation is trying to identify your own assumptions. To someone who already knows how the system works, its hard to understand the mindset of someone who is just learning it. You tend to unconsciously assume knowledge of certain things which a new user might not know. To that extent, it can be useful sometimes to have someone who is in the process of learning how to use the system, and who is willing to carefully analyze and write down their own experiences while doing so. Most of the time people are too busy to do this - they want to get their immediate problem solved, and they aren't interested in how difficult it will be for the next person. This is especially true in cases where the problem that is holding them up is three levels down from the level where their real goal is - they want to be able to "pop the stack" of problems as quickly as possible, so that they can get back to solving their *real* problem. So what I am offering, in this case, is my ignorance -- but a carefully described ignorance :) I don't demand that anyone do anything - I'm merely pointing out some things that people may or may not care about. Now, in this particular case, I have actually used distutils before. But distutils is one of those systems (like Perl) which tends to leak out of your brain if you don't use it regularly - that is, if you only use it once every 6 months, at the end of 6 months you have forgotten most of what you have learned, and you have to start the learning curve all over again. And I am in the middle of that re-learning process right now. What I am doing right now is creating a new extension project using setuputils, and keeping notes on what I do. So for example, I start by creating the directory structure: mkdir myproject cd myproject mkdir src mkdir test Next, create a minimal setup.py script. I won't include that here, but it's in the notes. Next, create the myproject.c file for the module in src/, and write the 'init' function for the module. (again, content omitted but it's in my notes). Create a projectname_unittest.py file in test. Add both of these to the setup.py file. At this point, you ought to be able to a "python setup.py test" and have it succeed. At this point, you can start adding types and methods, with a unit test for each one, testing each one as it is added. Now, I realize that all of this is "baby steps" to you folks, but it took me a day or so to figure out. And its interesting that even these few steps cut across a number of tools and libraries - setuptools, distutils, unittest, the "extending Python" doc and the "Python C API" doc. (BTW, I realized another thing that would be really handy is if the "extending Python" doc contained hyperlink references to the "Python C API" doc, so that when it talks about, say, PyArg_ParseTuple, you could go straight to the reference doc for it.) -- Talin
Talin schrieb:
To that extent, it can be useful sometimes to have someone who is in the process of learning how to use the system, and who is willing to carefully analyze and write down their own experiences while doing so.
I readily agree that the documentation can be improved, and applaud efforts to do so. And I have no doubts that distutils is difficult to learn for a beginner. In Talin's remarks, there was also the suggestion that distutils is "in need of some serious refactoring". It is such remarks that get me started: it seems useless to me to make such a statement if they are not accompanied with concrete proposals what specifically to change. It also gets me upset because it suggests that all prior contributors weren't serious. Regards, Martin
Martin v. Löwis wrote:
Talin schrieb:
To that extent, it can be useful sometimes to have someone who is in the process of learning how to use the system, and who is willing to carefully analyze and write down their own experiences while doing so.
I readily agree that the documentation can be improved, and applaud efforts to do so. And I have no doubts that distutils is difficult to learn for a beginner.
In Talin's remarks, there was also the suggestion that distutils is "in need of some serious refactoring". It is such remarks that get me started: it seems useless to me to make such a statement if they are not accompanied with concrete proposals what specifically to change. It also gets me upset because it suggests that all prior contributors weren't serious.
I'm sorry if I implied that distutils was 'misdesigned', that wasn't what I meant. Refactoring is usually desirable when a body of code has accumulated a lot of additional baggage as a result of maintenance and feature additions, accompanied by the observation that if the baggage had been present when the system was originally created, the design of the system would have been substantially different. Refactoring is merely an attempt to discover what that original design might have been, if the requirements had been known at the time. What I was reacting to, I think, is that it seemed like in some ways the 'diffness' of setuptools wasn't just in the documentation, but in the code itself, and if both setuptools and distutils had been co-developed, then distutils might have been someone different as a result. Also, I admit that some of this is hearsay, so maybe I should just back off on this one.
Regards, Martin
Talin wrote:
What I am doing right now is creating a new extension project using setuputils, and keeping notes on what I do. So for example, I start by creating the directory structure:
mkdir myproject cd myproject mkdir src mkdir test
I'd forgotten about this until I was reminded in the python-dev summary (dang those summaries are useful.) Anyway, I've put my notes on the Wiki; you can find them here at: http://wiki.python.org/moin/ExtensionTutorial This is an extremely minimalist guide for people who want to write an extension module, starting from nothing but a bare interpreter prompt. If I made any mistakes, well - it's a wiki, you know what to do :) -- Talin
On 11/26/06, Talin
I've been looking once again over the docs for distutils and setuptools, and thinking to myself "this seems a lot more complicated than it ought to be".
Before I get into detail, however, I want to explain carefully the scope of my critique - in particular, why I am talking about setuptools on the python-dev list. You see, in my mind, the process of assembling, distributing, and downloading a package is, or at least ought to be, a unified process. It ought to be a fundamental part of the system, and not split into separate tools with separate docs that have to be mentally assembled in order to understand it.
Moreover, setuptools is the defacto standard these days - a novice programmer who googles for 'python install tools' will encounter setuptools long before they learn about distutils; and if you read the various mailing lists and blogs, you'll sense a subtle aura of deprecation and decay that surrounds distutils.
From the manual one could then distill a spec for "what's needed in a
Look at the current situation as more of an evoluntionary point than a
finished product. There's widespread support for integrating
setuptools into Python as you suggest. I've heard it discussed at
Pycon the past two years. The reason it hasn't been done yet is
technical, from what I've heard. Distutils is apparently difficult to
patch correctly and could stand a rewrite.
I'm currently studying the Pylons implementation and thus having to
learn more about entry points, resources, ini files used by eggs, etc.
This requires studying three different pages on the
peak.telecommunity.com site -- exactly the problem you're describing.
A comprehensive third-party manual that integrates the documentation
would be a good place to start. Even the outline of such a manual
would be a good. That would give a common baseline of understanding
for package users, package developers, and core developers. I wonder
if one of the Python books already has this written down somewhere.
package manager, what features a distutils upgrade would provide, and
what a package should/may contain". That would be a basis for one or
more PEPs.
The "diff" approach is understandable at the beginning, because that's
how the developers think of it, and how most users will approach it
initially. We also needed real-world experience to see if the
setuptools approach was even feasable large-scale or whether it needed
major changes. Now we have more experience, and more Pythoneers are
appearing who are unfamiliar with the "distutils-only" approach. So
requests like Talin's will become more frequent.
It's such a big job and Python 2.6 is slated as "minimal features"
release, so it may be better to target this for Python 3 and backport
it if possible.
--
Mike Orr
participants (4)
-
"Martin v. Löwis"
-
Fredrik Lundh
-
Mike Orr
-
Talin