On Sun, 16 Aug 2009 14:16:29 -0600
Zooko Wilcox-O'Hearn
Folks:
I have a recurring problem with distutils/setuptools/Distribute, which is that I don't know how to extend the functionality of setup.py and make the new functionality be testable and modular. ... Here's one specific example that I currently have a lot of experience with: I'd like to generate a version number from revision control history. ...
The third is what I do in Tahoe-LAFS [6]. I moved the functionality in question into a separate Python package, in this case named "darcsver" [7], and used setuptools's plugin system to add a command named "darcsver" which initializes the distribution.metadata.version attribute correctly. Then I had to add a bunch of aliases to my setup.cfg [8] saying "If you're going to build, first darcsver, and if you're going to install, first darcsver, and ...". This sort of works, except that yesterday my programming partner Brian Warner informed me [9] that he expected the "python ./setup.py --version" command-line to output the version number. Argh! There is no way to configure in my setup.cfg "If you're going to --version, first darcsver.".
I think I have a plan for addressing this in the specific case of Tahoe. The following may be irrelevant to the extending-setuptools discussion, but it also might be informative. The job of "setup.py darcsver" (as used by Tahoe) is to perform a relatively disk-intensive operation to compute the current source tree's version string and then record it in two places: src/allmydata/_version.py : for use by tahoe itself, so it knows it own version at runtime, i.e. so that running a "tahoe --version" command works distribution.metadata.version : for use by setuptools, so commands like "setup.py sdist" can name the tarball correctly "distribution.metadata.version" is usually set by passing a version= argument to the main setup() call inside setup.py . Packages can use whatever mechanism they like to decide what to pass to version= . Tahoe currently never passes a version= to setup(). There are two basic scenarios that Tahoe's what-version-should-I-use code must contend with: 1: running in a darcs checkout 2: running from a source tarball, without _darcs/ metadata We always use darcsver to populate src/allmydata/_version.py before generating a tarball, so the #2 scenario should always find a _version.py file with the pre-computed version string. darcsver has code to handle the situation where it cannot use darcs to compute a version string (either "darcs" is unrunnable or _darcs/ is missing), and *also* has an existing _version.py file. In this case, it reads _version.py and greps the version string out of it, then saves the result in distribution.metadata.version . This is the bit that's relevant to Zooko's main question: it doesn't attempt to "import allmydata._version", but instead of does a out-of-band open/read/grep. In setup.py and other build-time code, I'm generally opposed to using "import XYZ; XYZ.__version__" to determining somebody's version string, specifically because you can't unimport anything. Instead, I prefer to grep a version out of a file with a well-known-format (we generate _version.py so we control its format, making it safe to grep), or to run a subprocess which does "import XYZ; print XYZ.__version__" (which of course still assumes the __version__ convention). Anyways, the problem with "setup.py --version" not working is because this read-_version.py-and-grep code only gets run when you invoke the "darcsver" command, and "setup.py --version" doesn't run any commands. So my proposal for Tahoe is: 1: setup.py should always start by attempting to read src/allmydata/_version.py and grep the version out of it. If this works, pass a version= argument into the setup() call, which will populate distribution.metadata.version and make everything work (including "setup.py --version", "setup.py --fullname", and "setup.py sdist") 2: if the "darcsver" command is run, that will possibly regenerate _version.py and reset distribution.metadata.version to a new value In general, you only run "darcsver" when you want to rebuild _version.py . At all other times we should instead just use the most recently cached value. Zooko modified the tahoe setup.cfg file to forcibly invoke "darcsver" before most major commands for two reasons: 1: to get a version string at all for commands like sdist (since without the always-read-_version.py approach proposed above, this was the only way to set distribution.metadata.version) 2: to protect the user against confusion if they run darcsver, then pull a new patch or two (invalidating the version string), then run sdist: the tarball would be generated with the wrong version string. Basically setup.py has no way to magically tell that the source tree has been changed, so re-running darcsver all the time is the only way to make sure the version string is up-to-date. My personal preference would be to leave tahoe.cfg empty and instruct people to run "darcsver" before doing anything else, or to make setup.py test for the existence of _version.py and invoke darcsver if it is missing, because darcsver is rather disk-intensive and takes up to 10 seconds to run on my slow (FileVault encrypted) OS-X partition. But I appreciate Zooko's concern#2, and have been personally bitten by the out-of-date-version-string problem in the past. So tolerating slower setup.py commands is a reasonable tradeoff to make. It may still be an interesting-to-setuptools issue of how to best modularize this proposed "read _version.py and run darcsver if necessary" functionality. It seems to me that the setuptools plugin mechanism is a good way to provide new commands (like "darcsver"), but not a good way to persistently modify existing behavior, and that the latter will always require not-very-modular customizations to a package's setup.py . cheers, -Brian