how do I add some functionality to my setup.py which is testable and modular?
Folks: I have a recurring problem with distutils/setuptools/Distribute, which is that I don't know how to extend the functionality of setup.py and make the new functionality be testable and modular. Here's one specific example that I currently have a lot of experience with: I'd like to generate a version number from revision control history. Don't get hung up on the desired functionality though -- if you think that generating version numbers is best done a different way, or if you don't care about generating version numbers, then please just mentally insert some other extension of your build functionality that you do care about. I'm familiar with three different ways to implement an extension like this. The first is what Nevow does [1], which is to write in setup.py "from nevow import __version__". Well, this works in the case that setup.py is being executed as the command-line script in a fresh, empty Python interpreter, but it fails in the case that the Python interpreter has already been running for a while, has already imported nevow (a *different* nevow from a different location on the local filesystem), and is now importing *this* nevow's setup module in order to build this nevow. This happens with setuptools and py2exe. Distribute v0.6 includes a patch [2] to fix this, but I'm not sure the patch is right (it involves 'for m in various_modules: del sys.modules[m]' and it doesn't seem to fix all cases). PJE says "Thou shalt not import yourself when trying to build yourself" [3]. Glyph says "Then how do I test this code?" [4]. I say "Ugh, I don't know. Let's look at the other two ways to do it." The second is what I do in some of my packages such as pycryptopp [5]. Just take that functionality that you want to add to all of your packages and cut and paste the code into each of your setup.py's, then edit it a little to reflect the correct name of the current package. This sucks because you're cutting and pasting code, because your setup.py gets bigger and hairier the more functionality your build system has, and because, again, you can't test it. The third is what I do in Tahoe-LAFS [6]. I moved the functionality in question into a separate Python package, in this case named "darcsver" [7], and used setuptools's plugin system to add a command named "darcsver" which initializes the distribution.metadata.version attribute correctly. Then I had to add a bunch of aliases to my setup.cfg [8] saying "If you're going to build, first darcsver, and if you're going to install, first darcsver, and ...". This sort of works, except that yesterday my programming partner Brian Warner informed me [9] that he expected the "python ./setup.py --version" command-line to output the version number. Argh! There is no way to configure in my setup.cfg "If you're going to --version, first darcsver.". So it appears to me that none of these techniques are both modular/ testable and compatible with distutils/setuptools/Distribute. What are we to do? Regards, Zooko [1] http://www.divmod.org/trac/browser/trunk/Nevow/setup.py?rev=17531 [2] http://bugs.python.org/setuptools/issue20 [3] http://bugs.python.org/setuptools/msg139 [4] http://www.divmod.org/trac/ticket/2699#comment:20 [5] http://allmydata.org/trac/pycryptopp/browser/setup.py?rev=661#L181 [6] http://allmydata.org/trac/tahoe/browser/setup.py?rev=4036#L97 [7] http://allmydata.org/trac/darcsver [8] http://allmydata.org/trac/tahoe/browser/setup.cfg?rev=3996#L46 [9] http://allmydata.org/trac/darcsver/ticket/6
At 02:16 PM 8/16/2009 -0600, Zooko Wilcox-O'Hearn wrote:
So it appears to me that none of these techniques are both modular/ testable and compatible with distutils/setuptools/Distribute. What are we to do?
We could be modular if there was a way to specify pre-setup.py dependencies. Unfortunately, there isn't such a thing at the moment, short of calling setup() twice, and forcing the first call to have no script arguments, just a setup_requires argument. Of course, that'd only work if setuptools were present, and it would also force an immediate download of the build dependencies in question. Something like: try: from setuptools import Distribution except ImportError: pass else: Distribution(dict(setup_requires=[...])) If you want to get fancy, you could replace the "pass" with printing some user-readable instructions after attempting to see if your build-time dependencies are already present.
On Sun, Aug 16, 2009 at 8:17 PM, P.J. Eby
At 02:16 PM 8/16/2009 -0600, Zooko Wilcox-O'Hearn wrote:
So it appears to me that none of these techniques are both modular/ testable and compatible with distutils/setuptools/Distribute. What are we to do?
We could be modular if there was a way to specify pre-setup.py dependencies.
For a lot — although admittedly, not all — of the code in question, the only dependency is on some code that lives in the package itself which explicitly avoids depending on anything besides distutils. That's not to say pre-setup.py dependencies wouldn't be useful, but if we could formalize making that case work (as it would if we could depend on the simplistic environment that a distutils-only 'setup.py install' has), it would go a long way towards fixing the larger problem. Unfortunately, there isn't such a thing at the moment, short of calling
setup() twice, and forcing the first call to have no script arguments, just a setup_requires argument.
So, "modular" is a slippery word. Let me try to be a little more specific about what I personally want; Zooko can elaborate, and I'm flexible on some of it, but best to start with an ideal. I have a development environment where sys.path is set up to point at the source code for a set of working branches. For the purposes of this discussion let's say I've got "Nevow", which contains "nevow/__init__.py", "Twisted", which contains "twisted/__init__.py", and "Tahoe", which... well, actually it contains "src/allmydata/__init__.py" but happily my setup can deal with that. My sys.path has ["Twisted", "Nevow", "Tahoe/src"] on the end of it. My $PATH (or %PATH%, as the case may be) has Twisted/bin, Nevow/bin, Tahoe/bin. I hope this convention is clear. Now, here's the important point. I want to run 'trial twisted', which is to say, "~/.../Twisted/bin/trial twisted", and have it load the code from my pre-existing "Twisted/" sys.path entry. I want to load and examine the distribution metadata, which in the current context means running most of what usually goes in setup.py. I also want to be able to run *parts* of the distribution process, to unit-test them, without actually invoking the entire thing. There are lots of reasons to want this: 1. It's much faster to skip installation, especially if you're rapidly iterating over changes to a small piece of the distribution setup process 2. It encourages splitting the distribution process up into smaller pieces ("modularizing" it) so that it can be re-used by other parts of the same project. 3. It allows for independent testing of those same pieces so that when they are re-used, there is some existing expectation that they will behave as expected that isn't specific to installation of a particular package. 4. By including it in the package, you allow dependencies of that package to use the packaging functionality as well, so that custom distribution stuff is done consistently across all parts of an ecosystem. As some of Zooko's links suggest, the way I would *prefer* to do that is for the distribution metadata to live in a module in 'twisted/', which can be imported by setup.py as a normal python module, and to have setup.py itself look like from distutils import setup from twisted.python.distribution import metadata setup(**metadata) or even better: from twisted.python.distribution import autosetup autosetup() The buildbot, as it happens, has a similar setup. There are specific buildslaves that do a full system installation rather than just an 'svn up' before running the tests, to do whole-system integration testing for the installation procedure, but that process is much slower and more disk-intensive, it increases wear and tear on the testing machines, and it takes longer to provide feedback to developers who are sitting idle, so we don't want to have it set up that way everywhere. Of course, that'd only work if setuptools were present, and it would also
force an immediate download of the build dependencies in question. Something like:
try: from setuptools import Distribution except ImportError: pass else: Distribution(dict(setup_requires=[...]))
What goes in the "..." is pretty important. For one thing, I don't quite understand the implications of this approach. For another, I really don't want to depend on setuptools, because we certainly need to keep supporting non-setuptools environments. If you want to get fancy, you could replace the "pass" with printing some
user-readable instructions after attempting to see if your build-time dependencies are already present.
This strikes me as very non-modular. If such a message is interesting or important, presumably it needs to be localized, displayed by installers, etc, and therefore belongs in a module somewhere. Even if that module needs to be bundled along with your application in order to make it work :). Thanks for reading :).
On Sun, 16 Aug 2009 14:16:29 -0600
Zooko Wilcox-O'Hearn
Folks:
I have a recurring problem with distutils/setuptools/Distribute, which is that I don't know how to extend the functionality of setup.py and make the new functionality be testable and modular. ... Here's one specific example that I currently have a lot of experience with: I'd like to generate a version number from revision control history. ...
The third is what I do in Tahoe-LAFS [6]. I moved the functionality in question into a separate Python package, in this case named "darcsver" [7], and used setuptools's plugin system to add a command named "darcsver" which initializes the distribution.metadata.version attribute correctly. Then I had to add a bunch of aliases to my setup.cfg [8] saying "If you're going to build, first darcsver, and if you're going to install, first darcsver, and ...". This sort of works, except that yesterday my programming partner Brian Warner informed me [9] that he expected the "python ./setup.py --version" command-line to output the version number. Argh! There is no way to configure in my setup.cfg "If you're going to --version, first darcsver.".
I think I have a plan for addressing this in the specific case of Tahoe. The following may be irrelevant to the extending-setuptools discussion, but it also might be informative. The job of "setup.py darcsver" (as used by Tahoe) is to perform a relatively disk-intensive operation to compute the current source tree's version string and then record it in two places: src/allmydata/_version.py : for use by tahoe itself, so it knows it own version at runtime, i.e. so that running a "tahoe --version" command works distribution.metadata.version : for use by setuptools, so commands like "setup.py sdist" can name the tarball correctly "distribution.metadata.version" is usually set by passing a version= argument to the main setup() call inside setup.py . Packages can use whatever mechanism they like to decide what to pass to version= . Tahoe currently never passes a version= to setup(). There are two basic scenarios that Tahoe's what-version-should-I-use code must contend with: 1: running in a darcs checkout 2: running from a source tarball, without _darcs/ metadata We always use darcsver to populate src/allmydata/_version.py before generating a tarball, so the #2 scenario should always find a _version.py file with the pre-computed version string. darcsver has code to handle the situation where it cannot use darcs to compute a version string (either "darcs" is unrunnable or _darcs/ is missing), and *also* has an existing _version.py file. In this case, it reads _version.py and greps the version string out of it, then saves the result in distribution.metadata.version . This is the bit that's relevant to Zooko's main question: it doesn't attempt to "import allmydata._version", but instead of does a out-of-band open/read/grep. In setup.py and other build-time code, I'm generally opposed to using "import XYZ; XYZ.__version__" to determining somebody's version string, specifically because you can't unimport anything. Instead, I prefer to grep a version out of a file with a well-known-format (we generate _version.py so we control its format, making it safe to grep), or to run a subprocess which does "import XYZ; print XYZ.__version__" (which of course still assumes the __version__ convention). Anyways, the problem with "setup.py --version" not working is because this read-_version.py-and-grep code only gets run when you invoke the "darcsver" command, and "setup.py --version" doesn't run any commands. So my proposal for Tahoe is: 1: setup.py should always start by attempting to read src/allmydata/_version.py and grep the version out of it. If this works, pass a version= argument into the setup() call, which will populate distribution.metadata.version and make everything work (including "setup.py --version", "setup.py --fullname", and "setup.py sdist") 2: if the "darcsver" command is run, that will possibly regenerate _version.py and reset distribution.metadata.version to a new value In general, you only run "darcsver" when you want to rebuild _version.py . At all other times we should instead just use the most recently cached value. Zooko modified the tahoe setup.cfg file to forcibly invoke "darcsver" before most major commands for two reasons: 1: to get a version string at all for commands like sdist (since without the always-read-_version.py approach proposed above, this was the only way to set distribution.metadata.version) 2: to protect the user against confusion if they run darcsver, then pull a new patch or two (invalidating the version string), then run sdist: the tarball would be generated with the wrong version string. Basically setup.py has no way to magically tell that the source tree has been changed, so re-running darcsver all the time is the only way to make sure the version string is up-to-date. My personal preference would be to leave tahoe.cfg empty and instruct people to run "darcsver" before doing anything else, or to make setup.py test for the existence of _version.py and invoke darcsver if it is missing, because darcsver is rather disk-intensive and takes up to 10 seconds to run on my slow (FileVault encrypted) OS-X partition. But I appreciate Zooko's concern#2, and have been personally bitten by the out-of-date-version-string problem in the past. So tolerating slower setup.py commands is a reasonable tradeoff to make. It may still be an interesting-to-setuptools issue of how to best modularize this proposed "read _version.py and run darcsver if necessary" functionality. It seems to me that the setuptools plugin mechanism is a good way to provide new commands (like "darcsver"), but not a good way to persistently modify existing behavior, and that the latter will always require not-very-modular customizations to a package's setup.py . cheers, -Brian
Dear Tarek Ziadé and fellowship of the packaging: Please pay attention to this issue. If we can extend our build processes in a modular, re-usable, and testable way then this could contribute to the success of distutils/setuptools/Distribute. In particular, I want Nevow to be automatically installable as a dependency (using, for now, setuptools, but presumably the same will apply to Distribute), and one of the issues is still outstanding and Glyph refuses to accept my patch which fixes it because the fix isn't testable. http://divmod.org/trac/ticket/2699 I want Distribute-compatible build extensions to be accepted by Glyph. This requires that they be testable. Please help me understand how to write build extensions which are testable. Regards, Zooko On Sunday,2009-08-16, at 14:16 , Zooko Wilcox-O'Hearn wrote:
Folks:
I have a recurring problem with distutils/setuptools/Distribute, which is that I don't know how to extend the functionality of setup.py and make the new functionality be testable and modular.
Here's one specific example that I currently have a lot of experience with: I'd like to generate a version number from revision control history. Don't get hung up on the desired functionality though -- if you think that generating version numbers is best done a different way, or if you don't care about generating version numbers, then please just mentally insert some other extension of your build functionality that you do care about.
I'm familiar with three different ways to implement an extension like this.
The first is what Nevow does [1], which is to write in setup.py "from nevow import __version__". Well, this works in the case that setup.py is being executed as the command-line script in a fresh, empty Python interpreter, but it fails in the case that the Python interpreter has already been running for a while, has already imported nevow (a *different* nevow from a different location on the local filesystem), and is now importing *this* nevow's setup module in order to build this nevow. This happens with setuptools and py2exe. Distribute v0.6 includes a patch [2] to fix this, but I'm not sure the patch is right (it involves 'for m in various_modules: del sys.modules[m]' and it doesn't seem to fix all cases). PJE says "Thou shalt not import yourself when trying to build yourself" [3]. Glyph says "Then how do I test this code?" [4]. I say "Ugh, I don't know. Let's look at the other two ways to do it."
The second is what I do in some of my packages such as pycryptopp [5]. Just take that functionality that you want to add to all of your packages and cut and paste the code into each of your setup.py's, then edit it a little to reflect the correct name of the current package. This sucks because you're cutting and pasting code, because your setup.py gets bigger and hairier the more functionality your build system has, and because, again, you can't test it.
The third is what I do in Tahoe-LAFS [6]. I moved the functionality in question into a separate Python package, in this case named "darcsver" [7], and used setuptools's plugin system to add a command named "darcsver" which initializes the distribution.metadata.version attribute correctly. Then I had to add a bunch of aliases to my setup.cfg [8] saying "If you're going to build, first darcsver, and if you're going to install, first darcsver, and ...". This sort of works, except that yesterday my programming partner Brian Warner informed me [9] that he expected the "python ./setup.py --version" command-line to output the version number. Argh! There is no way to configure in my setup.cfg "If you're going to --version, first darcsver.".
So it appears to me that none of these techniques are both modular/ testable and compatible with distutils/setuptools/Distribute. What are we to do?
Regards,
Zooko
[1] http://www.divmod.org/trac/browser/trunk/Nevow/setup.py?rev=17531 [2] http://bugs.python.org/setuptools/issue20 [3] http://bugs.python.org/setuptools/msg139 [4] http://www.divmod.org/trac/ticket/2699#comment:20 [5] http://allmydata.org/trac/pycryptopp/browser/setup.py?rev=661#L181 [6] http://allmydata.org/trac/tahoe/browser/setup.py?rev=4036#L97 [7] http://allmydata.org/trac/darcsver [8] http://allmydata.org/trac/tahoe/browser/setup.cfg?rev=3996#L46 [9] http://allmydata.org/trac/darcsver/ticket/6 _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
At 05:20 PM 8/18/2009 -0600, Zooko Wilcox-O'Hearn wrote:
Dear Tarek Ziadé and fellowship of the packaging:
Please pay attention to this issue.
Did you read the email where I explained how you could solve the problem today? http://mail.python.org/pipermail/distutils-sig/2009-August/012974.html (Specifically, the problem of fetching dependencies at the start of setup.py, so you can use modular extensions to the distutils or setuptools.)
On Tue, Aug 18, 2009 at 7:20 PM, Zooko Wilcox-O'Hearn
... and Glyph refuses to accept my patch which fixes it because the fix isn't testable ...
Hi, Zooko :). Since this message officially makes me That Jerk Who Is Forcing The Issue, I've finally joined distutils-sig. Thank you for writing up the whole thing with such a detailed bibliography. I'll reply to PJE's message shortly. (For what it's worth, I wouldn't quite characterize it like that. More like "Twisted and Nevow refuse to accept these patches because their policy requires testability." I didn't make a personal decision to exclude these patches; the contributors have achieved a consensus about testability. The fact that we all arrived at that consensus because I said so in the first place is irrelevant ;-)).
Glyph Lefkowitz
On Tue, Aug 18, 2009 at 7:20 PM, Zooko Wilcox-O'Hearn
wrote: ... and Glyph refuses to accept my patch which fixes it because the fix isn't testable ...
Since this message officially makes me That Jerk Who Is Forcing The Issue,
I wouldn't characterise it like that :-)
I've finally joined distutils-sig.
Welcome!
(For what it's worth, I wouldn't quite characterize it like that. More like "Twisted and Nevow refuse to accept these patches because their policy requires testability."
I see no real difference between those two; and I find both describe an equally commendable action. -- \ “I never forget a face, but in your case I'll be glad to make | `\ an exception.” —Groucho Marx | _o__) | Ben Finney
participants (5)
-
Ben Finney
-
Brian Warner
-
Glyph Lefkowitz
-
P.J. Eby
-
Zooko Wilcox-O'Hearn