On Nov 7, 2009, at 3:20 AM, Ben Finney wrote:
Guido van Rossum
writes: On Fri, Nov 6, 2009 at 2:52 PM, David Lyon
wrote: I think buildbot-style test runs for PyPI packages would raise average package quality on PyPI.
Please excuse the cross-post but I wanted to make sure that all these "CPAN for Python" discussions got this message and I've lost track of which list which part of what discussion had occurred on. We are currently extending our distutils/Distribute test system to include installation of a broad range of packages as part of the pre- release process for a future release of Distribute and as part of our "smoke" test for distutils/Distribute. Eventually, the goal is to integrate this with our buildbot system but that's a ways off. Our goal is to install a range of packages and, where practicable, actually run and record any errors with the packages' individual test suites. Right now, our "smoke" test only does Twisted and numpy. We've discussed how to collect test results from Twisted trial and we'll be working on similar things for other test runners (nose et al.). For Twisted, we're going to install and test both the current release version and an svn checkout from trunk. It would be an extension of that concept to install and test *all* packages from PyPI but would, obviously, take considerable horsepower (and time) to run such an exhaustive test (especially if we're talking about 2.4?, 2.5, 2.6, 2.7, and 3.1+. Right now I'm extending the configuration file for our smoke test to allow for various test runners (e.g. nose, twisted trial, etc.) so we can "smoke out" more installation problems and/or failed tests after installation. For the first pass, I'm just focusing on Twisted and trial, then numpy, then finding packages that support nose so that I can collect the data on what ran, what passed, and what didn't. I'm planning on collecting this all in a database and making some simple API so that it can be mined by very simple apps later. At the point where that infrastructure is in place, we could pretty easily mine the data to do all kinds of crazy things people have mentioned like: * A ranking system of test coverage * Complexity analysis * Test coverage * Run pylint, pyflakes, 2to3, whatever automated measurement tools over the code * Send test failure messages to maintainers (maybe with opt-in in the new meta-data). * Whatever! We're actively working on this right now; anyone who wants to lend a hand is welcome to contact me off-list and we can talk about what types of things we are needing and where we could use a hand. All in all, I think this could be a big leap forward for the Python distribution ecosystem whether or not we eventually write the PyPan I wished for as a new Perl refugee. Thanks, S