RE: [Distutils] Re: CPAN functionality for python - requirements

From: Sean Reifschneider [mailto:jafo@tummy.com]
On Tue, Feb 27, 2001 at 09:30:13AM -0700, Evelyn Mitchell wrote:
But it will also discover and resolve dependences in your perl site-packages, and automatically fetch them from your closest CPAN archive.
Not according to my tests the night before last. I did a test CPAN install of "News::Newsrc", which failed because the "make test" was failing. I then installed the "Set::BitSet" (? Something like that) module and tried News::Newsrc again and it worked...
Maybe this was just a fluke and News::Newsrc is the exception and/or isn't used enough that people have gotten the prereqs right yet. If anyone knows for sure, I'm curious.
There are basically a number of aspects to "CPAN", which need separating out. MakeMaker --------- This is a perl module which implements a build process. You write a Makefile.PL, which calls MakeMaker defining the module's structure and metadata. Thanks to MakeMaker, the process for building a Perl module is (nearly always) simply perl Makefile.PL make make test <-- optional, but pretty much standard - runs module unit tests make install <-- installs the module This is, in both concept and functionality, almost identical to Distutils. There are some areas in which distutils is better (building of platform-specific installers, for instance) and some where MakeMaker is better (it hooks into a very standard structure for modules, generated by thye h2xs program, which practically forces module authors to write test suites and documentation in a standard format), but these are details. We can consider this covered. (Although the distutils-sig could still usefully learn from MakeMaker). The system of FTP sites and mirrors ----------------------------------- Frankly, this is nothing special. It has some nice features (automated uploads for module authors, plus quite nice indexing and server multiplexing features), but it isn't rocket science as far as I know. We could quite happily start with a simple FTP site for this (that's what CPAN did - the mirroring came later as popularity grew). CPAN.pm ------- This is a Perl module, which automates the process of downloading and installing modules. I don't use this personally, for a number of reasons. Frankly, I find that manual downloading and running the 4 lines noted above is fine. It relies for things like dependency tracking on metadata added into the Makefile.PL which is not necessary for the straight 4-line build above. As such, the level to which that metadata is added by module authors is variable (to be polite). In practice, I wouldn't rely on it - accurate dependency data seems to be the exception rather than the rule. I *don't* regard CPAN.pm to be important to the overall CPAN "phenomenon". But some people like it. Writing something "at least as good as" CPAN.pm shouldn't be hard in Python - not least because the standard library is rich enough that things like FTP client support is available out of the box (whereas CPAN.pm won't work until you manually install libnet, and possibly some other modules, I forget which...) But writing a "perfect" utility for automated download-and-install, with dependency tracking, etc etc, is going to be VERY HARD. Don't get misled - Perl doesn't have such a beast. And We won't even have what Perl has if we focus on perfection rather than practicality. The h2xs program ---------------- This is VERY important. The idea is that when you write a Perl module, either pure perl or a C (XS) extension, you run h2xs first, to generate a template build directory. It automatically includes * The perl module, with some basic template code and embedded POD documentation * The XS extension, with template code (if requested) * A Makefile.PL shell * A basic test script - all it does is test that the module loads, but it includes a placeholder for your own tests Essentially, h2xs forces a standard structure on all Perl modules. This is important for Perl, where modules have to conform to some standards in order to work at all. However, it brings HUGE benefits in standardisation of all the "other" parts of the process (documentation, tests, etc). Python is at a disadvantage here, precisely because writing a Python module involves so little in the way of a specific structure. So people will likely rebel against having a structure "imposed"... A social structure ------------------ This is a bit of a chicken and egg issue, but Perl developers expect to write modules using h2xs and MakeMaker, they expect to write tests (even if they are minimal), they expect to fill in the sections in the POD documentation, and they expect to submit their modules to CPAN. So this all "just works". Interestingly, developers probably don't "expect" to have to include dependency information, and hence many don't - resulting in the problems you hit. But then again, Perl users don't "expect" to be totally shielded from dependency issues. Python is VERY far behind here. This is a maturity issue - distutils is still (relatively) new, and so there are LOTS of packages which don't come with a setup.py yet. Often, adding one isn't hard, but it is still to happen. And when you are distributing a pure python module, as a single .py file, it's hard to see the benefit of changing that into a .tar.gz file containing the module, plus a setup.py. (Once you start adding test suites and documentation, the point of the whole archive bit is clearer, but we're not even close to that stage yet). Things are looking better with Python 2.1, though. Included with 2.1, it looks like there will be a standard unit testing module, and a new "pydoc" package, which will extract documentation from module docstrings. So the infrastructure will be there, it just becomes a case of getting developers to use it consistently. Distutils can help with this, by "just working" if people follow a standard structure. Sorry, this wasn't meant to get so long... Paul.

But writing a "perfect" utility for automated download-and-install, with dependency tracking, etc etc, is going to be VERY HARD. Don't get misled - Perl doesn't have such a beast. And We won't even have what Perl has if we focus on perfection rather than practicality.
Agreed. There is an old saying about shooting for the stars to reach the moon. Some of the ideas and proposals for all this seem to be shooting for distant galaxies and are unable to even get off the ground. IMHO we should start with something simple that allows for some of the ultimate goals, and that can be built upon later to get to more of the goals. Here is what I think the simple version needs: 1. A standard place where a file containing a list of mirrors and maybe a bit of mirror meta-data can be fetched from. 2. A python module to parse that mirror data into a meaningful data structure such as a dictionary or a list of instance objects of some type. 3. At every site in the mirrors file there will be a file containing meta-data about all the packages in the archive. 4. A python module to parse the package meta-data into something the client code can easily use. (The meta-data format is probably XML, but hiding it like this means the tool developer doesn't need to know or care what format the file is in. Ditto for the mirror list.) 5. An automated way to extract at least most if not all of the package meta-data from a Distutils based package and upload the meta-data, the sdist, and any bdists that the developer has made to the archive. This could be a new command added to Distutils, with a bit of functionality on the server side for receiving the files, moving them into the "right place" in the archive, and updating the package meta-data file. 6. Define a file format and supporting Python code for tracking which packages and versions have been fetched from the network and installed. Again, this should probably be a function of Distutils so it will also catch the cases where you downloaded some package not in the archive network and ran "python setup.py install" yourself. That's it. With just that bit in place intelligent clients could be written as command-line or GUI tools, or even a web-based interface. They simply fetch the list of mirrors and fetch the package meta-data from a desired mirror and cache this info locally. Then the client tools can do things without having to talk to a server like list available packages, query dependencies, list packages you have installed, etc. When you want to fetch and install a package then the client tool can use the package meta-data and urllib to fetch the sdist and/or a desired bdist and then optionally install it either using the package's setup.py or by whatever command is neccessary for the bdist. -- Robin Dunn Software Craftsman robin@AllDunn.com Java give you jitters? http://wxPython.org Relax with wxPython!

On Wed, Feb 28, 2001 at 09:59:38AM -0000, Moore, Paul wrote:
MakeMaker ---------
Handled by distutils, as you mention.
The system of FTP sites and mirrors -----------------------------------
Depends on wether you want a single central location, or the ability for packages to reside on a disperse set of machines... My setup can handle either.
CPAN.pm -------
This is what I'm working on, though it's interface to the catalog is fairly stand-alone instead of being dumped in a bunch of FTP directories (as I understand being the way CPAN works).
enough that things like FTP client support is available out of the box (whereas CPAN.pm won't work until you manually install libnet, and possibly
Well, yes and no... Unfortunately, using urlretrieve on an ftp:// URL seems to be broken at best. In my prototyping work last night, I ended up just calling "lftp" via os.system() -- not optimal but it was at least working. I need to dig into urllib some more, but at the least it seemed to need to run in passive mode, have better handling of malformed URLs (ftp://ftp.tummy.com//foo/bar/ was causing problems because of the doubled "/" created by appending a base and mirror URL). There also seemed to be a problem with closing the session properly when talking to an anonfile server...
But writing a "perfect" utility for automated download-and-install, with dependency tracking, etc etc, is going to be VERY HARD. Don't get misled -
Can't be that hard. I'm still working on the dependencies, but I've been able to get the rest of it working in my copious spare time before going to bed over the last two nights...
The h2xs program ----------------
That seems like a distutils tool to me.
A social structure ------------------
Based on my experimenting with Distutils last night, I don't know that this will be a problem. Distutils is totally cool -- I can't imagine going back.
Interestingly, developers probably don't "expect" to have to include dependency information, and hence many don't - resulting in the problems you
I think we can deal with that in an iterative manner. First get them to build distutils packages, then when it fails to install on some user's machine because they don't have foo.py we can educate them on the joys of listing third-party module requirements.
still (relatively) new, and so there are LOTS of packages which don't come with a setup.py yet. Often, adding one isn't hard, but it is still to
The biggest win of using distutils is that it makes it easier for the developer to run their software on multiple machines. That selfish reason is enough for me.
Things are looking better with Python 2.1, though. Included with 2.1, it looks like there will be a standard unit testing module, and a new "pydoc" package, which will extract documentation from module docstrings. So the
Should be sweet... Sean -- That weapon will replace your tongue. You will learn to speak through it. And your poetry will now be written with blood. -- _Dead_Man_ Sean Reifschneider, Inimitably Superfluous <jafo@tummy.com> tummy.com - Linux Consulting since 1995. Qmail, KRUD, Firewalls, Python

On Wed, 28 Feb 2001, Sean Reifschneider wrote:
On Wed, Feb 28, 2001 at 09:59:38AM -0000, Moore, Paul wrote: [...]
The h2xs program ----------------
That seems like a distutils tool to me.
A social structure ------------------
Based on my experimenting with Distutils last night, I don't know that this will be a problem. Distutils is totally cool -- I can't imagine going back.
Interestingly, developers probably don't "expect" to have to include dependency information, and hence many don't - resulting in the problems you
I think we can deal with that in an iterative manner. First get them to build distutils packages, then when it fails to install on some user's machine because they don't have foo.py we can educate them on the joys of listing third-party module requirements.
I think an iterative approach is a recipe for an archive full of packages that only very patchily take advantage of all the facilities on offer. The perl approach of making 'stubs' generation very easy for tests, docs, etc. is a very good idea -- it encourages you to actually put something more useful in the stubs (if not immediately, then later on). It's self perpetuating: they're always generated and the standard instructions for uploading stuff say you should include them (presumably), so it becomes widespread in the archive; people making new packages then see that everyone makes packages that way, so they'll do the same thing. I'm not suggesting everyone's about to start writing huge comprehensive manuals and test suites for everything, but a little encouragement and standardisation can go a long way, with a relatively small effort on the part of the developer. Same for dependencies, even more so probably.
still (relatively) new, and so there are LOTS of packages which don't come with a setup.py yet. Often, adding one isn't hard, but it is still to
The biggest win of using distutils is that it makes it easier for the developer to run their software on multiple machines. That selfish reason is enough for me.
And, as selfish users of other people's software, you can use that as a hook to get people to start making documentation and tests at the same time. Getting started is often the 'rate limiting step'.
Things are looking better with Python 2.1, though. Included with 2.1, it looks like there will be a standard unit testing module, and a new "pydoc" package, which will extract documentation from module docstrings. So the
Should be sweet... [...]
Especially if this and PyUnit, and whatever else, can all be tied together in Distutils. But *only* if there are actually some proper docs and good howto examples for Distutils itself!! What happened to the effort a while back on that? What remains to be done -- a todo list would be useful to spread the load, wouldn't it? John

[...] But *only* if there are actually some proper docs and good howto examples for Distutils itself!! What happened to the effort a while back on that? What remains to be done -- a todo list would be useful to spread the load, wouldn't it?
I wrote some docs. They are online already in Fred Drake's development version: http://python.sourceforge.net/devel-docs/ Thomas

John Lee grumbled about a lack of documentation for distutils, and Thomas Heller replied:
I wrote some docs. They are online already in Fred Drake's development version: http://python.sourceforge.net/devel-docs/
Erm - all I can see there that sound at all relevant are "Distributing Python Modules" and "Installing Python Modules", both credited to Greg Ward, and both in that <rant>infernally infuriating "one smidgeon of text per HTML page" format (which is worse than useless, so far as I'm concerned!)[1]</rant>. Could you give a clearer reference, please? ..[1] or am I missing some (undocumented) trick of the trade which will lead me to whole documents? Tibs -- Tony J Ibbs (Tibs) http://www.tibsnjoan.co.uk/ Give a pedant an inch and they'll take 25.4mm (once they've established you're talking a post-1959 inch, of course) My views! Mine! Mine! (Unless Laser-Scan ask nicely to borrow them.)

Thomas Heller replied:
I wrote some docs. They are online already in Fred Drake's development version: http://python.sourceforge.net/devel-docs/
Tony J Ibbs wrote:
Erm - all I can see there that sound at all relevant are "Distributing Python Modules" and "Installing Python Modules", both credited to Greg Ward, and both in that <rant>infernally infuriating "one smidgeon of text per HTML page" format (which is worse than useless, so far as I'm concerned!)[1]</rant>.
Could you give a clearer reference, please?
This is the patch I posted: http://mail.python.org/pipermail/distutils-sig/2001-February/001969.html Note that I didn't say that the docs are complete... Thomas

Interestingly, developers probably don't "expect" to have to include dependency information, and hence many don't - resulting in the problems you
I think we can deal with that in an iterative manner. First get them to build distutils packages, then when it fails to install on some user's machine because they don't have foo.py we can educate them on the joys of listing third-party module requirements.
Has anyone written a program that looks at a .py and tries to determine the versions of Python it will work with? Maybe have a module that grabs info about the system the code is being developed on, then use some of that to generate default dependencies which the developer can modify (be made less restrictive in most cases, I imagine) before submitting the package to the archive. Even if the developer doesn't modify the dependencies, the user will at least have some idea of what is required. - Bruce
participants (7)
-
Bruce Sass
-
John J. Lee
-
Moore, Paul
-
Robin Dunn
-
Sean Reifschneider
-
Thomas Heller
-
Tony J Ibbs (Tibs)