Hi all --
right up at the top of my list of things to do for the Distutils is "fix the dist command". The main problems are that the syntax for the MANIFEST file (where you specify the files to go in a source distribution) is a bit goofy, and there's poor feedback about which files will actually be included.
Part 1 of the fix is to rearrange how the "dist" command works. I've worked out a solution that appeals to me; give this a read and see if it sounds right to you.
Step 1 (optional): Module developer creates MANIFEST.in file. This is a concise-but- readable specification of the files to be included in the source distribution with wildcards, directory recursion, and exclusion. (Syntax to be worked out in a future post; it should have the same features as the current syntax, but be more readable.)
Step 2 (optional): Developer runs "dist" command with "--manifest-only" option. Distutils parses MANIFEST.in and spits out MANIFEST, containing a complete, explicit list of every file to be included in the source distribution.
If MANIFEST.in doesn't exist -- ie. the developer skipped step 1 -- Distutils creates a MANIFEST using the "default fileset", mostly the pure Python modules and extension module source files mentioned in setup.py.
Possibly: print a warning for every file in the source tree that *won't* be included in the source distribution; or write this information to another file (with a "files were excluded: see ..." warning).
Step 3: Developer runs "dist" command. If MANIFEST doesn't exist or is out-of-date (ie. developer skipped step 2, or edited MANIFEST.in since running step 2), it is regenerated. Distutils then creates a source distibution (tarball, zipfile, whatever) containing exactly the files listed in MANIFEST.
Possibly *this* is the step where the warning about excluded files should be generated.
* Obviously we should regenerate MANIFEST whenever MANIFEST.in is updated. (I.e. MANIFEST is only auto-generated, never edited -- unless you like playing with fire.) But what about when the filesystem changes? If I add or remove files or directories, MANIFEST should be regenerated. I seem to recall that detecting this in a portable way is pretty much impossible, especially in the presence of network filesystems (NFS, SMB).
Two possible solutions: regenerate MANIFEST every time the "dist" command is run, or add a "--force-manifest" option. I prefer the latter because of the (presumed) expense of walking the directory tree.
* Should MANIFEST be included with the source distribution? What about MANIFEST.in? The only reason I can see for including MANIFEST is that it would enable trivial integrity checking; it buys you zero security-wise, but at least ensures the download wasn't truncated. I think this is a bogus argument; a truncated ZIP or .tar.gz file simply won't unpack without errors, so I don't see a big need for checking for the presence of all files. A less trivial integrity check could be done by adding MD5 signatures (or something) to MANIFEST, but that would all-but-require regenerating the damn thing every time the "dist" command is run.
Including MANIFEST.in is more defensible; it's one of the "source" files the developer uses to maintain the distribution, and third parties should be able to regenerate the author's source distribution if needed. (Just following the letter of the free software licences: forking should be an option, but hopefully not a commonly used one!)
* When should the developer be warned about files in his development tree that weren't picked up by the MANIFEST-generating scan? If we do it when the MANIFEST is generated (step 2 above), that's when the developer is most likely to be watching -- why would you bother doing an explicit separate MANIFEST generation if you're not going to watch it closely?
However, doing it as late as possible -- ie. when the MANIFEST is read and the source distribution is generated -- maximizes the chance of spotting any late additions not caught in MANIFEST.in's net. At the very least, this will remind the developer to re-run "dist --force-manifest", and it might well remind him to edit his MANIFEST.in to catch the late additions.
* Speaking of warning about files not included: there should be a way to say, "Don't warn me about excluding X". X could be *~, *.bak, *.o, or whatever (depending on your platform, editor, compiler, development style, ...). One possibility: any file explicitly excluded by MANIFEST.in would be exempt from "not included" warnings. This would require you to explicitly exclude *~, *.o, etc. -- normally they would not be caught in MANIFEST.in's web at all, so no need to exclude them. Alternately, we could add a bit more syntax to MANIFEST.in that says "Don't necessarily include or exclude this file, but don't warn me if it happens to be excluded". I don't see any need for this, and it could be confusing. Thoughts?
* Should the "default fileset" be included even when you have a MANIFEST.in? This seems obvious to me, but others (hi Fred!) have disagreed in the past. Since you can always exclude files from the default set (a feature of the present syntax that will be included in any future syntax), I see no need to make the great utility of a default fileset disappear just because you need to distribute files *not* mentioned in setup.py.
Coming soon: proposed new syntax for the MANIFEST.in file.