[Distutils] Proposed new manifest scheme
Greg Ward
gward@cnri.reston.va.us
Sun, 6 Feb 2000 10:55:00 -0500
Hi all --
right up at the top of my list of things to do for the Distutils is "fix
the dist command". The main problems are that the syntax for the
MANIFEST file (where you specify the files to go in a source
distribution) is a bit goofy, and there's poor feedback about which
files will actually be included.
Part 1 of the fix is to rearrange how the "dist" command works. I've
worked out a solution that appeals to me; give this a read and see if it
sounds right to you.
Step 1 (optional):
Module developer creates MANIFEST.in file. This is a concise-but-
readable specification of the files to be included in the source
distribution with wildcards, directory recursion, and exclusion.
(Syntax to be worked out in a future post; it should have the same
features as the current syntax, but be more readable.)
Step 2 (optional):
Developer runs "dist" command with "--manifest-only" option.
Distutils parses MANIFEST.in and spits out MANIFEST, containing a
complete, explicit list of every file to be included in the source
distribution.
If MANIFEST.in doesn't exist -- ie. the developer skipped step 1 --
Distutils creates a MANIFEST using the "default fileset", mostly the
pure Python modules and extension module source files mentioned in
setup.py.
Possibly: print a warning for every file in the source tree that
*won't* be included in the source distribution; or write this
information to another file (with a "files were excluded: see ..."
warning).
Step 3:
Developer runs "dist" command. If MANIFEST doesn't exist or is
out-of-date (ie. developer skipped step 2, or edited MANIFEST.in since
running step 2), it is regenerated. Distutils then creates a source
distibution (tarball, zipfile, whatever) containing exactly the files
listed in MANIFEST.
Possibly *this* is the step where the warning about excluded files
should be generated.
Open issues:
* Obviously we should regenerate MANIFEST whenever MANIFEST.in is
updated. (I.e. MANIFEST is only auto-generated, never edited --
unless you like playing with fire.) But what about when the
filesystem changes? If I add or remove files or directories,
MANIFEST should be regenerated. I seem to recall that detecting
this in a portable way is pretty much impossible, especially in
the presence of network filesystems (NFS, SMB).
Two possible solutions: regenerate MANIFEST every time the "dist"
command is run, or add a "--force-manifest" option. I prefer the
latter because of the (presumed) expense of walking the directory
tree.
* Should MANIFEST be included with the source distribution? What
about MANIFEST.in? The only reason I can see for including MANIFEST
is that it would enable trivial integrity checking; it buys you
zero security-wise, but at least ensures the download wasn't
truncated. I think this is a bogus argument; a truncated ZIP or
.tar.gz file simply won't unpack without errors, so I don't see
a big need for checking for the presence of all files. A less
trivial integrity check could be done by adding MD5 signatures (or
something) to MANIFEST, but that would all-but-require regenerating
the damn thing every time the "dist" command is run.
Including MANIFEST.in is more defensible; it's one of the "source"
files the developer uses to maintain the distribution, and third
parties should be able to regenerate the author's source
distribution if needed. (Just following the letter of the free
software licences: forking should be an option, but hopefully not a
commonly used one!)
* When should the developer be warned about files in his development
tree that weren't picked up by the MANIFEST-generating scan?
If we do it when the MANIFEST is generated (step 2 above), that's
when the developer is most likely to be watching -- why would
you bother doing an explicit separate MANIFEST generation if you're
not going to watch it closely?
However, doing it as late as possible -- ie. when the MANIFEST is
read and the source distribution is generated -- maximizes the
chance of spotting any late additions not caught in MANIFEST.in's
net. At the very least, this will remind the developer to re-run
"dist --force-manifest", and it might well remind him to edit his
MANIFEST.in to catch the late additions.
* Speaking of warning about files not included: there should be a way
to say, "Don't warn me about excluding X". X could be *~, *.bak,
*.o, or whatever (depending on your platform, editor, compiler,
development style, ...). One possibility: any file explicitly
excluded by MANIFEST.in would be exempt from "not included"
warnings. This would require you to explicitly exclude *~, *.o,
etc. -- normally they would not be caught in MANIFEST.in's web at
all, so no need to exclude them. Alternately, we could add a bit
more syntax to MANIFEST.in that says "Don't necessarily include or
exclude this file, but don't warn me if it happens to be excluded".
I don't see any need for this, and it could be confusing. Thoughts?
* Should the "default fileset" be included even when you have a
MANIFEST.in? This seems obvious to me, but others (hi Fred!) have
disagreed in the past. Since you can always exclude files from
the default set (a feature of the present syntax that will be
included in any future syntax), I see no need to make the great
utility of a default fileset disappear just because you need to
distribute files *not* mentioned in setup.py.
Coming soon: proposed new syntax for the MANIFEST.in file.
Greg