[Distutils] MANIFEST destiny :)
Phillip J. Eby
pje at telecommunity.com
Wed Nov 16 04:26:48 CET 2005
I originally added CVS and Subversion support to setuptools in order to get
past the pain of the distutils' MANIFEST system. I used to use
MANIFEST.in, but it was a royal pain to get right, and I pretty much always
forgot to add stuff to it. The most common problem when I shipped a source
distribution was that the MANIFEST was screwed up, such that a CVS checkout
worked fine but a source distribution would break. Ugh.
So, the CVS/Subversion support for setuptools automatically makes your
MANIFEST include anything under revision control, whether you have a
MANIFEST.in or not. If you don't have a MANIFEST.in, the MANIFEST is built
every time you run an sdist, so it's always up-to-date. Ah, bliss!
But all is not happy in MANIFESTville. It turns out that the bdist_rpm
command expects to build an sdist, for reasons impenetrable to me. If you
build an RPM from a source checkout, everything is fine because setuptools
can auto-discover your files and build the MANIFEST for the new sdist. But
if you build an RPM *from* an sdist, it's a no-go.
In addition, many folks have been asking for this autodetection to cover
package data files as well. Why, they reasonably ask, must I specify each
and every file to be included in a package, when the system already knows
what files I have in revision control, or which is covered by my MANIFEST.in?
The reason I've been avoiding adding this feature, however, is because of
the first issue; when you make an sdist, you lose that additional metadata,
so it would become impossible to build *any* binary from an sdist, not just
RPMs. Until recently, that issue seemed insurmountable.
So today, after looking over the issue a bit, I think I have a plan for
dealing with MANIFEST:
* Change the MANIFEST format to be platform-independent (currently it
contains OS-specific path separators)
* Always, always, always build MANIFEST, and always include both the
MANIFEST file and MANIFEST.in (if present) in the source distribution.
* Disable all the options that allow user control over MANIFEST generation,
including pruning, defaults, changing the filenames, etc.
* Use the MANIFEST data (along with revision control info) not only for
producing source distributions, but also to determine what files should be
considered "package data", if the user passes an
'include_package_data=True' keyword to setup().
The net result would be a single source for what constitutes "the
distribution contents", in the sense of files that are not directly part of
the distutils build process. For files that are built automatically in
some way but should be included in source distributions or as package data,
you would still have to put them in MANIFEST.in. But anything that was
under CVS or Subversion would be handled automatically, and you wouldn't
have to duplicate data between MANIFEST.in and setup(package_data={...}).
I'm also thinking that most of the MANIFEST logic could and should move to
the Distribution class, since the data will be used by multiple
commands. Thus, the sdist command could just ask the Distribution for the
MANIFEST and get it, as would the commands that copy package data files to
the build directory.
I suspect the most controversial parts of this idea are:
* Disabling all user control of MANIFEST
* Forcibly including MANIFEST and MANIFEST.in in source distributions
* Making MANIFEST be always platform-independent
When Googling the issues around MANIFEST, I noticed that the idea of having
MANIFEST or MANIFEST.in included automatically has been repeatedly shot
down here over the years. However, if I followed the logic put forth on
those occasions, I would never have implemented revision control support in
the first place, so I guess if I'm in for a penny, I might as well be in
for a pound, as they say.
I couldn't find any argument one way or the other about the
manifest-generation options, nor any reasons why MANIFEST needs to remain
platform-specific, so I presume the options are just YAGNI and the format
was just an implementation accident.
Likewise, as far as I can tell there is no reason for *not* regenerating a
MANIFEST whenever you need one, so the current behavior of only building
one when MANIFEST.in changes or you use --force-manifest, seems like a
premature optimization. Or maybe it wasn't an excessive optimization when
the distutils were created, but it's not as if it's going to save you much
time compared to the actual archive building process today.
I'm thinking that basically --force-manifest would become a no-op in
setuptools, in the sense that you won't be able to *stop* the MANIFEST from
being built every single time. --manifest-only would still be
possible. --manifest and --template would have to be rejected, however,
because the standard name is needed for MANIFEST to be re-read when you
build stuff from the produced sdist.
--no-defaults would be ignored, except for a warning. If you don't want
the defaults, you can always start your MANIFEST.in with an exclude pattern
to exclude absolutely everything already included. There shouldn't be two
ways to do the same thing, especially not one that you can use on the
command line to mess things up in a non-repeatable fashion! Likewise
--no-prune, because that's a similar recipe for disaster.
A lot of these ideas are potential backward compatibility problems, so
we'll have to see how they play out in setuptools before considering them
for addition to the distutils. My guess, however, is that most prolific
Python developers want to spend their time writing code, not writing and
debugging MANIFEST.in files, and that fact has been responsible for a lot
of setuptools uptake so far. I've been seeing a lot of projects that use
setuptools for no apparent reason other than it makes writing the setup
script a little easier, due to find_packages(), package_data, and the lack
of need for a MANIFEST when source control is involved. These are
qualities I'd like to extend further, even at the cost of some flexibility.
Heck, most of the distutils' flaws lie in their extreme versatility. You
can tell each individual command that it's using different build or
distribution directories, for example, and in the process completely foul
up your builds. What's more, every distutils tutorial may well end up
giving people different instructions as to the "best" way to lay out a
project directory. If there's ever a "distutils 2", it needs to become
dictator-ware and tell you exactly what the One Obvious Way is. If
everything *had* to be a particular way, then changing how the distutils
work would actually be possible, whereas now, it's bloody hard to even
figure out which of the nine billion ways to do it are actually in use.
Okay, off the soapbox now. :) Does anybody see any issues with this that
I'm missing, with respect to using the MANIFEST/FileList machinery to
control sdist and package data, or my implementation plans for doing
so? Thanks.
More information about the Distutils-SIG
mailing list