[Distutils] MANIFEST destiny :)

M.-A. Lemburg mal at egenix.com
Wed Nov 16 11:49:47 CET 2005

Hi Phillip,

you asked for it, so I'm giving you some flaming ;-) ...

Phillip J. Eby wrote:
> I originally added CVS and Subversion support to setuptools in order to get 
> past the pain of the distutils' MANIFEST system.  I used to use 
> MANIFEST.in, but it was a royal pain to get right, and I pretty much always 
> forgot to add stuff to it.  The most common problem when I shipped a source 
> distribution was that the MANIFEST was screwed up, such that a CVS checkout 
> worked fine but a source distribution would break.  Ugh.
> So, the CVS/Subversion support for setuptools automatically makes your 
> MANIFEST include anything under revision control, whether you have a 
> MANIFEST.in or not.  If you don't have a MANIFEST.in, the MANIFEST is built 
> every time you run an sdist, so it's always up-to-date.  Ah, bliss!
> But all is not happy in MANIFESTville.  It turns out that the bdist_rpm 
> command expects to build an sdist, for reasons impenetrable to me.  If you 
> build an RPM from a source checkout, everything is fine because setuptools 
> can auto-discover your files and build the MANIFEST for the new sdist.  But 
> if you build an RPM *from* an sdist, it's a no-go.

I don't understand what you mean with "no-go" - the current
system works just fine if you include the MANIFEST file
in the sdist.

> In addition, many folks have been asking for this autodetection to cover 
> package data files as well.  Why, they reasonably ask, must I specify each 
> and every file to be included in a package, when the system already knows 
> what files I have in revision control, or which is covered by my MANIFEST.in?
> The reason I've been avoiding adding this feature, however, is because of 
> the first issue; when you make an sdist, you lose that additional metadata, 
> so it would become impossible to build *any* binary from an sdist, not just 
> RPMs.  Until recently, that issue seemed insurmountable.

That's simply not true. You have to include the MANIFEST file
in the sdist and then everything is fine.

> So today, after looking over the issue a bit, I think I have a plan for 
> dealing with MANIFEST:
> * Change the MANIFEST format to be platform-independent (currently it 
> contains OS-specific path separators)


You are missing an important point: MANIFEST files can be
build using tools outside distutils and external package
building tools may require these to be platform dependent.

Distutils itself is happy with posix style separators
on all platforms.

> * Always, always, always build MANIFEST, and always include both the 
> MANIFEST file and MANIFEST.in (if present) in the source distribution.

-1 on always building MANIFEST.

This would miss the point of managing MANIFEST files
independently of your package files, e.g. using
Makefiles or other tools dealing with file dependencies,
checkouts, etc.

> * Disable all the options that allow user control over MANIFEST generation, 
> including pruning, defaults, changing the filenames, etc.


Again, you are forgetting that MANIFEST files serve a purpose
and are external to the distutils process for a reason. You
are free to have distutils build your MANIFEST files from
MANIFEST.in files, have distutils command auto-generate them,
or use external programs triggered by Makefiles or similar
distribution building processes to generate them.

In your world, everything is done within distutils, so it's
understandable that you'd like to get rid of the external
nature of MANIFEST files, but please keep in mind that
these features are being used and removing the logic would
seriously break things for packagers relying on other
mechanisms to build their MANIFEST files.

Simply overwriting the MANIFEST file everytime you
run the sdist command would break such use.

> * Use the MANIFEST data (along with revision control info) not only for 
> producing source distributions, but also to determine what files should be 
> considered "package data", if the user passes an 
> 'include_package_data=True' keyword to setup().

Isn't that already the case ? I mean you can put anything
you like into MANIFEST and it will be included in the sdist.

> The net result would be a single source for what constitutes "the 
> distribution contents", in the sense of files that are not directly part of 
> the distutils build process.  For files that are built automatically in 
> some way but should be included in source distributions or as package data, 
> you would still have to put them in MANIFEST.in.  But anything that was 
> under CVS or Subversion would be handled automatically, and you wouldn't 
> have to duplicate data between MANIFEST.in and setup(package_data={...}).

Again, not everybody is using distribution processes
built around CVS or Subversion. Left aside that there
are quite a few other SCM tools out there, you also have
the case where you create distributions from plain
directories (which is what MANIFEST.in and MANIFEST
are targetting).

> I'm also thinking that most of the MANIFEST logic could and should move to 
> the Distribution class, since the data will be used by multiple 
> commands.  Thus, the sdist command could just ask the Distribution for the 
> MANIFEST and get it, as would the commands that copy package data files to 
> the build directory.

Wait: MANIFEST defines what goes into the sdist - not an
arbitrary (binary) distribution.

> I suspect the most controversial parts of this idea are:
> * Disabling all user control of MANIFEST
> * Forcibly including MANIFEST and MANIFEST.in in source distributions
> * Making MANIFEST be always platform-independent
> When Googling the issues around MANIFEST, I noticed that the idea of having 
> MANIFEST or MANIFEST.in included automatically has been repeatedly shot 
> down here over the years.  However, if I followed the logic put forth on 
> those occasions, I would never have implemented revision control support in 
> the first place, so I guess if I'm in for a penny, I might as well be in 
> for a pound, as they say.

I'm not sure what you mean by "having MANIFEST[.in] included".

It would certainly make sense to have the MANIFEST[.in] files
automatically be added as default in sdist.py and I'd be
+1 on that (even though it never was an issue for me as I
always include them in the MANIFEST file).

> I couldn't find any argument one way or the other about the 
> manifest-generation options, nor any reasons why MANIFEST needs to remain 
> platform-specific, so I presume the options are just YAGNI and the format 
> was just an implementation accident.

See above.

> Likewise, as far as I can tell there is no reason for *not* regenerating a 
> MANIFEST whenever you need one, so the current behavior of only building 
> one when MANIFEST.in changes or you use --force-manifest, seems like a 
> premature optimization.  Or maybe it wasn't an excessive optimization when 
> the distutils were created, but it's not as if it's going to save you much 
> time compared to the actual archive building process today.

See above.

> I'm thinking that basically --force-manifest would become a no-op in 
> setuptools, in the sense that you won't be able to *stop* the MANIFEST from 
> being built every single time.  --manifest-only would still be 
> possible.  --manifest and --template would have to be rejected, however, 
> because the standard name is needed for MANIFEST to be re-read when you 
> build stuff from the produced sdist.
> --no-defaults would be ignored, except for a warning.  If you don't want 
> the defaults, you can always start your MANIFEST.in with an exclude pattern 
> to exclude absolutely everything already included.  There shouldn't be two 
> ways to do the same thing, especially not one that you can use on the 
> command line to mess things up in a non-repeatable fashion!  Likewise 
> --no-prune, because that's a similar recipe for disaster.

These options are meant for people who don't have a
MANIFEST.in file to begin with or just quickly want to
build an sdist with parts of the whole distribution or
an extended version (e.g. for testing or upgrading).

> A lot of these ideas are potential backward compatibility problems, so 
> we'll have to see how they play out in setuptools before considering them 
> for addition to the distutils.  My guess, however, is that most prolific 
> Python developers want to spend their time writing code, not writing and 
> debugging MANIFEST.in files, and that fact has been responsible for a lot 
> of setuptools uptake so far.  I've been seeing a lot of projects that use 
> setuptools for no apparent reason other than it makes writing the setup 
> script a little easier, due to find_packages(), package_data, and the lack 
> of need for a MANIFEST when source control is involved.  These are 
> qualities I'd like to extend further, even at the cost of some flexibility.
> Heck, most of the distutils' flaws lie in their extreme versatility. 

That comment is just silly: distutils is so powerful because
of its versatility. You wouldn't have been able to build setuptools
without this versatility.

Just because you don't like some of this flexibility doesn't
mean that distutils is broken in some way.

> You can tell each individual command that it's using different build or 
> distribution directories, for example, and in the process completely foul 
> up your builds.  What's more, every distutils tutorial may well end up 
> giving people different instructions as to the "best" way to lay out a 
> project directory.  If there's ever a "distutils 2", it needs to become 
> dictator-ware and tell you exactly what the One Obvious Way is.  If 
> everything *had* to be a particular way, then changing how the distutils 
> work would actually be possible, whereas now, it's bloody hard to even 
> figure out which of the nine billion ways to do it are actually in use.

That's your point of view - I've never had a hard time
adjusting distutils to whatever I wanted it to do. After
you get used to the way things are handled in distutils,
extending it is often enough really easy and would be
much harder in your One Obvious Way to do it (unless
you had a time-machine, zoom to 2042 and then take
all possibly ways to build distributions into account
on your way back to 2005 ;-).

You are free to develop setuptools into your own little
vision of distutils 2 - and that's one of distutils strengths !

> Okay, off the soapbox now.  :)  Does anybody see any issues with this that 
> I'm missing, with respect to using the MANIFEST/FileList machinery to 
> control sdist and package data, or my implementation plans for doing 
> so?  Thanks.

I think I gave you some more hints as to why MANIFEST[.in]
works the way it does.

Adding these files as defaults to the set of sdist files sounds
like a good idea (I don't remember discussions about this, so
maybe wrong).

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Nov 16 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

More information about the Distutils-SIG mailing list