[Distutils] MANIFEST destiny :)

Phillip J. Eby pje at telecommunity.com
Wed Nov 16 18:53:40 CET 2005

At 11:49 AM 11/16/2005 +0100, M.-A. Lemburg wrote:
>I don't understand what you mean with "no-go" - the current
>system works just fine if you include the MANIFEST file
>in the sdist.

But it's not included, and you have to know to include it - and people who 
previously requested here that MANIFEST and MANIFEST.in be included in the 
manifest were shot down under the claim that including these defaults would 
be "too much magic".

> > The reason I've been avoiding adding this feature, however, is because of
> > the first issue; when you make an sdist, you lose that additional 
> metadata,
> > so it would become impossible to build *any* binary from an sdist, not 
> just
> > RPMs.  Until recently, that issue seemed insurmountable.
>That's simply not true. You have to include the MANIFEST file
>in the sdist and then everything is fine.

I'm speaking about the context of setuptools, where the typical user has no 
MANIFEST.in, because they're using Subversion or CVS and therefore don't 
need one.

> > * Change the MANIFEST format to be platform-independent (currently it
> > contains OS-specific path separators)
>You are missing an important point: MANIFEST files can be
>build using tools outside distutils and external package
>building tools may require these to be platform dependent.

One reason I posted to find out what specific uses people actually had for 
such things.  Do the uses actually exist?  What are they used for?  One 
important point: on POSIX-y platforms (i.e. virtually everything but 
Windows), there's no difference between distutils paths and system 
paths.  So, what external tools build or read MANIFEST files on Windows, or 
any other platform that doesn't accept '/' as a separator?

Also, '/' *is* a valid path separator on Windows, so if your MANIFEST 
processing is done in Python, a platform-independent format won't have any 
effect.  (Indeed, the only reason to actually *change* the format is so 
that sdist files built *on Windows* would be usable on other 
platforms!  Other platforms would never see a difference.)

I know that in principle, somebody somewhere may have some tool that would 
break if MANIFEST changed it's format.  My question is, who and how 
many?  If there's some widely-used tool that does this, then it's 
reasonable to take it into account.  If it's just a theoretical 
possibility, it's not worth much concern.  If there are a handful of people 
with a hard-to-change setup, it's somewhere in between.

> > * Always, always, always build MANIFEST, and always include both the
> > MANIFEST file and MANIFEST.in (if present) in the source distribution.
>-1 on always building MANIFEST.
>This would miss the point of managing MANIFEST files
>independently of your package files, e.g. using
>Makefiles or other tools dealing with file dependencies,
>checkouts, etc.

Please point me to some examples of this, especially one that can't simply 
generate a MANIFEST.in instead.

> > * Disable all the options that allow user control over MANIFEST 
> generation,
> > including pruning, defaults, changing the filenames, etc.
>Again, you are forgetting that MANIFEST files serve a purpose
>and are external to the distutils process for a reason. You
>are free to have distutils build your MANIFEST files from
>MANIFEST.in files, have distutils command auto-generate them,
>or use external programs triggered by Makefiles or similar
>distribution building processes to generate them.

Examples?  You keep saying that people *can* do these things, but that's 
not anything like the same as saying that any significant number of people 
*actually* do them.  Frankly, most people I've encountered who are doing 
Python software development don't know how to get a basic setup.py to work 
right, and feel the distutils are way too complicated, underdocumented, or 
just plain broken, because they don't feel like they can control them.

OSAF, for example, has some developers who are very smart about creating 
build processes.  Smart enough to be *able* to do the kinds of things 
you're describing.  But they sure as heck don't use the distutils to 
*actually* do them, because from their perspective the distutils is a big 
pile of broken undocumentedness.  If the distutils are so frustrating to 
such smart developers, there's something wrong.

Thus, I find these super-custom processes you're talking about highly 
implausible, because the only people who could implement them are the 
people with a strong knowledge of the distutils -- an incredibly rare breed 
of person, in other words.  Most people just want this stuff to work, and 
they don't want to have to learn *how* it works.  They have better things 
to do with their time.  They want a build *tool*, not a build library or 
build framework.

Most people don't know MANIFEST exists, until they get bitten by the need 
to have one, or by it being out of date.  Hell, look at how many packages 
on PyPI and the Vaults aren't even packaged with distutils at all!  For 
many people, it's clearly easier to just tarball your source directory than 
to have to learn about this MANIFEST stuff.

>In your world, everything is done within distutils, so it's
>understandable that you'd like to get rid of the external
>nature of MANIFEST files, but please keep in mind that
>these features are being used and removing the logic would
>seriously break things for packagers relying on other
>mechanisms to build their MANIFEST files.

Please point me to these developers, and show me one that couldn't just 
spend a few minutes  making their tools generate a MANIFEST.in 
instead.  I'm suggesting that we present a very small number of 
highly-capable people with a truly minor inconvenience, in order to make an 
extremely large group of people happier by taking away something that 
invariably bites them.

> > * Use the MANIFEST data (along with revision control info) not only for
> > producing source distributions, but also to determine what files should be
> > considered "package data", if the user passes an
> > 'include_package_data=True' keyword to setup().
>Isn't that already the case ? I mean you can put anything
>you like into MANIFEST and it will be included in the sdist.

I'm talking about package data - a feature that was pioneered in setuptools 
and added to the distutils in Python 2.4.  The ability to specify data 
files that are *installed* in a package directory.

Specifically, I'm suggesting that users be able to replace the package_data 
setup keyword with a simple include_package_data flag, so that the MANIFEST 
data can be used to determine what data files to install.  This has nothing 
to do with sdist; I'm talking about being able to unify the distutils' idea 
of what files are part of the distribution, because for the simple cases 
that are 90% of projects, that's a very useful thing to have.

For every SciPy and mxODBC and Twisted, there are easily a hundred packages 
without anything like their level of complex build needs.  I'm talking 
about streamlining things for the simple packages, but there isn't anything 
in what I'm proposing that keeps anybody from doing complex things.  I'm 
just saying we should remove *multiple* ways to do the *same* complex 
thing.  We should pick One Obvious Way to *customize*, replacing multiple 
hooks with single hooks that allow the same degree of customizability.

For example, if you have tools that generate something, right now you could 
choose to generate either MANIFEST or MANIFEST.in.  I'm suggesting that we 
should choose for you, and say it's MANIFEST.in that you should generate, 
since it's the more expressive and flexible of the two formats.  Similarly, 
you can currently choose what filenames MANIFEST and MANIFEST.in have, and 
I'm suggesting that those be their only names.  If you need other files you 
certainly have the option of copying, renaming, and using other existing 
file manipulation tools.

These additional degrees of built-in "freedom" just give you *meaningless* 
choices.  They don't make any *real* difference to your ability to get the 
job done.  Instead, they raise a *barrier* to creating tools.  If somebody 
creates a tool to do something with manifest files, you can't use it unless 
you and they have already agreed to make the *same* choices about these 
superfluous options, or else the tool maker has to support all possible 
options - and sometimes, a useful tool that supports all the options just 
isn't *possible*.

See, it wouldn't matter if the arbitrary choice was that you have to 
generate MANIFEST instead of MANIFEST.in.  You'd still have the same 
flexibility.  The important thing in these arbitrary choices is that *we 
pick one*.  That's why we have a BDFL for the language - we all argue for a 
particular surface syntax, and then he picks one, and we all move on.  The 
distutils needs a BDFL to pick between all the mutually incompatible but 
semantically indistinguishable surface syntaxes for how to build and 
package something.

>Again, not everybody is using distribution processes
>built around CVS or Subversion. Left aside that there
>are quite a few other SCM tools out there, you also have
>the case where you create distributions from plain
>directories (which is what MANIFEST.in and MANIFEST
>are targetting).

The source control is a supplement to the "add_defaults" of the sdist 
process, not a replacement for MANIFEST.in.  Some users need to add 
non-source-controlled files, for example.  But *simple things should be 
simple*, and treating source control info as part of the defaults makes 
them work simply for most people.  And if you don't want the defaults, 
there should be only one way to turn them off - by excluding files in 
MANIFEST.in, not by a command line option.

> > I'm also thinking that most of the MANIFEST logic could and should move to
> > the Distribution class, since the data will be used by multiple
> > commands.  Thus, the sdist command could just ask the Distribution for the
> > MANIFEST and get it, as would the commands that copy package data files to
> > the build directory.
>Wait: MANIFEST defines what goes into the sdist - not an
>arbitrary (binary) distribution.

I'm talking about the include_package_data option.

>It would certainly make sense to have the MANIFEST[.in] files
>automatically be added as default in sdist.py and I'd be
>+1 on that (even though it never was an issue for me as I
>always include them in the MANIFEST file).

Interesting; I could've sworn that you were one of the people who told 
somebody it would be "too much magic" to include this.  But whatever the 
case, I'm glad you don't oppose it now.

> > --no-defaults would be ignored, except for a warning.  If you don't want
> > the defaults, you can always start your MANIFEST.in with an exclude 
> pattern
> > to exclude absolutely everything already included.  There shouldn't be two
> > ways to do the same thing, especially not one that you can use on the
> > command line to mess things up in a non-repeatable fashion!  Likewise
> > --no-prune, because that's a similar recipe for disaster.
>These options are meant for people who don't have a
>MANIFEST.in file to begin with or just quickly want to
>build an sdist with parts of the whole distribution or
>an extended version (e.g. for testing or upgrading).

Sure - and there are plenty of things I can leave some room for play 
in.  For example, I could simply revert to old behaviors when you use any 
non-default options.  I could make a separate MANIFEST.setuptools file, 
etc.  But these things add complexity, so I want to know who *actually* 
needs them, and can't trivially work around their absence.  I'd rather 
briefly inconvenience distutils mavens like you, than continue to stump and 
frustrate the hundreds of people who just don't get why it's all so damn 

> > Heck, most of the distutils' flaws lie in their extreme versatility.
>That comment is just silly: distutils is so powerful because
>of its versatility. You wouldn't have been able to build setuptools
>without this versatility.

You're confusing a well-factored framework with user-level 
versatility.  Using variables instead of hardcoding filenames internally is 
a very good idea.  Exposing those variables for users to change (in the 
absence of concrete use cases), however, is just bad UI design and a lack 
of social awareness.

>That's your point of view - I've never had a hard time
>adjusting distutils to whatever I wanted it to do. After
>you get used to the way things are handled in distutils,
>extending it is often enough really easy and would be
>much harder in your One Obvious Way to do it (unless
>you had a time-machine, zoom to 2042 and then take
>all possibly ways to build distributions into account
>on your way back to 2005 ;-).

And your point of view is missing the part where everybody else isn't a 
distutils expert like you or I, and unlike you or I, has *no interest 
whatsoever in becoming one*.  Simple things should be simple, and 
distributing most packages shouldn't be rocket science.  In particular, 
there should be a gentle learning curve from "distribute one module" to 
"complex distribution with autogenerated bits not in source control".  And, 
the path for *how* you do those things should be laid out.

There are plenty of things that *are* Obvious use cases, but for which the 
Distutils Way is not obvious.  It's always *possible* to customize via 
subclassing, and I'm not suggesting that be disallowed.  But it shouldn't 
be necessary for the Obvious Way, and should be *required* for any 
deviation from the Obvious Way.  If you're going to deviate, you should be 
*aware* that you're on your own, and parting ways with the larger 
community.  You should be aware that you are potentially isolating yourself 
from the use of community tools based on that Way.  Currently, you can 
never be sure, because there *is* no Way.  Everybody has their own, and the 
result is chaos.

Ironically, although the Perl community's language philosophy is "more than 
one way to do it", their build and distribution philosophy seems to be that 
there's not merely one obvious way to do it, there's *exactly* one way to 
do it.  And *that* is the real reason why Perl has always been ahead of 
Python in readily-available libraries.  The Perl distribution culture 
reflects the idea that build tools are for sharing software with the 
community, not a framework for creating private build systems.

More information about the Distutils-SIG mailing list