[Distutils] MANIFEST destiny :)
Phillip J. Eby
pje at telecommunity.com
Wed Nov 16 18:53:40 CET 2005
At 11:49 AM 11/16/2005 +0100, M.-A. Lemburg wrote:
>I don't understand what you mean with "no-go" - the current
>system works just fine if you include the MANIFEST file
>in the sdist.
But it's not included, and you have to know to include it - and people who
previously requested here that MANIFEST and MANIFEST.in be included in the
manifest were shot down under the claim that including these defaults would
be "too much magic".
> > The reason I've been avoiding adding this feature, however, is because of
> > the first issue; when you make an sdist, you lose that additional
> > so it would become impossible to build *any* binary from an sdist, not
> > RPMs. Until recently, that issue seemed insurmountable.
>That's simply not true. You have to include the MANIFEST file
>in the sdist and then everything is fine.
I'm speaking about the context of setuptools, where the typical user has no
MANIFEST.in, because they're using Subversion or CVS and therefore don't
> > * Change the MANIFEST format to be platform-independent (currently it
> > contains OS-specific path separators)
>You are missing an important point: MANIFEST files can be
>build using tools outside distutils and external package
>building tools may require these to be platform dependent.
One reason I posted to find out what specific uses people actually had for
such things. Do the uses actually exist? What are they used for? One
important point: on POSIX-y platforms (i.e. virtually everything but
Windows), there's no difference between distutils paths and system
paths. So, what external tools build or read MANIFEST files on Windows, or
any other platform that doesn't accept '/' as a separator?
Also, '/' *is* a valid path separator on Windows, so if your MANIFEST
processing is done in Python, a platform-independent format won't have any
effect. (Indeed, the only reason to actually *change* the format is so
that sdist files built *on Windows* would be usable on other
platforms! Other platforms would never see a difference.)
I know that in principle, somebody somewhere may have some tool that would
break if MANIFEST changed it's format. My question is, who and how
many? If there's some widely-used tool that does this, then it's
reasonable to take it into account. If it's just a theoretical
possibility, it's not worth much concern. If there are a handful of people
with a hard-to-change setup, it's somewhere in between.
> > * Always, always, always build MANIFEST, and always include both the
> > MANIFEST file and MANIFEST.in (if present) in the source distribution.
>-1 on always building MANIFEST.
>This would miss the point of managing MANIFEST files
>independently of your package files, e.g. using
>Makefiles or other tools dealing with file dependencies,
Please point me to some examples of this, especially one that can't simply
generate a MANIFEST.in instead.
> > * Disable all the options that allow user control over MANIFEST
> > including pruning, defaults, changing the filenames, etc.
>Again, you are forgetting that MANIFEST files serve a purpose
>and are external to the distutils process for a reason. You
>are free to have distutils build your MANIFEST files from
>MANIFEST.in files, have distutils command auto-generate them,
>or use external programs triggered by Makefiles or similar
>distribution building processes to generate them.
Examples? You keep saying that people *can* do these things, but that's
not anything like the same as saying that any significant number of people
*actually* do them. Frankly, most people I've encountered who are doing
Python software development don't know how to get a basic setup.py to work
right, and feel the distutils are way too complicated, underdocumented, or
just plain broken, because they don't feel like they can control them.
OSAF, for example, has some developers who are very smart about creating
build processes. Smart enough to be *able* to do the kinds of things
you're describing. But they sure as heck don't use the distutils to
*actually* do them, because from their perspective the distutils is a big
pile of broken undocumentedness. If the distutils are so frustrating to
such smart developers, there's something wrong.
Thus, I find these super-custom processes you're talking about highly
implausible, because the only people who could implement them are the
people with a strong knowledge of the distutils -- an incredibly rare breed
of person, in other words. Most people just want this stuff to work, and
they don't want to have to learn *how* it works. They have better things
to do with their time. They want a build *tool*, not a build library or
Most people don't know MANIFEST exists, until they get bitten by the need
to have one, or by it being out of date. Hell, look at how many packages
on PyPI and the Vaults aren't even packaged with distutils at all! For
many people, it's clearly easier to just tarball your source directory than
to have to learn about this MANIFEST stuff.
>In your world, everything is done within distutils, so it's
>understandable that you'd like to get rid of the external
>nature of MANIFEST files, but please keep in mind that
>these features are being used and removing the logic would
>seriously break things for packagers relying on other
>mechanisms to build their MANIFEST files.
Please point me to these developers, and show me one that couldn't just
spend a few minutes making their tools generate a MANIFEST.in
instead. I'm suggesting that we present a very small number of
highly-capable people with a truly minor inconvenience, in order to make an
extremely large group of people happier by taking away something that
invariably bites them.
> > * Use the MANIFEST data (along with revision control info) not only for
> > producing source distributions, but also to determine what files should be
> > considered "package data", if the user passes an
> > 'include_package_data=True' keyword to setup().
>Isn't that already the case ? I mean you can put anything
>you like into MANIFEST and it will be included in the sdist.
I'm talking about package data - a feature that was pioneered in setuptools
and added to the distutils in Python 2.4. The ability to specify data
files that are *installed* in a package directory.
Specifically, I'm suggesting that users be able to replace the package_data
setup keyword with a simple include_package_data flag, so that the MANIFEST
data can be used to determine what data files to install. This has nothing
to do with sdist; I'm talking about being able to unify the distutils' idea
of what files are part of the distribution, because for the simple cases
that are 90% of projects, that's a very useful thing to have.
For every SciPy and mxODBC and Twisted, there are easily a hundred packages
without anything like their level of complex build needs. I'm talking
about streamlining things for the simple packages, but there isn't anything
in what I'm proposing that keeps anybody from doing complex things. I'm
just saying we should remove *multiple* ways to do the *same* complex
thing. We should pick One Obvious Way to *customize*, replacing multiple
hooks with single hooks that allow the same degree of customizability.
For example, if you have tools that generate something, right now you could
choose to generate either MANIFEST or MANIFEST.in. I'm suggesting that we
should choose for you, and say it's MANIFEST.in that you should generate,
since it's the more expressive and flexible of the two formats. Similarly,
you can currently choose what filenames MANIFEST and MANIFEST.in have, and
I'm suggesting that those be their only names. If you need other files you
certainly have the option of copying, renaming, and using other existing
file manipulation tools.
These additional degrees of built-in "freedom" just give you *meaningless*
choices. They don't make any *real* difference to your ability to get the
job done. Instead, they raise a *barrier* to creating tools. If somebody
creates a tool to do something with manifest files, you can't use it unless
you and they have already agreed to make the *same* choices about these
superfluous options, or else the tool maker has to support all possible
options - and sometimes, a useful tool that supports all the options just
See, it wouldn't matter if the arbitrary choice was that you have to
generate MANIFEST instead of MANIFEST.in. You'd still have the same
flexibility. The important thing in these arbitrary choices is that *we
pick one*. That's why we have a BDFL for the language - we all argue for a
particular surface syntax, and then he picks one, and we all move on. The
distutils needs a BDFL to pick between all the mutually incompatible but
semantically indistinguishable surface syntaxes for how to build and
>Again, not everybody is using distribution processes
>built around CVS or Subversion. Left aside that there
>are quite a few other SCM tools out there, you also have
>the case where you create distributions from plain
>directories (which is what MANIFEST.in and MANIFEST
The source control is a supplement to the "add_defaults" of the sdist
process, not a replacement for MANIFEST.in. Some users need to add
non-source-controlled files, for example. But *simple things should be
simple*, and treating source control info as part of the defaults makes
them work simply for most people. And if you don't want the defaults,
there should be only one way to turn them off - by excluding files in
MANIFEST.in, not by a command line option.
> > I'm also thinking that most of the MANIFEST logic could and should move to
> > the Distribution class, since the data will be used by multiple
> > commands. Thus, the sdist command could just ask the Distribution for the
> > MANIFEST and get it, as would the commands that copy package data files to
> > the build directory.
>Wait: MANIFEST defines what goes into the sdist - not an
>arbitrary (binary) distribution.
I'm talking about the include_package_data option.
>It would certainly make sense to have the MANIFEST[.in] files
>automatically be added as default in sdist.py and I'd be
>+1 on that (even though it never was an issue for me as I
>always include them in the MANIFEST file).
Interesting; I could've sworn that you were one of the people who told
somebody it would be "too much magic" to include this. But whatever the
case, I'm glad you don't oppose it now.
> > --no-defaults would be ignored, except for a warning. If you don't want
> > the defaults, you can always start your MANIFEST.in with an exclude
> > to exclude absolutely everything already included. There shouldn't be two
> > ways to do the same thing, especially not one that you can use on the
> > command line to mess things up in a non-repeatable fashion! Likewise
> > --no-prune, because that's a similar recipe for disaster.
>These options are meant for people who don't have a
>MANIFEST.in file to begin with or just quickly want to
>build an sdist with parts of the whole distribution or
>an extended version (e.g. for testing or upgrading).
Sure - and there are plenty of things I can leave some room for play
in. For example, I could simply revert to old behaviors when you use any
non-default options. I could make a separate MANIFEST.setuptools file,
etc. But these things add complexity, so I want to know who *actually*
needs them, and can't trivially work around their absence. I'd rather
briefly inconvenience distutils mavens like you, than continue to stump and
frustrate the hundreds of people who just don't get why it's all so damn
> > Heck, most of the distutils' flaws lie in their extreme versatility.
>That comment is just silly: distutils is so powerful because
>of its versatility. You wouldn't have been able to build setuptools
>without this versatility.
You're confusing a well-factored framework with user-level
versatility. Using variables instead of hardcoding filenames internally is
a very good idea. Exposing those variables for users to change (in the
absence of concrete use cases), however, is just bad UI design and a lack
of social awareness.
>That's your point of view - I've never had a hard time
>adjusting distutils to whatever I wanted it to do. After
>you get used to the way things are handled in distutils,
>extending it is often enough really easy and would be
>much harder in your One Obvious Way to do it (unless
>you had a time-machine, zoom to 2042 and then take
>all possibly ways to build distributions into account
>on your way back to 2005 ;-).
And your point of view is missing the part where everybody else isn't a
distutils expert like you or I, and unlike you or I, has *no interest
whatsoever in becoming one*. Simple things should be simple, and
distributing most packages shouldn't be rocket science. In particular,
there should be a gentle learning curve from "distribute one module" to
"complex distribution with autogenerated bits not in source control". And,
the path for *how* you do those things should be laid out.
There are plenty of things that *are* Obvious use cases, but for which the
Distutils Way is not obvious. It's always *possible* to customize via
subclassing, and I'm not suggesting that be disallowed. But it shouldn't
be necessary for the Obvious Way, and should be *required* for any
deviation from the Obvious Way. If you're going to deviate, you should be
*aware* that you're on your own, and parting ways with the larger
community. You should be aware that you are potentially isolating yourself
from the use of community tools based on that Way. Currently, you can
never be sure, because there *is* no Way. Everybody has their own, and the
result is chaos.
Ironically, although the Perl community's language philosophy is "more than
one way to do it", their build and distribution philosophy seems to be that
there's not merely one obvious way to do it, there's *exactly* one way to
do it. And *that* is the real reason why Perl has always been ahead of
Python in readily-available libraries. The Perl distribution culture
reflects the idea that build tools are for sharing software with the
community, not a framework for creating private build systems.
More information about the Distutils-SIG