[Distutils] MANIFEST destiny :)

M.-A. Lemburg mal at egenix.com
Thu Nov 17 00:19:07 CET 2005


Hi Phillip,

In general, I think you are having a different focus here
than what distutils is trying to be and that's perfectly
OK - you can implement all these nice strategies and
automated decisions into your setuptools.

I just don't see a benefit in stripping down the
framework distutils itself.

If people choose setuptools as front-end to distutils
that's a perfectly good choice and one I'd like to
encourage.

Note that distutils would benefit a lot from more support
for e.g. InnoSetup, NSIS, native packages for Solaris,
HP-UX, Debian.

There are a few projects out there trying to add this
support, but few have stepped forward to suggest integration
with the core framework.

The issues around MANIFEST[.in] that you present are
really minor compared to not being able to build e.g.
Debian packages out of the box and without too much
user interaction.

Phillip J. Eby wrote:
> At 11:49 AM 11/16/2005 +0100, M.-A. Lemburg wrote:
> 
>> I don't understand what you mean with "no-go" - the current
>> system works just fine if you include the MANIFEST file
>> in the sdist.
> 
> 
> But it's not included, and you have to know to include it - and people
> who previously requested here that MANIFEST and MANIFEST.in be included
> in the manifest were shot down under the claim that including these
> defaults would be "too much magic".

I don't see that as a big problem. Why not include it per
default like README and the others ?!

>> > The reason I've been avoiding adding this feature, however, is
>> because of
>> > the first issue; when you make an sdist, you lose that additional
>> metadata,
>> > so it would become impossible to build *any* binary from an sdist,
>> not just
>> > RPMs.  Until recently, that issue seemed insurmountable.
>>
>> That's simply not true. You have to include the MANIFEST file
>> in the sdist and then everything is fine.
> 
> 
> I'm speaking about the context of setuptools, where the typical user has
> no MANIFEST.in, because they're using Subversion or CVS and therefore
> don't need one.

Right, and that's a different context than the one needed
for the core framework distutils itself.

>> > * Change the MANIFEST format to be platform-independent (currently it
>> > contains OS-specific path separators)
>>
>> -0.5
>>
>> You are missing an important point: MANIFEST files can be
>> build using tools outside distutils and external package
>> building tools may require these to be platform dependent.
> 
> 
> One reason I posted to find out what specific uses people actually had
> for such things.  Do the uses actually exist?  What are they used for? 

eGenix for one uses its own file selection mechanism (mostly
for historical reasons because we had our own packaging system
before we switched to distutils).

The MANIFEST.in format is not everybody's favorite, so I expect
others to use more common tools such as e.g. Unix find, sed
or even just a plain text editor.

We also manage the MANIFEST files using Makefiles which
take care of the build process, do the checkouts, copies,
rsyncs, etc. needed for remote builds.

> One important point: on POSIX-y platforms (i.e. virtually everything but
> Windows), there's no difference between distutils paths and system
> paths.  So, what external tools build or read MANIFEST files on Windows,
> or any other platform that doesn't accept '/' as a separator?
> 
> Also, '/' *is* a valid path separator on Windows, so if your MANIFEST
> processing is done in Python, a platform-independent format won't have
> any effect.  (Indeed, the only reason to actually *change* the format is
> so that sdist files built *on Windows* would be usable on other
> platforms!  Other platforms would never see a difference.)
> 
> I know that in principle, somebody somewhere may have some tool that
> would break if MANIFEST changed it's format.  My question is, who and
> how many?  If there's some widely-used tool that does this, then it's
> reasonable to take it into account.  If it's just a theoretical
> possibility, it's not worth much concern.  If there are a handful of
> people with a hard-to-change setup, it's somewhere in between.

The question should not be: how many setups can I break ?

It should be: what do we gain by using a single format,
e.g. the posix one and how can we avoid breakage ?

Note that distutils knows how to transform posix file
names into platform dependent ones.

>> > * Always, always, always build MANIFEST, and always include both the
>> > MANIFEST file and MANIFEST.in (if present) in the source distribution.
>>
>> -1 on always building MANIFEST.
>>
>> This would miss the point of managing MANIFEST files
>> independently of your package files, e.g. using
>> Makefiles or other tools dealing with file dependencies,
>> checkouts, etc.
> 
> 
> Please point me to some examples of this, especially one that can't
> simply generate a MANIFEST.in instead.

See above.

Tools like "find" are simply much more complete in terms of file
selection. It is also sometimes necessary to massage the paths
a bit, using e.g. sed.

>> > * Disable all the options that allow user control over MANIFEST
>> generation,
>> > including pruning, defaults, changing the filenames, etc.
>>
>> -1
>>
>> Again, you are forgetting that MANIFEST files serve a purpose
>> and are external to the distutils process for a reason. You
>> are free to have distutils build your MANIFEST files from
>> MANIFEST.in files, have distutils command auto-generate them,
>> or use external programs triggered by Makefiles or similar
>> distribution building processes to generate them.
> 
> 
> Examples?  You keep saying that people *can* do these things, but that's
> not anything like the same as saying that any significant number of
> people *actually* do them.  Frankly, most people I've encountered who
> are doing Python software development don't know how to get a basic
> setup.py to work right, and feel the distutils are way too complicated,
> underdocumented, or just plain broken, because they don't feel like they
> can control them.

It's underdocumented, yes, but getting a simple setup.py
to work is really not all that complicated - and this is
underlined by the fact that most Python packages nowadays
are distributed as distutils-based packages.

> OSAF, for example, has some developers who are very smart about creating
> build processes.  Smart enough to be *able* to do the kinds of things
> you're describing.  But they sure as heck don't use the distutils to
> *actually* do them, because from their perspective the distutils is a
> big pile of broken undocumentedness.  If the distutils are so
> frustrating to such smart developers, there's something wrong.

The code itself is well documented and easy to read. It should
be well within range of every average Python programmer.

Furthermore, you only need to dig into distutils if you
plan to extend or modify its default functionality in
some way. The casual user does not have to read the
sources.

> Thus, I find these super-custom processes you're talking about highly
> implausible, because the only people who could implement them are the
> people with a strong knowledge of the distutils -- an incredibly rare
> breed of person, in other words. 

I was only talking about special ways to build the MANIFEST
files, not "super-custom" processes. No idea where you
got that impression from.

> Most people just want this stuff to
> work, and they don't want to have to learn *how* it works.  They have
> better things to do with their time.  They want a build *tool*, not a
> build library or build framework.

distutils does work for these people. The many existing
packages using distutils is proof enough, IMHO.

Of course, you can always do better if you have more
specific requirements such as your CVS/Subversion
integration.

Those people should then use your setuptools front-end.
I don't see that as a problem.

> Most people don't know MANIFEST exists, until they get bitten by the
> need to have one, or by it being out of date.  Hell, look at how many
> packages on PyPI and the Vaults aren't even packaged with distutils at
> all! 

Not that many... :-)

> For many people, it's clearly easier to just tarball your source
> directory than to have to learn about this MANIFEST stuff.

I agree that this feature is underdocumented, but changing
the framework won't help with this: documentation patches
is what we *really* need !

BTW, not many people need to have these MANIFEST files
at all - distutils uses a built-in file finder based on these
defaults:

          - README or README.txt
          - setup.py
          - test/test*.py
          - all pure Python modules mentioned in setup script
          - all C sources listed as part of extensions or C libraries
            in the setup script (doesn't catch C headers!)

>> In your world, everything is done within distutils, so it's
>> understandable that you'd like to get rid of the external
>> nature of MANIFEST files, but please keep in mind that
>> these features are being used and removing the logic would
>> seriously break things for packagers relying on other
>> mechanisms to build their MANIFEST files.
> 
> 
> Please point me to these developers, and show me one that couldn't just
> spend a few minutes  making their tools generate a MANIFEST.in instead. 
> I'm suggesting that we present a very small number of highly-capable
> people with a truly minor inconvenience, in order to make an extremely
> large group of people happier by taking away something that invariably
> bites them.

See above.

Carelessly overwriting hand-edited or otherwise generated
files in a build process is simply bad design.

If a MANIFEST file exists it should be left untouched. If no such file
exists, but there's a MANIFEST.in exists, it should be
rebuilt. If there's not MANIFEST.in, use a set of sane defaults
determined by introspection of the setup.py details.

This is what distutils does.

>> > * Use the MANIFEST data (along with revision control info) not only for
>> > producing source distributions, but also to determine what files
>> should be
>> > considered "package data", if the user passes an
>> > 'include_package_data=True' keyword to setup().
>>
>> Isn't that already the case ? I mean you can put anything
>> you like into MANIFEST and it will be included in the sdist.
> 
> 
> I'm talking about package data - a feature that was pioneered in
> setuptools and added to the distutils in Python 2.4.  The ability to
> specify data files that are *installed* in a package directory.
> 
> Specifically, I'm suggesting that users be able to replace the
> package_data setup keyword with a simple include_package_data flag, so
> that the MANIFEST data can be used to determine what data files to
> install.  This has nothing to do with sdist; I'm talking about being
> able to unify the distutils' idea of what files are part of the
> distribution, because for the simple cases that are 90% of projects,
> that's a very useful thing to have.

Perhaps you should then enhance the sdist way of finding
suitable defaults - it currently does not take package_data
files into account.

MANIFEST is only used for source code distrbutions. I don't
see how you can use it for anything else. See e.g. the way
bdist_rpm works: it actually installs the package to find
out which files are actually installed and then records all
the files copied during that process - that's a very smart,
future proof and flexible design.

> For every SciPy and mxODBC and Twisted, there are easily a hundred
> packages without anything like their level of complex build needs.  I'm
> talking about streamlining things for the simple packages, but there
> isn't anything in what I'm proposing that keeps anybody from doing
> complex things.  I'm just saying we should remove *multiple* ways to do
> the *same* complex thing.  We should pick One Obvious Way to
> *customize*, replacing multiple hooks with single hooks that allow the
> same degree of customizability.

I don't buy this: on one hand you are talking about simple
packages (which don't need the MANIFEST files in the
first place), on the other about hooks to adjust distutils'
build process, something I'd group under more complex
setups.

> For example, if you have tools that generate something, right now you
> could choose to generate either MANIFEST or MANIFEST.in.  I'm suggesting
> that we should choose for you, and say it's MANIFEST.in that you should
> generate, since it's the more expressive and flexible of the two
> formats.  Similarly, you can currently choose what filenames MANIFEST
> and MANIFEST.in have, and I'm suggesting that those be their only
> names.  If you need other files you certainly have the option of
> copying, renaming, and using other existing file manipulation tools.
> 
> These additional degrees of built-in "freedom" just give you
> *meaningless* choices.  They don't make any *real* difference to your
> ability to get the job done.  Instead, they raise a *barrier* to
> creating tools.  If somebody creates a tool to do something with
> manifest files, you can't use it unless you and they have already agreed
> to make the *same* choices about these superfluous options, or else the
> tool maker has to support all possible options - and sometimes, a useful
> tool that supports all the options just isn't *possible*.

So your point is to make your life as setuptools author
easier ?

Why don't you just disable all these options in your
setuptools front-end and hard-code the MANIFEST file
names ?

> See, it wouldn't matter if the arbitrary choice was that you have to
> generate MANIFEST instead of MANIFEST.in.  You'd still have the same
> flexibility.  The important thing in these arbitrary choices is that *we
> pick one*.  That's why we have a BDFL for the language - we all argue
> for a particular surface syntax, and then he picks one, and we all move
> on.  The distutils needs a BDFL to pick between all the mutually
> incompatible but semantically indistinguishable surface syntaxes for how
> to build and package something.

distutils is a loosly coupled framework of components.
In such a framework, a basic design principle is to
be able to decouple and recouple existing components.
The only way to implement this is by making the components
suitably independent and this is what was done in
distutils.

Note that adding user options to change certain assumptions
or defaults does not count towards having "multiple
ways to get something done" - it just gives the user
a possiblity to adapt the framework to a particular
need and on a case-by-case basis.

Also note that it's not hard for setuptools or
any other front-end to access these user options -
just ask the component for them.

>> Again, not everybody is using distribution processes
>> built around CVS or Subversion. Left aside that there
>> are quite a few other SCM tools out there, you also have
>> the case where you create distributions from plain
>> directories (which is what MANIFEST.in and MANIFEST
>> are targetting).
> 
> 
> The source control is a supplement to the "add_defaults" of the sdist
> process, not a replacement for MANIFEST.in.  Some users need to add
> non-source-controlled files, for example.  But *simple things should be
> simple*, and treating source control info as part of the defaults makes
> them work simply for most people.  And if you don't want the defaults,
> there should be only one way to turn them off - by excluding files in
> MANIFEST.in, not by a command line option.

If you don't want the defaults added, you are requesting
a change in the way distutils works. Such a change should
be done using the command line switch --no-defaults
(or added to setup.cfg).

MANIFEST.in OTOH is really only needed in case you plan
to add non-standard files to your source distribution.

You are not changing the way distutils itself works -
just tell it to add a few more things that you might
need or that you might not want in the distribution.

>> > I'm also thinking that most of the MANIFEST logic could and should
>> move to
>> > the Distribution class, since the data will be used by multiple
>> > commands.  Thus, the sdist command could just ask the Distribution
>> for the
>> > MANIFEST and get it, as would the commands that copy package data
>> files to
>> > the build directory.
>>
>> Wait: MANIFEST defines what goes into the sdist - not an
>> arbitrary (binary) distribution.
> 
> 
> I'm talking about the include_package_data option.
> 
> 
>> It would certainly make sense to have the MANIFEST[.in] files
>> automatically be added as default in sdist.py and I'd be
>> +1 on that (even though it never was an issue for me as I
>> always include them in the MANIFEST file).
> 
> 
> Interesting; I could've sworn that you were one of the people who told
> somebody it would be "too much magic" to include this.  But whatever the
> case, I'm glad you don't oppose it now.
>
>> > --no-defaults would be ignored, except for a warning.  If you don't
>> want
>> > the defaults, you can always start your MANIFEST.in with an exclude
>> pattern
>> > to exclude absolutely everything already included.  There shouldn't
>> be two
>> > ways to do the same thing, especially not one that you can use on the
>> > command line to mess things up in a non-repeatable fashion!  Likewise
>> > --no-prune, because that's a similar recipe for disaster.
>>
>> These options are meant for people who don't have a
>> MANIFEST.in file to begin with or just quickly want to
>> build an sdist with parts of the whole distribution or
>> an extended version (e.g. for testing or upgrading).
> 
> 
> Sure - and there are plenty of things I can leave some room for play
> in.  For example, I could simply revert to old behaviors when you use
> any non-default options.  I could make a separate MANIFEST.setuptools
> file, etc.  But these things add complexity, so I want to know who
> *actually* needs them, and can't trivially work around their absence. 
> I'd rather briefly inconvenience distutils mavens like you, than
> continue to stump and frustrate the hundreds of people who just don't
> get why it's all so damn complicated.

Look, nobody stops you from removing all these features
in your front-end. distutils lets you do all this and that's
what so great about it.

My point is that you shouldn't try to strip down distutils
itself just because you think it's hard work to support
all these features in setuptools. It's not needed to strip
down distutils for this reason as you can easily disable
these options for anyone using your setuptools.

As a result, both users of setuptools and straight
distutils are happy.

Cheers,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 16 2005)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Distutils-SIG mailing list