[Distutils] distutils data_files and setuptools.pkg_resources are driving me crazy

Robin Bryce robinbryce at gmail.com
Fri Jul 14 01:56:00 CEST 2006


> Shove the data in a foopackagedata package
> which goes in a separate egg and throw a trivial __init__.py in there.

Yes, exactly. And I get the separate zip_safe as and added bonus.

> What compelling argument?

Ok, I should have said "I think there _may_ be a compelling argument.
I'll have a go:

In another thread, "best practices for creating eggs" Paul Moore said:

>I work offline sufficiently often that not having local documentation
>is frustrating. There's no standard for local docs, which is a
>nuisance, and makes for an inconsistent story between different
>packages, but I'd be concerned if setuptools made it more difficult to
> bundle local docs.

and Kevin Dangoor said:

>Being able to get at a package's documentation after it's
>installed would be nice.

Phil said:

> Note that I said above that I always put the documentation in an sdist
> form; to obtain a package's source distribution, use:
>
>    easy_install -e -b somedir arg...
<snip>
> A standard for how to install documentation would be great

I think the motivation to always package docs in sdist and arrange for
egg and sdist down loads to appear together is really a deficiency in
how data,eggs & pkg_resources interact. All things considered sdist is
the easiest thing for the packager: It's a few runes in MANIFEST.in
followed by python setup.py sdist. But you loose the ability to
reference that data in using pkg_resources.

If I put my docs, data, etc in their own egg I get some of this back.
Something like ``package_dir = package_dir = {'foopackage': './'}``
might be enough to fake me up a package to keep setuptools happy but
if not I'll create one. In this instance I don't care about having an
overly large distribution because the user has actively chosen to
install the docs.

But all of that is less than ideal. AFAICT, if pkg_resources supported
egg root relative resource_names, I would be able to do anything I
want, up to an including writing egg format extensions that do more
specialist things with data packaged in *eggs*.


There is a lot I would *like* to do with eggs as a distribution
format, beyond packaging python source. I can't see a sane way to go
about this in the absence of a consistent way to reference *all* of
the data I have put in my egg.

I would really like to know why pkg_resources('foopackage',
'/conf/foo.conf') is not interpreted as relative to the egg root ?


Robin

On 13/07/06, Bob Ippolito <bob at redivi.com> wrote:
>
> On Jul 13, 2006, at 12:52 PM, Robin Bryce wrote:
>
> > Hi,
> >
> > [using setuptools 0.6b4]
> >
> > Is it possible to have a separate 'zip_safe' decision for data files
> > versus python packages. Ie., a deployed egg with data files and non
> > zip safe packages would appear in site-packages (or wherever) as both
> > a zip archive for the zip safe data AND a directory tree containing
> > the 'eager' resources ?
>
> Make separate eggs. One for the data, one for the code. The code egg
> be zip_safe and the data one not.
>
> > I very much would prefer that the machinery for including data files
> > in a package to be orthogonal to the source building/packaging
> > machinery.
>
> I highly doubt that's what most people want to bother with.
>
> > I think there is a compelling argument that says complex data should
> > be explicitly packaged separately. Ie if foopackage had non trivial
> > data then I, as the package author, should create and distribute
> > foopackage.egg and foopackage.data.egg as separate things.
> > foopackage.egg would require foopackage.data and would, as an
> > additionally benefit, be free to use existant setuptools machinery to
> > separate data versions from package versions. In fact, to argue
> > completely against the thrust of this mail I'm now thinking along the
> > lines of:
>
> What compelling argument?
>
> > - *never* package data in the same egg as the application or library
> > - *always* create a separate foopackage-data package, even if it has
> > no python source in it beyond setup.py and even if the data is
> > trivial.
> > - use the optional dependencies mechanism to pull data in as needed.
>
> That's not really convenient for most people, but if that's what you
> want to do then go ahead. Shove the data in a foopackagedata package
> which goes in a separate egg and throw a trivial __init__.py in there.
>
> -bob
>
>


More information about the Distutils-SIG mailing list