distutils data_files and setuptools.pkg_resources are driving me crazy
Hi, [using setuptools 0.6b4] I'm a setuptools user and greatly appreciative of it as well. I'd like to understand how to use it more appropriately with respect to bundling miscellaneous data files. "just put them in python packages" is really not what I want but perhaps its time to refactor my tastes and accept this is the most appropriate thing to do. I think the root of my confusion lies in how I perceive the 'pkg_' prefix of the setuptools pkg_resources api. I think of it in the general sense of the word package - data I have 'packaged' along with my library or application. Experience suggests it would be better read in the python specific sense of 'package': data that is contained in a file that is a descendant of a python package (the file must be in or under a directory that contains an __init__.py). Which of these two views most accurately reflects the usage for which pkg_resources was designed ? Is it possible to have a separate 'zip_safe' decision for data files versus python packages. Ie., a deployed egg with data files and non zip safe packages would appear in site-packages (or wherever) as both a zip archive for the zip safe data AND a directory tree containing the 'eager' resources ? As context for the rest of this plea for help: A trivial layout that bundles a .conf file:: setuptools-test lib/foopackage/__init__.py lib/foopackage/foo.py conf/foo.conf setup.py versions etc: Ubuntu(dapper), Python 2.4.3, setuptools-0.6b4. My python is installed with the prefix: /home/robin/devel/0root. It is a 'proper' install rather than virtual-python.py setup. Is there a specific reason why there isn't a find_data_files to compliment find_packages in setuptools ? eg., ``data_files=[('/', find_data_files('*.conf'))`` spells: recursively find all .conf files, starting in the directory containing my setup.py, and bundle them in my egg root. I would then expect ``unzip -l foopackage-VER-py2.4.egg`` to produce a tree like:: foopackage/*.py, *.pyc conf/foo.conf EGG-INFO/ # usual suspects Does/could setuptools to overload the distutils keyword 'data_files' and change it's meaning so that it can work with pkg_resources rather than being --prefix relative ? (package_data, while a useful 2.4 addition, is not what I want here) In foopackage/foo.py Why are all of:: pkg_resources(Requirement(__name__), '/conf/foo.conf') pkg_resources(Requirement(__name__), 'conf/foo.conf') pkg_resources(Requirement('foopackage'), '/conf/foo.conf') interpreted as relative to the foopackage directory ? And why does the resource_name '/' not refer to the top of the egg ? Irrespective of whether I specify a relative or absolute path above pkg_resources always looks under the top most package directory. Is this by design ? Non packaged data files are packaged as siblings of 'foopackage'. What is the most convenient way package and access these files such that the references work for egg installs, normal 'setup.py install' installations and for ``python setup.py develop`` pseudo installs ? Extending my example with the following changes I explore pkg_resources.resource_string and friends:: file:setup.cfg [egg_info] egg_base=./ # because I guessed (incorrectly) that this would help. file:setup.py:: from setuptools import setup, find_packages setup( name='foopackage', packages=find_packages('lib'), package_dir={'','lib'}, data_files=[('conf','conf/foo.conf')], entry_points=dict(console_scripts=[ 'fooconf = foopackage.foo:run'])) file:foopackage/foo.py:: from pkg_resources import resource_string def run(): print __name__,__file__ try: foo_config=resource_string(__name__,'/conf/foo.conf') except IOError, e: print str(e) else: print foo_config if __name__=='__main__': run() running in place:: $python setuptools-test/lib/foopackage/foo.py __main__ setuptools-test/lib/foopackage/foo.py [Errno 20] Not no such file or directory: 'setuptools-test/lib/foopackage/conf/foo.conf' This is (almost) what I'd expect, I have not run setup.py yet so setuptools/pkg_resources has no way of knowing anything about my weirdo preferences. Given that setuptools has not had a chance to see my egg_base setting, I would expect '/' to mean the directory *containing* the top most package inferable from __file__. So I would have expected the path in the error to be 'setuptools-test/lib/conf/foo.conf'. But I don't care so much about the 'pre setup.py' scenario. Make an egg:: $python setup.py bdist_egg --keep-temp <snip> copying conf/foo.conf -> build/bdist.linux-i686/egg/conf creating 'dist/foopackage-0.0.0-py2.4.egg' and adding 'build/bdist.linux-i686/egg' to it $ls build/bdist.linux-i686/egg conf EGG-INFO foopackage $unzip -l dist/foopackage-0.0.0-py2.4.egg # paraphrasing the output foopackage/ *.py *.pyc conf/foo.conf EGG-INFO/ Woot! Exactly what I had hoped for. Install the package using develop mode (note the explicit egg_base option above):: First, manually clean up site-packages just to be sure. (rm easy-install.pth; rm foopackage*) $cd setuptools-test $python setup.py develop <snip> Installing fooconf script to /home/robin/devel/0root/bin $cd .. $fooconf Traceback (most recent call last): <snip> ImportError: No module named foopackage.foo $cat $PYSITE/foopackage.egg-link /home/robin/devel/setuptools-test $cat $PYSITE/easy-install.pth import sys; sys.__plen = len(sys.path) /home/robin/devel/setuptools-test <snip - its a fresh easy-install.pth file> Rats. It seems like egg_base is taken both as the place to put my .egg-info directory AND as the means of deciding what should be placed on sys.path in order for my package to be importable. remembering the package_dir option I reach for the distutils docs. But a variety of package=[], package_dir=[] combinations have no effect on the easy-install.pth. Double rats. Is there a way to have easy-install.pth, in the develop case, to get entries with the form: '/path/to/source-root/my/python/packages/live/here' ^--------------------------^ this is the bit we already have | assuming egg_base=./ (which it does not by default) AND an independent way of directing pkg_resources to where my data files are rooted ?? I look at the egg install case:: delete dist & build & do python setup.py bdist_egg, delete easy-install.pth and *.link files because I've been fiddling. $easy_install dist/foopackage-0.0.0-py2.4.egg Processing foopackage-0.0.0-py2.4.egg Copying foopackage-0.0.0-py2.4.egg to /home/robin/devel/0root/lib/python2.4/site-packages Adding foopackage 0.0.0 to easy-install.pth file Installed /home/robin/devel/0root/lib/python2.4/site-packages/foopackage-0.0.0-py2.4.egg Processing dependencies for foopackage==0.0.0 Erm what happened to my script ? (using setuptools 0.6b4). Quoting the easy_install docs: "Whenever you install, upgrade, or change versions of a package, EasyInstall automatically installs the scripts for the selected package version, unless you tell it not to" I fire up python:: $pwd /home/robin/devel/setuptools-test $cd .. $which python $/home/robin/devel/0root/bin/python $python >>>from foopackage.foo import run >>>run() foopackage.foo /home/robin/devel/0root/lib/python2.4/site-packages/foopackage-0.0.0-py2.4.egg/foopackage/foo.pyc [Errno 2] No such file or directory: '/home/robin/devel/0root/lib/python2.4/site-packages/foopackage-0.0.0-py2.4.egg/foopackage/conf/foo.conf' Ctrl-D $ls 0root/lib/python2.4/site-packages/foopackage-0.0.0-py2.4.egg/conf/ foo.conf (extreme gnashing of teeth, followed by a visit to the refrigerator so i can throw some eggs at the wall) I give up on egg_base, delete my setup.cfg, manually clean up my site-packages directory and delete my dist & build trees. I create a new egg, this time without the egg_base option:: $cd setuptools-test $python setup.py bdist_egg $unzip -l dist/foopackage-0.0.0-py2.4.egg # paraphrasing the output foopackage/ *.py *.pyc conf/foo.conf EGG-INFO/ Again, exactly what I want and shows that egg_base has *no* effect on the internal layout of the egg. Lets install it:: $easy_install dist/foopackage-0.0.0-py2.4.egg Processing foopackage-0.0.0-py2.4.egg creating /home/robin/devel/0root/lib/python2.4/site-packages/foopackage-0.0.0-py2.4.egg Extracting foopackage-0.0.0-py2.4.egg to /home/robin/devel/0root/lib/python2.4/site-packages Adding foopackage 0.0.0 to easy-install.pth file Installing fooconf script to /home/robin/devel/0root/bin Installed /home/robin/devel/0root/lib/python2.4/site-packages/foopackage-0.0.0-py2.4.egg Processing dependencies for foopackage==0.0.0 What the frak, this time I get my script. What on earth does egg_base have to do with script generation ? I try it out:: $fooconf foopackage.foo PYSITE/foopackage-0.0.0-py2.4.egg/foopackage/foo.pyc [Errno 2] No such file or directory: 'PYSITE/foopackage-0.0.0-py2.4.egg/foopackage/conf/foo.conf' Curses. <gripe>The fact that distutils does not include files specified in MANIFIEST.in for anything other than the sdist command (source distributions) is really tedious (and horribly confusing when first encountered).</gripe> (minor nit) the include_package_data option is well named but poorly described: "Accept all data files and directories matched by MANIFEST.in or found in source control". This is a lie. Only those files that are desendants of a python package directory (directory that has __init__.py) are considered. "Accept all python package data files " would reduce confusion and pointless optimism. At least for those that know just enough to get themselves into trouble (like me). I very much would prefer that the machinery for including data files in a package to be orthogonal to the source building/packaging machinery. I think there is a compelling argument that says complex data should be explicitly packaged separately. Ie if foopackage had non trivial data then I, as the package author, should create and distribute foopackage.egg and foopackage.data.egg as separate things. foopackage.egg would require foopackage.data and would, as an additionally benefit, be free to use existant setuptools machinery to separate data versions from package versions. In fact, to argue completely against the thrust of this mail I'm now thinking along the lines of: - *never* package data in the same egg as the application or library - *always* create a separate foopackage-data package, even if it has no python source in it beyond setup.py and even if the data is trivial. - use the optional dependencies mechanism to pull data in as needed. Anyhow. Thats got that lot off my chest. I have no intention of giving up on setuptools, it is *far* to useful for that. I do want to hear from distutils folks that could help straighten me out :-) Cheers, Robin
On Jul 13, 2006, at 12:52 PM, Robin Bryce wrote:
[using setuptools 0.6b4]
I'm a setuptools user and greatly appreciative of it as well. I'd like to understand how to use it more appropriately with respect to bundling miscellaneous data files. "just put them in python packages" is really not what I want but perhaps its time to refactor my tastes and accept this is the most appropriate thing to do.
Refactor your tastes, it is the most appropriate thing to do. -bob
On Jul 13, 2006, at 12:52 PM, Robin Bryce wrote:
Hi,
[using setuptools 0.6b4]
Is it possible to have a separate 'zip_safe' decision for data files versus python packages. Ie., a deployed egg with data files and non zip safe packages would appear in site-packages (or wherever) as both a zip archive for the zip safe data AND a directory tree containing the 'eager' resources ?
Make separate eggs. One for the data, one for the code. The code egg be zip_safe and the data one not.
I very much would prefer that the machinery for including data files in a package to be orthogonal to the source building/packaging machinery.
I highly doubt that's what most people want to bother with.
I think there is a compelling argument that says complex data should be explicitly packaged separately. Ie if foopackage had non trivial data then I, as the package author, should create and distribute foopackage.egg and foopackage.data.egg as separate things. foopackage.egg would require foopackage.data and would, as an additionally benefit, be free to use existant setuptools machinery to separate data versions from package versions. In fact, to argue completely against the thrust of this mail I'm now thinking along the lines of:
What compelling argument?
- *never* package data in the same egg as the application or library - *always* create a separate foopackage-data package, even if it has no python source in it beyond setup.py and even if the data is trivial. - use the optional dependencies mechanism to pull data in as needed.
That's not really convenient for most people, but if that's what you want to do then go ahead. Shove the data in a foopackagedata package which goes in a separate egg and throw a trivial __init__.py in there. -bob
Shove the data in a foopackagedata package which goes in a separate egg and throw a trivial __init__.py in there.
Yes, exactly. And I get the separate zip_safe as and added bonus.
What compelling argument?
Ok, I should have said "I think there _may_ be a compelling argument. I'll have a go: In another thread, "best practices for creating eggs" Paul Moore said:
I work offline sufficiently often that not having local documentation is frustrating. There's no standard for local docs, which is a nuisance, and makes for an inconsistent story between different packages, but I'd be concerned if setuptools made it more difficult to bundle local docs.
and Kevin Dangoor said:
Being able to get at a package's documentation after it's installed would be nice.
Phil said:
Note that I said above that I always put the documentation in an sdist form; to obtain a package's source distribution, use:
easy_install -e -b somedir arg... <snip> A standard for how to install documentation would be great
I think the motivation to always package docs in sdist and arrange for egg and sdist down loads to appear together is really a deficiency in how data,eggs & pkg_resources interact. All things considered sdist is the easiest thing for the packager: It's a few runes in MANIFEST.in followed by python setup.py sdist. But you loose the ability to reference that data in using pkg_resources. If I put my docs, data, etc in their own egg I get some of this back. Something like ``package_dir = package_dir = {'foopackage': './'}`` might be enough to fake me up a package to keep setuptools happy but if not I'll create one. In this instance I don't care about having an overly large distribution because the user has actively chosen to install the docs. But all of that is less than ideal. AFAICT, if pkg_resources supported egg root relative resource_names, I would be able to do anything I want, up to an including writing egg format extensions that do more specialist things with data packaged in *eggs*. There is a lot I would *like* to do with eggs as a distribution format, beyond packaging python source. I can't see a sane way to go about this in the absence of a consistent way to reference *all* of the data I have put in my egg. I would really like to know why pkg_resources('foopackage', '/conf/foo.conf') is not interpreted as relative to the egg root ? Robin On 13/07/06, Bob Ippolito <bob@redivi.com> wrote:
On Jul 13, 2006, at 12:52 PM, Robin Bryce wrote:
Hi,
[using setuptools 0.6b4]
Is it possible to have a separate 'zip_safe' decision for data files versus python packages. Ie., a deployed egg with data files and non zip safe packages would appear in site-packages (or wherever) as both a zip archive for the zip safe data AND a directory tree containing the 'eager' resources ?
Make separate eggs. One for the data, one for the code. The code egg be zip_safe and the data one not.
I very much would prefer that the machinery for including data files in a package to be orthogonal to the source building/packaging machinery.
I highly doubt that's what most people want to bother with.
I think there is a compelling argument that says complex data should be explicitly packaged separately. Ie if foopackage had non trivial data then I, as the package author, should create and distribute foopackage.egg and foopackage.data.egg as separate things. foopackage.egg would require foopackage.data and would, as an additionally benefit, be free to use existant setuptools machinery to separate data versions from package versions. In fact, to argue completely against the thrust of this mail I'm now thinking along the lines of:
What compelling argument?
- *never* package data in the same egg as the application or library - *always* create a separate foopackage-data package, even if it has no python source in it beyond setup.py and even if the data is trivial. - use the optional dependencies mechanism to pull data in as needed.
That's not really convenient for most people, but if that's what you want to do then go ahead. Shove the data in a foopackagedata package which goes in a separate egg and throw a trivial __init__.py in there.
-bob
On Jul 13, 2006, at 4:56 PM, Robin Bryce wrote:
What compelling argument?
Ok, I should have said "I think there _may_ be a compelling argument. I'll have a go:
In another thread, "best practices for creating eggs" Paul Moore said:
I work offline sufficiently often that not having local documentation is frustrating. There's no standard for local docs, which is a nuisance, and makes for an inconsistent story between different packages, but I'd be concerned if setuptools made it more difficult to bundle local docs.
and Kevin Dangoor said:
Being able to get at a package's documentation after it's installed would be nice.
Phil said:
Note that I said above that I always put the documentation in an sdist form; to obtain a package's source distribution, use:
easy_install -e -b somedir arg... <snip> A standard for how to install documentation would be great
I think the motivation to always package docs in sdist and arrange for egg and sdist down loads to appear together is really a deficiency in how data,eggs & pkg_resources interact. All things considered sdist is the easiest thing for the packager: It's a few runes in MANIFEST.in followed by python setup.py sdist. But you loose the ability to reference that data in using pkg_resources.
If I put my docs, data, etc in their own egg I get some of this back. Something like ``package_dir = package_dir = {'foopackage': './'}`` might be enough to fake me up a package to keep setuptools happy but if not I'll create one. In this instance I don't care about having an overly large distribution because the user has actively chosen to install the docs.
You said data before, not supplementary materials like docs and examples. Data that's consumed by the code absolutely belongs in eggs and shouldn't be strewn all over the place. One of the most important features of eggs is that they consolidate stuff. Without eggs you install a package and you get files thrown all over your disk without any way to keep track of what belongs to which package. It would be nice if there were a convention for installing supplementary stuff (docs, examples, py2app/py2exe generated tools, xcode templates, etc.). One problem with this is that you generally want those things to be somewhere easily accessible by the user, and something based on site-packages isn't (or sys.prefix at all in some cases, like Mac OS X). I'm not sure how much integration with distutils this stuff really deserves because it's not required to make anything work. For now, the author could just provide a zip of the supplementary materials for the users to download if they want to look at them offline. Another thing that setuptools is currently missing is support for packaging up dynamic libraries (e.g. pygame's SDL dlls) and/or headers (e.g. Numeric's API), but that's a bigger distutils problem.
But all of that is less than ideal. AFAICT, if pkg_resources supported egg root relative resource_names, I would be able to do anything I want, up to an including writing egg format extensions that do more specialist things with data packaged in *eggs*.
Egg format extensions are already provided for by entry points and the EGG-INFO metadata dir. Eggs were designed to facilitate plug-ins, after all. -bob
Data that's consumed by the code absolutely belongs in eggs and shouldn't be strewn all over the place
Totally agree. even if this means I've contradicted my self some where along the line.
the *reason* that pkg_resources doesn't support egg root relative resource names is because it won't work right with system packaging tools like RPM, Debian, etc.
ah. food for thought. thanks.
is currently missing is support for packaging up dynamic libraries
Oh, I've been to timid to even consider trying that. But mixed language development needs is something I very nearly mentioned in last post. I had games development in mind[1]. I just can't see how far it is reasonable to expect 'stock' egg/setuptools to stretch. I mean where would it all end: pydselect, pje nervous break down ?
Egg format extensions are already provided for by entry points and the EGG-INFO metadata dir
So if I decide I really care about referencing data in this way: investigating adding setup keywords, files to EGG-INFO and inventing my own rules is the way to go ? Hrm, time to pull down the the setuptools trunk I think ;-) Thanks, Robin [1] The small number of uk games developers I've worked for use lua and C++. And wouldn't dream of using a packaging system they did not write them selves.
On Jul 14, 2006, at 1:56 AM, Robin Bryce wrote:
Shove the data in a foopackagedata package which goes in a separate egg and throw a trivial __init__.py in there.
Yes, exactly. And I get the separate zip_safe as and added bonus.
What compelling argument?
Ok, I should have said "I think there _may_ be a compelling argument. I'll have a go:
In another thread, "best practices for creating eggs" Paul Moore said:
I work offline sufficiently often that not having local documentation is frustrating. There's no standard for local docs, which is a nuisance, and makes for an inconsistent story between different packages, but I'd be concerned if setuptools made it more difficult to bundle local docs.
Documentation can be tweaked into the current setup, with some minor changes to IDEs. Someone just has to come up with a cannonical way to do documentation eggs, then IDEs can be changed to be use this. BTW. I'm using IDE very loosely here, there could just as easy be a script that updates a index.html that people have listed in their browser bookmarks. Stuff for tools outside of the python world would be harder, PyObjC includes some templates for Xcode (Apple's IDE) and those must be installed in a specific location. Ronald
participants (3)
-
Bob Ippolito
-
Robin Bryce
-
Ronald Oussoren