Setuptools download cache
Attached is a patch which adds download caching to setuptools. At TOPP (http://topp.openplans.org/), we use a system called fassembler to build our opencore stack. It creates approximately a dozen virtualenvs, each with their own lib/python, and then uses setuptools to install lots of libraries. Some of these libraries are common among multiple apps, but we install multiple copies for ease of development. And every time we rebuild, we start the whole process over again. The major slowdown in building is downloading a bunch of things which probably haven't changed since last time we downloaded them. This patch will let us maintain a cache of all downloads, and thus do builds much faster. Anyway, I hope you'll accept this patch.
At 04:40 PM 8/20/2008 -0400, David Turner wrote:
Attached is a patch which adds download caching to setuptools.
At TOPP (http://topp.openplans.org/), we use a system called fassembler to build our opencore stack. It creates approximately a dozen virtualenvs, each with their own lib/python, and then uses setuptools to install lots of libraries. Some of these libraries are common among multiple apps, but we install multiple copies for ease of development. And every time we rebuild, we start the whole process over again. The major slowdown in building is downloading a bunch of things which probably haven't changed since last time we downloaded them. This patch will let us maintain a cache of all downloads, and thus do builds much faster.
The process I'd suggest for this use case is to build the external libraries using: easy_install -f cache_dir -zmaxd cache_dir lib1 lib2 ... This command will NOT go to the web for new versions of libraries, unless you also use -U. But it will ensure that the specified libraries have suitable eggs in cache_dir. Then, to install a given set of libraries to a virtualenv, use: easy_install -f cache_dir lib1 lib2 ... Or, if you really insist on multiple copies of the eggs (instead of just linking to them), use: easy_install -af cache_dir lib1 lib2 ... (which will copy the .egg files even if they could be used in place). Unlike your caching proposal, this approach gives you finer control over which libraries to update, when. You can also update the cache without changing what's installed in a given virtualenv.
On Wed, 2008-08-20 at 18:39 -0400, Phillip J. Eby wrote:
At 04:40 PM 8/20/2008 -0400, David Turner wrote:
Attached is a patch which adds download caching to setuptools.
At TOPP (http://topp.openplans.org/), we use a system called fassembler to build our opencore stack. It creates approximately a dozen virtualenvs, each with their own lib/python, and then uses setuptools to install lots of libraries. Some of these libraries are common among multiple apps, but we install multiple copies for ease of development. And every time we rebuild, we start the whole process over again. The major slowdown in building is downloading a bunch of things which probably haven't changed since last time we downloaded them. This patch will let us maintain a cache of all downloads, and thus do builds much faster.
The process I'd suggest for this use case is to build the external libraries using:
easy_install -f cache_dir -zmaxd cache_dir lib1 lib2 ...
This command will NOT go to the web for new versions of libraries, unless you also use -U. But it will ensure that the specified libraries have suitable eggs in cache_dir.
Then, to install a given set of libraries to a virtualenv, use:
easy_install -f cache_dir lib1 lib2 ...
Or, if you really insist on multiple copies of the eggs (instead of just linking to them), use:
easy_install -af cache_dir lib1 lib2 ...
(which will copy the .egg files even if they could be used in place).
Unlike your caching proposal, this approach gives you finer control over which libraries to update, when. You can also update the cache without changing what's installed in a given virtualenv.
I'm having a very hard time getting it working, actually. I should note that I'm not using easy_install directly, but through setup.py. This is because I don't want to have to list all my dependencies twice, and setup.py passes options on to easy_install, as I understand it. Here's a simple test case: 1. Create a virtualenv: $ virtualenv.py /tmp/testve 2. Activate $ cd /tmp/testve $ . bin/activate 3. Check out Cabochon. (testve)$ svn co https://svn.openplans.org/svn/Cabochon/trunk cabochon 4. Try to set up (testve)$ mkdir /tmp/ec2 # the cache directory (testve)$ cd cabochon (testve)$ python setup.py develop -f /tmp/ec2 -zmaxd /tmp/ec2 running develop Checking .pth file support in /tmp/ec2 /tmp/testve/bin/python -E -c pass running egg_info creating Cabochon.egg-info writing requirements to Cabochon.egg-info/requires.txt writing Cabochon.egg-info/PKG-INFO writing top-level names to Cabochon.egg-info/top_level.txt writing dependency_links to Cabochon.egg-info/dependency_links.txt writing entry points to Cabochon.egg-info/entry_points.txt writing manifest file 'Cabochon.egg-info/SOURCES.txt' writing manifest file 'Cabochon.egg-info/SOURCES.txt' running build_ext Creating /tmp/ec2/Cabochon.egg-link (link to .) Installed /tmp/testve/cabochon Because this distribution was installed --multi-version, before you can import modules from this package in an application, you will need to 'import pkg_resources' and then use a 'require()' call similar to one of these examples, in order to select the desired version: pkg_resources.require("Cabochon") # latest installed version pkg_resources.require("Cabochon==0.2dev-r19871") # this exact version pkg_resources.require("Cabochon>=0.2dev-r19871") # this version or higher Note also that the installation directory must be on sys.path at runtime for this to work. (e.g. by being the application's script directory, by being on PYTHONPATH, or by being added to sys.path by your code.) [many more lines of this as it installs all the requirements] 5. Try to run (testve)$ paster bash: paster: command not found 6. Hm, that's no good. Well, what if we just manually try to see if stuff is installed: (testve)$ python Python 2.5.2 (r252:60911, Jul 31 2008, 17:31:22) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import cabochon Traceback (most recent call last): File "<stdin>", line 1, in <module> File "cabochon/__init__.py", line 8, in <module> from cabochon.config.middleware import make_app File "cabochon/config/middleware.py", line 2, in <module> from paste import httpexceptions ImportError: No module named paste
Nope.
At 03:20 PM 8/22/2008 -0400, David Turner wrote:
On Wed, 2008-08-20 at 18:39 -0400, Phillip J. Eby wrote:
At 04:40 PM 8/20/2008 -0400, David Turner wrote:
Attached is a patch which adds download caching to setuptools.
At TOPP (http://topp.openplans.org/), we use a system called fassembler to build our opencore stack. It creates approximately a dozen virtualenvs, each with their own lib/python, and then uses setuptools to install lots of libraries. Some of these libraries are common among multiple apps, but we install multiple copies for ease of development. And every time we rebuild, we start the whole process over again. The major slowdown in building is downloading a bunch of things which probably haven't changed since last time we downloaded them. This patch will let us maintain a cache of all downloads, and thus do builds much faster.
The process I'd suggest for this use case is to build the external libraries using:
easy_install -f cache_dir -zmaxd cache_dir lib1 lib2 ...
This command will NOT go to the web for new versions of libraries, unless you also use -U. But it will ensure that the specified libraries have suitable eggs in cache_dir.
Then, to install a given set of libraries to a virtualenv, use:
easy_install -f cache_dir lib1 lib2 ...
Or, if you really insist on multiple copies of the eggs (instead of just linking to them), use:
easy_install -af cache_dir lib1 lib2 ...
(which will copy the .egg files even if they could be used in place).
Unlike your caching proposal, this approach gives you finer control over which libraries to update, when. You can also update the cache without changing what's installed in a given virtualenv.
I'm having a very hard time getting it working, actually. I should note that I'm not using easy_install directly, but through setup.py. This is because I don't want to have to list all my dependencies twice, and setup.py passes options on to easy_install, as I understand it.
You don't have to specify dependencies twice. Just do: easy_install -f cache_dir -zmaxd cache_dir path_to_checkout It will then build an egg from the checkout and copy it and all the dependencies to the cache dir.
Here's a simple test case:
1. Create a virtualenv:
$ virtualenv.py /tmp/testve
2. Activate
$ cd /tmp/testve $ . bin/activate
3. Check out Cabochon.
(testve)$ svn co https://svn.openplans.org/svn/Cabochon/trunk cabochon
4. Try to set up
(testve)$ mkdir /tmp/ec2 # the cache directory
Here, you should run easy_install -f /tmp/ec2 -zmaxd /tmp/ec2 cabochon
(testve)$ cd cabochon (testve)$ python setup.py develop -f /tmp/ec2 -zmaxd /tmp/ec2
Then here, run: python setup.py develop -af /tmp/ec2 This will then copy any dependency eggs from the cache dir to the virtualenv, and set up the checkout for development.
running develop Checking .pth file support in /tmp/ec2 /tmp/testve/bin/python -E -c pass running egg_info creating Cabochon.egg-info writing requirements to Cabochon.egg-info/requires.txt writing Cabochon.egg-info/PKG-INFO writing top-level names to Cabochon.egg-info/top_level.txt writing dependency_links to Cabochon.egg-info/dependency_links.txt writing entry points to Cabochon.egg-info/entry_points.txt writing manifest file 'Cabochon.egg-info/SOURCES.txt' writing manifest file 'Cabochon.egg-info/SOURCES.txt' running build_ext Creating /tmp/ec2/Cabochon.egg-link (link to .)
Installed /tmp/testve/cabochon
Because this distribution was installed --multi-version, before you can import modules from this package in an application, you will need to 'import pkg_resources' and then use a 'require()' call similar to one of these examples, in order to select the desired version:
pkg_resources.require("Cabochon") # latest installed version pkg_resources.require("Cabochon==0.2dev-r19871") # this exact version pkg_resources.require("Cabochon>=0.2dev-r19871") # this version or higher
Note also that the installation directory must be on sys.path at runtime for this to work. (e.g. by being the application's script directory, by being on PYTHONPATH, or by being added to sys.path by your code.) [many more lines of this as it installs all the requirements]
5. Try to run
(testve)$ paster bash: paster: command not found
6. Hm, that's no good. Well, what if we just manually try to see if stuff is installed:
(testve)$ python Python 2.5.2 (r252:60911, Jul 31 2008, 17:31:22) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import cabochon Traceback (most recent call last): File "<stdin>", line 1, in <module> File "cabochon/__init__.py", line 8, in <module> from cabochon.config.middleware import make_app File "cabochon/config/middleware.py", line 2, in <module> from paste import httpexceptions ImportError: No module named paste
Nope.
This didn't work because you only did half of what I said; you have to do the -zmaxd step to load or update the cache, and the -af step to actually install your target to the virtualenv.
I get a really weird error when I try to follow these instructions -- but only sometimes. In order to reproduce the error, you need to start with a fresh virtualenv and cache directory (I think). I can't figure out why. And It seems that after I test it a few times with a given piece of software, the error goes away and I can't reproduce it again. I'm really baffled here. Of course, I can just re-run easy_install, and that always clears up the error (so far), but this is not really a good solution. Processing dependencies for eyvind==0.1dev-r18052 Traceback (most recent call last): File "/tmp/test2/bin/easy_install", line 8, in <module> load_entry_point('setuptools==0.6c8', 'console_scripts', 'easy_install')() File "/usr/lib/python2.5/site-packages/setuptools/command/easy_install.py", line 1671, in main with_ei_usage(lambda: File "/usr/lib/python2.5/site-packages/setuptools/command/easy_install.py", line 1659, in with_ei_usage return f() File "/usr/lib/python2.5/site-packages/setuptools/command/easy_install.py", line 1675, in <lambda> distclass=DistributionWithoutHelpCommands, **kw File "/usr/lib/python2.5/distutils/core.py", line 151, in setup dist.run_commands() File "/usr/lib/python2.5/distutils/dist.py", line 974, in run_commands self.run_command(cmd) File "/usr/lib/python2.5/distutils/dist.py", line 994, in run_command cmd_obj.run() File "/usr/lib/python2.5/site-packages/setuptools/command/easy_install.py", line 211, in run self.easy_install(spec, not self.no_deps) File "/usr/lib/python2.5/site-packages/setuptools/command/easy_install.py", line 427, in easy_install return self.install_item(None, spec, tmpdir, deps, True) File "/usr/lib/python2.5/site-packages/setuptools/command/easy_install.py", line 478, in install_item self.process_distribution(spec, dist, deps) File "/usr/lib/python2.5/site-packages/setuptools/command/easy_install.py", line 519, in process_distribution [requirement], self.local_index, self.easy_install File "/usr/lib/python2.5/site-packages/pkg_resources.py", line 529, in resolve requirements.extend(dist.requires(req.extras)[::-1]) File "/usr/lib/python2.5/site-packages/pkg_resources.py", line 2107, in requires dm = self._dep_map File "/usr/lib/python2.5/site-packages/pkg_resources.py", line 2099, in _dep_map for extra,reqs in split_sections(self._get_metadata(name)): File "/usr/lib/python2.5/site-packages/pkg_resources.py", line 2518, in split_sections for line in yield_lines(s): File "/usr/lib/python2.5/site-packages/pkg_resources.py", line 1813, in yield_lines for ss in strs: File "/usr/lib/python2.5/site-packages/pkg_resources.py", line 2121, in _get_metadata for line in self.get_metadata_lines(name): File "/usr/lib/python2.5/site-packages/pkg_resources.py", line 1140, in get_metadata_lines return yield_lines(self.get_metadata(name)) File "/usr/lib/python2.5/site-packages/pkg_resources.py", line 1137, in get_metadata return self._get(self._fn(self.egg_info,name)) File "/usr/lib/python2.5/site-packages/pkg_resources.py", line 1195, in _get return self.loader.get_data(path) zipimport.ZipImportError: bad local file header in /tmp/ec5/eyvind-0.1dev_r18052-py2.5.egg I have uploaded examples of these eggs to http://novalis.org/bugs/, in case they give any hints about what weird thing setuptools is doing. Also, I can't easy_install lxml, even though python setup.py develop works fine: [after checking lxml out from svn] (test8)novalis@gentle:/tmp/test8$ easy_install -f /tmp/ec9 -zmaxd /tmp/ec9 lxml Processing lxml Running setup.py -q bdist_egg --dist-dir /tmp/test8/lxml/egg-dist-tmp-gzDfr3 Building lxml version 2.2.alpha1-57598. Building with Cython 0.9.8.1.1. Using build configuration of libxslt 1.1.22 Building against libxml2/libxslt in the following directory: /usr/lib warning: no files found matching 'lxml.etree.c' under directory 'src/lxml' warning: no files found matching 'lxml.objectify.c' under directory 'src/lxml' warning: no files found matching 'lxml.etree.h' under directory 'src/lxml' warning: no files found matching 'lxml.etree_api.h' under directory 'src/lxml' warning: no files found matching '*.html' under directory 'doc' gcc: src/lxml/lxml.etree.c: No such file or directory gcc: no input files error: Setup script exited with error: command 'gcc' failed with exit status 1 Any ideas on this one? I could, I suppose, hard-code that LXML should not be easy_installed, but this would be a hack. On Sun, 2008-08-24 at 15:40 -0400, Phillip J. Eby wrote:
At 03:20 PM 8/22/2008 -0400, David Turner wrote:
On Wed, 2008-08-20 at 18:39 -0400, Phillip J. Eby wrote:
At 04:40 PM 8/20/2008 -0400, David Turner wrote:
Attached is a patch which adds download caching to setuptools.
At TOPP (http://topp.openplans.org/), we use a system called fassembler to build our opencore stack. It creates approximately a dozen virtualenvs, each with their own lib/python, and then uses setuptools to install lots of libraries. Some of these libraries are common among multiple apps, but we install multiple copies for ease of development. And every time we rebuild, we start the whole process over again. The major slowdown in building is downloading a bunch of things which probably haven't changed since last time we downloaded them. This patch will let us maintain a cache of all downloads, and thus do builds much faster.
The process I'd suggest for this use case is to build the external libraries using:
easy_install -f cache_dir -zmaxd cache_dir lib1 lib2 ...
This command will NOT go to the web for new versions of libraries, unless you also use -U. But it will ensure that the specified libraries have suitable eggs in cache_dir.
Then, to install a given set of libraries to a virtualenv, use:
easy_install -f cache_dir lib1 lib2 ...
Or, if you really insist on multiple copies of the eggs (instead of just linking to them), use:
easy_install -af cache_dir lib1 lib2 ...
(which will copy the .egg files even if they could be used in place).
Unlike your caching proposal, this approach gives you finer control over which libraries to update, when. You can also update the cache without changing what's installed in a given virtualenv.
I'm having a very hard time getting it working, actually. I should note that I'm not using easy_install directly, but through setup.py. This is because I don't want to have to list all my dependencies twice, and setup.py passes options on to easy_install, as I understand it.
You don't have to specify dependencies twice. Just do:
easy_install -f cache_dir -zmaxd cache_dir path_to_checkout
It will then build an egg from the checkout and copy it and all the dependencies to the cache dir.
Here's a simple test case:
1. Create a virtualenv:
$ virtualenv.py /tmp/testve
2. Activate
$ cd /tmp/testve $ . bin/activate
3. Check out Cabochon.
(testve)$ svn co https://svn.openplans.org/svn/Cabochon/trunk cabochon
4. Try to set up
(testve)$ mkdir /tmp/ec2 # the cache directory
Here, you should run easy_install -f /tmp/ec2 -zmaxd /tmp/ec2 cabochon
(testve)$ cd cabochon (testve)$ python setup.py develop -f /tmp/ec2 -zmaxd /tmp/ec2
Then here, run:
python setup.py develop -af /tmp/ec2
This will then copy any dependency eggs from the cache dir to the virtualenv, and set up the checkout for development.
running develop Checking .pth file support in /tmp/ec2 /tmp/testve/bin/python -E -c pass running egg_info creating Cabochon.egg-info writing requirements to Cabochon.egg-info/requires.txt writing Cabochon.egg-info/PKG-INFO writing top-level names to Cabochon.egg-info/top_level.txt writing dependency_links to Cabochon.egg-info/dependency_links.txt writing entry points to Cabochon.egg-info/entry_points.txt writing manifest file 'Cabochon.egg-info/SOURCES.txt' writing manifest file 'Cabochon.egg-info/SOURCES.txt' running build_ext Creating /tmp/ec2/Cabochon.egg-link (link to .)
Installed /tmp/testve/cabochon
Because this distribution was installed --multi-version, before you can import modules from this package in an application, you will need to 'import pkg_resources' and then use a 'require()' call similar to one of these examples, in order to select the desired version:
pkg_resources.require("Cabochon") # latest installed version pkg_resources.require("Cabochon==0.2dev-r19871") # this exact version pkg_resources.require("Cabochon>=0.2dev-r19871") # this version or higher
Note also that the installation directory must be on sys.path at runtime for this to work. (e.g. by being the application's script directory, by being on PYTHONPATH, or by being added to sys.path by your code.) [many more lines of this as it installs all the requirements]
5. Try to run
(testve)$ paster bash: paster: command not found
6. Hm, that's no good. Well, what if we just manually try to see if stuff is installed:
(testve)$ python Python 2.5.2 (r252:60911, Jul 31 2008, 17:31:22) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import cabochon Traceback (most recent call last): File "<stdin>", line 1, in <module> File "cabochon/__init__.py", line 8, in <module> from cabochon.config.middleware import make_app File "cabochon/config/middleware.py", line 2, in <module> from paste import httpexceptions ImportError: No module named paste
Nope.
This didn't work because you only did half of what I said; you have to do the -zmaxd step to load or update the cache, and the -af step to actually install your target to the virtualenv.
participants (2)
-
David Turner
-
Phillip J. Eby