[Distutils] zc.buildout and site-packages

Gary Poster gary.poster at canonical.com
Fri Oct 23 18:01:53 CEST 2009


On Oct 22, 2009, at 11:08 PM, Kevin Teague wrote:

>
> On Oct 22, 2009, at 10:43 AM, Tres Seaver wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Martin Aspeli wrote:
>>> Hi,
>>>
>>> Is there a way (apart from putting buildout in a virtualenv with -- 
>>> no-site-
>>> packages) to tell buildout *not* to put site-packages as the first  
>>> line in the
>>> mangled sys.path when it generates scripts?
>>>
>>> We have people doing horrid things to their global python, and we  
>>> need the
>>> buildout to be safe and isolated in these environments.
>>
>> Using a --no-site-packages virtualenv to drive the buildout is a  
>> pretty
>> lightweight solution, and easier than the old standby of compiling  
>> your
>> own Python to get isolation from the global one, whichstill highly
>> recommended:  I build my own Python, and then use a separate  
>> virtualenv
>> for each project.
>>
>
> The idea behind Gary's branch

To be clear, *an* idea.  You can also just make a "don't give me what  
is in site-packages" gesture.  (When you do that, in the current  
branch, the generated scripts still have the complexities you describe  
below, though.)

> (http://svn.zope.org/zc.buildout/branches/gary-4-include-site- 
> packages) is that unlike the --no-site-packages option of  
> virtualenv, which is all-or-nothing proposition, you would be able  
> to include site-package locations in Buildout's script generation,  
> but care would be taken that if distributions are selected from a  
> site-package location to make sure that when site-package locations  
> are included on sys.path, those locations don't overshadow any other  
> paths pointing explicitly to already picked versions of  
> distributions. e.g. If I was using Apple's System Python on Leopard  
> (10.5), then site-packages includes zope.interface 3.3.0 and  
> bdist_mpkg 0.4.3. If I wanted to pick 'zope.interface == 3.3.0' and  
> 'bdist_mpkg == 0.4.4', then currently Buildout could generate a path  
> modification that looks like:
>
> sys.path[0:0] = [
>  '/System/Library/Frameworks/Python.framework/Versions/2.5/Extras/ 
> lib/python',
>  '/Users/kteague/buildouts/shared/eggs/bdist_mpkg-0.4.4-py2.5.egg',
> ]
>
> Where that System path contains bdist_mpkg 0.4.3. The ordering of  
> whether the site-package location is put before or after version- 
> specific paths is currently dependant upon the ordering in the  
> install_requires field (so you get the correct versions importable  
> if those distributions which are picked from site-packages are  
> listed after the non-site-package picked versions!) - obviously this  
> is just a side-effect of the current path manipulation implementation.

Not exactly.  I was going to go for that, but it was too hard/insane.   
(Do I need to update some docs on the branch?)

If you use this feature, then eggs from site-packages can be inserted  
cleanly along with other eggs.  They can be chosen individually,  
without masking other eggs.  Site-packages-like directories  
themselves--the directories that are not eggs, but collections of  
standard directory packages--always go at the end of the sys.path.   
Otherwise their contents might mask the eggs you chose.

What we actually ended up using ourselves (Launchpad) is "don't use  
any eggs from site-packages, but let site-packages through at the end  
so we can get some of the non-egg things from it that our system is  
providing, like Postgres-Python bindings."

> One would assume that making this change is fairly easy. Just do a  
> diff between normal sys.path and the site-package free sys.path when  
> Python is launched with the -S flag. Which Gary's code does, but the  
> script generation in Gary's branch right now also accounts for the  
> fact that *.pth files have been processed, and that you are allowed  
> to have import statements executed when *.pth files are processed,  
> so he is generating scripts which also clean sys.modules, and then  
> re-add site-packages locations with site.addsitedir(location) so  
> that .pth files are properly re-processed. Which is pretty fancy,  
> and probably "Does the Right Thing (TM)", but also greatly clutters  
> up the generated scripts.

Mostly right, and granted that the scripts are bigger and more  
annoying than they are in trunk.

FWIW, the "fancy" bits are not primarily because .pth files might  
import.  It's more because the setuptools approach to creating  
namespace packages in site-packages--that is, the approach that OS  
distributions typically use--creates fake modules for the namespace  
packages.  These mask any sys.path eggs in the same namespace  
packages, at least as of c9.  We have to clean the fake modules out,  
set up the sys.path, import pkg_resources because that magically does  
the right thing for any eggs on the sys.path, and *then* process .pth  
files.

(I hope that PEP 382 is accepted and helps.)

> I quite like having script generation generate scripts which are  
> still reasonably compact (I often open generated scripts to see what  
> Buildout is doing, or sometimes edit them to hand-pick a different  
> egg if I want to quickly try out a different working set)

Granted.

> and I also wonder how much overhead this additional processing adds  
> (I guess this depends upon how much you have in site-packages).

Any overhead is lost in the cost of importing pkg_resources.

Launchpad has a whole bunch of dependencies (~170 eggs last I  
checked).  It's trivial to generate both a ``PYTHONPATH= 
[...dependencies...] python`` and a faux Python interpreter generated  
by buildout that does the tricks that you describe.  To make the  
PYTHONPATH approach work with the namespace package problem I  
described above, you have to hack site.py to import pkg_resources  
before it processes the .pth files.

I compared the two approaches with Launchpad's ~170 dependencies using  
``time ${INTERPRETER_CHOICE} -c ''``.  They were equivalent in my  
tests.   (FWIW, they were both about 20 times slower than ``time  
python -c ''``).

> So perhaps if there was some option to still generate scripts using  
> the existing style of script generation - maybe a "i-keep-my-site- 
> packages-clean=true" option ... i dunno, perhaps the other way 'work- 
> around-site-package-madness-in-script-generation=true'  ... or just  
> merge Buildout and VirtualEnv into one monolithic project so that  
> you don't need to install two tools just to be able to use Buildout  
> with a dirty Python!  (rawr!)

I can understand the desire to make it possible to have a simpler  
script if you want to promise that you are going to have a clean site- 
packages.  I'm not super-excited to add this feature and the related  
tests, but if that made it possible for my work to not be consigned to  
a branch forever, I suppose I'd sign up.  Jim will be the arbiter there.

And, as usual and of course, there are other approaches possible than  
the one I chose.

> Anyways, for those distributions which are tough to install, I think  
> some people will find this branch quite handy in that they can apt- 
> get the tough to install distributions, and then safely include  
> those distributions in working sets composed by Buildout.

Seems to be working for us.

Thanks for looking at the branch, and for writing about it.

Gary


More information about the Distutils-SIG mailing list