Fixing parallel installs with easy_install / setuptools
Hello distutils folks, My group at Cisco uses easy_install / setuptools extensively for installing packages with our make system and there are 2 issues with using this with parallel make: 1. Duplicate dependencies cause corruption when installing the same package twice. 2. easy-install.pth is never updated according to the installed packages. Thus when we install 8+ packages and they all depend on python being installed 3. Cross-compilation is impossible (at least with what I've seen -- either that or the folks who hacked the sources before I got to it didn't understand that feeding in the correct variables would ensure that things could cross-compile). Then again many things Python don't cross-compile from what I've seen :(... the interpreter itself is a prime example =\. My goal is to fix these issues and contribute back the fixes, but I want to ensure that I use the best technical solution for the problems I mentioned. About problems 1 + 2: I'm short on time so I'm going to implement a simple locking mechanism around easy-install.pth. I also realize that .pth files are chosen because they are flat files and are included simply from within python with setup.py. Would it make more sense to use a backend database like pysqlite to store the package data though? That would require some reworking with setup.py, but considering that sqlite3 is _already_ included with 2.5+ and it's a public domain licensed piece of OSS software, would it make more sense to store packaging data with a stable system like SQLite, especially when it would make removal a trivial task? My knowledge of SQLite is limited, but I assume that it supports the same level of endian neutral code that other technologies like BDB do. Of course I'm going out on a huge limb in making that assumption, but the only document I found that blatantly disproves my claim straight out is http://groups.google.com/group/wview/browse_thread/thread/6fcc993dd548206c/f... -- please correct me if I'm totally off-base. Also, please let me know whether or not you disagree with my proposal, and if there's a different method that could be employed which could maximize data integrity, assure ATOMicity, and guarantee that the data is endian neutral and thus portable to multiple platforms. About problem 3: Is the solution simple enough to solve using --build, --host, and/or --target, like configure, or does more gross work need to be done under the covers to make things all work? Thanks! -Garrett
Garrett Cooper wrote:
Hello distutils folks, My group at Cisco uses easy_install / setuptools extensively for installing packages with our make system and there are 2 issues with using this with parallel make: 1. Duplicate dependencies cause corruption when installing the same package twice. 2. easy-install.pth is never updated according to the installed packages. Thus when we install 8+ packages and they all depend on python being installed 3. Cross-compilation is impossible (at least with what I've seen -- either that or the folks who hacked the sources before I got to it didn't understand that feeding in the correct variables would ensure that things could cross-compile). Then again many things Python don't cross-compile from what I've seen :(... the interpreter itself is a prime example =\. My goal is to fix these issues and contribute back the fixes, but I want to ensure that I use the best technical solution for the problems I mentioned. About problems 1 + 2: I'm short on time so I'm going to implement a simple locking mechanism around easy-install.pth. I also realize that .pth files are chosen because they are flat files and are included simply from within python with setup.py. Would it make more sense to use a backend database like pysqlite to store the package data though? That would require some reworking with setup.py, but considering that sqlite3 is _already_ included with 2.5+ and it's a public domain licensed piece of OSS software, would it make more sense to store packaging data with a stable system like SQLite, especially when it would make removal a trivial task?
Why would using sqlite make remove a trivial task ? I don't see anything that sqlite would solve compared to a file for uninstall. I think the main problem of uninstall is that there are many possible installation locations, and for one package there may be multiple installations. I am afraid I don't see a reliable way to do this without python's help (one 'registry' / python installation). Although note setuptools needs to support python < 2.5 (a lot of platforms do not have python > 2.4, for example).
About problem 3: Is the solution simple enough to solve using --build, --host, and/or --target, like configure, or does more gross work need to be done under the covers to make things all work?
If you want to do it as quickly as possible, then hacking something in distutils may be possible, but options handling is a bit of a pain in distutils (each command is independent and has its own option set). If you want something reliable, the only way is to bypass the build part IMHO (the build_* commands). It is not meant as a general thing, but you may find my project numscons helpful. It adds a scons distutils command, so that you can build your extensions with scons instead of distutils - you then have a sane build system. It is far from perfect, but it can build non trivial codebase on many platforms (windows included), and you can control flags, change compilers, etc... in scons. http://github.com/cournape/numscons/tree/master the code to plug into distutils is ugly and here: http://projects.scipy.org/numpy/browser/trunk/numpy/distutils/command/scons....) cheers, David
On Wed, Mar 11, 2009 at 10:36 PM, David Cournapeau
Garrett Cooper wrote:
Hello distutils folks, My group at Cisco uses easy_install / setuptools extensively for installing packages with our make system and there are 2 issues with using this with parallel make: 1. Duplicate dependencies cause corruption when installing the same package twice. 2. easy-install.pth is never updated according to the installed packages. Thus when we install 8+ packages and they all depend on python being installed 3. Cross-compilation is impossible (at least with what I've seen -- either that or the folks who hacked the sources before I got to it didn't understand that feeding in the correct variables would ensure that things could cross-compile). Then again many things Python don't cross-compile from what I've seen :(... the interpreter itself is a prime example =\. My goal is to fix these issues and contribute back the fixes, but I want to ensure that I use the best technical solution for the problems I mentioned. About problems 1 + 2: I'm short on time so I'm going to implement a simple locking mechanism around easy-install.pth. I also realize that .pth files are chosen because they are flat files and are included simply from within python with setup.py. Would it make more sense to use a backend database like pysqlite to store the package data though? That would require some reworking with setup.py, but considering that sqlite3 is _already_ included with 2.5+ and it's a public domain licensed piece of OSS software, would it make more sense to store packaging data with a stable system like SQLite, especially when it would make removal a trivial task?
Why would using sqlite make remove a trivial task ? I don't see anything that sqlite would solve compared to a file for uninstall. I think the main problem of uninstall is that there are many possible installation locations, and for one package there may be multiple installations. I am afraid I don't see a reliable way to do this without python's help (one 'registry' / python installation).
Although note setuptools needs to support python < 2.5 (a lot of platforms do not have python > 2.4, for example).
About problem 3: Is the solution simple enough to solve using --build, --host, and/or --target, like configure, or does more gross work need to be done under the covers to make things all work?
If you want to do it as quickly as possible, then hacking something in distutils may be possible, but options handling is a bit of a pain in distutils (each command is independent and has its own option set).
If you want something reliable, the only way is to bypass the build part IMHO (the build_* commands). It is not meant as a general thing, but you may find my project numscons helpful. It adds a scons distutils command, so that you can build your extensions with scons instead of distutils - you then have a sane build system. It is far from perfect, but it can build non trivial codebase on many platforms (windows included), and you can control flags, change compilers, etc... in scons.
http://github.com/cournape/numscons/tree/master
the code to plug into distutils is ugly and here:
http://projects.scipy.org/numpy/browser/trunk/numpy/distutils/command/scons....)
Thanks for the pointers. The only thing I'm concerned about is while scons and waf are both Python based, they're still not accepted packages in Python proper. That being said, neither is setuptools. Has there been talk of a standard python packaging tool being integrated into the interpreter suite? Also, has any serious thought been put into maybe taking the package name, producing specific mnemonic based .pth files for the particular package, and just installing this way, e.g.: pexpect -> pexpect.pth nose -> nose.pth etc. I have seen some packages do this and maybe this is the quicker / dirtier route to do this, but it's also the simplest route to go whilst avoiding collisions with packages, from my point of view, and it's not incredibly complex at all. Furthermore, it kind of lends itself to other packaging methods like pkg_install (FreeBSD), pkgconfig, etc etc. This would be good especially because easy_install doesn't allow multiple versions by default... The only real loss is that the interpreter would have to open up a number of .pth files which would potentially slow down the machine because of I/O access, but the number of python packages on a heavily populated system should be under 20~50 I'd think, so the number seems negligible Any more thoughts? -Garrett
Garrett Cooper wrote:
Thanks for the pointers. The only thing I'm concerned about is while scons and waf are both Python based, they're still not accepted packages in Python proper. That being said, neither is setuptools.
Yes, that's why I meant by "this is not meant as a general thing". I was not sure whether you wanted to solve the problem at large or only for your own application.
Has there been talk of a standard python packaging tool being integrated into the interpreter suite?
A lot or not really depending on what you mean by packaging tool, but this is a difficult topic. For once, the general rule for a package to be included into python stdlib is to have already a relatively big user-base - which is inherently difficult for a distribution tool. The only way to get a userbase is to be backward compatible, but being backward compatible with distutils means recreating all its flaws (command-based, option handling, mixing metadata and code, etc....). This is the main practical difficulty (besides implementing a good tool, of course - but those tools already exist or could be relatively easily integrated). cheers, David
On Wed, Mar 11, 2009 at 11:09:03PM -0700, Garrett Cooper wrote:
Also, has any serious thought been put into maybe taking the package name, producing specific mnemonic based .pth files for the particular package, and just installing this way, e.g.:
pexpect -> pexpect.pth nose -> nose.pth
etc. I have seen some packages do this and maybe this is the quicker / dirtier route to do this, but it's also the simplest route to go whilst avoiding collisions with packages, from my point of view, and it's not incredibly complex at all. Furthermore, it kind of lends itself to other packaging methods like pkg_install (FreeBSD), pkgconfig, etc etc. This would be good especially because easy_install doesn't allow multiple versions by default... The only real loss is that the interpreter would have to open up a number of .pth files which would potentially slow down the machine because of I/O access, but the number of python packages on a heavily populated system should be under 20~50 I'd think, so the number seems negligible
That's a rather optimistic assumption. $ ls /usr/lib/python2.5/site-packages/|wc -l 276 Admittedly, those are all installed with apt-get and not with easy_install. But here's a Zope-3-based Python app: $ wc -l /home/mg/src/schooltool/eggs/easy-install.pth 114 /home/mg/src/schooltool/eggs/easy-install.pth Marius Gedminas -- We have an advanced scalable groupware communication environment (email) -- Alan Cox
participants (3)
-
David Cournapeau
-
Garrett Cooper
-
Marius Gedminas