python_modules as default directory for dependencies in distutils
Currently, there are the following methods for installing dependencies: · Use the distribution's packaging (ignored here, all further points refer to setup.py/distutils) · Install them system-wide (default). This requires superuser rights and is basically guaranteed to conflict with some other application, especially if applications are choosy about the versions of Python packages they like. · Install them user-wide (--user), with pretty much the same downsides, plus that the application is now bound to the user installing it. · Manually invoke distutils with another path (error-prone and non-standard). · Give up and use virtualenv. While this works fine, it's a little bit heavy-handed to modify one's shell just to launch a potentially trivial application. Therefore, I'd like to suggest a new alternative location (--here = --root "./python_modules", intended to become default in Python 5), modeled after node's packaging system (http://goo.gl/dMRTC). The obvious advantage of installing all dependencies into a directory in the application root is that the application will work for every user, never conflict with any other application, and it is both easy to package dependencies (say, for an sftp-only rollout) and to delete all dependencies. Of course, this is not sufficient to replace virtualenv, but I believe a large majority of applications will (or at least should) run under any common python interpreter. Aside from the new flag in distutils, the site module should automatically look into ./python_modules , as if it were a second USER_SITE. In node, this scheme works so well that virtually nobody bothers to use system-wide installation, except when they want a binary to be available in the PATH for all users. This suggestion seems so obvious that it probably has been discussed before, but my google-fu is too weak to find it. If it has, I'd be glad to get a link to the old discussion. Thanks! - Philipp
On Tue, Nov 20, 2012 at 1:21 PM, Philipp Hagemeister <phihag@phihag.de>wrote:
Currently, there are the following methods for installing dependencies:
· Use the distribution's packaging (ignored here, all further points refer to setup.py/distutils) · Install them system-wide (default). This requires superuser rights and is basically guaranteed to conflict with some other application, especially if applications are choosy about the versions of Python packages they like. · Install them user-wide (--user), with pretty much the same downsides, plus that the application is now bound to the user installing it. · Manually invoke distutils with another path (error-prone and non-standard). · Give up and use virtualenv. While this works fine, it's a little bit heavy-handed to modify one's shell just to launch a potentially trivial application.
Therefore, I'd like to suggest a new alternative location (--here = --root "./python_modules", intended to become default in Python 5), modeled after node's packaging system (http://goo.gl/dMRTC).
The obvious advantage of installing all dependencies into a directory in the application root is that the application will work for every user, never conflict with any other application, and it is both easy to package dependencies (say, for an sftp-only rollout) and to delete all dependencies. Of course, this is not sufficient to replace virtualenv, but I believe a large majority of applications will (or at least should) run under any common python interpreter.
Aside from the new flag in distutils, the site module should automatically look into ./python_modules , as if it were a second USER_SITE.
In node, this scheme works so well that virtually nobody bothers to use system-wide installation, except when they want a binary to be available in the PATH for all users.
This suggestion seems so obvious that it probably has been discussed before, but my google-fu is too weak to find it. If it has, I'd be glad to get a link to the old discussion. Thanks!
- Philipp
You wouldn't need stdlib support to do this. I believe setuptools' pkg_resources can look in a directory full of eggs, adding the required ones to PYTHONPATH based on requirements as specified in a wrapper script. Gem uses a directory full of versioned packages like ~/.gem/ruby/1.8/gems/sinatra-1.3.3/. The feature is something like having a dynamic linker. It is a useful thing to have.
On 11/20/12, Daniel Holth <dholth@gmail.com> wrote:
On Tue, Nov 20, 2012 at 1:21 PM, Philipp Hagemeister <phihag@phihag.de>wrote:
Currently, there are the following methods for installing dependencies: ... Therefore, I'd like to suggest a new alternative location (--here = --root "./python_modules", intended to become default in Python 5), modeled after node's packaging system (http://goo.gl/dMRTC).
If I'm understanding correctly, you just mean "install dependencies in the same place as the application that asked for them", or maybe in a magically named subdirectory. That does sound like a reasonable policy -- similar to the windows or java solution of packing everything into a single bundle.
Aside from the new flag in distutils, the site module should automatically look into ./python_modules , as if it were a second USER_SITE.
As opposed to just putting them a layer up, and looking into the application package's own directory for relative imports?
You wouldn't need stdlib support to do this. I believe setuptools' pkg_resources can look in a directory full of eggs, adding the required ones to PYTHONPATH based on requirements as specified in a wrapper script. Gem uses a directory full of versioned packages like ~/.gem/ruby/1.8/gems/sinatra-1.3.3/.
If I understand correctly, that just provides a way to include the version number when choosing the system-wide package location (and later, when importing). Also useful, but different from bundling the dependencies inside each application that requires them. Most notably, the bundle-inside solution will* find exactly the module it shipped with, including custom patches. The versioned-packages solution will have conflicts when more than one application provides for the same dependency, but will better support independent maintenance (or at least security patches) for the 4th-party modules. * Err, unless the module was loaded before the application, or modified locally, or something odd happened with import, or ... -jJ
On 11/20/2012 09:35 PM, Jim Jewett wrote:
Aside from the new flag in distutils, the site module should automatically look into ./python_modules , as if it were a second USER_SITE. As opposed to just putting them a layer up, and looking into the application package's own directory for relative imports?
Precisely, because that kind of clutters the application's root directory, especially when the number of dependencies reaches triple digits. Think of all the entries in .hgignore/.gitignore alone.
Most notably, the bundle-inside solution will* find exactly the module it shipped with, including custom patches. The versioned-packages solution will have conflicts when more than one application provides for the same dependency, but will better support independent maintenance (or at least security patches) for the 4th-party modules.
Yeah, no having automatic security updates is a definitive downside of the bundling into a local directory; but that's no different to the situation with a virtuelenv (or user-specific packages). - Philipp
On 11/20/2012 08:13 PM, Daniel Holth wrote:
You wouldn't need stdlib support to do this. I believe setuptools' pkg_resources can look in a directory full of eggs, adding the required ones to PYTHONPATH based on requirements as specified in a wrapper script.
I'm not quite sure to which aspect you're referring to - changing distutils(=setup.py) to have the --here option, or changing site? If it's the former, I still have to download the directory full of eggs to somewhere, haven't I? And the point of this suggestions is that instead of *somewhere*, there's a dedicated "standard" location. And the point of the change to site would be that one doesn't need to do anything, git clone http://example.org/app python setup.py install --here ./app.py would just work without modification to the application (and not disturb any other application). My limited understanding of pkgresources may impede me though. Can you link me to or describe how I can use setuptools here? Thanks, Philipp
On Wed, Nov 21, 2012 at 4:21 AM, Philipp Hagemeister <phihag@phihag.de>wrote:
Currently, there are the following methods for installing dependencies:
· Use the distribution's packaging (ignored here, all further points refer to setup.py/distutils) · Install them system-wide (default). This requires superuser rights and is basically guaranteed to conflict with some other application, especially if applications are choosy about the versions of Python packages they like. · Install them user-wide (--user), with pretty much the same downsides, plus that the application is now bound to the user installing it. · Manually invoke distutils with another path (error-prone and non-standard). · Give up and use virtualenv. While this works fine, it's a little bit heavy-handed to modify one's shell just to launch a potentially trivial application.
Or install them all in a single directory, add a __main__.py file to that directory and then just pass that directory name on the command line instead of a script name. The directory will be added as sys.path[0] and the __main__.py file will be executed as the main module (If your additional application dependencies are all pure Python files, you can even zip up that directory and pass that on the command line instead). This approach has been supported since at least Python 2.6, but was missing from the original What's New, and nobody ever goes back to read the "using" documentation on the website because they assume they already know how invoking the interpreter works. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 11/21/2012 04:27 AM, Nick Coghlan wrote:
Or install them all in a single directory, add a __main__.py file to that directory and then just pass that directory name on the command line instead of a script name. The directory will be added as sys.path[0] and the __main__.py file will be executed as the main module (If your additional application dependencies are all pure Python files, you can even zip up that directory and pass that on the command line instead). I'm well-aware of that approach, but didn't apply it to dependencies, and am still not sure how to. Can you describe how a hypothetical helloworld application with one dependency would look like? And wouldn't one sacrifice the ability to seamlessly import from the application's code itself.
As far as I understand, you suggest a setup like ./main.py (with content: import lxml.etree import myapp myapp.hello(lxml.etree.fromstring('<foo/>'))") ) ./myapp/__init__.py ./python_modules/__main__.py -> ../main.py ./python_modules/myapp -> ../myapp # Or a path fixup in main ./python_modules/lxml/... # or equivalent .pth ./myapp.sh (chmod +x, with content: python -m python_modules ) which strikes me as really complex (and would still benefit from a --here option to distutils). And how would the setup.py in . look to set up all the symlinks? - Philipp
On Wed, Nov 21, 2012 at 7:50 PM, Philipp Hagemeister <phihag@phihag.de>wrote:
On 11/21/2012 04:27 AM, Nick Coghlan wrote:
Or install them all in a single directory, add a __main__.py file to that directory and then just pass that directory name on the command line instead of a script name. The directory will be added as sys.path[0] and the __main__.py file will be executed as the main module (If your additional application dependencies are all pure Python files, you can even zip up that directory and pass that on the command line instead). I'm well-aware of that approach, but didn't apply it to dependencies, and am still not sure how to. Can you describe how a hypothetical helloworld application with one dependency would look like? And wouldn't one sacrifice the ability to seamlessly import from the application's code itself.
One directory containing: runthis/ __main__.py (with content as described for your main.py) lxml myapp Execute "python runthis" (via a +x shell script if you prefer). Note the lack of -m: you're executing the directory contents, not a package. You can also bundle it all into a zip file, but that only works if you don't need C extension support (since zipimport can't handle the necessary step of extracting the shared libraries out to separate files so the OS can load them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 11/21/2012 02:38 PM, Nick Coghlan wrote:
runthis/ __main__.py (with content as described for your main.py) lxml myapp
But how is that different from putting everything into the root directory of the application? In particular, assume that I have lxml001..lxml100 . Don't I still have to gitignore/hgignore all of them, and write a convoluted target to delete all of them? Plus, how would I install all these dependencies with distutils? - Philipp
On 11/21/12 2:38 PM, Nick Coghlan wrote:
On Wed, Nov 21, 2012 at 7:50 PM, Philipp Hagemeister <phihag@phihag.de <mailto:phihag@phihag.de>> wrote:
On 11/21/2012 04:27 AM, Nick Coghlan wrote: > Or install them all in a single directory, add a __main__.py file to that > directory and then just pass that directory name on the command line > instead of a script name. The directory will be added as sys.path[0] and > the __main__.py file will be executed as the main module (If your > additional application dependencies are all pure Python files, you can even > zip up that directory and pass that on the command line instead). I'm well-aware of that approach, but didn't apply it to dependencies, and am still not sure how to. Can you describe how a hypothetical helloworld application with one dependency would look like? And wouldn't one sacrifice the ability to seamlessly import from the application's code itself.
One directory containing:
runthis/ __main__.py (with content as described for your main.py) lxml myapp
Execute "python runthis" (via a +x shell script if you prefer). Note the lack of -m: you're executing the directory contents, not a package. You can also bundle it all into a zip file, but that only works if you don't need C extension support (since zipimport can't handle the necessary step of extracting the shared libraries out to separate files so the OS can load them.
Hi Nick, This would actually be very nice if we could go this far! ;-) Maybe with some Ramdisk support or something. A problem might be to handle RPath issues on the fly. I've actually learnt about this when working on the pyside setup, was not aware of the problem, before. Do you think it would make sense for me to put time into this? cheers - chris -- Christian Tismer :^) <mailto:tismer@stackless.com> Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/
On 11/21/12, Philipp Hagemeister <phihag@phihag.de> wrote:
On 11/21/2012 04:27 AM, Nick Coghlan wrote:
Or install them all in a single directory, add a __main__.py file to that directory and then just pass that directory name on the command line instead of a script name. The directory will be added as sys.path[0] and the __main__.py file will be executed as the main module
... And wouldn't one sacrifice the ability to seamlessly import from the application's code itself.
Do you mean from within the application, or from the supposedly independent libraries that you depend upon?
As far as I understand, you suggest a setup like ...
./python_modules/__main__.py -> ../main.py ./python_modules/myapp -> ../myapp # Or a path fixup in main
Skip those two ... if something inside python_modules is looking at your application, then it really shouldn't be segregated into a python_modules directory. (And if you need to anyhow, make those imports explicit, so that you don't end up with two copies of the "same" module.) That said, I think (but haven't tested) that import __main__ or import myprojutils will do the right thing, because of sys.path[0] being the root directory of myapp. -jJ
On 11/21/2012 11:21 PM, Jim Jewett wrote:
That said, I think (but haven't tested) that import __main__ or import myprojutils will do the right thing, because of sys.path[0] being the root directory of myapp. Nick already clarified what he meant in his other mail, archived at http://mail.python.org/pipermail/python-ideas/2012-November/017928.html
I just misunderstood his proposal - as far as my current understanding goes, he suggests putting the application, its __main__.py file, as well as the dependencies in one subdirectory. Cheers, Philipp
participants (5)
-
Christian Tismer
-
Daniel Holth
-
Jim Jewett
-
Nick Coghlan
-
Philipp Hagemeister