setuptools for people behind a firewall
I've just had a look at the new documentation for setuptools. I've not read it all in detail yet, but one thing struck me regarding the "automatically download dependencies" feature. It isn't going to work for people (like me) stuck behind a firewall that Python doesn't support (Windows NTLM based firewall). Obviously, setuptools is never going to be able to resolve a situation like this, nor would I expect it to. But can I suggest two possible changes to make it easier for people with limited internet access? 1. A "manual download" mode, where setuptools lists the files which it wants you to obtain, and then leaves it to you how you get them. I'm not sure how plausible this would be, given the necessarily iterative process involved in resolving dependencies, but even a little help would be useful (a report of unresolved dependencies when run with a --no-download flag would be the most basic help). 2. A way of specifying an external command to use to download files over HTTP. This would (for example) allow me to use curl, which does support HTLM proxies, rather than relying on Python's built-in HTTP support, which doesn't. Regards, Paul.
One thing you could do is keep a bunch of eggs, .tar.gz's, exe's, whatever in a directory on a web server with directory indexes turned on and then add that page to the find_links options in you ~/.pydistutils.cfg file. Here's mine:: [easy_install] find_links=http://lesscode.org/eggs/ http://peak.telecommunity.com/dist/ Then, just evolve a process for placing things on the behind-the- firewall server and you should be good. Ryan On Jul 11, 2005, at 11:58 AM, Paul Moore wrote:
I've just had a look at the new documentation for setuptools. I've not read it all in detail yet, but one thing struck me regarding the "automatically download dependencies" feature.
It isn't going to work for people (like me) stuck behind a firewall that Python doesn't support (Windows NTLM based firewall). Obviously, setuptools is never going to be able to resolve a situation like this, nor would I expect it to. But can I suggest two possible changes to make it easier for people with limited internet access?
1. A "manual download" mode, where setuptools lists the files which it wants you to obtain, and then leaves it to you how you get them. I'm not sure how plausible this would be, given the necessarily iterative process involved in resolving dependencies, but even a little help would be useful (a report of unresolved dependencies when run with a --no-download flag would be the most basic help).
2. A way of specifying an external command to use to download files over HTTP. This would (for example) allow me to use curl, which does support HTLM proxies, rather than relying on Python's built-in HTTP support, which doesn't.
Regards, Paul. _______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig
Ryan Tomayko rtomayko@gmail.com http://naeblis.cx/rtomayko/
Ryan Tomayko
One thing you could do is keep a bunch of eggs, .tar.gz's, exe's, whatever in a directory on a web server with directory indexes turned on and then add that page to the find_links options in you ~/.pydistutils.cfg file.
That's pretty neat. It's obvious in retrospect, but I never thought of it...
Then, just evolve a process for placing things on the behind-the- firewall server and you should be good.
Ah, there's the difficulty, of course :-) That's where a utility to report the dependencies would help... (It may be pretty trivial, but I don't see immediately how to do this - it seems to me that the current setuptools documentation is more for *creators* of eggs, than for *users* of them...) Paul. -- The trouble with being punctual is that nobody's there to appreciate it. -- Franklin P. Jones
At 08:54 PM 7/11/2005 +0100, Paul Moore wrote:
Ah, there's the difficulty, of course :-) That's where a utility to report the dependencies would help... (It may be pretty trivial, but I don't see immediately how to do this - it seems to me that the current setuptools documentation is more for *creators* of eggs, than for *users* of them...)
Did you also look at these docs: http://peak.telecommunity.com/DevCenter/PythonEggs http://peak.telecommunity.com/DevCenter/EasyInstall Of course, the main reason for little documentation devoted to "users" of eggs is that really eggs should be a mostly transparent thing. As for dependency analysis, the current API to find eggs in a directory is pkg_resources.find_distributions(dirname_or_filename), which iterates over the egg(s), yielding Distribution objects. Distribution objects have a 'depends()' method which returns a list of Requirement objects describing what the distribution needs. With that API, you should easily be able to create a simple script to dump an egg's dependencies, although I think the API may change slightly in a future release. (e.g. depends() will probably have a different name before 1.0 rolls around.)
On 7/12/05, Phillip J. Eby
At 08:54 PM 7/11/2005 +0100, Paul Moore wrote:
Ah, there's the difficulty, of course :-) That's where a utility to report the dependencies would help... (It may be pretty trivial, but I don't see immediately how to do this - it seems to me that the current setuptools documentation is more for *creators* of eggs, than for *users* of them...)
Did you also look at these docs:
http://peak.telecommunity.com/DevCenter/PythonEggs http://peak.telecommunity.com/DevCenter/EasyInstall
I'd forgotten about these, but found them last night. I've not had a chance to read them through yet, but will do so.
Of course, the main reason for little documentation devoted to "users" of eggs is that really eggs should be a mostly transparent thing.
Hmm, yes. It's that "site-packages should be managed by tools" attitude of mine which muddies the waters a bit. Maybe it's a Windows thing...
As for dependency analysis, the current API to find eggs in a directory is pkg_resources.find_distributions(dirname_or_filename), which iterates over the egg(s), yielding Distribution objects. Distribution objects have a 'depends()' method which returns a list of Requirement objects describing what the distribution needs. With that API, you should easily be able to create a simple script to dump an egg's dependencies, although I think the API may change slightly in a future release. (e.g. depends() will probably have a different name before 1.0 rolls around.)
Cool. I feel the need for such a script (and possibly some others) so I'll have a go at an egg_utils script/module which I'll contribute back when I'm happy with it. Paul.
On Jul 12, 2005, at 5:23 AM, Paul Moore wrote:
Hmm, yes. It's that "site-packages should be managed by tools" attitude of mine which muddies the waters a bit. Maybe it's a Windows thing...
It's not. In many cases, it will be impossible for users without root/ wheel level permissions to modify the site-packages directory at all on *nix systems that have proper package management, which is just about everything now. Additionally, most Redhat/Fedora admins will tell you that having anything other than rpm touching the RPM managed site-packages directory is bad form (there's actually good reasons for this), even when you do have permissions (I'm not sure how prevalent this mindset is outside the Redhat/Fedora community). I think that many people running systems built on package management will want to setup an alternative install_dir (/usr/local/lib/ pythonX.X/site-packages or perhaps /opt/pythonX.X/site-packages). This is why I think the only-load-pth-from-site-packages issue is important. Then again, wrapper scripts make it less of an issue. teeder-totter-teeder-totter I'm still waiting to see how people maintaining python packages for Linux/BSD distributions feel about eggs. So far I don't think they've had to think about it a whole lot. Ryan Tomayko rtomayko@gmail.com http://naeblis.cx/rtomayko/
Ryan Tomayko wrote:
I'm still waiting to see how people maintaining python packages for Linux/BSD distributions feel about eggs. So far I don't think they've had to think about it a whole lot.
Hopefully this will make package generation much much easier -- the only issue I can see is the mismatch in the names of packages and dependencies (since generally package names are parameterized with "py-" or "python-" or somesuch). Actually putting the one file in its place should be easy, of course. Well... the other issue is that eggs support the installation of multiple versions simultaneously; for most packaging systems to support this the package name has to grow a version number too, which is rather awkward. -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
At 05:49 AM 7/12/2005 -0400, Ryan Tomayko wrote:
I think that many people running systems built on package management will want to setup an alternative install_dir (/usr/local/lib/ pythonX.X/site-packages or perhaps /opt/pythonX.X/site-packages). This is why I think the only-load-pth-from-site-packages issue is important. Then again, wrapper scripts make it less of an issue. teeder-totter-teeder-totter
Keep in mind that you only have to create one .pth file per Python installation in order to enable this setup now. e.g., create an 'altinstall.pth' with this in it: import os,site; site.addsitedir(os.expanduser('~/lib/python2.X/site-packages') import site; site.addsitedir('/opt/python2.X/site-packages') Those lines will not only put the other directories on sys.path, but it will also load all the .pth files in those directories as well. Of course, EasyInstall won't create an easy-install.pth in the /opt directory or ~/lib directories, at least not until/unless it grows a --site-dirs option for you to use to specify alternate directories. I could see an OS distro being set up like this, with 'altinstall.pth' in site-packages, and a 'distutils.cfg' file in the OS-installed distutils, containing something like: [easy_install] site_dirs = /opt/python2.X/site-packages, ~/lib/python2.X/site-packages And this would basically make it so that you could install Python packages either site-wide or in your home directory, and still use easy-install.pth files. Of course, there are problems with this setup for distros that have privileged programs running in Python; those programs will need to run with python -S (to prevent bringing in site-packages and friends) so that those programs can control their own environment more closely. There are always trade-offs.
On 7/12/05, Paul Moore
Hmm, yes. It's that "site-packages should be managed by tools" attitude of mine which muddies the waters a bit. Maybe it's a Windows thing...
As for dependency analysis, the current API to find eggs in a directory is pkg_resources.find_distributions(dirname_or_filename), which iterates over the egg(s), yielding Distribution objects. Distribution objects have a 'depends()' method which returns a list of Requirement objects describing what the distribution needs. With that API, you should easily be able to create a simple script to dump an egg's dependencies, although I think the API may change slightly in a future release. (e.g. depends() will probably have a different name before 1.0 rolls around.)
Cool. I feel the need for such a script (and possibly some others) so I'll have a go at an egg_utils script/module which I'll contribute back when I'm happy with it.
I'm playing around with writing some utility/management scripts at the moment. Listing installed distributions is easy, and I see how dependency analysis can be handled with the above APIs. I was looking at an uninstall utility, and hit a possible problem: Given a Distribution object (maybe derived from a user's command line, maybe from selection off a GUI) I can "uninstall" the Distribution by simply removing the egg (file or directory). However, according to the documentation, before I delete files for "the currently installed version of a package", I need to run easy_install -m <package> to ensure that Python doesn't continue to search for it. So, three questions: 1. How can I tell from a Distribution instance if it is "the currently installed version"? 2. How can I do the equivalent of easy_install -m in code? 3. Can eggs be in site-packages, but not locatable via find_distribution? I can't see anything in the Distribution API documentation, and I'm a little hazy on what happens in the face of multiple Distributions of the same package, all in site-packages at once but only one "installed". And I don't have a suitable set of eggs or a "clean" environment to try things out. Thanks, Paul.
At 03:00 PM 7/13/2005 +0100, Paul Moore wrote:
On 7/12/05, Paul Moore
wrote: Given a Distribution object (maybe derived from a user's command line, maybe from selection off a GUI) I can "uninstall" the Distribution by simply removing the egg (file or directory).
Make sure you *only* do this to a directory if it has a '.egg' extension; otherwise you could delete a package installed using "develop"!
However, according to the documentation, before I delete files for "the currently installed version of a package", I need to run easy_install -m <package> to ensure that Python doesn't continue to search for it. So, three questions:
1. How can I tell from a Distribution instance if it is "the currently installed version"?
If its .path attribute matches an entry in sys.path. But you'd probably be better off manipulating easy-install.pth directly, via PthDistributions.
2. How can I do the equivalent of easy_install -m in code?
There's a PthDistributions class in setuptools.command.easy_install; look at it and the code that uses it. Unfortunately, this code is targeted for refactoring when pkg_resources gets refactored, but hopefully its API won't change much.
3. Can eggs be in site-packages, but not locatable via find_distribution?
Um, only if you don't look for them. I'm not sure I understand the question.
I can't see anything in the Distribution API documentation, and I'm a little hazy on what happens in the face of multiple Distributions of the same package, all in site-packages at once but only one "installed".
You mean only one "activated" (they're all "installed"). What happens is that when you find_distributions('site-packages') they will all be listed. However, when you find_distributions() on the path entry that makes a particular one current, that one will show up again.
On 7/13/05, Phillip J. Eby
At 03:00 PM 7/13/2005 +0100, Paul Moore wrote:
On 7/12/05, Paul Moore
wrote: Given a Distribution object (maybe derived from a user's command line, maybe from selection off a GUI) I can "uninstall" the Distribution by simply removing the egg (file or directory). Make sure you *only* do this to a directory if it has a '.egg' extension; otherwise you could delete a package installed using "develop"!
Hmm, I'm only looking on sys.path - I hadn't imagined that development software would be added to the *default* sys.path... But I take your point. Actually, as I'm only looking at these utilities from the POV of managing site-packages, maybe I should just strip out any entries from sys.path which aren't under that directory (although I don't know if I can find that directory on Unix - on Windows, it's under sys.(exec)prefix, but I don't know directory structures under Unix so well). I'll have to check the source of easy_install to see how it decides where to copy files *to* (in the absence of user overrides...) [... omitted some stuff about PthDistributions which I need to review ...]
3. Can eggs be in site-packages, but not locatable via find_distribution?
Um, only if you don't look for them. I'm not sure I understand the question.
Sorry - I'm doing find_distribution() on every entry in sys.path. What I was getting at, is whether that process could miss any eggs which easy_install may have put into site-packages. Again, the key point I forgot to clarify is that I want to keep track of "things that easy_install could have added to site-packages" - as I'm trying to add tools to do the operations easy_install doesn't supply (list, uninstall are the key ones) so that there's no requirement for the user to manually work inside site-packages.
I can't see anything in the Distribution API documentation, and I'm a little hazy on what happens in the face of multiple Distributions of the same package, all in site-packages at once but only one "installed".
You mean only one "activated" (they're all "installed"). What happens is that when you find_distributions('site-packages') they will all be listed. However, when you find_distributions() on the path entry that makes a particular one current, that one will show up again.
Ah, OK. That helps me understand the reason for the doubled entries mentioned above as well. Thanks, Paul.
At 11:26 AM 7/14/2005 +0100, Paul Moore wrote:
On 7/13/05, Phillip J. Eby
wrote: At 03:00 PM 7/13/2005 +0100, Paul Moore wrote:
On 7/12/05, Paul Moore
wrote: Given a Distribution object (maybe derived from a user's command line, maybe from selection off a GUI) I can "uninstall" the Distribution by simply removing the egg (file or directory). Make sure you *only* do this to a directory if it has a '.egg' extension; otherwise you could delete a package installed using "develop"!
Hmm, I'm only looking on sys.path - I hadn't imagined that development software would be added to the *default* sys.path...
By default, if you use "develop" on a package, it becomes part of the default sys.path; this makes sense on machines that are "development" machines. If you are developing on a machine that also has production software, you need to use a different staging area (--install-dir) for the develop command. This is easily set on a per-user or sitewide basis; in a shop making use of these tools, you'd probably do this (as root): setup.py setopt -g -c develop -o install_dir -s /somewhere/staging To set the sitewide staging area. Hmm, actually, it would probably make more sense to have user-specific staging areas, by making the default something like ~/staging. I should probably think about adding expanduser() support to a lot of setuptools' options.
But I take your point. Actually, as I'm only looking at these utilities from the POV of managing site-packages, maybe I should just strip out any entries from sys.path which aren't under that directory (although I don't know if I can find that directory on Unix - on Windows, it's under sys.(exec)prefix, but I don't know directory structures under Unix so well). I'll have to check the source of easy_install to see how it decides where to copy files *to* (in the absence of user overrides...)
There's a get_python_lib() function in distutils that I use. However, I've recently discovered that the situation on Mac OS X is more complex; users have a ~/Library/python2.X/site-packages directory as well.
Sorry - I'm doing find_distribution() on every entry in sys.path. What I was getting at, is whether that process could miss any eggs which easy_install may have put into site-packages.
No; find_distributions(site_packages) would list everything installed in that directory, whether active or not, plus distributions for anything that was installed as a development link in site-packages. That is, even if *all* you call find_distributions() on is site-packages, it's still going to include distributions whose .path is elsewhere, and that's fine because you want to know about those anyway; just list them with a "(Development)" status or something in your output.
On 7/14/05, Phillip J. Eby
At 11:26 AM 7/14/2005 +0100, Paul Moore wrote:
Hmm, I'm only looking on sys.path - I hadn't imagined that development software would be added to the *default* sys.path...
By default, if you use "develop" on a package, it becomes part of the default sys.path; this makes sense on machines that are "development" machines.
Ah! I don't work that way, so this hadn't occurred to me. No wonder I didn't follow the discussion on "develop" :-) I'll have to think about this.
This is easily set on a per-user or sitewide basis; in a shop making use of these tools, you'd probably do this (as root):
setup.py setopt -g -c develop -o install_dir -s /somewhere/staging
Ooh, these setup commands that do global stuff unrelated to the package whose setup.py is being used confuse me! More to think about...
There's a get_python_lib() function in distutils that I use. However, I've recently discovered that the situation on Mac OS X is more complex; users have a ~/Library/python2.X/site-packages directory as well.
Sorry - I'm doing find_distribution() on every entry in sys.path. What I was getting at, is whether that process could miss any eggs which easy_install may have put into site-packages.
No; find_distributions(site_packages) would list everything installed in that directory, whether active or not, plus distributions for anything that was installed as a development link in site-packages. That is, even if *all* you call find_distributions() on is site-packages, it's still going to include distributions whose .path is elsewhere, and that's fine because you want to know about those anyway; just list them with a "(Development)" status or something in your output.
Sounds reasonable. There are lots of complexities here I hadn't considered, because I'm working purely from the POV of a *user* of eggs, not a *producer* of them. Maybe that's why bdist_wininst installers, simplistic though they are, suit a lot of my needs. (But that's leading onto another philosophical ramble, which I'll spare you... :-)) Paul.
Paul Moore wrote:
2. A way of specifying an external command to use to download files over HTTP. This would (for example) allow me to use curl, which does support HTLM proxies, rather than relying on Python's built-in HTTP support, which doesn't.
Someone mentioned curl to me in a ticket, but didn't mention specifically why it was "better". I assume now that NTLM was the issue. They submitted this patch that used pycurl: http://pythonpaste.org/trac/attachment/ticket/10/build-pkg.diff I imagine that would be an easy change to setuptools as well. Though it would be nice if setuptools could detect the case where urllib failed due to an NTLM firewall, so it could suggest (in the error message) that the user install pycurl. I don't know how easy pycurl is to install -- I wonder in practice if command-line interaction with curl would be easier to get people to install (pycurl doesn't appear to have up-to-date windows installers). -- Ian Bicking / ianb@colorstudy.com / http://blog.ianbicking.org
Ian Bicking
I don't know how easy pycurl is to install -- I wonder in practice if command-line interaction with curl would be easier to get people to install (pycurl doesn't appear to have up-to-date windows installers).
I was certainly just thinking of being able to supply a command line which, given a URL, would produce the file on stdout (in the case of curl, "curl %s" is the basic command, possibly with proxy-type options added). I don't see the need for using pycurl - and doing so would exclude use of other utilities like wget. Paul. -- The trouble with being punctual is that nobody's there to appreciate it. -- Franklin P. Jones
At 04:58 PM 7/11/2005 +0100, Paul Moore wrote:
I've just had a look at the new documentation for setuptools. I've not read it all in detail yet, but one thing struck me regarding the "automatically download dependencies" feature.
It isn't going to work for people (like me) stuck behind a firewall that Python doesn't support (Windows NTLM based firewall). Obviously, setuptools is never going to be able to resolve a situation like this, nor would I expect it to.
Have you tried APS? (i.e., http://ntlmaps.sf.net/ ) Its pages seem to suggest it can authenticate to NTLM proxy servers like the one you're dealing with, and it sounds like a general-purpose solution to the proxying problem. The only issue is that you'd need to configure your system such that urllib considers the APS address to be the proxy to use, but then *all* Python apps (or any app that reads the same proxy configuration) will be able to get out past the firewall.
But can I suggest two possible changes to make it easier for people with limited internet access?
1. A "manual download" mode, where setuptools lists the files which it wants you to obtain, and then leaves it to you how you get them. I'm not sure how plausible this would be, given the necessarily iterative process involved in resolving dependencies, but even a little help would be useful (a report of unresolved dependencies when run with a --no-download flag would be the most basic help).
EasyInstall can't *find* the files without HTTP, since that's what it uses to talk to PyPI. So "manual download" mode would mean you'd effectively need to find all the files yourself! I'm not sure how much help EasyInstall could actually be in such a process.
2. A way of specifying an external command to use to download files over HTTP. This would (for example) allow me to use curl, which does support HTLM proxies, rather than relying on Python's built-in HTTP support, which doesn't.
This is potentially possible, but EasyInstall's PyPI searches and SourceForge download support need to be able to detect the MIME type of an HTTP response in order to decide whether it has an HTML page or not. If APS doesn't work for you, this might be an option, it's just a tricky one to implement in ez_setup.py, which wants to be small and simple and do no command-line parsing of its own, because it's also a module that gets imported by setup scripts. OTOH, perhaps an environment variable would be the way to go there, if we have to.
On 7/12/05, Phillip J. Eby
At 04:58 PM 7/11/2005 +0100, Paul Moore wrote:
I've just had a look at the new documentation for setuptools. I've not read it all in detail yet, but one thing struck me regarding the "automatically download dependencies" feature.
It isn't going to work for people (like me) stuck behind a firewall that Python doesn't support (Windows NTLM based firewall). Obviously, setuptools is never going to be able to resolve a situation like this, nor would I expect it to.
Have you tried APS? (i.e., http://ntlmaps.sf.net/ ) Its pages seem to suggest it can authenticate to NTLM proxy servers like the one you're dealing with, and it sounds like a general-purpose solution to the proxying problem. The only issue is that you'd need to configure your system such that urllib considers the APS address to be the proxy to use, but then *all* Python apps (or any app that reads the same proxy configuration) will be able to get out past the firewall.
Yes, I have used APS and it is a reasonably good workaround. However, there is a definite disadvantage, in that it isn't set up to be launched as a service on Windows, which means that I can't have it "always running" (actually, having a permanently running proxy probably isn't that good an idea - I'm not enough of a security expert to be sure I haven't left a hole by doing so). So it tends to be left around, to run "when needed", which in practice means that when I do need it (pretty infrequently) I have to remember where it is, how to start it, etc etc. All of the above is fixable, but I don't have the time to do so, and the project seems pretty static, so I don't expect this sort of usability improvement to come from the project. But yes, I'll keep it in mind as an option.
This is potentially possible, but EasyInstall's PyPI searches and SourceForge download support need to be able to detect the MIME type of an HTTP response in order to decide whether it has an HTML page or not.
curl -i includes the HTTP headers. But I understand that it's additional work for a very limited requirement, and there's a more general workaround available, so that's OK. Thanks for the explanation. Paul.
participants (4)
-
Ian Bicking
-
Paul Moore
-
Phillip J. Eby
-
Ryan Tomayko