how to easily consume just the parts of eggs that are good for you
Folks: Here is a simple proposal: make the standard Python "import" mechanism notice eggs on the PYTHONPATH and insert them (into the *same* location) on the sys.path. This eliminates the #1 problem with eggs -- that they don't easily work when installing them into places other than your site-packages and that if you allow any of them to be installed on your system then they take precedence over your non-egg packages even you explicitly put those other packages earlier in your PYTHONPATH. (That latter behavior is very disagreeable to more than a few prorgammers.) This also preserves most of the value of eggs for many use cases. This is backward-compatible with most current use cases that rely on eggs. This is very likely forward-compatible with new schemes that are currently being cooked up and will be deployed in the future. Regards, Zooko
Has somebody made a list of the problems with eggs? Because I use them all the time and hasn't encountered any problems whatsoever, myself... :) So I am a bit surprised at the various discussions about them.
zooko wrote:
Folks:
Here is a simple proposal: make the standard Python "import" mechanism notice eggs on the PYTHONPATH and insert them (into the *same* location) on the sys.path.
This eliminates the #1 problem with eggs -- that they don't easily work when installing them into places other than your site-packages and that if you allow any of them to be installed on your system then they take precedence over your non-egg packages even you explicitly put those other packages earlier in your PYTHONPATH. (That latter behavior is very disagreeable to more than a few prorgammers.)
Sorry if I'm out of the loop and there's some subtlety here that I'm disregarding, but it doesn't appear that either of the issues you mention is a actually problem with eggs. These are instead problems with how eggs get installed by easy_install (which uses a .pth file to extend sys.path). It's reasonable to put eggs on the PYTHONPATH manually (e.g. sys.path.append('/path/to/some.egg')) instead of using easy_install to install them. I don't think there would be any benefit to changing Python's import machinery to deal with them; they are essentially just directories (or zipfiles) that contain packages. - C
On Mar 26, 2008, at 7:34 PM, Chris McDonough wrote:
zooko wrote:
http://mail.python.org/pipermail/python-dev/2008-March/078243.html
Here is a simple proposal: make the standard Python "import" mechanism notice eggs on the PYTHONPATH and insert them (into the *same* location) on the sys.path. This eliminates the #1 problem with eggs -- that they don't easily work when installing them into places other than your site- packages and that if you allow any of them to be installed on your system then they take precedence over your non-egg packages even you explicitly put those other packages earlier in your PYTHONPATH. (That latter behavior is very disagreeable to more than a few prorgammers.)
Sorry if I'm out of the loop and there's some subtlety here that I'm disregarding, but it doesn't appear that either of the issues you mention is a actually problem with eggs. These are instead problems with how eggs get installed by easy_install (which uses a .pth file to extend sys.path). It's reasonable to put eggs on the PYTHONPATH manually (e.g. sys.path.append('/path/to/some.egg')) instead of using easy_install to install them.
Yes, you are missing something. While many programmers, such as yourself and Lennart Regebro (who posted to this thread) find the current eggs system to be perfectly convenient and to Just Work, many others, such as Glyph Lefkowitz (who posted to a related thread) find them to be so annoying that they actively ensure that no eggs are ever allowed to touch their system. The reasons for this latter problem are two: 1. You can't conveniently install eggs into a non-system directory, such as ~/my-python-stuff. 2. If you allow even a single egg to be installed into your PYTHONPATH, it will change the semantics of your PYTHONPATH. Both of these problems are directly caused by the need for eggs to hack your site.py. If Python automatically added eggs found in the PYTHONPATH to the sys.path, both of these problems would go away. I am skeptical that the current proposals to define a new database for installed packages will fare any better than the current eggs scheme does in this respect. This issue is important to me, because the benefits of eggs grow superlinearly with the number of good programmers who use them. They are a tool for re-using source code -- a tool for cooperation between programmers. To gain the greatest benefits at this point we do not need to add new features to eggs, we need to make them more palatable to more good programmers. I am skeptical that prorgammers are going to be willing to use a new database format. They already have a database -- their filesystem -- and they already have the tools to control it -- mv, rm, and PYTHONPATH. Many of them already hate the existence the "easy_instlal.pth" database file, and I don't see why a new database file would be any different. My proposal makes the current benefits of eggs -- clean, easy code re- use among programmers -- more compatible with their current tools -- mv, rm, and PYTHONPATH. It is also forward-compatible with more sophisticated proposals to add features like uninstall and operating system integration. By the way, since I posted my proposal two weeks ago I have pointed a couple of Python hackers who currently refuse to use eggs at the URL: http://mail.python.org/pipermail/python-dev/2008-March/078243.html They both agreed that it made perfect sense. I told one of them about the alternate proposal to define a new database file to contain a list of installed packages, and he sighed and rolled his eyes and said "So they are planning to reinvent apt!". Regards, Zooko
At 10:01 AM 4/8/2008 -0700, zooko wrote:
On Mar 26, 2008, at 7:34 PM, Chris McDonough wrote:
zooko wrote:
http://mail.python.org/pipermail/python-dev/2008-March/078243.html
Here is a simple proposal: make the standard Python "import" mechanism notice eggs on the PYTHONPATH and insert them (into the *same* location) on the sys.path. This eliminates the #1 problem with eggs -- that they don't easily work when installing them into places other than your site- packages and that if you allow any of them to be installed on your system then they take precedence over your non-egg packages even you explicitly put those other packages earlier in your PYTHONPATH. (That latter behavior is very disagreeable to more than a few prorgammers.)
Sorry if I'm out of the loop and there's some subtlety here that I'm disregarding, but it doesn't appear that either of the issues you mention is a actually problem with eggs. These are instead problems with how eggs get installed by easy_install (which uses a .pth file to extend sys.path). It's reasonable to put eggs on the PYTHONPATH manually (e.g. sys.path.append('/path/to/some.egg')) instead of using easy_install to install them.
Yes, you are missing something. While many programmers, such as yourself and Lennart Regebro (who posted to this thread) find the current eggs system to be perfectly convenient and to Just Work, many others, such as Glyph Lefkowitz (who posted to a related thread) find them to be so annoying that they actively ensure that no eggs are ever allowed to touch their system.
The reasons for this latter problem are two:
1. You can't conveniently install eggs into a non-system directory, such as ~/my-python-stuff.
Wha?
2. If you allow even a single egg to be installed into your PYTHONPATH, it will change the semantics of your PYTHONPATH.
Only in the same way that manually putting an egg on the front of PYTHONPATH can be considered to "change the semantics" of your PYTHONPATH.
Both of these problems are directly caused by the need for eggs to hack your site.py. If Python automatically added eggs found in the PYTHONPATH to the sys.path, both of these problems would go away.
And add new ones.
I am skeptical that the current proposals to define a new database for installed packages will fare any better than the current eggs scheme does in this respect.
The purpose for the installation database is to allow easy_install to eschew the use of .egg files or directories for anything other than multi-version installs. Thus, no need to add those .egg files or directories to the head of the PYTHONPATH. Conflicts would be handled at install time rather than runtime, in other words.
I am skeptical that prorgammers are going to be willing to use a new database format. They already have a database -- their filesystem -- and they already have the tools to control it -- mv, rm, and PYTHONPATH. Many of them already hate the existence the "easy_instlal.pth" database file, and I don't see why a new database file would be any different.
PEP 262 does not propose a database file -- it proposes the inclusion of a metadata file for each installed distribution.
My proposal makes the current benefits of eggs -- clean, easy code re- use among programmers -- more compatible with their current tools -- mv, rm, and PYTHONPATH. It is also forward-compatible with more sophisticated proposals to add features like uninstall and operating system integration.
Actually, your current proposal doesn't work, unless you at least have some way to indicate which *version* of an egg should be automatically added to sys.path -- and some way to change that. Otherwise, you might as well use the -m option to easy_install, and require() the eggs at runtime. (Which needs neither .pth files nor site.py hacking.) Meanwhile, my understanding is that the people who dislike eggs, dislike them because when they install a setuptools-based package, it's installed as an egg by default. The installation database proposal (and by the way, people really should read and understand PEP 262, including the open issues, before trying to compete with it), will allow setuptools-based packages to install the "old-fashioned" way by default. That is, not as eggs. Similarly, easy_install would be able to skip installing .eggs unless you wanted multi-version support. So, people who don't like eggs would never see them, since the only way you'd ever get them would be via easy_install -m, and they would never use it.
By the way, since I posted my proposal two weeks ago I have pointed a couple of Python hackers who currently refuse to use eggs at the URL:
http://mail.python.org/pipermail/python-dev/2008-March/078243.html
They both agreed that it made perfect sense. I told one of them about the alternate proposal to define a new database file to contain a list of installed packages, and he sighed and rolled his eyes and said "So they are planning to reinvent apt!".
No, we're planning to make it possible for easy_install not to overwrite things that would break your system, and allow distutils and setuptools to uninstall what they installed. That's a considerably less ambitious goal, by far. :)
On 08/04/2008, zooko <zooko@zooko.com> wrote:
By the way, since I posted my proposal two weeks ago I have pointed a couple of Python hackers who currently refuse to use eggs at the URL:
http://mail.python.org/pipermail/python-dev/2008-March/078243.html
They both agreed that it made perfect sense. I told one of them about the alternate proposal to define a new database file to contain a list of installed packages, and he sighed and rolled his eyes and said "So they are planning to reinvent apt!".
I do think that a simple solution like that has some merit. It has two significant downsides, however: 1. It requires that core Python "bless" the egg format to some extent - something Guido has said he is unwilling to do. 2. It ignores the issue of package management completely. Personally, I avoid anything that doesn't integrate with a unified package manager (whether that be the Windows add/remove feature, or an as-yet-to-be-built custom Python package manager is not important). Filesystem commands do not a package manager make... Paul.
zooko wrote:
On Mar 26, 2008, at 7:34 PM, Chris McDonough wrote:
zooko wrote:
http://mail.python.org/pipermail/python-dev/2008-March/078243.html
Here is a simple proposal: make the standard Python "import" mechanism notice eggs on the PYTHONPATH and insert them (into the *same* location) on the sys.path. This eliminates the #1 problem with eggs -- that they don't easily work when installing them into places other than your site-packages and that if you allow any of them to be installed on your system then they take precedence over your non-egg packages even you explicitly put those other packages earlier in your PYTHONPATH. (That latter behavior is very disagreeable to more than a few prorgammers.)
Sorry if I'm out of the loop and there's some subtlety here that I'm disregarding, but it doesn't appear that either of the issues you mention is a actually problem with eggs. These are instead problems with how eggs get installed by easy_install (which uses a .pth file to extend sys.path). It's reasonable to put eggs on the PYTHONPATH manually (e.g. sys.path.append('/path/to/some.egg')) instead of using easy_install to install them.
Yes, you are missing something. While many programmers, such as yourself and Lennart Regebro (who posted to this thread) find the current eggs system to be perfectly convenient and to Just Work, many others, such as Glyph Lefkowitz (who posted to a related thread) find them to be so annoying that they actively ensure that no eggs are ever allowed to touch their system.
The reasons for this latter problem are two:
1. You can't conveniently install eggs into a non-system directory, such as ~/my-python-stuff.
That's just not true. $ export PYTHONPATH=/home/you/my-python-stuff/foo-1.3.egg $ python
import foo
Eggs are directories (or zipfiles) that contain packages. They happen to contain other metadata directories too, but these can be ignored if you just want to *use* them (as opposed to wanting to introspect metadata about them).
2. If you allow even a single egg to be installed into your PYTHONPATH, it will change the semantics of your PYTHONPATH.
I think you are mistaken. The use of the .pth file that changes sys.path is a feature of easy_install, not of eggs. You don't need to use any .pth file to put eggs on your PYTHONPATH. Note that zc.buildout is a framework that installs eggs, and it doesn't rely at all on .pth files to automatically hack sys.path. Instead, it munges console scripts to add each egg dir to sys.path. This is pretty nasty too, but it does prove the point. It is however true that you need to change sys.path to use an egg. Is that what you're objecting to fundamentally? You just don't want to have to change sys.path at all to use an egg? Maybe you're objecting to the notion that an egg can contain more than one package. There is functionally no difference between an egg and a directory full of packages.
Both of these problems are directly caused by the need for eggs to hack your site.py. If Python automatically added eggs found in the PYTHONPATH to the sys.path, both of these problems would go away.
I'm not sure what you mean. Eggs don't hack site.py. Eggs are just a packaging format. easy_install doesn't hack site.py either. It just puts a .pth file (the parsing of which is a feature of "core" Python itself, not any setuptools magic) in site packages and manages it. It seems like you're advocating adding magic that you can't turn off (magical detection of eggs in an already site.py-approved packages directory) to defeat magic that you can turn off (by not using easy_install or .pth files). At some level the magic you want to see built in to Python would almost certainly wind up doing what you hate by hacking sys.path unless you wanted to restrict eggs to containing a single package only. And you wouldn't be able to turn it off except through some obscure environment variable setting.
By the way, since I posted my proposal two weeks ago I have pointed a couple of Python hackers who currently refuse to use eggs at the URL:
http://mail.python.org/pipermail/python-dev/2008-March/078243.html
They both agreed that it made perfect sense. I told one of them about the alternate proposal to define a new database file to contain a list of installed packages, and he sighed and rolled his eyes and said "So they are planning to reinvent apt!".
I think changing the Python core is the worst possible answer to this problem. "Don't use easy_install" is currently the best, AFAICT. - C
On Tue, 2008-04-08 at 10:01 -0700, zooko wrote:
They both agreed that it made perfect sense. I told one of them about the alternate proposal to define a new database file to contain a list of installed packages, and he sighed and rolled his eyes and said "So they are planning to reinvent apt!".
When I wear my sysadmin hat, eggs become a nuisance. They are not listed in the system packages; if zipped they won't work when the apache user tries to import them; easy_install can produce unexpected upgrades. The system package manager (apt or yum) is much preferred. As a developer, eggs are great. If a python module is not already available from my system packagers, easy_install will find it, get it, and install it. I waste almost no time with system administration issues while developing. Fortunately, distutils includes tools like bdist_rpm so that python modules can be packaged for easy processing by the system package manager. So once I need to switch back to a sysadmin role, I can use the system tools to install and track packages. -- Lloyd Kvam Venix Corp DLSLUG/GNHLUG library http://www.librarything.com/catalog/dlslug http://www.librarything.com/profile/dlslug http://www.librarything.com/rsshtml/recent/dlslug
On Apr 8, 2008, at 11:27 AM, Lloyd Kvam wrote:
When I wear my sysadmin hat, eggs become a nuisance.
...
As a developer, eggs are great. ... Fortunately, distutils includes tools like bdist_rpm so that python modules can be packaged for easy processing by the system package manager. So once I need to switch back to a sysadmin role, I can use the system tools to install and track packages.
This is the same experience I have. I rely on setuptools and eggs extensively in developing our software, and I use setuptools and eggs as the primary method of giving our source code to other programmers. But no software is ever installed on our production servers unless that software is in .deb form in an apt-gettable repository, and this requirement is unlikely to change for the forseeable future. Regards, Zooko
zooko <zooko@zooko.com> writes:
I am skeptical that prorgammers are going to be willing to use a new database format. They already have a database -- their filesystem -- and they already have the tools to control it -- mv, rm, and PYTHONPATH. Many of them already hate the existence the "easy_instlal.pth" database file, and I don't see why a new database file would be any different.
Moreover, many of us already have a database of *all* packages on the system, not just Python-language ones: the package database of our operating system. Adding another, parallel, database which needs separate maintenance, and only applies to Python packages, is not a step forward in such a situation.
They both agreed that it made perfect sense. I told one of them about the alternate proposal to define a new database file to contain a list of installed packages, and he sighed and rolled his eyes and said "So they are planning to reinvent apt!".
That's pretty much my reaction, too. -- \ "Contentment is a pearl of great price, and whosoever procures | `\ it at the expense of ten thousand desires makes a wise and | _o__) happy purchase." -- J. Balguy | Ben Finney
On Wed, Apr 09, 2008 at 11:37:07AM +1000, Ben Finney wrote:
zooko <zooko@zooko.com> writes:
I am skeptical that prorgammers are going to be willing to use a new database format. They already have a database -- their filesystem -- and they already have the tools to control it -- mv, rm, and PYTHONPATH. Many of them already hate the existence the "easy_instlal.pth" database file, and I don't see why a new database file would be any different.
Moreover, many of us already have a database of *all* packages on the system, not just Python-language ones: the package database of our operating system. Adding another, parallel, database which needs separate maintenance, and only applies to Python packages, is not a step forward in such a situation.
90 % (at least) of the world does not have such database. I, and probably you, have such a very nice database. I works well, and we can choose to forget the problems our users are facing. It does not solve them though. In addition, packaging is system-specific. I recently had to learn some Debian packaging, because I wanted my Ubuntu and Debian users to be able to use my projects seamlessly. What about RPMs for RHEL, Fedora, Mandriva? ... and coronary packages? and MSIs? ... When do I find time to do development if I have to learn all this packaging. It would be fantastic to have an abstraction on all these packaging systems, including, as you point out, their database. I do agree that reusing the system packaging's database is great, and would be the best option for system-wide install. However one of the very neat features of setuptools and eggs is that you don't need administrator access to install the packages, and that is great in a shared environment, like a computation cluster. The system's database is thus unfortunately not a complete solution to the problem. My 2 cents, Gaƫl
Gael Varoquaux wrote:
90 % (at least) of the world does not have such database. I, and probably you, have such a very nice database. I works well, and we can choose to forget the problems our users are facing. It does not solve them though.
In addition, packaging is system-specific. I recently had to learn some Debian packaging, because I wanted my Ubuntu and Debian users to be able to use my projects seamlessly. What about RPMs for RHEL, Fedora, Mandriva? ... and coronary packages? and MSIs? ... When do I find time to do development if I have to learn all this packaging.
There is no way around it: you have to learn about them. It is a PITA, but there is a limit of what a tool can do, specially for things like installers. I agree it would be fantastic to have an abstraction on all those packaging systems, but I don't think it is possible without a huge amount of work if at all. Deploying softwares is simply a very big problem that won't go away magically, even if eggs were perfect and nobody would complain about it. I strongly believe it is one of the reason why windows has been so popular: instead of targetting many combinations, you target windows, which does everything you ever need to do, and MS kind of guarantee that something which works today will still work in ten years. The only reliable way to handle dependencies if you don't have huge ressources is to bundle everything; IOW, not handling them. That's how most softwares work anyway on mac os X and windows: matlab for example is a huge thing with hundred of MB; most softwares do not depend on something else except the OS, which is a much more known thing on mac os X and windows than on linux. If you want to update a small part of it, life is tough, you upgrade everything. Of course, part of the thing is that it brings more revenue to mathworks, but I don't think it is the only reason. I also know some open source projects which do the same because they simply cannot track api changes (ardour, for example: if you build ardour from sources, you will get a private copy of the whole gtk stack). cheers, David
participants (9)
-
Ben Finney -
Chris McDonough -
David Cournapeau -
Gael Varoquaux -
Lennart Regebro -
Lloyd Kvam -
Paul Moore -
Phillip J. Eby -
zooko