Mailman 3 setuptools-0.4a2: Eggs, scripts, and __file__ - Distutils-SIG

newer
PATCH: setuptools-0.4a3 resource...

setuptools-0.4a2: Eggs, scripts, and file

Ryan Tomayko

13 Jun 2005 13 Jun '05

4:16 a.m.

From "[Distutils] A plan for scripts (in EasyInstall)" http://mail.python.org/pipermail/distutils-sig/2005-June/004594.html On Mon Jun 6 16:44:19 CEST 2005, Phillip J. Eby wrote:

...

The 'run_main' function would do several things: <snip>

* Clear everything but __name__ from the __main__ namespace <snip> * exec the script in __main__, using something like:

maindict['__file__'] = pseudo_filename code = compile(script_source, pseudo_filename, "exec") exec code in maindict, maindict

It seems that bullet one is happening but not bullet two? I have scripts that are attempting to use __file__ but failing with: NameError: name '__file__' is not defined The script looks like it would work properly if it was given a pseudo filename but this has me thinking about what the best way to detect development environments in scripts will look like in an eggified environment. My wrapper scripts generally look something like the following to detect whether the script is being run from a development location or a deployed location:: import sys from os.path import dirname, abspath, join, exists devel_dir = dirname(dirname(abspath(__file__))) if exists(join(devel_dir, 'support')): sys.path.insert(1, join(devel_dir, 'support')) sys.path.insert(1, devel_dir) from package.module import main main(sys.argv) Assuming a directory layout of: [devel-dir]/bin (scripts in here) [devel-dir]/support (packages/modules) [devel-dir]/[package-name] At first I thought I should switch from using path operations on __file__ to using `pkg_resources.resource_isdir` and `resource_filename` but that doesn't make any sense - if the script is running from within a deployed egg, I'm not using it from a development environment and the resource_*** functions don't make sense in __main__ context anyway. So my current thinking is that the existing idiom should remain and that a pseudo filename shouldn't pose any problems in the scenarios I'm dealing with: * Deployed egg: don't tamper with sys.path * Deployed site-packages: don't tamper with sys.path * Development environment: insert development paths At any rate, I'm guessing that __file__ appears in a good percentage of scripts in the wild. I've switched mine to use sys.argv[0] in place of __file__ but I'll see if I can get a patch together for getting a psuedo __file__ into the exec dict (but don't wait up - I'm not familiar with the code and figuring out how to get the psuedo filename will take me infinite orders of magnitude longer than you ;) Ryan Tomayko rtomayko@gmail.com http://naeblis.cx/rtomayko/

Show replies by date

Phillip J. Eby

13 Jun 13 Jun

5:15 a.m.

At 12:16 AM 6/13/2005 -0400, Ryan Tomayko wrote:

...

From "[Distutils] A plan for scripts (in EasyInstall)" http://mail.python.org/pipermail/distutils-sig/2005-June/004594.html

On Mon Jun 6 16:44:19 CEST 2005, Phillip J. Eby wrote:

...
The 'run_main' function would do several things: <snip>

* Clear everything but __name__ from the __main__ namespace <snip> * exec the script in __main__, using something like:

maindict['__file__'] = pseudo_filename code = compile(script_source, pseudo_filename, "exec") exec code in maindict, maindict

It seems that bullet one is happening but not bullet two? I have scripts that are attempting to use __file__ but failing with:

NameError: name '__file__' is not defined

I'll fix this in the next release.

...

The script looks like it would work properly if it was given a pseudo filename but this has me thinking about what the best way to detect development environments in scripts will look like in an eggified environment.

That's the wrong question to ask, IMO. Think about how to make the script work exactly the same in all environments, instead. :)

...

My wrapper scripts generally look something like the following to detect whether the script is being run from a development location or a deployed location::

import sys from os.path import dirname, abspath, join, exists devel_dir = dirname(dirname(abspath(__file__))) if exists(join(devel_dir, 'support')): sys.path.insert(1, join(devel_dir, 'support')) sys.path.insert(1, devel_dir) from package.module import main main(sys.argv)

Assuming a directory layout of:

[devel-dir]/bin (scripts in here) [devel-dir]/support (packages/modules) [devel-dir]/[package-name]

Have you read this: http://peak.telecommunity.com/DevCenter/PythonEggs#developing-with-eggs The complexity you're incurring here is unnecessary; between require() and .pth files you should never need to mess with sys.path manually.

...

At first I thought I should switch from using path operations on __file__ to using `pkg_resources.resource_isdir` and `resource_filename` but that doesn't make any sense - if the script is running from within a deployed egg, I'm not using it from a development environment

Not true; see the link above. When you do development from your distutils package root, your development code *is* in an egg. However, you still shouldn't be checking __file__ or fiddling with sys.path, and there is no need anyway. Here's one idea of what your tree could look like: devel_dir/ some_dependency-1.2.egg otherpackage-3.9-whatever.egg package_you_are_working_on/ __init__.py some_module.py somescript.py MyPackage.egg-info/ <- this makes devel_dir a "source egg" for MyPackage setup.py build/ dist/ MyPackage-1.1.egg <- built egg; not actually used for development You'll notice that everything is just thrown into the same directory; if you use this setup for development, everything will "just work". So, you create a setup.py for your package, and you run setup.py bdist_egg; this will dump an egg in dist/, and create MyPackage.egg-info, marking devel_dir as a "development egg". Install any other packages you need to the current directory using 'easy_install -xd. package_you_need' (the -x excludes their scripts). Now you're ready to party. Make your scripts use 'require()' to ask for 'MyPackage'; when you run them (whether you are in the devel_dir or not), they will find MyPackage.egg-info, find the dependencies declared, and add all the needed .eggs to sys.path automatically. This is just *one* layout that works. The dependency eggs don't have to be in devel_dir; they can be anywhere that will already be on sys.path, like site-packages or a directory named in a .pth file in site-packages. Does this explain it better? One side benefit of egg-based installation is that you can dump as many libraries in site-packages as you want and not worry about version conflicts, so it's definitely how I plan to do most development. The directory where a script is located, however, takes precedence over site-packages, which means that even if you have the package you're developing already installed in site-packages, your development egg will take precedence if the script you're running is in that directory.

...

and the resource_*** functions don't make sense in __main__ context anyway. So my current thinking is that the existing idiom should remain and that a pseudo filename shouldn't pose any problems in the scenarios I'm dealing with:

* Deployed egg: don't tamper with sys.path * Deployed site-packages: don't tamper with sys.path * Development environment: insert development paths

You should be able to make that last one work the same way; i.e., without tampering with sys.path. If you can't, please explain your situation further, because I want pkg_resources to be able to prevent all future sys.path munging by anything but EasyInstall itself, and by extensible applications that have to manage plugin directories.

Ryan Tomayko

7:12 a.m.

On Jun 13, 2005, at 1:15 AM, Phillip J. Eby wrote:

...

...
The script looks like it would work properly if it was given a pseudo filename but this has me thinking about what the best way to detect development environments in scripts will look like in an eggified environment.

That's the wrong question to ask, IMO. Think about how to make the script work exactly the same in all environments, instead. :)

I'd love to except I can't assume setuptools and eggs-based dependencies in all environments at the moment. In particular, Linux distributions like Fedora probably won't be moving to egg based packaging for some time. If I'm lucky I might see python RPM maintainers phase in package.egg-info directories on top of the normal site-packages layout over the next few months. What this adds up to--if I'm not missing something--is that I can't assume require() is going to work. I need to be able to fallback into assuming that all dependencies will be laid out for me by some other package management system (in this case RPM). I don't think the setuptools dependency will be hard to deal with but egg versions of other dependencies is probably going to be a problem for a little while. I can assume that require() will be there but I'd have to try/expect/pass on DependencyNotFound exceptions or something. What I'd prefer is to keep require() out of the code completely and use .egg-info/depends.txt instead. If I'm running out of an egg, I want setuptools to manage requiring everything before my script is even called. This should give me all of the benefits of eggs when I'm using them and fallback to the old-style manual dependency management otherwise. Does that make sense?

...

Have you read this:

http://peak.telecommunity.com/DevCenter/PythonEggs#developing-with- eggs

The complexity you're incurring here is unnecessary; between require () and .pth files you should never need to mess with sys.path manually.

I've absolutely read it and agree completely with the concept. I can't assume that require() will work in all scenarios, however. But your conclusion is still valid I think. If I move to egg dependencies in development and assume that either setuptools or some other package management utility will setup sys.path correctly, I should be able to get rid of manual sys.path hackery.

...

...
At first I thought I should switch from using path operations on __file__ to using `pkg_resources.resource_isdir` and `resource_filename` but that doesn't make any sense - if the script is running from within a deployed egg, I'm not using it from a development environment

Not true; see the link above.

I should have been more clear. I was speaking to when my script is being run from EGG-INFO/scripts/somescript as opposed to [devel- package]/scripts/somescript or /usr/bin/somescript (deployed via RPM). Where the script file *is* provides the information needed to determine whether/how to setup sys.path. The resource_*** functions provide no information about where the script actually lives and so the entire exercise of moving that code to use those functions was in vain. The point is moot at any rate as I don't think I'll be needing sys.path munging anymore.

...

When you do development from your distutils package root, your development code *is* in an egg. However, you still shouldn't be checking __file__ or fiddling with sys.path, and there is no need anyway. Here's one idea of what your tree could look like:

<snip file layout>

So, you create a setup.py for your package, and you run setup.py bdist_egg; this will dump an egg in dist/, and create MyPackage.egg- info, marking devel_dir as a "development egg".

Install any other packages you need to the current directory using 'easy_install -xd. package_you_need' (the -x excludes their scripts). Now you're ready to party. Make your scripts use 'require()' to ask for 'MyPackage'; when you run them (whether you are in the devel_dir or not), they will find MyPackage.egg-info, find the dependencies declared, and add all the needed .eggs to sys.path automatically.

So this is where I need to figure something out because I'd like to either not use require() in those scripts or will need to try/except/ pass around DependencyNotFound exceptions in cases where eggs won't be available for dependencies. Or maybe... When I require('MyPackage'), does setuptools look at MyPackage.egg- info/depends.txt and require everything else for me? I'm assuming it does and don't see why it wouldn't. If that's the case, I might be able to make my scripts as simple as:: from pkg_resources import require, find_distributions if list(find_distributions('MyPackage')): require('MyPackage') import MyPackage MyPackage.main() If find_distributions() yields any results then we can assume that we're running as an egg, if not we assume that we're running old- school and that some other package manager has laid everything out nicely already. The downside to this approach is that I would have to be sure to NOT distribute MyPackage.egg-info with RPMs and other packages, which kind of rules out any phased approach to bringing egg based packaging to Fedora's stock RPMs. It might be better to just patch some flag into my script during the RPM build that would tell it whether to use require or not:: use_require = 1 if use_require: require('MyPackage') import MyPackage MyPackage.main() The RPM spec would have to patch that use_require line to be zero but that's a single call to sed. If that's all the finagling I have to do in the spec file it would be a good day. I don't know - none of these seem to be perfect solutions, but none of them would have taken me as much time to implement as writing this email either. Still, it seems worth pointing out that keeping the number of code level require() calls to a minimum and having some way of switching those few calls off and on based on environment is something packages that need to be included in a non-egg-based distribution will need to think about.

...

Does this explain it better? One side benefit of egg-based installation is that you can dump as many libraries in site- packages as you want and not worry about version conflicts, so it's definitely how I plan to do most development. The directory where a script is located, however, takes precedence over site-packages, which means that even if you have the package you're developing already installed in site-packages, your development egg will take precedence if the script you're running is in that directory.

Right. I think that makes a lot of sense and will definitely be moving to eggs in development as you've described.

...

...
and the resource_*** functions don't make sense in __main__ context anyway. So my current thinking is that the existing idiom should remain and that a pseudo filename shouldn't pose any problems in the scenarios I'm dealing with:

* Deployed egg: don't tamper with sys.path * Deployed site-packages: don't tamper with sys.path * Development environment: insert development paths

You should be able to make that last one work the same way; i.e., without tampering with sys.path. If you can't, please explain your situation further, because I want pkg_resources to be able to prevent all future sys.path munging by anything but EasyInstall itself, and by extensible applications that have to manage plugin directories.

No, I think that covers sys.path munging. I'm still a little shaky on how I should know whether to rely on require() or not but we'll see what happens. Ryan Tomayko rtomayko@gmail.com http://naeblis.cx/rtomayko/

Phillip J. Eby

4:21 p.m.

At 03:12 AM 6/13/2005 -0400, Ryan Tomayko wrote:

...

On Jun 13, 2005, at 1:15 AM, Phillip J. Eby wrote:

...
...
The script looks like it would work properly if it was given a pseudo filename but this has me thinking about what the best way to detect development environments in scripts will look like in an eggified environment.

That's the wrong question to ask, IMO. Think about how to make the script work exactly the same in all environments, instead. :)

I'd love to except I can't assume setuptools and eggs-based dependencies in all environments at the moment. In particular, Linux distributions like Fedora probably won't be moving to egg based packaging for some time. If I'm lucky I might see python RPM maintainers phase in package.egg-info directories on top of the normal site-packages layout over the next few months. What this adds up to--if I'm not missing something--is that I can't assume require() is going to work. I need to be able to fallback into assuming that all dependencies will be laid out for me by some other package management system (in this case RPM).

I don't think the setuptools dependency will be hard to deal with but egg versions of other dependencies is probably going to be a problem for a little while.

Are you distributing an application, or a library? If you're distributing a library, you don't need require() in library code. If it's an application, you can handle your own dependencies by force-installing the eggs in the application script directory using EasyInstall, and then just require() your main package.

...

I can assume that require() will be there but I'd have to try/expect/pass on DependencyNotFound exceptions or something. What I'd prefer is to keep require() out of the code completely and use .egg-info/depends.txt instead.

That's ideal for library code; startup scripts should just require() their target package, and that's almost more for development than anything else, since scripts installed by EasyInstall do the necessary require() work in pkg_resources.run_main().

...

If I'm running out of an egg, I want setuptools to manage requiring everything before my script is even called.

Yep, it'll do that.

...

This should give me all of the benefits of eggs when I'm using them and fallback to the old-style manual dependency management otherwise. Does that make sense?

Um, yeah, except I don't think you really need to fall back, just because people have other stuff installed. The worst that's going to happen is that you're going to force reinstallation of dependencies they already have, just to get them into eggs. (Or make them create .egg-info directories to tell the system the stuff is already installed.) Hm. What if you created .egg directories and symlinked the dependencies into them during your installation process? Or if there were some way to create the .egg-info directories automatically from packaging system databases, or from inspecting module contents? Setuptools has some code that can look for the setting of constants or presence of symbols in modules, without importing them. Perhaps I could extend this somehow so that a transitional package like yours could include additional info in the setup script, that checks for these dependencies and tags them somehow? Or maybe this could be done by metadata -- you put a legacy.py file in your egg-info, and when processing your egg's dependencies, if pkg_resources can't find a package you need, it would call a function in legacy.py that would check for the dependency using setuptools' inspection facilities, and return a path and a guess at a version number. How does that sound?

...

If I move to egg dependencies in development and assume that either setuptools or some other package management utility will setup sys.path correctly, I should be able to get rid of manual sys.path hackery.

I assume most packaging systems install to site-packages, so if you're doing applications, it's basically going to boil down to eggs in the script directory plus whatever's in site-packages.

...

When I require('MyPackage'), does setuptools look at MyPackage.egg- info/depends.txt and require everything else for me? I'm assuming it does and don't see why it wouldn't.

Yes, it does.

...

If that's the case, I might be able to make my scripts as simple as::

from pkg_resources import require, find_distributions if list(find_distributions('MyPackage')):

Don't do this. find_distributions() yields distributions found in a directory or zipfile; it doesn't take a package name, it takes a sys.path entry. I'm not sure exactly what you're trying to do here. If you just want to know if your script is running from a development location (and therefore needs to call require() to set up dependencies), couldn't you just check for 'MyPackage.egg-info' in the sys.path[0] (script) directory? e.g.: import sys, os if os.path.isdir(os.path.join(sys.path[0],"MyPackage.egg-info")): from pkg_resources import require require("MyPackage") # ensures dependencies get processed If this is what you want, perhaps we can create a standard recipe in pkg_resources, like maybe 'script_package("MyPackage")', that only does the require if you're a development egg and not being run from run_main().

...

The downside to this approach is that I would have to be sure to NOT distribute MyPackage.egg-info with RPMs and other packages, which kind of rules out any phased approach to bringing egg based packaging to Fedora's stock RPMs.

I don't know if .egg-info is a good idea for RPMs. .egg-info is primarily intended for *development*, not deployment, because you can't easily override a package installed with .egg-info in site-packages. In fact, the only way you can normally override it is to install an egg alongside the script. My current idea for how RPMs and other packagers should install eggs is just to dump them in site-packages as egg files or directories, and let people use require() or else use EasyInstall to set the active package. Hey, wait a second... if you can put install/uninstall scripts in packages, couldn't installing or uninstalling an RPM ask EasyInstall to fix up the easyinstall.pth file? This would let packagers distribute as eggs, but without breaking users' expectations that the package would be available to "just import". If somebody explicitly wants to support multiversion for a package, they can run 'EasyInstall -m PackageName' to reset it to multi-version after installing a new version. EasyInstall doesn't have everything that's needed to do this yet (no "uninstall" mode), but perhaps it's a good option to add, and then packagers could standardize on this approach.

...

I don't know - none of these seem to be perfect solutions, but none of them would have taken me as much time to implement as writing this email either. Still, it seems worth pointing out that keeping the number of code level require() calls to a minimum and having some way of switching those few calls off and on based on environment is something packages that need to be included in a non-egg-based distribution will need to think about.

Yep.

Ryan Tomayko

10 p.m.

On Jun 13, 2005, at 12:21 PM, Phillip J. Eby wrote:

...

Are you distributing an application, or a library? If you're distributing a library, you don't need require() in library code. If it's an application, you can handle your own dependencies by force-installing the eggs in the application script directory using EasyInstall, and then just require() your main package.

Libraries, applications, and libraries with helper scripts.

...

...
This should give me all of the benefits of eggs when I'm using them and fallback to the old-style manual dependency management otherwise. Does that make sense?

Um, yeah, except I don't think you really need to fall back, just because people have other stuff installed. The worst that's going to happen is that you're going to force reinstallation of dependencies they already have, just to get them into eggs. (Or make them create .egg-info directories to tell the system the stuff is already installed.)

That's the problem. I'm trying to figure out a general plan of attack that Linux/BSD package maintainers can adopt for python packages that want to use eggs / setuptools. Here's some numbers on how many python packages are included in a couple different distributions: On a OS X darwinports box: $ port list | grep -e '^py-' | wc -l 237 On a Fedora 3 box (Core + Extras): $ yum list all | grep -e '^py' -e 'python' | wc -l 78 I don't have any debian or gentoo boxes handy but I imagine they'd weigh in somewhere around the darwinports number. None of these packages are currently provided as eggs or with .egg- info directories when they are installed to site-packages and they have complex dependency relationships that are managed by the distribution's utility (port, yum, apt-get, emerge, etc.) This creates a problem for these packages because it means that they can not assume dependencies will always be egg managed. If they start adding require() calls to their scripts, they will break under these environments. require() is an all or nothing proposition for distributions and that means there will need to be a planned "upgrade" period or something for all packages. As a more specific example, I contribute to two packages that are distributed with Fedora Core: python-urlgrabber and yum. yum depends on python-urlgrabber and python-elementtree. Now, if I wanted to move yum to be egg based and use require(), I would also need to ensure that all yum's dependencies are egg based. When yum (and its dependencies) are installed from RPM, they must all be in egg format (or at least provide .egg-info dirs). If not, the yum script will fail. So a single package using require() can cause a snowball effect where many other packages would need to be upgraded to egg format as well. In time, this may be a good thing because it could accelerate adoption of eggs but for the time being it makes it really hard to use require().

...

Hm. What if you created .egg directories and symlinked the dependencies into them during your installation process? Or if there were some way to create the .egg-info directories automatically from packaging system databases, or from inspecting module contents? Setuptools has some code that can look for the setting of constants or presence of symbols in modules, without importing them. Perhaps I could extend this somehow so that a transitional package like yours could include additional info in the setup script, that checks for these dependencies and tags them somehow?

Yes, yes. Along those lines.

...

Or maybe this could be done by metadata -- you put a legacy.py file in your egg-info, and when processing your egg's dependencies, if pkg_resources can't find a package you need, it would call a function in legacy.py that would check for the dependency using setuptools' inspection facilities, and return a path and a guess at a version number.

How does that sound?

That would solve my problem perfectly.

...

...
If that's the case, I might be able to make my scripts as simple as::

from pkg_resources import require, find_distributions if list(find_distributions('MyPackage')):

Don't do this. find_distributions() yields distributions found in a directory or zipfile; it doesn't take a package name, it takes a sys.path entry.

Ahhh..

...

I'm not sure exactly what you're trying to do here. If you just want to know if your script is running from a development location (and therefore needs to call require() to set up dependencies), couldn't you just check for 'MyPackage.egg-info' in the sys.path[0] (script) directory?

e.g.:

import sys, os if os.path.isdir(os.path.join(sys.path[0],"MyPackage.egg-info")): from pkg_resources import require require("MyPackage") # ensures dependencies get processed

If this is what you want, perhaps we can create a standard recipe in pkg_resources, like maybe 'script_package("MyPackage")', that only does the require if you're a development egg and not being run from run_main().

...
The downside to this approach is that I would have to be sure to NOT distribute MyPackage.egg-info with RPMs and other packages, which kind of rules out any phased approach to bringing egg based packaging to Fedora's stock RPMs.

I don't know if .egg-info is a good idea for RPMs. .egg-info is primarily intended for *development*, not deployment, because you can't easily override a package installed with .egg-info in site- packages. In fact, the only way you can normally override it is to install an egg alongside the script.

My current idea for how RPMs and other packagers should install eggs is just to dump them in site-packages as egg files or directories, and let people use require() or else use EasyInstall to set the active package.

Right. But that's going to require significant lobbying and effort to get all core python packages included in a distribution migrated over to eggs. This is the root of my dilemma. The only realistic approach seems to be supporting dual egg and non-egg deployment for a little while. The legacy.py proposal is one method and I threw out a couple earlier. But yea.. I think this is the root of the problem.

...

Hey, wait a second... if you can put install/uninstall scripts in packages, couldn't installing or uninstalling an RPM ask EasyInstall to fix up the easyinstall.pth file? This would let packagers distribute as eggs, but without breaking users' expectations that the package would be available to "just import". If somebody explicitly wants to support multiversion for a package, they can run 'EasyInstall -m PackageName' to reset it to multi- version after installing a new version.

I think that would be great but it assumes modification to a whole lot of existing packages. Ideally, I'd like to be able to start using eggs and require() in my packages without having to convince everyone else to do so just yet. Not that I won't be trying to convince people, I just don't want to rely on other packages being eggified in Fedora Core, darwinports, etc for upcoming releases.

...

EasyInstall doesn't have everything that's needed to do this yet (no "uninstall" mode), but perhaps it's a good option to add, and then packagers could standardize on this approach.

I'd be happy to advocate to / work with packagers once we get a basic set of best practices together. It seems like there are a lot of options here - we just need to iron out the details. Thanks, Ryan Tomayko rtomayko@gmail.com http://naeblis.cx/rtomayko/

Phillip J. Eby

11:51 p.m.

At 06:00 PM 6/13/2005 -0400, Ryan Tomayko wrote:

...

On a OS X darwinports box:

$ port list | grep -e '^py-' | wc -l 237

On a Fedora 3 box (Core + Extras):

$ yum list all | grep -e '^py' -e 'python' | wc -l 78

Impressive. :)

...

I don't have any debian or gentoo boxes handy but I imagine they'd weigh in somewhere around the darwinports number.

None of these packages are currently provided as eggs or with .egg- info directories when they are installed to site-packages and they have complex dependency relationships that are managed by the distribution's utility (port, yum, apt-get, emerge, etc.) This creates a problem for these packages because it means that they can not assume dependencies will always be egg managed. If they start adding require() calls to their scripts, they will break under these environments. require() is an all or nothing proposition for distributions and that means there will need to be a planned "upgrade" period or something for all packages.

As a more specific example, I contribute to two packages that are distributed with Fedora Core: python-urlgrabber and yum. yum depends on python-urlgrabber and python-elementtree. Now, if I wanted to move yum to be egg based and use require(), I would also need to ensure that all yum's dependencies are egg based. When yum (and its dependencies) are installed from RPM, they must all be in egg format (or at least provide .egg-info dirs). If not, the yum script will fail.

So a single package using require() can cause a snowball effect where many other packages would need to be upgraded to egg format as well. In time, this may be a good thing because it could accelerate adoption of eggs but for the time being it makes it really hard to use require().

I'm not seeing how this is any different than if you just started requiring a newer version of a package. I mean, if 'yum' needed a newer version of elementtree, it would force an upgrade. So why can't you just rely on a later "port number"? ISTM that most packaging systems have something like '-1' or 'p1' or 'nb1' (NetBSD) tagged on a revision to identify changes in the packaging or platform-specific patches applied. Couldn't you use that to make your 'yum' RPM depend on egg-packaged versions of its dependencies? I understand you're saying it's a big problem, but the truth is that relatively few existing Python packages have a lot of dependencies; the dependency tree of the 237 darwinports is probably extremely flat. The problems today of depending on anything are such that few people do; this makes it relatively simple for the maintainer of a single port to just go ahead and upgrade the dependencies, too (organizational issues notwithstanding). But I am obviously no expert in these matters, so I defer to you here. I'm just saying that distribution packages that depend on more than one or two other packages are rare in Python today, and the things that do get depended on, tend to be frequently used, so when you do port a dependency, it significantly reduces the number of dependencies that *need* to be ported. Thus, I think that although the problem appears huge in potential, I think that the actual interconnectedness of the packages is probably quite small.

...

...
Or maybe this could be done by metadata -- you put a legacy.py file in your egg-info, and when processing your egg's dependencies, if pkg_resources can't find a package you need, it would call a function in legacy.py that would check for the dependency using setuptools' inspection facilities, and return a path and a guess at a version number.

How does that sound?

That would solve my problem perfectly.

I'll give this some thought for the 0.5/0.6 releases, then. Interestingly enough, this technique could possibly give someone the opportunity to do things like look for dynamic link libraries or headers, check operating system versions, etc.

...

...
I'm not sure exactly what you're trying to do here. If you just want to know if your script is running from a development location (and therefore needs to call require() to set up dependencies), couldn't you just check for 'MyPackage.egg-info' in the sys.path[0] (script) directory?

e.g.:

import sys, os if os.path.isdir(os.path.join(sys.path[0],"MyPackage.egg-info")): from pkg_resources import require require("MyPackage") # ensures dependencies get processed

If this is what you want, perhaps we can create a standard recipe in pkg_resources, like maybe 'script_package("MyPackage")', that only does the require if you're a development egg and not being run from run_main().

You didn't answer this, by the way.

...

I'd be happy to advocate to / work with packagers once we get a basic set of best practices together. It seems like there are a lot of options here - we just need to iron out the details.

Yeah; I think that basically the best approach for packaging systems will be to run EasyInstall during install and uninstall to modify the easyinstall.pth file. I also think that if in Python 2.5 we can change the bdist_* commands to create packages this way, then that should help, too.

Ryan Tomayko

14 Jun 14 Jun

11:04 a.m.

On Jun 13, 2005, at 7:51 PM, Phillip J. Eby wrote:

...

At 06:00 PM 6/13/2005 -0400, Ryan Tomayko wrote:

...
So a single package using require() can cause a snowball effect where many other packages would need to be upgraded to egg format as well. In time, this may be a good thing because it could accelerate adoption of eggs but for the time being it makes it really hard to use require().

I'm not seeing how this is any different than if you just started requiring a newer version of a package. I mean, if 'yum' needed a newer version of elementtree, it would force an upgrade. So why can't you just rely on a later "port number"? ISTM that most packaging systems have something like '-1' or 'p1' or 'nb1' (NetBSD) tagged on a revision to identify changes in the packaging or platform-specific patches applied. Couldn't you use that to make your 'yum' RPM depend on egg-packaged versions of its dependencies?

Sure. It's just that down-level cascading upgrades like this are generally discouraged if they can be avoided. The legacy.py approach or a mechanism like it would be preferable until such time as a general policy for packaging with eggs can be devised by the distribution. It's not a show stopper, just a mild concern.

...

I understand you're saying it's a big problem, but the truth is that relatively few existing Python packages have a lot of dependencies; the dependency tree of the 237 darwinports is probably extremely flat. The problems today of depending on anything are such that few people do; this makes it relatively simple for the maintainer of a single port to just go ahead and upgrade the dependencies, too (organizational issues notwithstanding).

Perhaps your right. The nice thing is that in the case of libraries (elementtree for example) no upstream changes are needed to the code and the package can be eggified by the packager. This makes the whole thing a bit less of an issue.

...

But I am obviously no expert in these matters, so I defer to you here. I'm just saying that distribution packages that depend on more than one or two other packages are rare in Python today, and the things that do get depended on, tend to be frequently used, so when you do port a dependency, it significantly reduces the number of dependencies that *need* to be ported. Thus, I think that although the problem appears huge in potential, I think that the actual interconnectedness of the packages is probably quite small.

I guess we'll find out in the coming months as maintainers become more aware of the advantage of egg based development / deployment. Like I was saying earlier, this may be to our advantage as packages moving to eggs will nudge others in that direction as well.

...

...
...
Or maybe this could be done by metadata -- you put a legacy.py file in your egg-info, and when processing your egg's dependencies, if pkg_resources can't find a package you need, it would call a function in legacy.py that would check for the dependency using setuptools' inspection facilities, and return a path and a guess at a version number.

How does that sound?

That would solve my problem perfectly.

I'll give this some thought for the 0.5/0.6 releases, then.

Interestingly enough, this technique could possibly give someone the opportunity to do things like look for dynamic link libraries or headers, check operating system versions, etc.

Yea. I was thinking the same thing. Having a "package preload" area could be useful in a variety of ways.

...

...
...
I'm not sure exactly what you're trying to do here. If you just want to know if your script is running from a development location (and therefore needs to call require() to set up dependencies), couldn't you just check for 'MyPackage.egg-info' in the sys.path[0] (script) directory?

e.g.:

import sys, os if os.path.isdir(os.path.join(sys.path[0],"MyPackage.egg- info")): from pkg_resources import require require("MyPackage") # ensures dependencies get processed

If this is what you want, perhaps we can create a standard recipe in pkg_resources, like maybe 'script_package("MyPackage")', that only does the require if you're a development egg and not being run from run_main().

You didn't answer this, by the way.

I'm hoping this won't be necessary with legacy.py or some variation thereof. I wasn't trying to detect a development environment so much as I was trying to detect whether I was egg managed or some-external- package-manager managed. The idea was to limit require() calls to when my package was an egg or had egg meta data, otherwise assume some external package manager (or manual setup) is responsible for setting up sys.path.

...

...
I'd be happy to advocate to / work with packagers once we get a basic set of best practices together. It seems like there are a lot of options here - we just need to iron out the details.

Yeah; I think that basically the best approach for packaging systems will be to run EasyInstall during install and uninstall to modify the easyinstall.pth file. I also think that if in Python 2.5 we can change the bdist_* commands to create packages this way, then that should help, too.

I think so too. Wide changes to packages are not unexpected when the python version changes so if this did make it into 2.5, I might even be able to convince fedora-devel packagers to move python packages to eggs or at least provide .egg-info directories and calls to EasyInstall for Fedora Core 5. I'm not very close to the BSD ports or other packaging communities but I wouldn't have a problem advocating the same to them if things went well with fedora. Ryan Tomayko rtomayko@gmail.com http://naeblis.cx/rtomayko/

6889

Age (days ago)

6890

Last active (days ago)

List overview

Download

6 comments

2 participants

participants (2)

Phillip J. Eby
Ryan Tomayko