[Distutils] setuptools-0.4a2: Eggs, scripts, and __file__

Ryan Tomayko rtomayko at gmail.com
Mon Jun 13 09:12:35 CEST 2005


On Jun 13, 2005, at 1:15 AM, Phillip J. Eby wrote:
>> The script looks like it would work properly if it was given a pseudo
>> filename but this has me thinking about what the best way to detect
>> development environments in scripts will look like in an eggified
>> environment.
>>
>
> That's the wrong question to ask, IMO.  Think about how to make the  
> script work exactly the same in all environments, instead.  :)

I'd love to except I can't assume setuptools and eggs-based  
dependencies in all environments at the moment. In particular, Linux  
distributions like Fedora probably won't be moving to egg based  
packaging for some time. If I'm lucky I might see python RPM  
maintainers phase in package.egg-info directories on top of the  
normal site-packages layout over the next few months. What this adds  
up to--if I'm not missing something--is that I can't assume require()  
is going to work. I need to be able to fallback into assuming that  
all dependencies will be laid out for me by some other package  
management system (in this case RPM).

I don't think the setuptools dependency will be hard to deal with but  
egg versions of other dependencies is probably going to be a problem  
for a little while. I can assume that require() will be there but I'd  
have to try/expect/pass on DependencyNotFound exceptions or  
something. What I'd prefer is to keep require() out of the code  
completely and use .egg-info/depends.txt instead. If I'm running out  
of an egg, I want setuptools to manage requiring everything before my  
script is even called. This should give me all of the benefits of  
eggs when I'm using them and fallback to the old-style manual  
dependency management otherwise. Does that make sense?

> Have you read this:
>
> http://peak.telecommunity.com/DevCenter/PythonEggs#developing-with- 
> eggs
>
> The complexity you're incurring here is unnecessary; between require 
> () and .pth files you should never need to mess with sys.path  
> manually.

I've absolutely read it and agree completely with the concept. I  
can't assume that require() will work in all scenarios, however. But  
your conclusion is still valid I think. If I move to egg dependencies  
in development and assume that either setuptools or some other  
package management utility will setup sys.path correctly, I should be  
able to get rid of manual sys.path hackery.

>> At first I thought I should switch from using path operations on
>> __file__ to using `pkg_resources.resource_isdir` and
>> `resource_filename` but that doesn't make any sense - if the script
>> is running from within a deployed egg, I'm not using it from a
>> development environment
>
> Not true; see the link above.

I should have been more clear. I was speaking to when my script is  
being run from EGG-INFO/scripts/somescript as opposed to [devel- 
package]/scripts/somescript or /usr/bin/somescript (deployed via  
RPM). Where the script file *is* provides the information needed to  
determine whether/how to setup sys.path. The resource_*** functions  
provide no information about where the script actually lives and so  
the entire exercise of moving that code to use those functions was in  
vain. The point is moot at any rate as I don't think I'll be needing  
sys.path munging anymore.

> When you do development from your distutils package root, your  
> development code *is* in an egg.  However, you still shouldn't be  
> checking __file__ or fiddling with sys.path, and there is no need  
> anyway.  Here's one idea of what your tree could look like:
>
> <snip file layout>
>
> So, you create a setup.py for your package, and you run setup.py  
> bdist_egg; this will dump an egg in dist/, and create MyPackage.egg- 
> info, marking devel_dir as a "development egg".
>
> Install any other packages you need to the current directory using  
> 'easy_install -xd. package_you_need' (the -x excludes their  
> scripts).  Now you're ready to party.  Make your scripts use  
> 'require()' to ask for 'MyPackage'; when you run them (whether you  
> are in the devel_dir or not), they will find MyPackage.egg-info,  
> find the dependencies declared, and add all the needed .eggs to  
> sys.path automatically.

So this is where I need to figure something out because I'd like to  
either not use require() in those scripts or will need to try/except/ 
pass around DependencyNotFound exceptions in cases where eggs won't  
be available for dependencies. Or maybe...

When I require('MyPackage'), does setuptools look at MyPackage.egg- 
info/depends.txt and require everything else for me? I'm assuming it  
does and don't see why it wouldn't. If that's the case, I might be  
able to make my scripts as simple as::

     from pkg_resources import require, find_distributions
     if list(find_distributions('MyPackage')):
         require('MyPackage')
     import MyPackage
     MyPackage.main()

If find_distributions() yields any results then we can assume that  
we're running as an egg, if not we assume that we're running old- 
school and that some other package manager has laid everything out  
nicely already.

The downside to this approach is that I would have to be sure to NOT  
distribute MyPackage.egg-info with RPMs and other packages, which  
kind of rules out any phased approach to bringing egg based packaging  
to Fedora's stock RPMs. It might be better to just patch some flag  
into my script during the RPM build that would tell it whether to use  
require or not::

     use_require = 1
     if use_require:
         require('MyPackage')
     import MyPackage
     MyPackage.main()

The RPM spec would have to patch that use_require line to be zero but  
that's a single call to sed. If that's all the finagling I have to do  
in the spec file it would be a good day.

I don't know - none of these seem to be perfect solutions, but none  
of them would have taken me as much time to implement as writing this  
email either. Still, it seems worth pointing out that keeping the  
number of code level require() calls to a minimum and having some way  
of switching those few calls off and on based on environment is  
something packages that need to be included in a non-egg-based  
distribution will need to think about.

> Does this explain it better?  One side benefit of egg-based  
> installation is that you can dump as many libraries in site- 
> packages as you want and not worry about version conflicts, so it's  
> definitely how I plan to do most development.  The directory where  
> a script is located, however, takes precedence over site-packages,  
> which means that even if you have the package you're developing  
> already installed in site-packages, your development egg will take  
> precedence if the script you're running is in that directory.

Right. I think that makes a lot of sense and will definitely be  
moving to eggs in development as you've described.

>>  and the resource_*** functions don't make
>> sense in __main__ context anyway. So my current thinking is that the
>> existing idiom should remain and that a pseudo filename shouldn't
>> pose any problems in the scenarios I'm dealing with:
>>
>>  * Deployed egg: don't tamper with sys.path
>>  * Deployed site-packages: don't tamper with sys.path
>>  * Development environment: insert development paths
>>
>
> You should be able to make that last one work the same way; i.e.,  
> without tampering with sys.path.  If you can't, please explain your  
> situation further, because I want pkg_resources to be able to  
> prevent all future sys.path munging by anything but EasyInstall  
> itself, and by extensible applications that have to manage plugin  
> directories.

No, I think that covers sys.path munging. I'm still a little shaky on  
how I should know whether to rely on require() or not but we'll see  
what happens.

Ryan Tomayko
                                  rtomayko at gmail.com
                                  http://naeblis.cx/rtomayko/



More information about the Distutils-SIG mailing list