[Distutils] A plan for scripts (in EasyInstall)

Mon Jun 6 16:44:19 CEST 2005

Here's my semi-final plan for doing scripts with EasyInstall; please let me 
know if you see any issues with it:

* Make bdist_egg stick the scripts under an EGG-INFO/scripts/ subdirectory 
of the egg, by abusing the install_scripts command.  (Note: this does *not* 
mean they will be copied to 'PackageName.egg-info', only that they will be 
included in the built egg file.  Currently, subdirectories of .egg-info are 
not included in the built egg, so this is completely separate.)

* Add 'metadata_isdir' and 'metadata_listdir' APIs to pkg_resources, to 
allow inspecting the contents of EGG-INFO/scripts

* Add these options to the easy_install command:

   --exclude-scripts, -x     Don't install scripts
   --scripts-to=DIR, -t DIR  Install scripts to DIR

The --scripts-to option would default to being the *same as the 
installation directory*, unless you're installing to site-packages, in 
which case they would go to the Python default location for installing scripts.

Why the installation directory?  Because if you run the scripts from there, 
the egg(s) will be in a directory on sys.path, meaning that 'require()' 
will then work.  In essence, I'm assuming that the normal use case for 
specifying an install-dir is to create an "application directory" filled 
with the eggs needed to run a specific application.  For example, on a 
Unix-like system you might be installing to your ~/bin.

The downside to this assumption is that since the scripts are in the same 
place, they might become importable when that's not intended.  So if you're 
installing to a personal ~/lib/python, you will probably want to use -x or 
-t to override.

Anyway, if you are installing scripts, easy_install will just do what the 
distutils does now to install them in their specified locations.  This 
basically means giving them executable permissions and munging the #! line 
if applicable.  On a multi-version install (-m or -i), it seems like we 
should also add a line like:

   from pkg_resources import require; require("thispackage==version"); del 
require

But, as I've pointed out before, it's a tricky modification, as it would 
need to be inserted just before the first *executable* line of code in the 
script, which is often not the first line.  Docstrings and __future__ 
statements both have to be skipped, or else the script could be broken by 
the modification.  Further, even a successful modification is going to 
change the script's line numbering, which could have tech support 
implications.  So, I'm somewhat reluctant to do this without a way to turn 
it off (other than by skipping scripts).  There also needs to be a way to 
verify that the script is in fact a Python script!  (Presumably by checking 
for a .py/.pyw extension or a #! line containing "python".)

Also, adding such a 'require()' line might be more restrictive than 
necessary; the script might include its own require() already!

So, here's an alternative possibility.  Let's suppose that the script *is* 
a Python script.  What if we ran it from inside the egg?  We could write 
out the script as a stub loader, looking something like this:

     #!python   <-- copied from original script, if present
     import pkg_resources
     pkg_resources.run_main("scriptname", "EggName")

The 'run_main' function would do several things:

  * require() the appropriate package(s)

  * Clear everything but __name__ from the __main__ namespace

  * Load the script file into memory, and "poke" it into linecache.cache, 
so that tracebacks from the script execution will still show its source 
code, with correct line numbers

  * exec the script in __main__, using something like:

     maindict['__file__'] = pseudo_filename
     code = compile(script_source, pseudo_filename, "exec")
     exec code in maindict, maindict

Hm.  execfile() could also be used if the script file actually exists, in 
which case we could also skip the seeding of linecache.  Probably we can 
add a run_script() method to the IMetadataProvider interface, so that 
different egg formats can handle this appropriately.

Now that we've come this far, it becomes clear that these "scripts" are 
nothing more than bootstraps -- which means that in a future version I can 
imagine allowing more user-friendly installation options, like .exe files 
on Windows, "applications" on the Mac, and extension-stripping everywhere 
else.  However, it may be that the choice of script installation policy is 
largely a matter of vehement personal preference, so there should probably 
be a way to configure that.  It could also include a way to define custom 
installation policies in Python modules, and a way to select a particular 
policy at runtime, e.g.:

    easy_install --script-policy=mymodule.foo_policy ...

At this point, however, easy_install options will have gotten complex 
enough to warrant configuration files for standard settings.  We could 
probably hijack the existing distutils configuration scheme for that, 
though, treating 'easy_install' as if it were a distutils command.

Whew.  I think that about covers it.  Thoughts, anyone?