Re: [Python-Dev] [Distutils] Capsule Summary of Some Packaging/Deployment Technology Concerns

March 18, 2008

      We should probably move this off of Python-Dev, as we're getting into 
deep details now...

At 07:27 PM 3/18/2008 -0500, Dave Peterson wrote:
...
If you really wanted to do a full-tree intersection, it seems to me 
that the problem is detecting all the dependencies without having to 
spend significant time downloading/building in order to find them 
out.   This could be solved by simply extending the cheeseshop 
interface to export the set of requirements outside of the egg / 
tarball / etc.  We've done this for our own egg repository by 
extracting the appropriate meta-data files out of EGG-INFO and 
putting it into a separate file.  This info is also useful for users 
as it gives them an idea of how much *new* stuff is going to be 
installed (a la yum, apt-get, etc.)
...and now we're more directly competing with them, too.  The 
original idea Bob and I had was to do XML files ala Eclipse feature 
repositories, but then later I realized that for what we were doing, 
HTML was both adequate and already available.  However, I don't see a 
problem in principle with having "header" files available for this 
sort of thing.
...
With our ETS projects, we've run into problems with the current 
heuristic.  Perhaps we just don't know how to make it work like we want?
We have a set of projects that we want to be individually 
installable (to the extent that we limit cross-project dependencies) 
but we also want to make it easy to install the complete set.  We 
use a meta-egg for the latter.  It's purpose is only to specify the 
exact versions of each project that have been explicitly tested to 
work together -- you could almost think of it as a source control system tag.
I would think that as long as that meta-egg specifies *all* the 
required versions (right down to recursive dependencies), then there 
shouldn't be any problem.  Maybe it's me who's not understanding something?

I would think that you could get the appropriate data by running the 
tl.eggdeps tool.
...
A number of projects want to provide various types of files besides 
code in their distributable, and they'd like these to end up in 
standard locations for that type of file.  Think documentation, 
sample data, web templates, configuration settings, etc.   Each of 
these should be treated differently at installation time depending 
on platform.  On *nix, docs should go in /usr/share/doc whereas we 
might need to create a C:\Python2.5\docs on Windows.   With sample 
data and templates, you probably just want it accessible outside of 
the zipped egg so users can easily look at it, add to it, edit it, 
etc.  Configuration settings should be installed with some defaults 
into a standard configuration directory like /etc on *nix, etc.
Basically the issue is that it needs to be easier to include 
different sets of files into an egg for different actions to be 
taken during installation or packaging into an OS-specific distribution format.
Yes, it would be nice to define a metadata standard for including 
installable "datasets" either through copying or symlinking, 
optionally with entry points for running some code, too.  When you 
install an egg, these things could get added to a "post-install 
to-do" list, that you could then read to find out what steps to do, 
or invoke a tool on to actually do some of those steps.
...
But the docs for easy_install claim that the list of active eggs is 
maintained in easy-install.pth.  Also, if I create my own .pth file, 
and the user tries to update my version to a new one, will the 
easy_install tool modify my .pth file to remove the mention of the 
old version from my sys.path and put the new version in the same 
.pth file?  Or will it now be listed in both places?  Or will it 
only in easy-install.pth?
My understanding of the context of the question was that it applied 
to *system* packaging tools, which would be exclusively maintaining 
the .pth entries for the packages they installed.  i.e., a scenario 
with *no* easy-install.pth.  Setuptools will still detect the 
presence of their eggs, regardless of the means by which they're 
added to sys.path.  But it would not *maintain* those .pth files.
...
Yes, but as you've already pointed out, they've escaped into a 
larger ecosystem and this restriction is a severe limitation -- 
leading to significant frustration.  Especially as projects evolve 
and want to do something more complex than simply install pure 
Python code.  Here at Enthought, we use and ship a number of 
projects that have extensions and thus dynamic libraries that need 
to either be modified during installation to work from the user's 
installed location, or copied elsewhere on the system to avoid the 
need to modify (which we also can't do via an egg install) env 
variables, registries, etc.
By the way, there *is* experimental shared library building support 
in setuptools, and I recently heard from Andi Vajda that he was 
successful in using it in his JCC project to make available a C++ 
library for linkage from JCC-built projects.  (I'm also sitting on 
his patch that makes it work...)  I'm not sure that it actually fixes 
the larger problem, in that e.g., if the main project is installed by 
the system, and then you build or install an egg yourself.  But I 
think those problems are solvable.
...
We'd also love to be able to ship end-user enterprise-scale 
applications via eggs so that bug fixes and updates don't require 
downloading a monolithic 100MB+ installer.  But doing that requires 
the ability to update desktop icons, menus, etc. which we also 
can't do automatically via an egg.
Yep...  a good post-install mechanism would be handy for wx and 
pywin32 as well.
...
If you don't want the burden on setuptools to support, much less 
track, all these options, then perhaps it could just support 
automatic execution of a post-install script (and pre-uninstall 
script if uninstallation ever happens) that allows individual 
project developers to do what they need to do?  Let the burden of 
describing how those things happen and how to 
uninstall/relocate/update them fall to the provider of the projects 
that do them.
Yeah, that's what I really *don't* want.  I'd like to enable a more 
trustable mechanism than a blindly-executed script.  I'd rather see a 
standard that makes a developer document more, and have to at least 
*convince* the user that their post-install is worthwhile, even if 
the tool then makes it easy to run.

Better still, I'd rather have those post-install parts done in such a 
way that things like icons, menus, manifests, registry stuff, etc., 
have to get explicitly listed instead of being done programatically.
...
Also, IIUC, stow only tries to "contain" the hard files.  It puts 
links in multiple standard locations (for man pages, executables, 
libraries, etc.)   If setuptools supported these options, I don't 
think there'd be any discussion here except for things like "how do 
I extend the set of things the tool supports so that my foo-type 
files get linked into the standard /os/path/to/foo for the X os?"
Yep.  Having that would be a worthwhile thing, I think.  Discussion 
leading to specs is most welcome.
...
I should have read ahead.  This sounds close to what I've been 
describing except that this leads me to picture a script that 
prompts for install locations and allows the user to customize the 
destinations rather than one that assumes everything goes in a 
standard place.  I'm all for this, and the continuation of the 
ability to install an egg into a user-environment vs. a system-environment.
+1.
...
The only thing missing here is the ability for the installer to 
automatically run that script so that installation isn't a 
disjointed, two-step manual process that a user is prone to forgot 
to complete.
I don't see a problem with a prompting process, backed by a log file 
that records what post-install steps are pending, finished, or 
explicitly rejected by the user.

One possibility, by the way, is that we could overload "extras" for 
this purpose.  Entry points (such as those for scripts) can require 
extras; if extras could mean post-install components like docs or 
icons or what-have-you, then trying to run the script could result in 
an error message telling you you need to "easy_install 
foo_package[icons]" or whatever.
...
One of the features of Enthought's Enstaller extension to 
easy_install was that it looks for a post_install.py script in 
EGG-INFO and if one is found, runs it.  I would think that getting 
this into setuptools would be a significant step forward but I 
believe you previously rejected that idea.   We'll take a stab at 
creating a patch for you if you're more receptive to that idea 
now.  Just let me know.
No -- I'm not happy with a straight-up executable hook for 
post-install steps.  My evaluation of the state of PyPI is that I 
don't trust the community to write non-hazardous setup.py files, let 
alone post-install scripts.  There should be a high technical and 
social barrier to including post-install hooks with arbitrary code.

For example, if there was a required separation between installer 
tools and the things they install, such that any post-install 
operation had to be performed strictly by providing some 
human-readable data that will be passed to a separately-installed 
tool, and there was a high social stigma associated with writing your 
own post-install tool, then that might work.

So, for example, if the community creates an icons and menus 
installer tool for the various platforms, and then anybody can use it 
in their project by adding the right data, then the user doesn't have 
to fully trust arbitrary package authors, only the authors of the 
post-install tools.

I'm not saying that model is perfect; in fact I can see some 
potential pitfalls.  But once an automatic post-install hole is 
opened it will be *very* hard to close, because it will always be 
*easier* to roll your own crappy post-installer instead of 
contributing to a set of robust cross-project/cross-platform 
tools.  So I'd rather keep this particular "itch" in play and try to 
build up the scratching pressure until some people get together and 
pay attention long enough to solve the problem in a less hacky way.  :)
...
...
On the other hand, I've been puzzling over how to handle legitimate
post-install features.  On Windows, both wx and pywin32 have a real
need to do some actuall "install" operations.  Some is just copying
files, but pywin32 also has to do some registry stuff.  I don't know
how to allow just what's sensible, without opening up a huge can of
worms, though.
I think there are lots of situations that are legitimate (projects 
with extensions, projects that want to put icons on the desktop or 
in menus, projects that need to interact with a registry, projects 
that want to put configuration information somewhere other than in a 
zip file in a site-packages dir, etc.)   I think we should worry 
less about preventing developers from shooting themselves in the foot
It's the users' feet that I'm concerned with.  Some people are 
already paranoid about the fact that PyPI doesn't use SSL and code 
signing, or that easy_install uses the intarwebs at all.  I can just 
see the witch hunt when we start executing arbitrary code.  Unh 
unh.  No way am I letting that happen.  Nope.
...
and more about ensuring that they can hunt for food for their survival.
Right now, if you have a post-install script that's essential, you'll 
just have to convince your users to run it.  Which nicely keeps 
easy_install out of what should be a conversation between developer and user.

Enstaller is a different case - you are presumably installing an 
application, and the user is trusting your installer.  easy_install 
is something else altogether, and is used by other programs such as buildout.

Actually, I wonder if instead of trying to enhance setuptools for 
post-install, if maybe we should be looking at buildout recipes and 
maybe having a way for setuptools dependencies to point to buildout 
specs.  IIRC, buildout specs can be remotely retrieved from a single URL, too.
...
We can always tighten things down after seeing the usecases that 
develop, right?
Actually, no, we can't, since backward compatibility would keep us 
from removing the hook, once people rely on it.

I really feel yours (and others) pain on this issue, but it's one 
place where the users have to come first, and they need protection 
from the wilds of PyPI.  Distribution and installation issues are not 
first on most developers' minds, so the fact that someone writes a 
great library on PyPI doesn't mean they can write installers worth a 
crap.  Frankly, I wouldn't trust myself to write a correct 
post-installer on the first go -- perhaps *because* I have seen so 
many "simple" things go wrong.

Re: [Python-Dev] [Distutils] Capsule Summary of Some Packaging/Deployment Technology Concerns

Phillip J. Eby