[Distutils] [Python-Dev] Capsule Summary of Some Packaging/Deployment Technology Concerns
Dave Peterson
dpeterson at enthought.com
Wed Mar 19 01:27:01 CET 2008
Phillip J. Eby wrote:
> At 05:10 PM 3/17/2008 -0500, Jeff Rush wrote:
>
>
>> 1. Many felt the existing dependency resolver was not correct. They wanted a
>> full tree traversal resulting in an intersection of all restrictions,
>> instead of a first-acceptable-solution approach taking now, which can
>> result in top-level dependencies not being enforced upon lower
>> levels. The
>> latter is faster however. One solution would be to make the resolver
>> pluggable.
>>
>
> Patches welcome, on both counts. Personally, Bob and I originally
> wanted a full-tree intersection, too, but it turned out to be hairier
> to implement than it seems at first. My guess is that none of the
> people who want it, have actually tried to implement it without a
> factorial or exponential O(). But that doesn't mean I'll be unhappy
> if somebody succeeds. :)
>
I think we'd make significant progress by just intersecting the
dependencies we know about as we progress through the dependency tree.
For example, if A requires B==2 and C==3, and if B requires C>=2,<=4,
then at the time we install A we'd pick C==3 and also at the time we
install B we'd pick C==3. As opposed to the current scheme that would
choose C==4 for the latter case. This would allow dependent projects
(think applications here) to better control the versions of the full set
of libraries they use. Things would still fail (like they do now) if
you ran across dependencies that had no intersection or if you
encountered a new requirement after the target projected was already
installed.
If you really wanted to do a full-tree intersection, it seems to me that
the problem is detecting all the dependencies without having to spend
significant time downloading/building in order to find them out. This
could be solved by simply extending the cheeseshop interface to export
the set of requirements outside of the egg / tarball / etc. We've done
this for our own egg repository by extracting the appropriate meta-data
files out of EGG-INFO and putting it into a separate file. This info is
also useful for users as it gives them an idea of how much *new* stuff
is going to be installed (a la yum, apt-get, etc.)
> In other words, we attempt to achieve heuristically what's being
> proposed to do algorithmically. And my guess is that whatever cases
> the heuristic is failing at, would probably not be helped by an
> algorithmic approach either. But I would welcome some actual data, either way.
>
With our ETS projects, we've run into problems with the current
heuristic. Perhaps we just don't know how to make it work like we want?
We have a set of projects that we want to be individually installable
(to the extent that we limit cross-project dependencies) but we also
want to make it easy to install the complete set. We use a meta-egg for
the latter. It's purpose is only to specify the exact versions of each
project that have been explicitly tested to work together -- you could
almost think of it as a source control system tag. Whereas on the
individual projects, we explicitly want to ensure that people get the
latest possible release of each required API so the version requirements
are wider here. This setup causes problems whenever we release new
versions of projects because it seems easy_install ignores the meta-egg
exact versions when it gets down into a project and comes across a wider
cross-project dependency. We ended up having to give up on the ranges
in the cross-project dependencies and synchronize them to the same
values in the meta-egg dependencies. There are numerous side-effects
of this that we don't like but we haven't found a way around it.
> Again, though, patches are welcome. :) (Specifically, for the
> trunk; I don't see a resolver overhaul as being suitable for the 0.6
> stable branch.)
>
We're planning to pursue this (for the above mentioned strategy) as soon
as we work ourselves out of a bit of a backlog of other things to do.
>> 2. People want a solution for the handling of documentation. The distutils
>> module has had commented out sections related to this for several years.
>>
>
> As with so many other things, this gets tossed around the
> distutils-sig every now and then. A couple of times I've thrown out
> some options for how this might be done, but then the conversation
> peters out around the time anybody would have to actually do some
> work on it. (Me included, since I don't have an itch that needs
> scratching in this area.)
>
> In particular, if somebody wants to come up with a metadata standard
> for including documentation in eggs, we've got a boatload of hooks by
> which it could be done. Nothing's stopping anybody from proposing a
> standard and building a tool, here. (e.g. using the setuptools
> command hook, .egg-info writer hook, etc.)
Enthought has started an effort (it's currently one of two things in our
ETSProjectTools project at
https://svn.enthought.com/svn/enthought/ETSProjectTools/trunk) and we're
experimenting with our solution before proposing it as a patch. We'd
love some more help if anyone wants to participate.
>> 3. A more flexible internal handing of the different types of files is needed.
>> Currently the code, data, lib, etc. files are aggregated at
>> build time and
>> people would like them to be kept separate until install/packaging time.
>>
>
> I don't know what this means, exactly.
>
A number of projects want to provide various types of files besides code
in their distributable, and they'd like these to end up in standard
locations for that type of file. Think documentation, sample data, web
templates, configuration settings, etc. Each of these should be
treated differently at installation time depending on platform. On
*nix, docs should go in /usr/share/doc whereas we might need to create a
C:\Python2.5\docs on Windows. With sample data and templates, you
probably just want it accessible outside of the zipped egg so users can
easily look at it, add to it, edit it, etc. Configuration settings
should be installed with some defaults into a standard configuration
directory like /etc on *nix, etc.
Basically the issue is that it needs to be easier to include different
sets of files into an egg for different actions to be taken during
installation or packaging into an OS-specific distribution format.
>> The other is the use of a single .pth file to control the list
>> of activated
>> packages. Those who produce distributions would prefer a magic directory
>> into which links to distributions could be dropped, similar to
>> the current
>> best practices for Linux, with /etc/conf.d/, /etc/profile.d/,
>> /etc/xinetd.d/ and so forth.
>>
>
> site-packages is that directory, and has been since long before
> setuptools. Just drop uniquely-named .pth files there, and you're good to go.
>
But the docs for easy_install claim that the list of active eggs is
maintained in easy-install.pth. Also, if I create my own .pth file, and
the user tries to update my version to a new one, will the easy_install
tool modify my .pth file to remove the mention of the old version from
my sys.path and put the new version in the same .pth file? Or will it
now be listed in both places? Or will it only in easy-install.pth?
>> 7. Many wanted to ability to install files anywhere in the install tree and
>> not just under the Python package. Under distutils this was possible but
>> it was removed in setuptools for security reasons.
>>
>
> It wasn't security, it was manageability. Egg-based installation
> means containment, (analagous to GNU stow) and therefore portability
> and disposability of plugins. (Which again is what eggs were really
> developed for in the first place.)
>
Yes, but as you've already pointed out, they've escaped into a larger
ecosystem and this restriction is a severe limitation -- leading to
significant frustration. Especially as projects evolve and want to do
something more complex than simply install pure Python code. Here at
Enthought, we use and ship a number of projects that have extensions and
thus dynamic libraries that need to either be modified during
installation to work from the user's installed location, or copied
elsewhere on the system to avoid the need to modify (which we also can't
do via an egg install) env variables, registries, etc. We'd also love
to be able to ship end-user enterprise-scale applications via eggs so
that bug fixes and updates don't require downloading a monolithic 100MB+
installer. But doing that requires the ability to update desktop icons,
menus, etc. which we also can't do automatically via an egg.
If you don't want the burden on setuptools to support, much less track,
all these options, then perhaps it could just support automatic
execution of a post-install script (and pre-uninstall script if
uninstallation ever happens) that allows individual project developers
to do what they need to do? Let the burden of describing how those
things happen and how to uninstall/relocate/update them fall to the
provider of the projects that do them.
Also, IIUC, stow only tries to "contain" the hard files. It puts links
in multiple standard locations (for man pages, executables, libraries,
etc.) If setuptools supported these options, I don't think there'd be
any discussion here except for things like "how do I extend the set of
things the tool supports so that my foo-type files get linked into the
standard /os/path/to/foo for the X os?"
>> Custom code can still
>> be written to do this explicitly but this is not popular.
>>
>
> No kidding. :) Current best practice is to include a script or
> module in the package that can install other files to a designated
> location. Personally, though, I tend to view applications and
> libraries that target specific install locations to be overreaching
> their bounds, and stepping into sysadmin territory. Give me the
> tools to install the data, don't just dump it somewhere on my system
> where *you* think it should go, in other words.
>
I should have read ahead. This sounds close to what I've been
describing except that this leads me to picture a script that prompts
for install locations and allows the user to customize the destinations
rather than one that assumes everything goes in a standard place. I'm
all for this, and the continuation of the ability to install an egg into
a user-environment vs. a system-environment.
The only thing missing here is the ability for the installer to
automatically run that script so that installation isn't a disjointed,
two-step manual process that a user is prone to forgot to complete.
One of the features of Enthought's Enstaller extension to easy_install
was that it looks for a post_install.py script in EGG-INFO and if one is
found, runs it. I would think that getting this into setuptools would
be a significant step forward but I believe you previously rejected that
idea. We'll take a stab at creating a patch for you if you're more
receptive to that idea now. Just let me know.
> On the other hand, I've been puzzling over how to handle legitimate
> post-install features. On Windows, both wx and pywin32 have a real
> need to do some actuall "install" operations. Some is just copying
> files, but pywin32 also has to do some registry stuff. I don't know
> how to allow just what's sensible, without opening up a huge can of
> worms, though.
>
I think there are lots of situations that are legitimate (projects with
extensions, projects that want to put icons on the desktop or in menus,
projects that need to interact with a registry, projects that want to
put configuration information somewhere other than in a zip file in a
site-packages dir, etc.) I think we should worry less about preventing
developers from shooting themselves in the foot and more about ensuring
that they can hunt for food for their survival. We can always tighten
things down after seeing the usecases that develop, right?
-- Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/distutils-sig/attachments/20080318/ac9b10a3/attachment-0001.htm
More information about the Distutils-SIG
mailing list