[Distutils] [Python-Dev] Capsule Summary of Some Packaging/Deployment Technology Concerns

Thu Mar 20 21:25:44 CET 2008

Phillip J. Eby wrote:
> We should probably move this off of Python-Dev, as we're getting into 
> deep details now...

Done.  Only in distutils-sig now.

> At 07:27 PM 3/18/2008 -0500, Dave Peterson wrote:
>> If you really wanted to do a full-tree intersection, it seems to me 
>> that the problem is detecting all the dependencies without having to 
>> spend significant time downloading/building in order to find them 
>> out.   This could be solved by simply extending the cheeseshop 
>> interface to export the set of requirements outside of the egg / 
>> tarball / etc.  We've done this for our own egg repository by 
>> extracting the appropriate meta-data files out of EGG-INFO and 
>> putting it into a separate file.  This info is also useful for users 
>> as it gives them an idea of how much *new* stuff is going to be 
>> installed (a la yum, apt-get, etc.)
>
> ...and now we're more directly competing with them, too.  The original 
> idea Bob and I had was to do XML files ala Eclipse feature 
> repositories, but then later I realized that for what we were doing, 
> HTML was both adequate and already available.  However, I don't see a 
> problem in principle with having "header" files available for this 
> sort of thing.

It seems from latter discussions that Martin v. Löwis agrees that this 
is a reasonable thing to do.  I'll see if I can find time to work on a 
patch to PyPI.  Not having looked at that code before at all, it might 
take me awhile.

>> With our ETS projects, we've run into problems with the current 
>> heuristic.  Perhaps we just don't know how to make it work like we want?
>>
>> We have a set of projects that we want to be individually installable 
>> (to the extent that we limit cross-project dependencies) but we also 
>> want to make it easy to install the complete set.  We use a meta-egg 
>> for the latter.  It's purpose is only to specify the exact versions 
>> of each project that have been explicitly tested to work together -- 
>> you could almost think of it as a source control system tag.
>
> I would think that as long as that meta-egg specifies *all* the 
> required versions (right down to recursive dependencies), then there 
> shouldn't be any problem.  Maybe it's me who's not understanding 
> something?

It actually does specify all the required versions, including those 
recursive dependencies, but we were still getting breakages when new 
versions were released. :-(   I think I explained what we were seeing in 
my original e-mail though it sounds like you're saying that shouldn't be 
possible, right?

> I would think that you could get the appropriate data by running the 
> tl.eggdeps tool.

Getting the version data isn't a problem at all for us, but thanks for 
the pointer to an interesting project.  (We have an internal project 
that actually analyzes import statements within the code within a 
project to ensure that the documented dependencies in a setup.py match 
the declared ones and this solves the problem for us.)

>> A number of projects want to provide various types of files besides 
>> code in their distributable, and they'd like these to end up in 
>> standard locations for that type of file.  Think documentation, 
>> sample data, web templates, configuration settings, etc.   Each of 
>> these should be treated differently at installation time depending on 
>> platform.  On *nix, docs should go in /usr/share/doc whereas we might 
>> need to create a C:\Python2.5\docs on Windows.   With sample data and 
>> templates, you probably just want it accessible outside of the zipped 
>> egg so users can easily look at it, add to it, edit it, etc.  
>> Configuration settings should be installed with some defaults into a 
>> standard configuration directory like /etc on *nix, etc.
>>
>> Basically the issue is that it needs to be easier to include 
>> different sets of files into an egg for different actions to be taken 
>> during installation or packaging into an OS-specific distribution 
>> format.
>
> Yes, it would be nice to define a metadata standard for including 
> installable "datasets" either through copying or symlinking, 
> optionally with entry points for running some code, too.  When you 
> install an egg, these things could get added to a "post-install to-do" 
> list, that you could then read to find out what steps to do, or invoke 
> a tool on to actually do some of those steps.

I agree.   Let's get that setuptools wiki started and start documenting 
some of these ideas as a roadmap so that anyone who wants to help out 
has an idea of what to work on, or factor into what they're currently 
working on.

>> But the docs for easy_install claim that the list of active eggs is 
>> maintained in easy-install.pth.  Also, if I create my own .pth file, 
>> and the user tries to update my version to a new one, will the 
>> easy_install tool modify my .pth file to remove the mention of the 
>> old version from my sys.path and put the new version in the same .pth 
>> file?  Or will it now be listed in both places?  Or will it only in 
>> easy-install.pth?
>
> My understanding of the context of the question was that it applied to 
> *system* packaging tools, which would be exclusively maintaining the 
> .pth entries for the packages they installed.  i.e., a scenario with 
> *no* easy-install.pth.  Setuptools will still detect the presence of 
> their eggs, regardless of the means by which they're added to 
> sys.path.  But it would not *maintain* those .pth files.

I may be confusing the issue then.  I was under the impression that 
system packaging tools would want to install things such that anyone 
used to using setuptools would be able to see the effects of that 
installation in the same way as if it was done via easy_install. i.e. if 
I wanted to temporarily remove it for testing something or other, I 
could de-activate it; or if I wanted to install a second optional 
version of it, I could use easy_install to do so without worrying about 
tracking down the right .pth file.

>> Yes, but as you've already pointed out, they've escaped into a larger 
>> ecosystem and this restriction is a severe limitation -- leading to 
>> significant frustration.  Especially as projects evolve and want to 
>> do something more complex than simply install pure Python code.  Here 
>> at Enthought, we use and ship a number of projects that have 
>> extensions and thus dynamic libraries that need to either be modified 
>> during installation to work from the user's installed location, or 
>> copied elsewhere on the system to avoid the need to modify (which we 
>> also can't do via an egg install) env variables, registries, etc.
>
> By the way, there *is* experimental shared library building support in 
> setuptools, and I recently heard from Andi Vajda that he was 
> successful in using it in his JCC project to make available a C++ 
> library for linkage from JCC-built projects.  (I'm also sitting on his 
> patch that makes it work...)  I'm not sure that it actually fixes the 
> larger problem, in that e.g., if the main project is installed by the 
> system, and then you build or install an egg yourself.  But I think 
> those problems are solvable.

I'm not sure your description matches what we're trying to do here, but 
I can figure that out better from looking at the code.   Is this in the 
0.6 versions or 0.7a?  And where should I start looking at module-wise?

>>    We'd also love to be able to ship end-user enterprise-scale 
>> applications via eggs so that bug fixes and updates don't require 
>> downloading a monolithic 100MB+ installer.  But doing that requires 
>> the ability to update desktop icons, menus, etc. which we also can't 
>> do automatically via an egg.
>
> Yep...  a good post-install mechanism would be handy for wx and 
> pywin32 as well.

Enthought has started a project to provide an API abstraction of doing 
the desktop icon / menu setup on multiple platforms (Windows, Gnome, 
KDE, and hopefully soon OSX) for both system and user installs.   We use 
it in our EPD product (Enthought Python Distribution.)  We could 
probably work on getting this into a more public form with some hints as 
to whether it should be done as a plugin, patch, separate project, etc.

>> If you don't want the burden on setuptools to support, much less 
>> track, all these options, then perhaps it could just support 
>> automatic execution of a post-install script (and pre-uninstall 
>> script if uninstallation ever happens) that allows individual project 
>> developers to do what they need to do?  Let the burden of describing 
>> how those things happen and how to uninstall/relocate/update them 
>> fall to the provider of the projects that do them.
>
> Yeah, that's what I really *don't* want.  I'd like to enable a more 
> trustable mechanism than a blindly-executed script.  I'd rather see a 
> standard that makes a developer document more, and have to at least 
> *convince* the user that their post-install is worthwhile, even if the 
> tool then makes it easy to run.

I'm not sure what you mean by "convince".   If you simply mean that the 
post-install has to default to not doing anything unless the user 
responds in the affirmative to some prompt, then I guess I could live 
with that.   If you mean that other documentation has to convince them 
to run a command, then I think that leads to the issue I was directly 
worried about, which is people complaining because something isn't 
working because they forgot to run the post-install.

My other concern here is how this chains through dependencies.  If I 
install the ETS meta-egg mentioned above, and that causes 4 other eggs 
to install that all have post-install requirements, I'd hate to have the 
user have to step through the same sort of prompts 4 times.   I guess 
this is what you were referring to above by a list of post-install 
tasks, but I just want to be sure.

> Better still, I'd rather have those post-install parts done in such a 
> way that things like icons, menus, manifests, registry stuff, etc., 
> have to get explicitly listed instead of being done programatically.

I assume you mean by declaring lists or dictionaries within the setup.py 
that then get stored as meta-data within a file in EGG-INFO and then get 
acted on during install.  If so, then yes, I'm all for that idea too.

>> Also, IIUC, stow only tries to "contain" the hard files.  It puts 
>> links in multiple standard locations (for man pages, executables, 
>> libraries, etc.)   If setuptools supported these options, I don't 
>> think there'd be any discussion here except for things like "how do I 
>> extend the set of things the tool supports so that my foo-type files 
>> get linked into the standard /os/path/to/foo for the X os?"
>
> Yep.  Having that would be a worthwhile thing, I think.  Discussion 
> leading to specs is most welcome.

I thought I was starting that already. :-)  Or were you saying that it 
needed to happen somewhere else?

>> I should have read ahead.  This sounds close to what I've been 
>> describing except that this leads me to picture a script that prompts 
>> for install locations and allows the user to customize the 
>> destinations rather than one that assumes everything goes in a 
>> standard place.  I'm all for this, and the continuation of the 
>> ability to install an egg into a user-environment vs. a 
>> system-environment.
>
> +1.
>
>
>> The only thing missing here is the ability for the installer to 
>> automatically run that script so that installation isn't a 
>> disjointed, two-step manual process that a user is prone to forgot to 
>> complete.
>
> I don't see a problem with a prompting process, backed by a log file 
> that records what post-install steps are pending, finished, or 
> explicitly rejected by the user.
>
> One possibility, by the way, is that we could overload "extras" for 
> this purpose.  Entry points (such as those for scripts) can require 
> extras; if extras could mean post-install components like docs or 
> icons or what-have-you, then trying to run the script could result in 
> an error message telling you you need to "easy_install 
> foo_package[icons]" or whatever.

While I can see many nice things about using extras for delivery of 
docs, icons, etc. (including reduced size for those who don't want or 
need them,)  I'm not thrilled with the idea of a user getting a message 
saying to run "easy_install ..." anything for them to be installed.   
Couldn't we just have the post-install actually run the easy_install 
command once they accepted the installation of the icons, etc?

>> One of the features of Enthought's Enstaller extension to 
>> easy_install was that it looks for a post_install.py script in 
>> EGG-INFO and if one is found, runs it.  I would think that getting 
>> this into setuptools would be a significant step forward but I 
>> believe you previously rejected that idea.   We'll take a stab at 
>> creating a patch for you if you're more receptive to that idea now.  
>> Just let me know.
>
> No -- I'm not happy with a straight-up executable hook for 
> post-install steps.  My evaluation of the state of PyPI is that I 
> don't trust the community to write non-hazardous setup.py files, let 
> alone post-install scripts.  There should be a high technical and 
> social barrier to including post-install hooks with arbitrary code.

Ouch.  That seems a pretty harsh indictment.   :-)

> For example, if there was a required separation between installer 
> tools and the things they install, such that any post-install 
> operation had to be performed strictly by providing some 
> human-readable data that will be passed to a separately-installed 
> tool, and there was a high social stigma associated with writing your 
> own post-install tool, then that might work.
>
> So, for example, if the community creates an icons and menus installer 
> tool for the various platforms, and then anybody can use it in their 
> project by adding the right data, then the user doesn't have to fully 
> trust arbitrary package authors, only the authors of the post-install 
> tools.
>
> I'm not saying that model is perfect; in fact I can see some potential 
> pitfalls.  But once an automatic post-install hole is opened it will 
> be *very* hard to close, because it will always be *easier* to roll 
> your own crappy post-installer instead of contributing to a set of 
> robust cross-project/cross-platform tools.  So I'd rather keep this 
> particular "itch" in play and try to build up the scratching pressure 
> until some people get together and pay attention long enough to solve 
> the problem in a less hacky way.  :)

I can see what you're saying though I think it cuts off those who need 
to prove the usecase before writing a tool to support it.   Perhaps we'd 
get more scratching pressure for standardizing (safely) some of these 
things if people were free to experiment. :-)  

Anyway, since Enthought is already scratching, I'm fine with the idea of 
building a standard way to do it that is driven by human-readable 
data.   We just need to setup the process to allow that to happen.   So 
far I haven't seen any responses from you in regards to the setup of an 
issue/patch tracker, wiki, process to open up the number of commiters, 
etc. that gives me any confidence I'm not heading off down the wrong 
path somehow.  Perhaps I'm too cautious?

>>> On the other hand, I've been puzzling over how to handle legitimate
>>> post-install features.  On Windows, both wx and pywin32 have a real
>>> need to do some actuall "install" operations.  Some is just copying
>>> files, but pywin32 also has to do some registry stuff.  I don't know
>>> how to allow just what's sensible, without opening up a huge can of
>>> worms, though.
>>>
>>
>> I think there are lots of situations that are legitimate (projects 
>> with extensions, projects that want to put icons on the desktop or in 
>> menus, projects that need to interact with a registry, projects that 
>> want to put configuration information somewhere other than in a zip 
>> file in a site-packages dir, etc.)   I think we should worry less 
>> about preventing developers from shooting themselves in the foot
>
> It's the users' feet that I'm concerned with.  Some people are already 
> paranoid about the fact that PyPI doesn't use SSL and code signing, or 
> that easy_install uses the intarwebs at all.  I can just see the witch 
> hunt when we start executing arbitrary code.  Unh unh.  No way am I 
> letting that happen.  Nope.

Though if we had https, code signing, et al, then they'd be trusting the 
signers of the source anyway and not just "arbitrary code".   That 
doesn't seem bad to me.  "If I trust their code to run on my system, why 
not trust the post-install code as well?"

>>  and more about ensuring that they can hunt for food for their survival.
>
> Right now, if you have a post-install script that's essential, you'll 
> just have to convince your users to run it.  Which nicely keeps 
> easy_install out of what should be a conversation between developer 
> and user.

And how do I do that?  So few users read the documentation to begin 
with, or our wiki, or anything else.  Is there some meta-data we're able 
to be put into our eggs / setup.py that displays when the user installs 
them.  And which doesn't scroll by or get buried in an avalanche of 
cascading dependencies?

> Enstaller is a different case - you are presumably installing an 
> application, and the user is trusting your installer.  easy_install is 
> something else altogether, and is used by other programs such as 
> buildout.

I think there may be some misunderstanding here.  Enstaller is how we 
are distributing third-party libraries as binaries for a community of 
users, as well as our own code libraries, and only finally for 
applications.  Yes, you could view it as doing this primarily for larger 
applications but we have a number of people who use it just to get wx, 
VTK, etc. on platforms like Windows and OSX, as well as those who use it 
to get user-space installs of wx, VTK, etc. on Linux.

I'm failing to see how trusting Enstaller is different than trusting 
easy_install.   I wouldn't hold either responsible for what happened if 
I installed a package built by someone else that mis-used the 
features.   Just like I'm careful to not always blame MS if some other 
application, installed via an .msi, messes up my copy of Windows. :-)

> Actually, I wonder if instead of trying to enhance setuptools for 
> post-install, if maybe we should be looking at buildout recipes and 
> maybe having a way for setuptools dependencies to point to buildout 
> specs.  IIRC, buildout specs can be remotely retrieved from a single 
> URL, too.

I'll need to read up more on buildout to understand this, but my 
understanding was that buildout was not something a user ran to install 
an app, but rather something the developer ran to build and publish an 
app.  The end result of a 'production' buildout is to generate a large 
tarball or rpm that included everything, right?   If so, this goes 
directly against what Enthought was aiming for, which was to allow 
delivery of bug-fixes and minor updates in a large app by downloading 
only smaller units instead of a huge monolithic re-install of everything.  

Having typed that up though, I'm thinking we're probably abusing eggs 
for something that rightly ought to be delivered as an application 
directory scoped patch.

>>    We can always tighten things down after seeing the usecases that 
>> develop, right?
>
> Actually, no, we can't, since backward compatibility would keep us 
> from removing the hook, once people rely on it.
>
> I really feel yours (and others) pain on this issue, but it's one 
> place where the users have to come first, and they need protection 
> from the wilds of PyPI.  Distribution and installation issues are not 
> first on most developers' minds, so the fact that someone writes a 
> great library on PyPI doesn't mean they can write installers worth a 
> crap.  Frankly, I wouldn't trust myself to write a correct 
> post-installer on the first go -- perhaps *because* I have seen so 
> many "simple" things go wrong.

Hell, people can't even write correct code on the first go otherwise we 
wouldn't have bugs in every app, os, and driver.  However, people do fix 
things over time and eventually get it right or else their project dies 
because no one wants to deal with the pain.   Why is python installation 
any different? :-)

-- Dave