[Web-SIG] A Python Web Application Package and Format

Tue Apr 12 05:03:16 CEST 2011

Eric,

Let me rephrase a few things.

On 2011-04-11 17:48:14 -0700, Eric Larson said:

> pre-install-hooks: [
>   "apt-get install libxml2",  # the person deploying the package 
> assumes apt-get is available

Assumptions are evil.  You could end up with multiple third-party 
applications each assuming different things.  Aptitude, apt-get, brew, 
emerge, ports, …

>   "run-some-shell-script.sh", # the shell script might do the following 
> on a list of URLs

There is zero way of tracking what that does, so out of the gate that's 
a no-no, and full system chroots (not what I'm talking about in terms 
of chroot) require far too much organization/duplication/management.

The 'hooks' idea listed in my original document is for callbacks into 
the application.  That callback would be one of:

:: A Python script to execute.  (path notation)

:: A Python callable to execute.  (dot-colon notation)

:: A URL within the application to GET.  (url notation)

Arbitrary system-level commands are right out: Linux, UNIX, BSD, 
Windows, Solaris… good luck getting even simple commands to execute 
identically and predictably across platforms.  The goal isn't to 
rewrite buildout!

> Just b/c a command like apt-get is used it doesn't mean it is used as 
> root. The point is not that you can install things via the package, but 
> rather that you provide the system a way to install things as needed 
> that the system can control.

A methodology of testing for the presence and capability of specific 
services (resources) is far more useful than rewriting buildout.  "I 
need an SQL database of some kind."  "I need this C library within 
these version boundaries."  Etc.  Those are reasonable predicates for 
installation.  You can combine this application format with buildout, 
puppet, or brew-likes if you want to, though.

Personally, I'd rather not re-invent the wheel of a Linux distribution, 
thanks.  I wouldn't even want an application server to touch 
system-wide configurations other than web server configurations for the 
applications hosted therein.

> If you start telling the system what is supported then as a spec you 
> have to support too many actions:
> 
>   pre-install-hooks: [
>     ('install', ['libxml2', 'libxslt']),
>     ('download', 'foo-library.tar.gz'),
>     ('extract', 'foo-library.tar.gz'),
>     ...
>     # the idea being
>     ($action, $args)
>   ]

I define no actions, only a callback.

> This is a pain in the neck as a protocol.

Unfortunately for your argument this is a protocol you invented, not 
one that I defined.

> It is much simpler to have a list of "pre-install-hooks" and let the 
> hosting system that is installing the package deal with those. If your 
> system wants to run commands, you have the ability to do so. If you 
> want to list package names that you install, go for it. If you have a 
> tool that you want to use that the package can provide arguments, that 
> is fine too. From the standpoint of a spec / API / package format, you 
> don't really control the tool that acts on the package. 

Bing.  You finally understand what I defined.

> This is the same problem that setuptools has. There isn't a record of 
> what was installed.

That's a tool-level problem unrelated to application packaging.  For a 
good example of a Python application that /does/ manage packages, file 
tracking, etc. have a look at Gentoo's Portage system.

> It is safe to assume a deployed server has some software installed 
> (nginx, postgres, wget, vim, etc.) and those requirements should 
> usually be defined by some system administrator.

No application honestly cares what front-end web server it is running 
on unless it makes extensive use of very specific plugins (like Nginx's 
push notification service).  Again, most of this is outside the scope 
of an application container format.  Do your applications honestly need 
access to vim?

Also, assume nothing.

> When an application requires that you install some library, it is 
> helpful to that sysadmin because that person has some options when 
> something is meant to be deployed:
> 
>  1. If the library is incompatible and will break some other piece of 
> software, you can know and stop the deployment right there

That's what the "sandbox" is for.  I've been running Gentoo servers 
with 'slotting' mechanisms for > 10 years, now, and having multiple 
installed libraries that are incompatible with one-another is not 
unusual, unheard of, or difficult.  (Three versions of PHP, three of 
Python, etc.)

>  2. If the application is going to be moved to another server, the 
> sysadmin can go ahead and add that app's requirements to their own 
> config (puppet class for example)

Puppet, buildout, etc. is, again, outside the scope.  And if the 
application already defines requirements, what config file are you 
updating and duplicating the data needlessly within?

>  3. If two applications are running on the same machine, they may have 
> inconsistent library requirements

That's what the "sandbox" is for.

>  4. If an application does fail and you need to roll back to a previous 
> version, you can also roll back the system library that was installed 
> with the application

That's what the "sandbox" is for.

> Yes you can use different LD_PATHS for your sandboxed environment, but 
> that is going to be up to the system administrator. By simply listing 
> those dependencies you can let them keep their system according to 
> their requirements.

See my above note on detecting vs. installing.

> You never once said anything about virtual machines either. I feel that 
> it is a natural progression though when you define a package that has 
> an impact on the system requirements since if your application needs 
> some library to run and you are under the assumption you have a 
> "sandbox", then you might as well install things systemwide, which is a 
> perfectly valid model when you have a cloud infrastructure or 
> hypervisor.

You assume a natural progression where one does not exist.  System 
packaging and virtual machines aren't even remotely related to 
each-other; this is all needless rhetoric.

These applications do /not/ have an impact on the underlying system 
because they are, by definition, in isolated sandboxes.

>  It just shouldn't be the assumption of the package format.

A sandbox isn't an assumption, it's a requirement.  Very different beasts.

> Likewise, I sincerely hope that we can define a format that could make 
> deployment easy for everyone involved. I'm convinced the deployment 
> pain is really just a matter of incorrect assumptions between sysadmin 
> and developers. This kind of format seems like an excellent place to 
> put application assumptions and state requirements so the sysadmin side 
> can easily handle them in a way that works within their constraints.

+1, but executing arbitrary commands (root or otherwise) is /not/ the 
way to do it.  Executing package managers directly is /not/ the way to 
do it.  Having a clear collection of predicates (app±version, 
lib±version, pkg±version, etc.) is The Right™ way to do it.

If you want a specific version of Apache to go with your application, 
or a brand new MySQL installation, use buildout.  An application 
server's role is to mediate between these services and the installed 
application, and let the sysadmin do his job.

	— Alice.

As an aside, who here doesn't run their production software on a 
homogenous hosting environment?  Having unorganized servers of any kind 
will lead to Bad Stuff™ eventually.

Mine is Gentoo + Nginx + FastCGI PHP 4 & 5 + Python 2.6, 2.7, 3.1 + 
[MySQL + MongoDB, db servers only] + dcron + metalog + reiserfs + … all 
kept up-to-date and in sync across all servers… hell, I even have 
"application" configurations in Nginx which are generic and reusable, 
and shared between servers.