[Web-SIG] A Python Web Application Package and Format

Tue Apr 12 17:45:27 CEST 2011

On Apr 11, 2011, at 10:03 PM, Alice Bevan–McGregor wrote:

> Eric,
> 
> Let me rephrase a few things.
> 
> On 2011-04-11 17:48:14 -0700, Eric Larson said:
> 
>> pre-install-hooks: [
>>   "apt-get install libxml2",  # the person deploying the package assumes apt-get is available
> 
> Assumptions are evil.  You could end up with multiple third-party applications each assuming different things.  Aptitude, apt-get, brew, emerge, ports, …

No, _undefined_ assumptions are evil. When I say you are assuming some program is available, that is a decision made b/w a developer and person providing the system. My point is not to advocate a package format running commands. My point is to make sure two things happen:

  1. Allow the package a way to define system level dependencies
  2. Allow the package to have _internally_ agreed upon mechanisms for doing operations, meaning that the package format doesn't prescribe every action, it simply provides hooks.

You want to use a python script, callable or URL. That is too restrictive in my opinion. It should be a list of strings in terms of the format and the tool supporting the package should deal with them. 

> 
>>   "run-some-shell-script.sh", # the shell script might do the following on a list of URLs
> 
> There is zero way of tracking what that does, so out of the gate that's a no-no, and full system chroots (not what I'm talking about in terms of chroot) require far too much organization/duplication/management.
> 
> The 'hooks' idea listed in my original document is for callbacks into the application.  That callback would be one of:
> 
> :: A Python script to execute.  (path notation)
> 
> :: A Python callable to execute.  (dot-colon notation)
> 
> :: A URL within the application to GET.  (url notation)
> 
> Arbitrary system-level commands are right out: Linux, UNIX, BSD, Windows, Solaris… good luck getting even simple commands to execute identically and predictably across platforms.  The goal isn't to rewrite buildout!
> 

See above.

>> Just b/c a command like apt-get is used it doesn't mean it is used as root. The point is not that you can install things via the package, but rather that you provide the system a way to install things as needed that the system can control.
> 
> A methodology of testing for the presence and capability of specific services (resources) is far more useful than rewriting buildout.  "I need an SQL database of some kind."  "I need this C library within these version boundaries."  Etc.  Those are reasonable predicates for installation.  You can combine this application format with buildout, puppet, or brew-likes if you want to, though.
> 
> Personally, I'd rather not re-invent the wheel of a Linux distribution, thanks.  I wouldn't even want an application server to touch system-wide configurations other than web server configurations for the applications hosted therein.
> 
>> If you start telling the system what is supported then as a spec you have to support too many actions:
>>   pre-install-hooks: [
>>     ('install', ['libxml2', 'libxslt']),
>>     ('download', 'foo-library.tar.gz'),
>>     ('extract', 'foo-library.tar.gz'),
>>     ...
>>     # the idea being
>>     ($action, $args)
>>   ]
> 
> I define no actions, only a callback.
> 
>> This is a pain in the neck as a protocol.
> 
> Unfortunately for your argument this is a protocol you invented, not one that I defined.
> 
>> It is much simpler to have a list of "pre-install-hooks" and let the hosting system that is installing the package deal with those. If your system wants to run commands, you have the ability to do so. If you want to list package names that you install, go for it. If you have a tool that you want to use that the package can provide arguments, that is fine too. From the standpoint of a spec / API / package format, you don't really control the tool that acts on the package. 
> 
> Bing.  You finally understand what I defined.
> 
>> This is the same problem that setuptools has. There isn't a record of what was installed.
> 
> That's a tool-level problem unrelated to application packaging.  For a good example of a Python application that /does/ manage packages, file tracking, etc. have a look at Gentoo's Portage system.
> 
>> It is safe to assume a deployed server has some software installed (nginx, postgres, wget, vim, etc.) and those requirements should usually be defined by some system administrator.
> 
> No application honestly cares what front-end web server it is running on unless it makes extensive use of very specific plugins (like Nginx's push notification service).  Again, most of this is outside the scope of an application container format.  Do your applications honestly need access to vim?
> 
> Also, assume nothing.

You missed my point. I wasn't talking about the application needing vim. I was talking about by using a system such as linux, you are building on a set of assumptions. Unix as a system provides some tools that it defines, which allows you assume they will be present. Call it an expectation or standard if you want, but the idea is the same. An assumption is when the guest on the system thought something would be there. Nothing wrong with that when it is actually there. The package should simply make clear these assumptions so the parent system can do what it needs to do.

But below you make some assumptions that I don't think are a good idea.

> 
>> When an application requires that you install some library, it is helpful to that sysadmin because that person has some options when something is meant to be deployed:
>>  1. If the library is incompatible and will break some other piece of software, you can know and stop the deployment right there
> 
> That's what the "sandbox" is for.  I've been running Gentoo servers with 'slotting' mechanisms for > 10 years, now, and having multiple installed libraries that are incompatible with one-another is not unusual, unheard of, or difficult.  (Three versions of PHP, three of Python, etc.)
> 
>>  2. If the application is going to be moved to another server, the sysadmin can go ahead and add that app's requirements to their own config (puppet class for example)
> 
> Puppet, buildout, etc. is, again, outside the scope.  And if the application already defines requirements, what config file are you updating and duplicating the data needlessly within?
> 
>>  3. If two applications are running on the same machine, they may have inconsistent library requirements
> 
> That's what the "sandbox" is for.
> 

Ok, I think this is an incorrect assumption. 

>>  4. If an application does fail and you need to roll back to a previous version, you can also roll back the system library that was installed with the application
> 
> That's what the "sandbox" is for.
> 

Again, I think you shouldn't put everything in a "sandbox" without defining what you mean. I'm sure you have a good idea for what the sandbox should be outside of using virtualenv. My point is that by only stating what you need, you can let the system using the package deal with how it defines "sandboxes". I think that is a much better assumption when defining this package format, that you do not get to prescribe how the packages will run, but you will need to provide system level expectations.

Just to be clear, I'm not saying that a sandbox is a bad idea. Quite the contrary. My point is that defining the package system under the assumption there is a "sandbox", you really would also need to define what that "sandbox" really means. Is it like in a shared hosting environment where your sandbox is just a user directory, Is it an entire virtual machine dedicated to the app, or is it somewhere in between? The answers to all these questions change what you can do as an application developer, especially in terms of deployment. 

>> Yes you can use different LD_PATHS for your sandboxed environment, but that is going to be up to the system administrator. By simply listing those dependencies you can let them keep their system according to their requirements.
> 
> See my above note on detecting vs. installing.
> 
>> You never once said anything about virtual machines either. I feel that it is a natural progression though when you define a package that has an impact on the system requirements since if your application needs some library to run and you are under the assumption you have a "sandbox", then you might as well install things systemwide, which is a perfectly valid model when you have a cloud infrastructure or hypervisor.
> 
> You assume a natural progression where one does not exist.  System packaging and virtual machines aren't even remotely related to each-other; this is all needless rhetoric.
> 
> These applications do /not/ have an impact on the underlying system because they are, by definition, in isolated sandboxes.
> 
>> It just shouldn't be the assumption of the package format.
> 
> A sandbox isn't an assumption, it's a requirement.  Very different beasts.

It is a terrible requirement that doesn't have to exist. I understand utilizing virtualenv b/c it is pretty easy and fits reasonably well. But a requirement that the host machine use a sandbox is wrong. It is not wrong in the sense that having a sandboxed place to run the application, but rather it is wrong in that the package doesn't need to tell the sysadmin side of things how to set up servers. 

This has been my point all along. Don't make the package format prescribe a specific deployment technique, but instead list what you need to make it run. The system running the application can meet those needs then however it can according to how its system is configured. 

Eric

> 
>> Likewise, I sincerely hope that we can define a format that could make deployment easy for everyone involved. I'm convinced the deployment pain is really just a matter of incorrect assumptions between sysadmin and developers. This kind of format seems like an excellent place to put application assumptions and state requirements so the sysadmin side can easily handle them in a way that works within their constraints.
> 
> +1, but executing arbitrary commands (root or otherwise) is /not/ the way to do it.  Executing package managers directly is /not/ the way to do it.  Having a clear collection of predicates (app±version, lib±version, pkg±version, etc.) is The Right™ way to do it.
> 
> If you want a specific version of Apache to go with your application, or a brand new MySQL installation, use buildout.  An application server's role is to mediate between these services and the installed application, and let the sysadmin do his job.
> 
> 	— Alice.
> 
> As an aside, who here doesn't run their production software on a homogenous hosting environment?  Having unorganized servers of any kind will lead to Bad Stuff™ eventually.
> 
> Mine is Gentoo + Nginx + FastCGI PHP 4 & 5 + Python 2.6, 2.7, 3.1 + [MySQL + MongoDB, db servers only] + dcron + metalog + reiserfs + … all kept up-to-date and in sync across all servers… hell, I even have "application" configurations in Nginx which are generic and reusable, and shared between servers.
> 
> 
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/eric%40ionrock.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20110412/ee5aacd9/attachment-0001.html>