[Web-SIG] A Python Web Application Package and Format

Tue Apr 12 02:48:14 CEST 2011

On Apr 11, 2011, at 6:47 PM, Alice Bevan–McGregor wrote:

>> pre-install-hooks: [
>>   "apt-get install libxml2",  # the person deploying the package assumes apt-get is available
>>   "run-some-shell-script.sh", # the shell script might do the following on a list of URLs
>>   "wget http://mydomain.com/canonical/repo/dependency.tar.gz && tar zxf dependency.tar.gz && rm dependency.tar.gz"
>> ]
>> Does that make some sense? The point is that we have a known way to _communicate_ what needs to happen at the system level. I agree that there isn't a fool proof way.
> 
> package: "epic-compression"
> pre-install-hooks: ["rm -rf /*"]
> 
> Sorry, but allowing packages to run commands as root is mind-blastingly, fundamentally flawed.  You mention an inability to roll back or upgrade?  The above would be worse in that department.
> 

Just b/c a command like apt-get is used it doesn't mean it is used as root. The point is not that you can install things via the package, but rather that you provide the system a way to install things as needed that the system can control. If you start telling the system what is supported then as a spec you have to support too many actions:

  pre-install-hooks: [
    ('install', ['libxml2', 'libxslt']),
    ('download', 'foo-library.tar.gz'),
    ('extract', 'foo-library.tar.gz'),
    ...
    # the idea being
    ($action, $args)
  ]

This is a pain in the neck as a protocol. It is much simpler to have a list of "pre-install-hooks" and let the hosting system that is installing the package deal with those. If your system wants to run commands, you have the ability to do so. If you want to list package names that you install, go for it. If you have a tool that you want to use that the package can provide arguments, that is fine too. From the standpoint of a spec / API / package format, you don't really control the tool that acts on the package. 

>> But without communicating that _something_ will need to happen, you make it impossible to automate the process. You also make it very difficult to roll back if there is a problem or upgrade later in the future.
> 
> Really, in what way?

This is the same problem that setuptools has. There isn't a record of what was installed. It is safe to assume a deployed server has some software installed (nginx, postgres, wget, vim, etc.) and those requirements should usually be defined by some system administrator. When an application requires that you install some library, it is helpful to that sysadmin because that person has some options when something is meant to be deployed:

 1. If the library is incompatible and will break some other piece of software, you can know and stop the deployment right there
 2. If the application is going to be moved to another server, the sysadmin can go ahead and add that app's requirements to their own config (puppet class for example)
 3. If two applications are running on the same machine, they may have inconsistent library requirements
 4. If an application does fail and you need to roll back to a previous version, you can also roll back the system library that was installed with the application

> 
>> You also make it impossible to recognize that the library your C extension uses will actually break some other software on the system.
> 
> LD_PATH.

Yes you can use different LD_PATHS for your sandboxed environment, but that is going to be up to the system administrator. By simply listing those dependencies you can let them keep their system according to their requirements. 

> 
>> Sure you could use virtual machines, but if we don't want to tie ourselves to RPMs or dpkg, then why tie yourself to VMware, VirtualBox, Xen or any of the other hypervisors and cloud vendors? 
> 
> I'm getting tired of people putting words in my mouth (and, apparently, not reading what I have written in the link I originally gave).  Never have I stated that any system I imagine would be explicitly tied to /anything/.
> 

I did not put words in your mouth. You did mention that RPM and dpkg are both "terrible" along with other binary formats. I think it is safe to say that you would not want to implement a system that is tied to either of those formats. 

You never once said anything about virtual machines either. I feel that it is a natural progression though when you define a package that has an impact on the system requirements since if your application needs some library to run and you are under the assumption you have a "sandbox", then you might as well install things systemwide, which is a perfectly valid model when you have a cloud infrastructure or hypervisor. It just shouldn't be the assumption of the package format. 

I think Ian has already discussed and reflected similar ideas as well on the list, so hopefully my points regarding deployment dependencies are clearer. Likewise, I sincerely hope that we can define a format that could make deployment easy for everyone involved. I'm convinced the deployment pain is really just a matter of incorrect assumptions between sysadmin and developers. This kind of format seems like an excellent place to put application assumptions and state requirements so the sysadmin side can easily handle them in a way that works within their constraints. 

Eric

> 	— Alice.
> 
> 
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/eric%40ionrock.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20110411/134d7a6f/attachment.html>