[Web-SIG] A Python Web Application Package and Format

Thu Apr 14 08:57:32 CEST 2011

On 2011-04-13 18:16:36 -0700, Ian Bicking said:

> While initially reluctant to use zip files, after further discussion 
> and thought they seem fine to me, so long as any tool that takes a zip 
> file can also take a directory.  The reverse might not be true -- for 
> instance, I'd like a way to install or update a library for (and 
> inside) an application, but I doubt I would make pip rewrite zip files 
> to do this ;)  But it could certainly work on directories.  Supporting 
> both isn't a big deal except that you can't do symlinks in a zip file.

I'm not talking about using zip files as per eggs, where the code is 
maintained within the zip file during execution.  It is merely a 
packaging format with the software itself extracted from the zip during 
installation / upgrade.  A transitory container format.  (Folders in 
the end.)

Symlinks are an OS-specific feature, so those are out as a core 
requirement.  ;)

> I don't think we're talking about something like a buildout recipe. 
>  Well, Eric kind of brought something like that up... but otherwise I 
> think the consensus is in that direction.

Ambiguous statements FTW, but I think I know what you meant.  ;)

> So specifically if you need something like lxml the application 
> specifies that somehow, but doesn't specify *how* that library is 
> acquired.  There is some disagreement on whether this is generally 
> true, or only true for libraries that are not portable.  

+1

I think something along the lines of autoconf (those lovely ./configure 
scripts you run when building GNU-style software from source) with 
published base 'checkers' (predicates as I referred to them previously) 
would be great.  A clear way for an application to declare a 
dependency, have the application server check those dependencies, then 
notify the administrator installing the package.

I've seen several Python libraries that include the C library code that 
they expose; while not so terribly efficient (i.e. you can't install 
the C library once, then share it amongst venvs), it is effective for 
small packages.

Larger (i.e. global or application-local) would require the 
intervention of a systems administrator.

> Something like a database takes this a bit further.  We haven't really 
> discussed it, but I think this is where it gets interesting.  Silver 
> Lining has one model for this.  The general rule in Silver Lining is 
> that you can't have anything with persistence without asking for it as 
> a service, including an area to write files (except temporary files?)

+1

Databases are slightly more difficult; an application could ask for:

:: (Very Generic) A PEP-249 database connection.

:: (Generic) A relational database connection string.

:: (Specific) A connection string to a specific vendor of database.

:: (Odd) A NoSQL database connection string.

I've been making heavy use of MongoDB over the last year and a half, 
but AFIK each NoSQL database engine does its own thing API-wise.  (Then 
there are ORMs on top of that, but passing a connection string like 
mysql://user:pass@host/db or mongo://host/db is pretty universal.)

It is my intention to write an application server that is capable of 
creating and securing databases on-the-fly.  This would require fairly 
high-level privileges in the database engine, but would result in far 
more "plug-and-play" configuration.  Obviously when deleting an 
application you will have the opportunity to delete the database and 
associated user.

> I assume everyone agrees that an application can't write to its own 
> files (but of course it could execfile something in another location).

+1; that _almost_ goes without saying.  :)  At the same time, an 
application server /must not/ require root access to do its work, thus 
no mandating of (real) chroots, on-the-fly user creation, etc.

There are ways around almost all security policies, but where possible 
setting the read-only flag (Windows) or removing write (chmod -w on 
POSIX systems) should be enough to prevent casual abuse.

> I suspect there's some disagreement about how the Python environment 
> gets setup, specifically sys.path and any other application-specific 
> customizations (e.g., I've set environ['DJANGO_SETTINGS_MODULE'] in 
> silvercustomize.py, and find it helpful).

Similar to Paste's "here" variable for INI files, having some method of 
the application defining environment variables with base path 
references would be needed.

I've tossed out my idea of sharing dependencies, BTW, so a simple 
extraction of the zipped application into one package folder (linked in 
using a .pth file) with the dependencies installed into an app-packages 
folder in the path (like site-packages) would be ideal.  At least, for 
me.  ;)

> Describing the scope of this, it seems kind of boring.  In, for 
> example, App Engine you do all your setup in your runner -- I find this 
> deeply annoying because it makes the runner the only entry point, and 
> thus makes testing, scripts, etc. hard.

I agree; that's a short-sighted approach to an application container 
format.  There should be some way to advertise a test suite and, for 
example, have the suite run before installation or during upgrade.  
(Rolling back the upgrade process thus far if there is a failure.)

My shiny end goal would be a form of continuous deployment: a git-based 
application which gets a post-commit notification, pulls the latest, 
runs the tests, rolls back on failure or fully deploys the update on 
success.

> We would start with just WSGI.  Other things could follow, but I don't 
> see any reason to worry about that now.  Maybe we should just punt on 
> aggregate applications now too.  I don't feel like there's anything we 
> would do that would prevent other kinds of runtime models (besides the 
> starting point, container-controlled WSGI), and the places to add 
> support for new things are obvious enough (e.g., something like Silver 
> Lining's platform setting).  I would define a server with accompanying 
> daemon processes as an "aggregate".

Since in my model the application server does not proxy requests to the 
instantiated applications (each running in its own process), I'm not 
sure I'm interpreting what you mean by an aggregate application 
properly.

If "my" application server managed Nginx or Apache configurations, 
dispatch to applications based on base path would be very easy to do 
while still keeping the applications isolated.

> An important distinction to make, I believe, is application concerns 
> and deployment concerns.  For instance, what you do with logging is a 
> deployment concern.  Generating logging messages is of course an 
> application concern.  In practice these are often conflated, especially 
> in the case of bespoke applications where the only person deploying the 
> application is the person (or team) developing the application.  It 
> shouldn't be annoying for these users, though.  Maybe it makes sense 
> for people to be able to include tool-specific default settings in an 
> application -- things that could be overridden, but especially for the 
> case when the application is not widely reused it could be useful.  (An 
> example where Silver Lining gets is all backwards is I created a 
> [production] section in app.ini when the very concept of "production" 
> is not meaningful in that context -- but these kind of named profiles 
> would make sense for actual application deployment tools.)

Having an application define default logging levels for different 
scopes would be very useful.  The application server could take those 
defaults, and allow an administrator to modify them or define 
additional scopes quite easily.

> There's actually a kind of layered way of thinking of this:
> 
> 1. The first, maybe most important part, is how you get a proper Python 
> environment.  That includes sys.path of course, with all the 
> accompanying libraries, but it also includes environment description.

Virtualenv-like, with the application itself linked in via a .pth file 
(a la setup.py develop, allowing inline upgrades via SCM) and 
dependencies extracted from the zip distributable into an app-packages 
folder a la site-packages.

I don't install global Python modules on any of my servers, so the 
--no-site-packages option is somewhat unnecessary for me, but having 
something similar would be useful, too.  Unfortunately, that one 
feature seems to require a lot of additional work.

> In Silver Lining there's two stages -- first, set some environmental 
> variables (both general ones like $SILVER_CANONICAL_HOST and 
> service-specific ones like $CONFIG_MYSQL_DBNAME), then get sys.path 
> proper, then import silvercustomize by which an environment can do any 
> more customization it wants (e.g., set $DJANGO_SETTINGS_MODULE)

Environment variables are typeless (raw strings) and thus less than 
optimum for sharing rich configurations.

Host names depend on how the application is mounted, and a single 
application may be mounted to multiple domains or paths, so utilizing 
the front end web server's rewriting capability is probably the best 
solution for that.

What about multiple database connections?  Environment variables are 
also not so good for repeated values.

A /few/ environment variables are a good idea, though:

:: TMPDIR — when don't you need temporary files?

:: APP_CONFIG_PATH — the path to a YAML file containing the real configuration.

The configuration file would even include a dict-based logging 
configuration routing all messages to the parent app server for final 
delivery, removing the need for per-app logging files, etc.

> 2. Define some basic generic metadata.  "app_name" being the most obvious one.

The standard Python setup metadata is pretty good:

:: Application title.
:: Application (package) name.
:: Short description.
:: Long description / documentation.
:: Author information.
:: License.
:: Source information (URL, download URL).
:: Dependencies.
:: Entry point-style hooks.  (Post-install, pre/post upgrade, 
pre-removal, etc.)

Likely others.

> 3. Define how to get the WSGI app.  This is WSGI specific, but (1) is 
> *not* WSGI specific (it's only Python specific, and would apply well to 
> other platforms)

I could imagine there would be multiple "application types":

:: WSGI application.  Define a package dot-notation entry point to a 
WSGI application factory.

:: Networked daemon.  This would allow deployment of Twisted services, 
for example.  Define a package dot-notation entry point to the 'main' 
callable.

Again, there are likely others, but those are the big two.  In both of 
these cases the configuration (loaded automatically) could be passed as 
a dict to the callable.

> 4. Define some *web specific* metadata, like static files to serve.  
> This isn't necessarily WSGI or even Python specific (not that we should 
> bend backwards to be agnostic -- but in practice I think we'd have to 
> bend backwards to make it Python-specific).

Explicitly defining the paths to static files is not just a good idea, 
it's The Slaw™.

> 5. Define some lifecycle metadata, like update_fetch.  These are 
> generally commands to invoke.  IMHO these can be ad hoc, but exist in 
> the scope of (1) and a full "environment".  So it's not radically 
> different than anything else the app does, it's just we declare 
> specific times these actions happen.

Script name, dot-notation callable, or URL.  I see those as the 'big 
three' to support.  Using a dot-notation callable has the same benefit 
as my comments to #3.

The URL would be relative to wherever the application is mounted within 
a domain, of course.

> 6. Define services (or "resources" or whatever -- the name "resource" 
> doesn't make as much sense to me, but that's bike shedding).  These are 
> things the app can't provide for itself, but requires (or perhaps only 
> wants; e.g., an app might be able to use SQLite, but could also use 
> PostgreSQL).  While the list of services will increase over time, 
> without a basic list most apps can't run at all.  We also need a core 
> set as a kind of reference implementation of what a fully-specified 
> service *is*.

I touched on this up above; any DBAPI compliant database or various 
configuration strings.  (I'd implement this as a string-like object 
with accessor properties so you can pass it to SQLAlchemy straight, or 
dissect it to do something custom.)

More below.

> 7. In Silver Lining I've distinguished active services (like a running 
> database) from passive resources (like an installed binary library).  I 
> don't see a reason to conflate these, as they are so very different.  
> Maybe this is part of why "resource" strikes me as an odd name for 
> something like a database.

You hit the terminology perfectly: active services (such as databases) 
are just that, services.  Installed binary libraries are resources.  :)

> So... there's kind of some thoughts about process. 

Good stuff.

	— Alice.