[Web-SIG] A Python Web Application Package and Format

Ian Bicking ianb at colorstudy.com
Sat Apr 16 06:20:05 CEST 2011


On Fri, Apr 15, 2011 at 2:05 PM, Alice Bevan–McGregor
<alice at gothcandy.com>wrote:

> I want to keep this distinct from anything long-running, which is a much
>> more complex deal.
>>
>
> The primary application is only potentially long-running.  (You could, in
> theory, deploy an app as CGI, but that way lies madness.)  However, the
> reference syntax mentioned (excepting URL) works well for identifying this.


Right -- just one long running things (but no promises how long).


>
>  I think given the three options, and for general simplicity, the script
>> can be successful or have an error (for Python code: exception or no; for
>> __main__: zero exit code or no; for a URL: 2xx code or no), and can return
>> some text (which may only be informational, not structured?)
>>
>
> For the simple cases (script / callable), it's pretty easy to trap STDOUT
> and STDERR, deliver INFO log messages to STDOUT, everything else to STDERR,
> then display that to the administrator in some form.  Same for HTTP, except
> that it can include full HTML formatting information.


For Silver Lining I set "Accept: text/plain", to at least suggest that plain
text was preferred, since typically HTML isn't easily displayed.  But of
course a tool could change that, probably usefully?  But that only applies
to HTTP.  Anyway, seems easy enough.


>
>  An application configuration could refer to scripts under different names,
>> to be invoked at different stages.
>>
>
> A la the already mentioned post-install, pre-upgrade, post-upgrade,
> pre-removal, and cron-like.  Any others?


test-environment, test-alive, test-functional are all possible
test-alive could be used by, e.g., Nagios to monitor (it might actually have
structured output?)



>
>  There could be an optional self-test script, where the application could
>> do a last self-check -- import whatever it wanted, check db settings, etc.
>> Of course we'd want to know what it needed *before* the self-check to try to
>> provide it, but double-checking is of course good too.
>>
>
> Unit and functional tests are the most obvious.  In which case we'll need
> to be able to provide a localhost-only 'mounted' location for the
> application even though it hasn't been installed yet.


For local function HTTP tests you might want that, but if you are doing
non-HTTP functional tests (e.g., just WGSI) or unit tests then the
environment should always be sufficient without actually serving anything
up.  You'd probably want a "test" set of local services (as opposed to a
"development" set of services).  I think this will all be another kind of
tooling around development.



>
>  One advantage to a separate script instead of just one script-on-install
>> is that you can more easily indicate *why* the installation failed.  For
>> instance, script-on-install might fail because it can't create the database
>> tables it needs, which is a different kind of error than a library not being
>> installed, or being fundamentally incompatible with the container it is in.
>> In some sense maybe that's because we aren't proposing a rich error system
>> -- but realistically a lot of these errors will be TypeError, ImportError,
>> etc., and trying to normalize those errors to some richer meaning is
>> unlikely to be done effectively (especially since error cases are hard to
>> test, since they are the things you weren't expecting).
>>
>
> Humans are potentially better at reading tracebacks than machines are, so
> my previous logging idea (script output stored and displayed to the
> administrator in a readable form) combined with a modicum of reasonable
> exception handling within the script should lead to fairly clear errors.
>

Deployers aren't very good at reading developer tracebacks, so it is kind of
nice if you at least have a sense of the stage.  One advantage to multiple
testing stages is that you might roll back before, e.g., having to deal with
database migrations.  But easy enough to skip for now.


> I'd like to see maybe an | operator, and a distinction between required and
>> optional services.  E.g.:
>>
>
>
> No need for some new operator, YAML already supports lists.
>
> services:
>        - [mysql, postgresql, dburl]
>
> Or:
>
> services:
>        required:
>                - files
>
>        optional:
>                - [mysql, postgresql]
>
>
>  And then there's a lot more you could do... which one do you prefer, for
>> instance.
>>
>
> The order of services within one of these lists would indicate preference,
> thus MySQL is preferred over PostgreSQL in the second example, above.


Sure


>
>  Tricky things:
>> - You need something funny like multiple databases.  This is very
>> service-specific anyway, and there might sometimes need to be a way to
>> configure the service.  It's also a fairly obscure need.
>>
>
> I'm not convinced that connecting to a legacy database /and/ current
> database is that obscure.  It's also not as hard as Django makes it look
> (with a 1M SLoC change to add support)… WebCore added support in three
> lines.


Well, then you are getting into specific configurations fitting into legacy
environments, not containers and encapsulated applications.  There's nothing
that actually *stops* you from trying to connect to any database you want,
so ad hoc configuration can handle these requirements.  If you have a legacy
database you also aren't looking for the container to allocate you a
database.



>
>  - You need multiple applications to share data.  This is hard, not sure
>> how to handle it.  Maybe punt for now.
>>
>
> That's what higher-level APIs are for. ;)
>
>
>  You mean, the application provides its own HTTP server?  I certainly
>> wouldn't expect that...?
>>
>
> Nor would I; running an HTTP server would be daft.  Running mod_wsgi,
> FastCGI on-disk sockets, or other persistent connector makes far more sense,
> and is what I plan.
>
> Unless you have a very, very specific need (i.e. Tornado), running a Python
> HTTP server in production then HTTP proxying to it is inefficient and a
> terrible idea.  (Easy deployment model, terrible overhead/performance.)
>
>
>  Anyway, in terms of aggregate, I mean something like a "site" that is made
>> up of many "applications", and maybe those applications are interdependent
>> in some fashion.  That adds lots of complications, and though there's lots
>> of use cases for that I think it's easier to think in terms apps as simpler
>> building blocks for now.
>>
>
> That's not complicated at all; I do those types of aggregate sites fairly
> regularly.  E.g.
>
> / - CMS
> /location - Location & image database.
> /resource - Business database.
> /admin - Flex administration interface.
>
> That's done at the Nginx/Apache level, where it's most efficient to do so,
> not in Python.


Yes, and you could use your deployment tool to manage that.  E.g., with
Silver Lining you might do:

silver update http://mysite.com/ apps/cms
silver update http://mysite.com/location apps/location-image
silver update http://mysite.com/resource apps/business
silver update http://mysite.com/admin apps/flex-admin

And that's fine, but if you wanted to have them automatically know about
each other and perhaps interact then you need something more.



>
>  Sure; these would be tool options, and if you set everything up you are
>> requiring the deployer to invoke the tools correctly to get everything in
>> place.  Which is a fine starting point before formalizing anything.
>>
>
> What?  Not even close—the person deploying an application is relying on the
> application server/service to configure the web server of choice; there is
> no need for deployer action after the initial "Nginx, include all .conf
> files from folder X" where folder X is managed by the app server.  (That's
> one line in /etc/nginx/nginx.conf.)


I'm not thinking about conf files.  I'm thinking about something a login app
that you mount at /login that sets a signed cookie, and your main app needs
to know the proper place to redirect to, and both apps need a shared secret
for the signed cookie.  The two of them together, with a formalized
connection, is what I'm thinking of as an "aggregate" app.



>
>  Hm... I guess this is an ordering question.  You could import logging and
>> setup defaults, but that doesn't give the container a chance to overwrite
>> those defaults.  You could have the container setup logging, then make sure
>> the app sets defaults only when the container hasn't -- but I'm not sure if
>> it's easy to use the logging module that way.
>>
>
> The logging configuration, in dict form, is passed from the app server to
> the container.  The default logging levels are read by the app server from
> the container.  It's trivially easy, esp. when INI and YAML files can be
> programatically created.


How does the app sever pass it to the container?  Just point to the dict or
INI/YAML config in the app config?


>
>  Well, maybe that's not hard -- if you have something like
>> silvercustomize.py that is always imported, and imported fairly early on,
>> then have the container overwrite logging settings before it *does* anything
>> (e.g., sends a request) then you should be okay?
>>
>
> Indeed; container-setup.py or whatever.


That would be a different model than I think you propose above -- the app
automatically sets up defaults always, and the container has a chance to
override those.


>
>  Rich configurations are problematic in their own ways.  While the
>> str-key/str-value of os.environ is somewhat limited, I wouldn't want
>> anything richer than JSON (list, dict, str, numbers, bools).
>>
>
> JSON is a subset of YAML.  I honestly believe YAML meets the requirements
> for richness, simplicity, flexibility, and portability that a configuration
> format really needs.
>
>
>  And then we have to figure out a place to drop the configuration.  Because
>> we are configuring the *process*, not a particular application or request
>> handler, a callable isn't great (unless we expect the callable to drop the
>> config somewhere and other things to pick it up?)
>>
>
> I've already mentioned an environment variable identifying the path to the
> on-disk configuration file—APP_CONFIG_PATH—which would then be read in and
> acted upon by the container-setup.py file which is initially imported before
> the rest of the application.  Also, the application factory idea of passing
> the already read-in configuration dictionary is quite handy, here.


I'm still unhappy with the indirection, and with serializing configuration
to a YAML file.  I think the container is always going to have to run its
own Python code to setup the environment, at which point having that Python
code write a YAML file and then set an environmental variable to say where
it wrote it and then in the same process have the app load that YAML file...
I dunno.



>
>  I found at least giving one valid hostname (and yes, should include a
>> path) was important for many applications.  E.g., a bunch of apps have
>> tendencies to put hostnames in the database.
>>
>
> Luckily, that's a bad habit we can discourage.  ;)


I would disagree with this on principle -- this format should support
applications as they are written now.  And in my experience most Django apps
need a hostname on process startup.

Anyway, it's not that big a deal -- with WSGI you only get a hostname on the
first request, but with a container I can't think of a situation where it
wouldn't know at least one reasonable and valid hostname at process startup
time.



>
>  I'm not psyched about pointing to a file, though I guess it could work --
>> it's another kind of peculiar
>> drop-the-config-somewhere-and-wait-for-someone-to-pick-it-up.  At least
>> dropping it directly in os.environ is easy to use directly (many things
>> allow os.environ interpolation already) and doesn't require any temporary
>> files.  Maybe there's a middle ground.
>>
>
> Picked up by the container-setup.py site-customize script.  What's the
> limit on the size of a variable in the environ?  (Also, that memory gets
> permanently allocated for the life of the application; not very efficient if
> we're just going to convert it to a rich internal structure.)


Realistically I always ended up setting os.environ['FOO'] = value, and then
something else did os.environ['FOO'].  An in-memory structure would be
pretty much equivalent.


>
>  :: Application (package) name.
>>
>> This doesn't seem meaningful to me -- there's no need for a one-to-one
>> mapping between these applications and a particular package.  Unless you
>> mean some attempt at a unique name that can be used for indexing?
>>
>
> You're mixing something up, here.  Each application is a single primary
> package with dependencies.  One container per application.


Well, here we're entering into the dependency disagreement, and maybe
something more.  To me an application is a way to setup the Python
environment, and a pointer to the WSGI application object.  Packages don't
enter into it at all.


>
>  It would also need a way to specify things like what port to run on
>>
>
> Automatically allocated by the app server.
>

Yes, that's what I was thinking.


>  public or private interface
>>
>
> Chosen by the deployer during deployment time configuration.


Generally it seems like a daemon might either desire or need an internal or
private interface.  Celery needs a private interface, Tornado apps probably
a public interface.


>
>  maybe indicate if something like what proxying is valid (if any)
>>
>
> If it's WSGI, it's irrelevant.  If it's a network service, it shouldn't be
> HTTP.
>

Again, Tornado or Twisted, which are typically used for things you don't
want to proxy (though e.g., Nginx proxying might be okay when Apache
proxying isn't).


>  maybe process management parameters
>>
>
> For WSGI apps, it's transparent.  Each app server would have its own
> preference (e.g. mine will prefer FastCGI on-disk sockets) and the
> application will be blissfully unaware of that.


Yes, but for a daemon not.  A WSGI app should be able to require threaded
(single-process) or multiprocess (no threads), though most would work with
either.


>
>  ways to inspect the process itself (since *maybe* you can't send internal
>> HTTP requests into it), etc.
>>
>
> Interesting idea, not sure how that would be implemented or used, though.
>

Monitoring.


>  PHP! ;)
>>
>
> PHP can be deployed as a WSGI application.  :P
>
>
>  I'm not personally that happy with how App Engine does it, as an example
>> -- it requires a regex-based dispatch.
>>
>
> Regex dispatch is terrible.  (I've actually encountered Python's 56KiB
> regular expression size limit on one project!)  Simply exporting folders as
> "top level" webroots would be sufficient, methinks.


Having / be a static file is also nice (good for Javascript/RPC-backend apps
too), but doesn't work well with webroots.

Silver Lining's writable-root service is fairly closely integrated with
static files.  The... name is weird now that I look at it.  But anyway, it's
a space where the app can write static files, and those static files get
preference.  I found it to be a nice feature to have available, but pretty
closely coupled with everything else.


>
>
>> Anything "string-like" or otherwise fancy requires more support libraries
>> for the application to actually be able to make use of the environment.
>> Maybe necessary, but it should be done with great reluctance IMHO.
>>
>
> I've had great success with string-likes in WebCore/Marrow and TurboMail
> for things like e-mail address lists, e-mail addresses, and URLs.


It's not that I don't think they could be useful or convenient *now*, but
how they develop over time -- this container format should aspire to be
stable and conservative fairly quickly, which means keeping things simple
and relying on applications to use support libraries if they want something
more convenient (not unlike WSGI).  Applications can then use and upgrade
these support libraries at their own convenience.

Services are in particular an issue, as each container will have to
reimplement much of the service code for its own.  Though maybe with good
abstract base classes it wouldn't be too hard.

  Ian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20110415/91dbe236/attachment-0001.html>


More information about the Web-SIG mailing list