[Web-SIG] Web Container Interface

Wed Jan 28 16:56:53 EST 2004

At 02:11 PM 1/28/04 -0600, Ian Bicking wrote:
>Portable Real Web Applications could also adapt their behavior depending 
>on how they were being run, for instance creating an abstraction that 
>cached data in module globals if available (and unlike sessions, that can 
>work in multi-process models), or wrote them to disk otherwise.

And those applications don't provide a way to configure that?

Honestly, my experience supporting applications that "adapt their behavior" 
without the user's input is rather unpleasant.  Honestly, I'm -1 on 
providing ways for developers to make their applications decide stuff based 
on what they *think* is going on, in the absence of narrowly and precisely 
defined options.

>In a long-running model, the framework also needs to know when the 
>environment is shutting down (so it can write data out to disk).  Maybe 
>atexit would be sufficient, I'm not sure (it's not what we use now).

And what do you do now when somebody does a "kill -9" on the process, or 
the machine reboots?  Python doesn't even guarantee that all objects in a 
process will be finalized during a *normal* exit, so how can any Python 
container guarantee a finalization notice?  I'd rather we didn't promise 
what isn't deliverable, or anything that starts blurring responsibilities 
between container and service.

>But why not provide that one little hook (a link to the gateway/container) 
>that would allow systems to develop greater portability?

If you define "portability" as "ability to run under new and 
never-before-seen containers", that hook woudl not only provide zero 
portability improvement, but it would also be an "attractive nuisance" 
encouraging people to write *non* portable code, that specifically looks 
for given containers.  Configuration and options should be explicit, not 
implicit.  "In the presences of ambiguity, refuse the temptation to guess".

If the app or framework can choose its behaviors, let it make those options 
explicit, as part of its configuration.

>>>And, to make it a little harder, we've often had requests to implement 
>>>memory-only sessions, to put unpickleable objects into the session. 
>>>Usually we just tell people to keep these values in module globals.
>>>But module globals are also unportable across environments.
>>But if your framework only supports a "long running, single-process" 
>>architecture, module globals would work just fine with any gateway that 
>>supports that.
>>Frankly, "multi-process only" and "short running" gateways are going to 
>>be in the minority anyway.  The only gateway I know of that's likely to 
>>*require* multiple processes is mod_python, and the only gateway that's 
>>likely to be "short running" is plain CGI.  So, it's not like requiring 
>>an "LR/SP" gateway is going to dramatically limit the choice of gateways 
>>for Webware.
>
>SkunkWeb also uses multiple processes, run in a separate space from Apache.

How does it communicate with Apache?  Does it *require* multiple processes, 
or *allow* them?

>>Obviously, the PEP needs to have examples of these process models added, 
>>and clarify the nature of the restrictions.  Who knows, maybe if we talk 
>>about this long enough maybe we'll be able to clarify the process models 
>>well enough to define a variable that services can expose to indicate 
>>their compatibility with various process models.
>
>I think that would be very useful.  Here's my list:
>
>Single process per request:
>* process ends with request (CGI, Webware OneShot)
>* process reused (mod_python, SkunkWeb)

I think you mean single request per process here.

>Multiple requests per process:
>* Asynchronous (implied to be single-threaded) (Twisted, Medusa, CherryPy, 
>BaseHTTPServer)
>* Threaded (Zope, Webware, CherryPy with different settings)

Actually, both Twisted and Zope's ZServer use a "async dispatcher in the 
main thread, requests can be processed in worker threads" model.

BaseHTTPServer isn't asynchronous, either, and with mixins can be threaded 
or forking.

Regarding most of the others you mention, I'm not knowledgeable enough to 
comment.

>Most asynch environments can be turned into threaded systems after 
>runCGI.  Webware, at least, is threaded at the point runCGI is called 
>(maybe to its detriment), but many systems are not (including Zope, I 
>think, and probably CherryPy).

I actually don't understand what you mean here.  But I'll try and tackle 
definitions for this in a subsequent PEP draft.

>I believe most other frameworks are built on mod_python, CGI, or FastCGI 
>so they are covered under these categories.  I think there might be a 
>separate threaded model for quixote, but I don't know if that portion has 
>its own name.  I'm not sure I understand FastCGI well enough to classify it.

A quick attempt to clarify what concepts we're dealing with...

* A "web server" is something that accepts HTTP connections

* A "gateway protocol" connects a "web server" to a "gateway"

* A "gateway protocol" may be in-process (e.g. if the server is written in 
Python or embeds Python) or use some kind of inter-process communication 
(pipes for CGI, sockets for FastCGI, etc.)

* If the gateway protocol is in-process, then the process model for the app 
is limited by the process model of the web server.

* If the gateway protocol is interprocess, then the process model for the 
app is determined by the process model of the gateway implementation.

* The basic process models for a server or gateway are:

   - preforking, serially reused processes  (e.g. mod_python, PEAK's 
multiprocess FastCGI runner, etc.)

   - "long running single process" (LRSP) (e.g. Twisted, ZServer, 
WSGIServer, any FastCGI runner under Apache if Apache is configured with 
maxClassProcesses=1, maybe AOLServer too?)

     + with threads

     + without threads

   - fork-on-demand, die-after-one-request (CGI)

Notice that the server's process model need not be the same as the 
gateway/container's process model, if the gateway protocol is 
interprocess.  Indeed, with Apache as the server, you can use any of the 
process models simply by selecting an appropriate gateway and gateway protocol.

Anyway, I think I've covered everything possible, except for maybe the idea 
of using multiple threads in multiple processes, which makes my head hurt.  :)

So, for short, I guess I'd call the process models "prefork", LRSP-single, 
LRSP-multi, and fork-and-die (FAD? SRMP?).  Those are just working terms 
for discussion, the PEP should of course use their full names/descriptions.

The most complicated one from a configuration point of view (IMO) is 
LRSP-multi.  I don't have much experience with developing in that 
environment, so it would be helpful if those who have could offer some 
thoughts.  The main options I'm aware of are:

* Gateway gets factory, instantiates service instance per worker thread, or 
on demand within configured parameters.  (Here, the gateway drives how many 
service instances there are.)

* Gateway gets a single service, that it calls from many threads.  Service 
handles everything on that side of the fence.  So here, the app side 
controls its threading.

I think that Twisted and ZServer may currently lean slightly towards the 
first model, but the second model seems more "portable" to me, in terms of 
being doable for multiple frameworks.  In theory, one could perhaps even 
run an "LRSP-single" app in an "LRSP-multi" gateway simply by having one's 
'runCGI()' acquire and release a global lock at entry and exit.  It also 
simplifies things from a container-configuration point of view, as there is 
only one service object to keep track of.