[Web-SIG] The rewritten WSGI pre-PEP

Wed Aug 11 08:42:22 CEST 2004

It looks great to me.  Of course, I got all my wishes.  A couple smaller 
things, and some possible clarifications:

Phillip J. Eby wrote:
> Specification Overview
> ======================
> 
> The WSGI interface has two sides: the "server" or "gateway" side,
> and the "application" side.  The server side invokes a callable
> object that is provided by the application side.  The specifics
> of how that object is provided are up to the server or gateway.
> It is assumed that some servers or gateways will require an
> application's deployer to write a short script to create an
> instance of the server or gateway, and supply it with the
> application object.  Other servers and gateways may use
> configuration files or other mechanisms to specify where the
> application object should be imported from.
> 
> The application object is simply a callable object that accepts
> two arguments.  The term "object" should not be misconstrued as
> requiring an actual object instance: a function, method, class,
> or instance with a ``__call__`` method are all acceptable for
> use as an application object.  Here are two example application
> objects; one is a function, and the other is a class::
> 
>     def simple_app(environ, start_response):
>         """Simplest possible application object"""
>         status = '200 OK'
>         headers = [('Content-type','text/plain')]
>         write = start_response(status, headers)
>         write('Hello world!\n')

The callables are a little confusing to me.  The application is a 
callable.  Start_response is a callable.  It returns a callable.  Of 
course, if it wasn't a callable, it would be an object with only one 
method, which is kind of boring.

A contrary example to this would be iterators, which have basically one 
method in their interface (next); yet they are not simply callables.

I'm not of strong opinion, but the callables definitely make it harder 
to understand.

> ``environ`` Variables
> ---------------------
> 
> The ``environ`` dictionary is required to contain CGI environment
> variables, as defined by the Common Gateway Interface specification
> [2]_.  In addition, it must contain the following WSGI-defined
> variables:
> 
> ====================   =============================================
> Variable               Value
> ====================   =============================================
> ``wsgi.version``       The string ``"1.0"``

Would it make sense for this to be a tuple, like (1, 0), like 
sys.version_info?

> ``wsgi.input``         An input stream from which the HTTP request
>                        body can be read.
> 
> ``wsgi.errors``        An output stream to which error output can
>                        be written.  For most servers, this will be
>                        the server's error log.
> 
> ``wsgi.multithread``   This value should be true if the application
>                        object may be simultaneously invoked by
>                        another thread in the same process, and
>                        false otherwise.
> 
> ``wsgi.multiprocess``  This value should be true if an equivalent
>                        application object may be simultaneously
>                        invoked by another process, and false
>                        otherwise.
> ====================   =============================================

Another useful one I brought up last time would be some indication that 
the application was definitely not going to be reused, i.e., it's being 
invoked in a CGI context.  The performance issues there are completely 
different than in other environments.

Webware has a CGI interface, but it suffers from being really slow.  It 
could be faster, but everything is optimized toward the long-running 
case.  I think CGI could be made to perform better, putting in 
information to know when to do those optimizations would leave that door 
open.

Another common use case would be sessions.  It's best to preserve 
sessions over server restarts, but you might keep sessions in memory and 
only write to disk when the server shuts down.  If it's a CGI request, 
you can skip all that and just write to disk immediately.

> .. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft
>    (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)

I think before we discussed being explicit about a couple variables. 
Specifically that SCRIPT_NAME should refer to the application's root, 
and PATH_INFO to everything that comes after.  This is in contrast to a 
situation where SCRIPT_NAME points to the WSGI server, and PATH_INFO to 
the application (in a case where the server hosts multiple applications 
at different URLs).  Your CGI example avoids this issue because it only 
supports one application, but a naive extension of that example to 
support more applications might improperly set these variables.

Should there be any policy about path segments containing //, ./, or ../?

Hmm... what should the server do if it gets a Location header with no 
Status?  I think Apache does an internal redirect, sometimes.  Should 
there be any notion of an internal redirect?  The CGI spec seems to 
require internal redirects in this case.

The CGI spec says servers should change the current working directory to 
the resource being run.  I think this won't be that common for WSGI 
servers, though.

I wonder if this will be an issue with imports.  Specifically, relative 
imports.  Eh, I guess that's an application issue.

Will GATEWAY_INTERFACE be defined?  If so, what value?  "WSGI/1.0"?  I 
assume SERVER_SOFTWARE will be up to the WSGI server.  Should they be 
sure to rewrite this value if these servers are nested?  E.g., should 
your CGI example rewrite that value?  It seems like each piece adds 
another name to the end in the format "name/version_number", where the 
name has no spaces.  And it might optionally have more information in 
parenthesis after the version, which may contain spaces.  Maybe this 
should be a suggestion.

Is there any non-parsed header form?  This would be difficult to support 
in some environments.  Easy in BasicHTTPServer, but hard with a CGI server.

This is from the CGI spec:

    Scripts MUST be prepared to handled URL-encoded values in
    metavariables. In addition, they MUST recognise both "+" and
    "%20" in URL-encoded quantities as representing the space
    character. (See section 3.1.)

That seems weird; I've never URL-decoded values besides QUERY_STRING.

The CGI spec doesn't seem to mention REQUEST_URI.  That's surprising. 
Here's the Apache CGI variables it doesn't mention:

SERVER_SIGNATURE (pretty boring)
SERVER_ADDR (seems very basic)
DOCUMENT_ROOT (doesn't seem appropriate)
SCRIPT_FILENAME (also often not appropriate)
SERVER_ADMIN (boring)
SCRIPT_URI
REQUEST_URI (I don't understand the distinction)
REMOTE_PORT (boring, though I guess if you wanted to add an ident check 
it would be useful)
UNIQUE_ID (not needed)

I think SERVER_ADDR and REMOTE_PORT are easy to add, and potentially 
useful.  SCRIPT_URI and REQUEST_URI might be good.

For middleware application/servers, it might be suggested that they use 
mod_rewrites extra variables 
(http://httpd.apache.org/docs/mod/mod_rewrite.html#EnvVar):

This module keeps track of two additional (non-standard) CGI/SSI 
environment variables named SCRIPT_URL  and SCRIPT_URI. These contain 
the logical Web-view to the current resource, while the standard CGI/SSI 
variables SCRIPT_NAME and SCRIPT_FILENAME contain the physical  System-view.

Notice: These variables hold the URI/URL as they were initially 
requested, i.e., before any rewriting. This is important because the 
rewriting process is primarily used to rewrite logical URLs to physical 
pathnames.

Example:

SCRIPT_NAME=/sw/lib/w3s/tree/global/u/rse/.www/index.html
SCRIPT_FILENAME=/u/rse/.www/index.html
SCRIPT_URL=/u/rse/
SCRIPT_URI=http://en1.engelschall.com/u/rse/