[Web-SIG] The rewritten WSGI pre-PEP
Ian Bicking
ianb at colorstudy.com
Wed Aug 11 08:42:22 CEST 2004
It looks great to me. Of course, I got all my wishes. A couple smaller
things, and some possible clarifications:
Phillip J. Eby wrote:
> Specification Overview
> ======================
>
> The WSGI interface has two sides: the "server" or "gateway" side,
> and the "application" side. The server side invokes a callable
> object that is provided by the application side. The specifics
> of how that object is provided are up to the server or gateway.
> It is assumed that some servers or gateways will require an
> application's deployer to write a short script to create an
> instance of the server or gateway, and supply it with the
> application object. Other servers and gateways may use
> configuration files or other mechanisms to specify where the
> application object should be imported from.
>
> The application object is simply a callable object that accepts
> two arguments. The term "object" should not be misconstrued as
> requiring an actual object instance: a function, method, class,
> or instance with a ``__call__`` method are all acceptable for
> use as an application object. Here are two example application
> objects; one is a function, and the other is a class::
>
> def simple_app(environ, start_response):
> """Simplest possible application object"""
> status = '200 OK'
> headers = [('Content-type','text/plain')]
> write = start_response(status, headers)
> write('Hello world!\n')
The callables are a little confusing to me. The application is a
callable. Start_response is a callable. It returns a callable. Of
course, if it wasn't a callable, it would be an object with only one
method, which is kind of boring.
A contrary example to this would be iterators, which have basically one
method in their interface (next); yet they are not simply callables.
I'm not of strong opinion, but the callables definitely make it harder
to understand.
> ``environ`` Variables
> ---------------------
>
> The ``environ`` dictionary is required to contain CGI environment
> variables, as defined by the Common Gateway Interface specification
> [2]_. In addition, it must contain the following WSGI-defined
> variables:
>
> ==================== =============================================
> Variable Value
> ==================== =============================================
> ``wsgi.version`` The string ``"1.0"``
Would it make sense for this to be a tuple, like (1, 0), like
sys.version_info?
> ``wsgi.input`` An input stream from which the HTTP request
> body can be read.
>
> ``wsgi.errors`` An output stream to which error output can
> be written. For most servers, this will be
> the server's error log.
>
> ``wsgi.multithread`` This value should be true if the application
> object may be simultaneously invoked by
> another thread in the same process, and
> false otherwise.
>
> ``wsgi.multiprocess`` This value should be true if an equivalent
> application object may be simultaneously
> invoked by another process, and false
> otherwise.
> ==================== =============================================
Another useful one I brought up last time would be some indication that
the application was definitely not going to be reused, i.e., it's being
invoked in a CGI context. The performance issues there are completely
different than in other environments.
Webware has a CGI interface, but it suffers from being really slow. It
could be faster, but everything is optimized toward the long-running
case. I think CGI could be made to perform better, putting in
information to know when to do those optimizations would leave that door
open.
Another common use case would be sessions. It's best to preserve
sessions over server restarts, but you might keep sessions in memory and
only write to disk when the server shuts down. If it's a CGI request,
you can skip all that and just write to disk immediately.
> .. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft
> (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)
I think before we discussed being explicit about a couple variables.
Specifically that SCRIPT_NAME should refer to the application's root,
and PATH_INFO to everything that comes after. This is in contrast to a
situation where SCRIPT_NAME points to the WSGI server, and PATH_INFO to
the application (in a case where the server hosts multiple applications
at different URLs). Your CGI example avoids this issue because it only
supports one application, but a naive extension of that example to
support more applications might improperly set these variables.
Should there be any policy about path segments containing //, ./, or ../?
Hmm... what should the server do if it gets a Location header with no
Status? I think Apache does an internal redirect, sometimes. Should
there be any notion of an internal redirect? The CGI spec seems to
require internal redirects in this case.
The CGI spec says servers should change the current working directory to
the resource being run. I think this won't be that common for WSGI
servers, though.
I wonder if this will be an issue with imports. Specifically, relative
imports. Eh, I guess that's an application issue.
Will GATEWAY_INTERFACE be defined? If so, what value? "WSGI/1.0"? I
assume SERVER_SOFTWARE will be up to the WSGI server. Should they be
sure to rewrite this value if these servers are nested? E.g., should
your CGI example rewrite that value? It seems like each piece adds
another name to the end in the format "name/version_number", where the
name has no spaces. And it might optionally have more information in
parenthesis after the version, which may contain spaces. Maybe this
should be a suggestion.
Is there any non-parsed header form? This would be difficult to support
in some environments. Easy in BasicHTTPServer, but hard with a CGI server.
This is from the CGI spec:
Scripts MUST be prepared to handled URL-encoded values in
metavariables. In addition, they MUST recognise both "+" and
"%20" in URL-encoded quantities as representing the space
character. (See section 3.1.)
That seems weird; I've never URL-decoded values besides QUERY_STRING.
The CGI spec doesn't seem to mention REQUEST_URI. That's surprising.
Here's the Apache CGI variables it doesn't mention:
SERVER_SIGNATURE (pretty boring)
SERVER_ADDR (seems very basic)
DOCUMENT_ROOT (doesn't seem appropriate)
SCRIPT_FILENAME (also often not appropriate)
SERVER_ADMIN (boring)
SCRIPT_URI
REQUEST_URI (I don't understand the distinction)
REMOTE_PORT (boring, though I guess if you wanted to add an ident check
it would be useful)
UNIQUE_ID (not needed)
I think SERVER_ADDR and REMOTE_PORT are easy to add, and potentially
useful. SCRIPT_URI and REQUEST_URI might be good.
For middleware application/servers, it might be suggested that they use
mod_rewrites extra variables
(http://httpd.apache.org/docs/mod/mod_rewrite.html#EnvVar):
This module keeps track of two additional (non-standard) CGI/SSI
environment variables named SCRIPT_URL and SCRIPT_URI. These contain
the logical Web-view to the current resource, while the standard CGI/SSI
variables SCRIPT_NAME and SCRIPT_FILENAME contain the physical System-view.
Notice: These variables hold the URI/URL as they were initially
requested, i.e., before any rewriting. This is important because the
rewriting process is primarily used to rewrite logical URLs to physical
pathnames.
Example:
SCRIPT_NAME=/sw/lib/w3s/tree/global/u/rse/.www/index.html
SCRIPT_FILENAME=/u/rse/.www/index.html
SCRIPT_URL=/u/rse/
SCRIPT_URI=http://en1.engelschall.com/u/rse/
More information about the Web-SIG
mailing list