[Web-SIG] Latest WSGI Draft

Ian Bicking ianb at colorstudy.com
Sun Aug 22 06:29:03 CEST 2004


Phillip J. Eby wrote:
> Once again, please pardon me if I missed an update, and gently remind me 
> with a clue by four if need be.  :)  Or better yet, by supplying a patch 
> implementing your suggested changes.  :)
> 
> I was going to post a diff, but even a unified diff is about as long as 
> the previous version was, and the new draft is almost 50% longer than 
> the old one, as lots of new material has been added about streaming, URL 
> determination, required CGI variables, etc. etc.  There's even some 
> extra material in the Rationale and Goals about using WSGI middleware to 
> better modularize
> frameworks, allowing more mix-and-match between them.
> 
> I think this is just about ready to submit as an official PEP, get a 
> numbering, and post to c.l.py and Python-Dev, but of course I could be 
> wrong.  Your feedback is appreciated.

I think it's ready as well.  I have only a couple small comments, which 
are mostly about language.  There's going to be more discussion later 
anyway, so why not get started with the second round.

> PEP: XXX

For some reason this got caught as spam.  I blame it on these triple Xs.

> Title: Python Web Server Gateway Interface v1.0
> Version: $Revision: 1.1 $
> Last-Modified: $Date: 2004/08/20 19:11:27 $
> Author: Phillip J. Eby <pje at telecommunity.com>
> Discussions-To: Python Web-SIG <web-sig at python.org>
> Status: Draft
> Type: Informational
> Content-Type: text/x-rst
> Created: 07-Dec-2003
> Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004
> 
> 
> Abstract
> ========
> 
> This document specifies a proposed standard interface between web
> servers and Python web applications or frameworks, to promote
> web application portability across a variety of web servers.
> 
> 
> Rationale and Goals
> ===================
> 
> Python currently boasts a wide variety of web application
> frameworks, such as Zope, Quixote, Webware, SkunkWeb, PSO,
> and Twisted Web -- to name just a few [1]_.  This wide variety
> of choices can be a problem for new Python users, because
> generally speaking, their choice of web framework will limit
> their choice of usable web servers, and vice versa.
> 
> By contrast, although Java has just as many web application
> frameworks available, Java's "servlet" API makes it possible
> for applications written with any Java web application framework
> to run in any web server that supports the servlet API.
> 
> The availability and widespread use of such an API in web
> servers for Python -- whether those servers are written in
> Python (e.g. Medusa), embed Python (e.g. mod_python), or
> invoke Python via a gateway protocol (e.g. CGI, FastCGI,
> etc.) -- would separate choice of framework from choice
> of web server, freeing users to choose a pairing that suits
> them, while freeing framework and server developers to focus
> on their area of specialty.
> 
> This PEP, therefore, proposes a simple and universal interface
> between web servers and web applications or frameworks: the
> Python Web Server Gateway Interface (WSGI).
> 
> But the mere existence of a WSGI spec does nothing to address the
> existing state of servers and frameworks for Python web applications.
> Server and framework authors and maintainers must actually implement
> WSGI for there to be any effect.
> 
> However, since no existing servers or frameworks support WSGI, there
> is little immediate reward for an author who implements WSGI support.
> Thus, WSGI *must* be easy to implement, so that an author's initial
> investment in the interface can be reasonably low.
> 
> Thus, simplicity of implementation on *both* the server and framework
> sides of the interface is absolutely critical to the utility of the
> WSGI interface, and is therefore the principal criterion for any
> design decisions.
> 
> Note, however, that simplicity of implementation for a framework
> author is not the same thing as ease of use for a web application
> author.  WSGI presents an absolutely "no frills" interface to the
> framework author, because bells and whistles like response objects
> and cookie handling would just get in the way of existing frameworks'
> handling of these issues.  Again, the goal of WSGI is to facilitate
> easy interconnection of existing servers and applications or
> frameworks, not to create a new web framework.
> 
> Note also that this goal precludes WSGI from requiring anything that
> is not already available in deployed versions of Python.  Therefore,
> new standard library modules are not proposed or required by this
> specification, and nothing in WSGI requires a Python version greater
> than 1.5.2.  (It would be a good idea, however, for future versions
> of Python to include support for this interface in web servers
> provided by the standard library.)

Like you said, maybe 1.5.2 is optimistic.  The spec works for 1.5.2, but 
most servers and applications will have higher requirements, and the 
iteration is annoying to handle in those versions.

> In addition to ease of implementation for existing and future
> frameworks and servers, it should also be easy to create request
> preprocessors, response postprocessors, and other WSGI-based
> "middleware" components that look like an application to their
> containing server, while acting as a server for their contained
> applications.
> 
> If middleware can be both simple and robust, and WSGI is widely
> available in servers and frameworks, it allows for the possibility
> of an entirely new kind of Python web application framework: one
> consisting of loosely-coupled WSGI middleware components.  Indeed,
> existing framework authors may even choose to refactor their
> frameworks' existing services to be provided in this way, becoming
> more like libraries used with WSGI, and less like monolithic
> frameworks.  This would then allow application developers to choose
> "best-of-breed" components for specific functionality, rather than
> having to commit to all the pros and cons of a single framework.
> 
> Of course, as of this writing, that day is doubtless quite far off.
> In the meantime, it is a sufficient short-term goal for WSGI to
> enable the use of any framework with any server.

That's a awfully pessimistic paragraph ;)

> Finally, it should be mentioned that the current version of WSGI
> does not prescribe any particular mechanism for "deploying" an
> application for use with a web server or server gateway.  At the
> present time, this is necessarily implementation-defined by the
> server or gateway.  After a sufficient number of servers and
> frameworks have implemented WSGI to provide field experience with
> varying deployment requirements, it may make sense to create
> another PEP, describing a deployment standard for WSGI servers and
> application frameworks.
> 
> 
> 
> Specification Overview
> ======================
> 
> The WSGI interface has two sides: the "server" or "gateway" side,
> and the "application" side.  The server side invokes a callable
> object that is provided by the application side.  The specifics
> of how that object is provided are up to the server or gateway.
> It is assumed that some servers or gateways will require an
> application's deployer to write a short script to create an
> instance of the server or gateway, and supply it with the
> application object.  Other servers and gateways may use
> configuration files or other mechanisms to specify where the
> application object should be imported from.

Maybe "gateway" is just distracting.

> The application object is simply a callable object that accepts
> two arguments.  The term "object" should not be misconstrued as
> requiring an actual object instance: a function, method, class,
> or instance with a ``__call__`` method are all acceptable for
> use as an application object.  Here are two example application
> objects; one is a function, and the other is a class::
> 
>     def simple_app(environ, start_response):
>         """Simplest possible application object"""
>         status = '200 OK'
>         headers = [('Content-type','text/plain')]
>         write = start_response(status, headers)
>         write('Hello world!\n')
> 
> 
>     class AppClass:
>         """Much the same thing, but as a class"""
> 
>         def __init__(self, environ, start_response):
>             self.environ = environ
>             self.start = start_response
> 
>         def __iter__(self):
>             status = '200 OK'
>             headers = [('Content-type','text/plain')]
>             self.start(status, headers)
> 
>             yield "Hello world!\n"
>             for i in range(1,11):
>                 yield "Extra line %s\n" % i

This second example confuses me.  Though as I reread it I realize more 
clearly what it's doing; __init__ is the callable (in essence), but self 
is automatically returned.  I think an instance with a __call__ method 
would be easier to understand.  OTOH, there's more concurrency overhead. 
  I dunno.  Anyway, that one confused me.

> The server or gateway invokes the application once for each request
> it receives from a web browser.  To illustrate, here is a simple
> CGI gateway, implemented as a function taking an application object
> (all error handling omitted)::
> 
>     import os, sys
> 
>     def run_with_cgi(application):
> 
>         environ = {}
>         environ.update(os.environ)
>         environ['wsgi.input']        = sys.stdin
>         environ['wsgi.errors']       = sys.stderr
>         environ['wsgi.version']      = '1.0'
>         environ['wsgi.multithread']  = False
>         environ['wsgi.multiprocess'] = True
>         environ['wsgi.last_call']    = True
> 
>         def start_response(status,headers):
>             print "Status:", status
>             for key,val in headers:
>                 print "%s: %s" % (key,val)
>             return sys.stdout.write
> 
>         result = application(environ, start_response)
>         if result:
>             try:
>                 for data in result:
>                     sys.stdout.write(data)
>             finally:
>                 if hasattr(result,'close'):
>                     result.close()
> 
> In the next section, we will specify the precise semantics that
> these illustrations are examples of.
> 
> 
> Specification Details
> =====================
> 
> The application object must accept two positional arguments.  For
> the sake of illustration, we have named them ``environ``, and
> ``start_response``, but they are not required to have these names.
> A server or gateway *must* invoke the application object using
> positional (not keyword) arguments.
> 
> The first parameter is a dictionary object, containing CGI-style
> environment variables.  

I think the spec is easier to understand if you use names here, i.e., 
"environ is a dictionary object".  Or remind the reader of the 
invocation, i.e., note application(environ, start_response) is called.

> This object *must* be a builtin Python
> dictionary (*not* a subclass, ``UserDict`` or other dictionary
> emulation), and the application is allowed to modify the dictionary
> in any way it desires.  The dictionary must also include certain
> WSGI-required variables (described in a later section), and may
> also include server-specific extension variables, named according
> to a convention that will be described below.
> 
> The second parameter is a callable accepting two positional
> arguments: a status string of the form ``"999 Message here"``,
> and a list of ``(header_name,header_value)`` tuples describing the
> HTTP response header.  This callable must return another callable
> that takes one parameter: a string to write as part of the HTTP
> response body.

"This callable must return a writing function: a function that takes a 
single string as an argument, which is written as the HTTP response body."

I guess "function" is more specific than "callable", but it seems easier 
to understand.  Though honestly, I find the CGI example the easiest way 
to understand this, so maybe being more accurate here is fine.

> The application object may return either ``None`` (indicating that
> there is no additional output), or it may return a non-empty
> iterable yielding strings.  (For example, it could be a
> generator-iterator that yields strings, or it could be a
> sequence such as a list of strings.)  The server or gateway will
> treat the strings yielded by the iterable as if they had been
> passed to the ``write()`` method.
> 
> Also, if the application returns an iterable, and the iterable has a
> ``close()`` method, the server or gateway *must* call that method
> upon completion of the current request, whether the request was
> completed normally, or terminated early due to an error.  (This is to
> support resource release by the application.  This protocol is
> intended to support PEP 325, and also the simple case of an
> application returning an open text file.)
> 
> 
> ``environ`` Variables
> ---------------------
> 
> The ``environ`` dictionary is required to contain these CGI environment
> variables, as defined by the Common Gateway Interface specification
> [2]_.  The following variables *must* be present, but *may* be an empty
> string, if there is no more appropriate value for them:
> 
>  * ``REQUEST_METHOD``
> 
>  * ``SCRIPT_NAME`` (The initial portion of the request URL's "path" that
>    corresponds to the application object, so that the application knows
>    its virtual "location".)
> 
>  * ``PATH_INFO`` (The remainder of the request URL's "path", designating
>     the virtual "location" of the request's target within the application)
> 
>  * ``QUERY_STRING``
> 
>  * ``CONTENT_TYPE``
> 
>  * ``CONTENT_LENGTH``
> 
>  * ``SERVER_NAME`` and ``SERVER_PORT`` (which, when combined with
>    ``SCRIPT_NAME`` and ``PATH_INFO``, should complete the

You forgot to finish your sentence.  Also SERVER_NAME is a fallback if 
HTTP_HOST isn't present; generally SERVER_NAME indicates the canonical 
host name, not necessarily the actual host name.

>  * Variables corresponding to the client-supplied HTTP headers (i.e.,
>    variables whose names begin with ``"HTTP_"``).
> 
> In general, a server or gateway should attempt to provide as many
> other CGI variables as are applicable, including e.g. the nonstandard
> SSL variables such as ``HTTPS=on``, if an SSL connection is in effect.
> However, an application that uses any variables other than the ones
> listed above are necessarily non-portable to web servers that do not
> support the relevant extensions.
 >
> A WSGI-compliant server or gateway *should* document what variables
> it provides, along with their definitions as appropriate.  Applications
> *should* check for the presence of any nonstandard variables they
> require, and have a fallback plan in the event such a variable is
> absent.
> 
> Note: missing variables (such as ``REMOTE_USER`` when no
> authentication has occurred) should be left out of the ``environ``
> dictionary.  Also note that CGI-defined variables must be strings,
> if they are present at all.  It is a violation of this specification
> for a CGI variable's value to be of any type other than ``str``.
> 
> In addition to the CGI-defined variables, the ``environ`` dictionary
> must also contain the following WSGI-defined variables:
> 
> =====================  ==============================================
> Variable               Value
> =====================  ==============================================
> ``wsgi.version``       The tuple ``(1,0)``, representing WSGI
>                        version 1.0.
> 
> ``wsgi.input``         An input stream from which the HTTP request
>                        body can be read.
> 
> ``wsgi.errors``        An output stream to which error output can
>                        be written.  For most servers, this will be
>                        the server's error log.
> 
> ``wsgi.multithread``   This value should be true if the application
>                        object may be simultaneously invoked by
>                        another thread in the same process, and
>                        false otherwise.
> 
> ``wsgi.multiprocess``  This value should be true if an equivalent
>                        application object may be simultaneously
>                        invoked by another process, and false
>                        otherwise.
> 
> ``wsgi.last_call``     This value should be true if this is expected
>                        to be the last invocation of the application
>                        in this process.  This is provided to allow
>                        applications to optimize their setup for
>                        long-running vs. short-running scenarios.
>                        This flag should normally only be true for
>                        CGI applications, or while a server is doing
>                        some kind of "graceful shutdown".  Note that
>                        a server or gateway is still allowed to invoke
>                        the application again; this flag is only
>                        a "suggestion" to the application that it is
>                        unlikely to be reinvoked.

wsgi.last_call seems to complicated from this.  Really, it's for CGI and 
nothing else.  Maybe just wsgi.cgi?  wsgi.run_once?  I think the 
semantics shouldn't be any more general than that.  Then we can also 
guarantee that it won't be called again.

> =====================  ==============================================
> 
> Finally, the ``environ`` dictionary may also contain server-defined
> variables.  These variables should be named using only lower-case
> letters, numbers, dots, and underscores, and should be prefixed with
> a name that is unique to the defining server or gateway.  For
> example, ``mod_python`` might define variables with names like
> ``mod_python.some_variable``.  This naming convention allows
> "middleware" components to safely filter out extensions that they
> do not understand.  (E.g. by deleting all keys from ``environ`` that
> are all-lowercase and do not begin with ``"wsgi."``.)
> 
> 
> Input and Error Streams
> ~~~~~~~~~~~~~~~~~~~~~~~
> 
> The input and error streams provided by the server must support
> the following methods:
> 
> ===================  ==========  ========
> Method               Files       Notes
> ===================  ==========  ========
> ``read(size)``       ``input``
> ``readline()``       ``input``   1
> ``readlines(hint)``  ``input``   2
> ``__iter__()``       ``input``
> ``flush()``          ``errors``  3
> ``write(str)``       ``errors``
> ``writelines(seq)``  ``errors``
> ===================  ==========  ========
> 
> The semantics of each method are as documented in the Python Library
> Reference, except for these notes as listed in the table above:
> 
> 1. The optional "size" argument to ``readline()`` is not supported,
>    as it may be complex for server authors to implement, and is not
>    often used in practice.
> 
> 2. Note that the ``hint`` argument to ``readlines()`` is optional for
>    both caller and implementer.  The application is free not to
>    supply it, and the server or gateway is free to ignore it.
> 
> 3. Since the ``errors`` stream may not be rewound, a container is
>    free to forward write operations immediately, without buffering.
>    In this case, the ``flush()`` method may be a no-op.  Portable
>    applications, however, cannot assume that output is unbuffered
>    or that ``flush()`` is a no-op.  They must call ``flush()`` if
>    they need to ensure that output has in fact been written.  (For
>    example, to minimize intermingling of data from multiple processes
>    writing to the same error log.
> 
> The methods listed in the table above *must* be supported by all
> servers conforming to this specification.  Applications conforming
> to this specification *must not* use any other methods or attributes
> of the ``input`` or ``errors`` objects.  In particular, applications
> *must not* attempt to close these streams, even if they possess
> ``close()`` methods.
> 
> 
> The ``start_response()`` Callable
> ---------------------------------
> 
> The second parameter passed to the application object is itself a
> two-argument callable, used to begin the HTTP response and return
> a ``write()`` callable.  

"The second parameters passed to the application object (start_response) 
is a callable, used like ``start_response(status, headers)``.

The status argument is a string like "404 Not Found" or "200 OK".  This 
string must be pure 7-bit ASCII, containing no control characters, and 
not terminated with a return or linefeed.

The headers argument is a sequence of ``(header_name, header_value)`` 
tuples.  Each ``header_name`` must be a valid... (and continuing on with 
your text).

Though I'm not clear what "folding" means.  I'm guessing you mean:

Header: blah
     continuing Header content

Does the HTTP spec care about folding?  Seems like a distraction to 
mention it.

> The first parameter the ``start_response()``
> callable takes is a "status" string, of the form ``"999 Message here"``,
> where ``999`` is replaced with the HTTP status code, and ``Message here``
> is replaced with the appropriate message text.  The string *must* be
> pure 7-bit ASCII, containing no control characters.  In particular,
> it must not be terminated with a carriage return or linefeed.
> 
> The second parameter accepted by the ``start_response()`` callable
> must be a sequence of ``(header_name,header_value)`` tuples.  Each
> ``header_name`` must be a valid HTTP header name, without a
> trailing colon or other punctuation.  Each ``header_value``
> *must not* include carriage returns or linefeeds: it should be a raw
> *unfolded* header value.  If the HTTP spec calls for folding of a
> particular header, the server shall be responsible for performing the
> folding.  (These requirements are to minimize the complexity of parsing
> required by servers, gateways, and intermediate response processors
> that need to inspect or modify response headers.)
> 
> In general, the server or gateway is responsible for ensuring that
> correct headers are sent to the client: if the application omits
> a needed header, the server or gateway *should* add it.  For example,
> the HTTP ``Date:`` and ``Server:`` headers would normally be supplied
> by the server or gateway.  If the application supplies a header that
> the server would ordinarily supply, or that contradicts the server's
> intended behavior (e.g. supplying a different ``Connection:`` header),
> the server or gateway *may* discard the conflicting header, provided
> that its action is recorded for the benefit of the application author.
> 
> 
> The ``write()`` Callable
> ------------------------
> 
> The return value of the ``start_response()`` callable is a one-argument
> `write()`` callable, that accepts strings to write as part of the
> HTTP response body.
> 
> Note that the purpose of the ``write()`` callable is primarily to
> support existing application frameworks that support a streaming "push"
> API.  Therefore, strings passed to ``write()`` *must* be sent to the
> client *as soon as possible*; they must *not* be buffered unless the
> buffer will be emptied in parallel with the application's continuing
> execution (e.g. by a separate I/O thread).  If the server or gateway
> does not have a separate I/O thread available, it *must* finish
> writing the supplied string before it returns from each ``write()``
> invocation.
> 
> If the application returns an iterable, each string produced by the
> iterable must be treated as though it had been passed to ``write()``,
> with the data sent in an "as soon as possible" manner.  That is,
> the iterable should not be asked for a new string until the previous
> string has been sent to the client, or is buffered for such sending
> by a parallel thread.
> 
> Notice that these rules discourage the generation of content before a
> client is ready for it, in excess of the buffer sizes provided by the
> server and operating system.  For this reason, some applications may
> wish to buffer data internally before passing any of it to ``write()``
> or yielding it from an iterator, in order to avoid waiting for the
> client to catch up with their output.  This approach may yield better
> throughput for dynamically generated pages of moderate size, since the
> application is then freed for other tasks.
> 
> In addition to improved performance, buffering all of an application's
> output has an advantage for error handling: the buffered output can
> be thrown away and replaced by an error page, rather than dumping an
> error message in the middle of some partially-completed output.  For
> this and other reasons, many existing Python frameworks already
> accumulate their output for a single write, unless the application
> explicitly requests streaming, or the expected output is larger than
> practical for buffering (e.g. multi-megabyte PDFs).  So, these
> application frameworks are already a natural fit for the WSGI
> streaming model: for most requests they will only call ``write()``
> once anyway!
> 
> 
> Implementation/Application Notes
> ================================
> 
> 
> Unicode
> -------
> 
> HTTP does not directly support Unicode, and neither does this
> interface.  All encoding/decoding must be handled by the application;
> all strings and streams passed to or from the server must be standard
> Python byte strings, not Unicode objects.  The result of using a
> Unicode object where a string object is required, is undefined.
> 
> 
> Multiple Invocations
> --------------------
> 
> Application objects must be able to be invoked more than once, since
> virtually all servers/gateways will make such requests.
> 
> 
> Error Handling
> --------------
> 
> Servers *should* trap and log exceptions raised by
> applications, and *may* continue to execute, or attempt to shut down
> gracefully.  Applications *should* avoid allowing exceptions to
> escape their execution scope, since the result of uncaught exceptions
> is server-defined.
> 
> 
> Thread Support
> --------------
> 
> Thread support, or lack thereof, is also server-dependent.
> Servers that can run multiple requests in parallel, *should* also
> provide the option of running an application in a single-threaded
> fashion, so that applications or frameworks that are not thread-safe
> may still be used with that server.
> 
> 
> URL Reconstruction
> ------------------
> 
> If an application wishes to reconstruct a request's complete URL,
> it may do so using the following algorithm, contributed by Ian
> Bicking::
> 
>     if environ.get('HTTPS') == 'on':
>         url = 'https://'
>     else:
>         url = 'http://'
> 
>     if environ.get('HTTP_HOST'):
>         url += environ['HTTP_HOST']
>     else:
>         url += environ['SERVER_NAME']
> 
>     if environ.get('HTTPS') == 'on':
>         if environ['SERVER_PORT'] != '443'
>            url += ':' + environ['SERVER_PORT']
>     else:
>         if environ['SERVER_PORT'] != '80':
>            url += ':' + environ['SERVER_PORT']
> 
>     url += environ['SCRIPT_NAME']
>     url += environ['PATH_INFO']
>     if environ.get('QUERY_STRING'):
>         url += '?' + environ['QUERY_STRING']
> 
> Note that such a reconstructed URL may not be precisely the
> same URI as requested by the client.  Server rewrite rules, for
> example, may have modified the client's originally requested URL
> to place it in a canonical form.
> 
> 
> Application Configuration
> -------------------------
> 
> This specification does not define how a server selects or
> obtains an application to invoke.  These and other configuration
> options are highly server-specific matters.  It is expected that
> server/gateway authors will document how to configure the server to
> execute a particular application object, and with what options (such
> as threading options).
> 
> Framework authors, on the other hand, should document how to create
> an application object that wraps their framework's functionality.
> The user, who has chosen both the server and the application
> framework, must connect the two together.  However, since both the
> framework and the server now have a common interface, this should
> be merely a mechanical matter, rather than a significant engineering
> effort for each new server/framework pair.
> 
> 
> Middleware
> ----------
> 
> Note that a single object may play the role of a server with respect
> to some application(s), while also acting as an application with
> respect to some server(s).  Such "middleware" components can perform
> such functions as:
> 
>   * Routing a request to different application objects based on the
>     target URL, after rewriting the ``environ`` accordingly.
> 
>   * Allowing multiple applications or frameworks to run side-by-side
>     in the same process
> 
>   * Load balancing and remote processing, by forwarding requests and
>     responses over a network
> 
>   * Perform content postprocessing, such as applying XSL stylesheets
> 
> Given the existence of applications and servers conforming to this
> specification, the appearance of such reusable middleware becomes
> a possibility.
> 
> Middleware components that transform the request or response data
> should in general remove WSGI extension data from the ``environ``
> that the middleware does not understand, to prevent applications
> from inadvertently bypassing the middleware's mediation of the
> interaction by use of a server extension.  The simplest way to do
> this is to just delete keys from ``environ`` that are all lowercase
> and do not begin with ``"wsgi."``, before passing the ``environ``
> on to the application.

I don't understand this.  To me it seems more reasonable that middleware 
leave the extra arguments in place.

For instance, lets say I have a URL redirecting middleware.  There's a 
chance I need to look at the parsed form of QUERY_STRING, and I cache 
the result as a dictionary in, say, webkit.query_vars.  That's just as 
valid later.  Oh, well, unless someone rewrites QUERY_STRING.  So to be 
safe, I put the query string I parsed in webkit.query_string.

But maybe I have some other middleware that handles configuration.  It 
runs after the URL parser, for localized configuration.  It doesn't 
necessarily know about the query string, or about the other piece of 
middleware.  And it shouldn't know about it, because what would be the 
point of that?  They are decoupled.  But I don't want it throwing away 
that information.

In that case, it's just some lost time reparsing the URL, but I can 
imagine more important things, and a lot of pieces of middleware where 
the only point is that they add something to the environ dictionary. 
E.g., a session-handling middleware.  There's not point to these if 
other middleware is going to throw information away.

If there's reliability issues -- like middleware rewriting QUERY_STRING, 
but passing through a cached parse of the old QUERY_STRING that it 
didn't know about -- these can be handled pretty easily.  But if one 
middleware throws away keys it doesn't know about, it messes up the 
whole stack.

> HTTP 1.1 Expect/Continue
> ------------------------
> 
> Servers and gateways *must* provide transparent support for HTTP 1.1's
> "expect/continue" mechanism, if they implement HTTP 1.1.  This may be
> done in any of several ways:
> 
>  1. Reject all client requests containing an ``Expect: 100-continue``
>     header with a "417 Expectation failed" error.  Such requests will
>     not be forwarded to an application object.
> 
>  2. Respond to requests containing an ``Expect: 100-continue`` request
>     with an immediate "100 Continue" response, and proceed normally.
> 
>  3. Proceed with the request normally, but provide the application with
>     a ``wsgi.input`` stream that will send the "100 Continue" response
>     if/when the application first attempts to read from the input
>     stream.  The read request must then remain blocked until the client
>     responds.
> 
> Note that this behavior restriction does not apply for HTTP 1.0 requests,
> or for requests that are not directed to an application object.  For more
> information on HTTP 1.1 Expect/Continue, see RFC 2616, sections 8.2.3
> and 10.1.1.
> 
> 
> 
> 
> Questions and Answers
> =====================
> 
> 1. Why must ``environ`` be a dictionary?  What's wrong with using
>    a subclass?
> 
>    The rationale for requiring a dictionary is to maximize
>    portability between servers.  The alternative would be to define
>    some subset of a dictionary's methods as being the standard and
>    portable interface.  In practice, however, most servers will
>    probably find a dictionary adequate to their needs, and thus
>    framework authors will come to expect the full set of dictionary
>    features to be available, since they will be there more often
>    than not.  But, if some server chooses *not* to use a dictionary,
>    then there will be interoperability problems despite that
>    server's "conformance" to spec.  Therefore, making a dictionary
>    mandatory simplifies the specification and guarantees
>    interoperabilty.
> 
>    Note that this does not prevent server or framework developers
>    from offering specialized services as custom variables *inside*
>    the ``environ`` dictionary.  This is the recommended approach
>    for offering any such value-added services.
> 
> 2. Why can you call ``write()`` *and* yield strings/return an
>    iterator?  Shouldn't we pick just one way?
> 
>    If we supported only the iteration approach, then current
>    frameworks that assume the availability of "push" suffer.
>    But, if we only support pushing via ``write()``, then
>    server performance suffers for transmission of e.g. large
>    files (if a worker thread can't begin work on a new request
>    until all of the output has been sent).  Thus, this compromise
>    allows an application framework to support both approaches, as
>    appropriate, but with only a little more burden to the server
>    implementor than a push-only approach would require.
> 
> 3. What's the ``close()`` for?
> 
>    When writes are done from during the execution of an application
>    object, the application can ensure that resources are released
>    using a try/finally block.  But, if the application returns an
>    iterator, any resources used will not be released until the
>    iterator is garbage collected.  The ``close()`` idiom allows
>    an application to release critical resources at the end of a
>    request, and it's forward-compatible with the support for
>    try/finally in generators that's proposed by PEP 325.
> 
> 4. Why is this interface so low-level?  I want feature X!  (e.g.
>    cookies, sessions, persistence, ...)
> 
>    This isn't Yet Another Python Web Framework.  It's just a way
>    for frameworks to talk to web servers, and vice versa.  If you
>    want these features, you need to pick a web framework that
>    provides the features you want.  And if that framework lets
>    you create a WSGI application, you should be able to run it
>    in most WSGI-supporting servers.  Also, some WSGI servers may
>    offer additional services via objects provided in their
>    ``environ`` dictionary; see the applicable server documentation
>    for details.  (Of course, applications that use such extensions
>    will not be portable to other WSGI-based servers.)
> 
> 5. Why use CGI variables instead of good old HTTP headers?  And
>    why mix them in with WSGI-defined variables?
> 
>    Many existing web frameworks are built heavily upon the CGI spec,
>    and existing web servers know how to generate CGI variables.  In
>    contrast, alternative ways of representing inbound HTTP information
>    are fragmented and lack market share.  Thus, using the CGI
>    "standard" seems like a good way to leverage existing
>    implementations.  As for mixing them with WSGI variables, separating
>    them would just require two dictionary arguments to be passed
>    around, while providing no real benefits.
> 
> 6. What about the status string?  Can't we just use the number,
>    passing in ``200`` instead of ``"200 OK"``?
> 
>    Doing this would complicate the server or gateway, by requiring
>    them to have a table of numeric statuses and corresponding
>    messages.  By contrast, it is easy for an application or framework
>    author to type the extra text to go with the specific response code
>    they are using, and existing frameworks often already have a table
>    containing the needed messages.  So, on balance it seems better to
>    make the application/framework responsible, rather than the server
>    or gateway.
> 
> 
> Acknowledgements
> ================
> 
> Thanks go to the many folks on the Web-SIG mailing list whose
> thoughtful feedback made this revised draft possible.  Especially:
> 
>  * Gregory "Grisha" Trubetskoy, author of ``mod_python``, who
>    beat up on the first draft as not offering any advantages
>    over "plain old CGI", thus encouraging me to look for a
>    better approach.
> 
>  * Ian Bicking, who helped nag me into properly specifying
>    the multithreading and multiprocess options, as well as
>    badgering me to provide a mechanism for servers to supply
>    custom extension data to an application.
> 
>  * Tony Lownds, who came up with the concept of a ``start_response``
>    function that took the status and headers, returning a ``write``
>    function.
> 
> 
> References
> ==========
> 
> .. [1] The Python Wiki "Web Programming" topic
>    (http://www.python.org/cgi-bin/moinmoin/WebProgramming)
> 
> .. [2] The Common Gateway Interface Specification, v 1.1, 3rd Draft
>    (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)
> 
> 
> Copyright
> =========
> 
> This document has been placed in the public domain.
> 
> 
> 
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    End:
> 


More information about the Web-SIG mailing list