[Web-SIG] Draft PEP: WSGI 1.1

Dirkjan Ochtman dirkjan at ochtman.nl
Thu Apr 15 14:54:21 CEST 2010


Mostly taking Graham's list of issues and incorporating it into PEP 333.

Latest revision: http://hg.xavamedia.nl/peps/file/tip/wsgi-1.1.txt

Let's have comments here (comments in the form of diffs are
particularly welcome, of course). Remember, the idea is not to change
or improve WSGI right now, but only to improve the spec, improving
interoperability and enabling Python 3 support.

Graham, I hope I did a good job with your suggestions. (Since so much
of this is yours, I've just listed you as the second author.) I tried
to clarify exactly what you meant by "native strings", can you check
that out?

Cheers,

Dirkjan

--- pep-0333.txt	2010-04-15 14:46:02.000000000 +0200
+++ wsgi-1.1.txt	2010-04-15 14:51:39.000000000 +0200
@@ -1,114 +1,124 @@
-PEP: 333
-Title: Python Web Server Gateway Interface v1.0
+PEP: 0000
+Title: Python Web Server Gateway Interface 1.1
 Version: $Revision$
 Last-Modified: $Date$
-Author: Phillip J. Eby <pje at telecommunity.com>
+Author: Dirkjan Ochtman <dirkjan at ochtman.nl>,
+        Graham Dumpleton <graham.dumpleton at gmail.com>
 Discussions-To: Python Web-SIG <web-sig at python.org>
 Status: Draft
 Type: Informational
 Content-Type: text/x-rst
-Created: 07-Dec-2003
-Post-History: 07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004
+Created: 15-04-2010
+Post-History: Not yet


 Abstract
 ========

-This document specifies a proposed standard interface between web
-servers and Python web applications or frameworks, to promote web
-application portability across a variety of web servers.
+This document specifies a revision of the proposed standard interface
+between web servers and Python web applications or frameworks, to
+promote web application portability across a variety of web servers.


 Rationale and Goals
 ===================

-Python currently boasts a wide variety of web application frameworks,
-such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web -- to
-name just a few [1]_.  This wide variety of choices can be a problem
-for new Python users, because generally speaking, their choice of web
-framework will limit their choice of usable web servers, and vice
-versa.
-
-By contrast, although Java has just as many web application frameworks
-available, Java's "servlet" API makes it possible for applications
-written with any Java web application framework to run in any web
-server that supports the servlet API.
-
-The availability and widespread use of such an API in web servers for
-Python -- whether those servers are written in Python (e.g. Medusa),
-embed Python (e.g. mod_python), or invoke Python via a gateway
-protocol (e.g. CGI, FastCGI, etc.) -- would separate choice of
-framework from choice of web server, freeing users to choose a pairing
-that suits them, while freeing framework and server developers to
-focus on their preferred area of specialization.
-
-This PEP, therefore, proposes a simple and universal interface between
-web servers and web applications or frameworks: the Python Web Server
-Gateway Interface (WSGI).
-
-But the mere existence of a WSGI spec does nothing to address the
-existing state of servers and frameworks for Python web applications.
-Server and framework authors and maintainers must actually implement
-WSGI for there to be any effect.
-
-However, since no existing servers or frameworks support WSGI, there
-is little immediate reward for an author who implements WSGI support.
-Thus, WSGI **must** be easy to implement, so that an author's initial
-investment in the interface can be reasonably low.
-
-Thus, simplicity of implementation on *both* the server and framework
-sides of the interface is absolutely critical to the utility of the
-WSGI interface, and is therefore the principal criterion for any
-design decisions.
-
-Note, however, that simplicity of implementation for a framework
-author is not the same thing as ease of use for a web application
-author.  WSGI presents an absolutely "no frills" interface to the
-framework author, because bells and whistles like response objects and
-cookie handling would just get in the way of existing frameworks'
-handling of these issues.  Again, the goal of WSGI is to facilitate
-easy interconnection of existing servers and applications or
-frameworks, not to create a new web framework.
-
-Note also that this goal precludes WSGI from requiring anything that
-is not already available in deployed versions of Python.  Therefore,
-new standard library modules are not proposed or required by this
-specification, and nothing in WSGI requires a Python version greater
-than 2.2.2.  (It would be a good idea, however, for future versions
-of Python to include support for this interface in web servers
-provided by the standard library.)
-
-In addition to ease of implementation for existing and future
-frameworks and servers, it should also be easy to create request
-preprocessors, response postprocessors, and other WSGI-based
-"middleware" components that look like an application to their
-containing server, while acting as a server for their contained
-applications.
-
-If middleware can be both simple and robust, and WSGI is widely
-available in servers and frameworks, it allows for the possibility
-of an entirely new kind of Python web application framework: one
-consisting of loosely-coupled WSGI middleware components.  Indeed,
-existing framework authors may even choose to refactor their
-frameworks' existing services to be provided in this way, becoming
-more like libraries used with WSGI, and less like monolithic
-frameworks.  This would then allow application developers to choose
-"best-of-breed" components for specific functionality, rather than
-having to commit to all the pros and cons of a single framework.
-
-Of course, as of this writing, that day is doubtless quite far off.
-In the meantime, it is a sufficient short-term goal for WSGI to
-enable the use of any framework with any server.
-
-Finally, it should be mentioned that the current version of WSGI
-does not prescribe any particular mechanism for "deploying" an
-application for use with a web server or server gateway.  At the
-present time, this is necessarily implementation-defined by the
-server or gateway.  After a sufficient number of servers and
-frameworks have implemented WSGI to provide field experience with
-varying deployment requirements, it may make sense to create
-another PEP, describing a deployment standard for WSGI servers and
-application frameworks.
+WSGI 1.0, specified in PEP 333, did a great job in making it easier
+for web applications and web servers to interface with each other.
+It has become very much the standard it was meant to be and an
+important part of the Python web development infrastructure.
+
+After several implementations were built by different developers,
+it inevitably turned out that the specification wasn't perfect. It
+left out some details that were implemented by all the web server
+interfaces because they were critical for many applications (or
+application frameworks). Additionally, the specification was written
+before Python 3.x was specified, resulting in a lack of clear
+specification on what to do with unicode strings.
+
+While there are some ideas around to improve WSGI further in less
+compatible ways, we feel that there is value to be had in first
+specifying a minor revision of the specification, which is largely
+compatible with existing implementations. Further simplification
+and experimentation are therefore deferred to a 2.0 version.
+
+
+Differences with WSGI 1.0
+=========================
+
+Descriptive changes
+-------------------
+
+The following changes were made to realign the spec with
+implementations 'in the wild'.
+
+1. The 'readline()' function of 'wsgi.input' must optionally take
+   a size hint. This is required because many applications use
+   cgi.FieldStorage, which uses this functionality.
+
+2. The 'wsgi.input' functions for reading input must return an empty
+   string as end of input stream marker. This is required for support
+   of HTTP 1.1 request pipelining. A correctly implemented WSGI
+   middleware already has to cope with an empty string as end
+   sentinel anyway to detect premature end of input.
+
+3. Any WSGI application or middleware should not itself return, or
+   consume from a wrapped WSGI component, more data than specified by
+   the Content-Length response header if defined. Middleware that
+   does this is arguably broken and can generate incorrect data.
+   This is just a clarification of obligations.
+
+4. The WSGI adapter must not pass on to the server any data above
+   what the Content-Length response header defines, if supplied.
+   Doing this is technically a violation of HTTP. This is another
+   clarification of obligations.
+
+
+String handling changes
+-----------------------
+
+The following changes were made to make WSGI work on Python 3.x.
+
+1. The application is passed an instance of a Python dictionary
+   containing what is referred to as the WSGI environment. All keys
+   in this dictionary are native strings. For CGI variables, all names
+   are going to be ISO-8859-1 and so where native strings are
+   unicode strings, that encoding is used for the names of CGI
+   variables.
+
+2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
+   environment, the value of the variable should be a native string.
+
+3. For the CGI variables contained in the WSGI environment, the values
+   of the variables are native strings. Where native strings are
+   unicode strings, ISO-8859-1 encoding would be used such that the
+   original character data is preserved and as necessary the unicode
+   string can be converted back to bytes and thence decoded to unicode
+   again using a different encoding.
+
+4. The WSGI input stream 'wsgi.input' contained in the WSGI environment
+   and from which request content is read, should yield byte strings.
+
+5. The status line specified by the WSGI application should be a byte
+   string. Where native strings are unicode strings, the native string
+   type can also be returned in which case it would be encoded as
+   ISO-8859-1.
+
+6. The list of response headers specified by the WSGI application should
+   contain tuples consisting of two values, where each value is a byte
+   string. Where native strings are unicode strings, the native string
+   type can also be returned in which case it would be encoded as
+   ISO-8859-1.
+
+7. The iterable returned by the application and from which response
+   content is derived, should yield byte strings. Where native strings
+   are unicode strings, the native string type can also be returned in
+   which case it would be encoded as ISO-8859-1.
+
+8. The value passed to the 'write()' callback returned by
+   'start_response()' should be a byte string. Where native strings
+   are unicode strings, a native string type can also be supplied, in
+   which case it would be encoded as ISO-8859-1.


 Specification Overview
@@ -447,6 +457,13 @@
 Streaming`_ section below for more on how application output must be
 handled.)

+Further on, several places specify constraints upon string types used
+in the WSGI API. The term native string is used to mean the 'str' class
+in both Python 2.x and 3.x. The spec tries to ensure optimal
+compatibility and ease of use by allowing implementations running on
+Python 3.x to encode strings (which are Unicode strings with no
+specified encoding) as ISO-8859-1 where a 3.x string is passed in.
+
 The server or gateway should treat the yielded strings as binary byte
 sequences: in particular, it should ensure that line endings are
 not altered.  The application is responsible for ensuring that the
@@ -489,12 +506,22 @@
 ``environ`` Variables
 ---------------------

+All keys in this dictionary are native strings. For CGI variables,
+all names are going to be ISO-8859-1 and so where native strings are
+unicode strings, that encoding is used for the names of CGI variables.
+
 The ``environ`` dictionary is required to contain these CGI
 environment variables, as defined by the Common Gateway Interface
 specification [2]_.  The following variables **must** be present,
 unless their value would be an empty string, in which case they
 **may** be omitted, except as otherwise noted below.

+The values for CGI variables are native strings. Where native strings
+are unicode strings, ISO-8859-1 encoding would be used such that the
+original character data is preserved and as necessary the unicode
+string can be converted back to bytes and thence decoded to unicode
+again using a different encoding.
+
 ``REQUEST_METHOD``
   The HTTP request method, such as ``"GET"`` or ``"POST"``.  This
   cannot ever be an empty string, and so is always required.
@@ -575,13 +602,14 @@
 =====================  ===============================================
 Variable               Value
 =====================  ===============================================
-``wsgi.version``       The tuple ``(1,0)``, representing WSGI
+``wsgi.version``       The tuple ``(1, 0)``, representing WSGI
                        version 1.0.

 ``wsgi.url_scheme``    A string representing the "scheme" portion of
                        the URL at which the application is being
                        invoked.  Normally, this will have the value
-                       ``"http"`` or ``"https"``, as appropriate.
+                       ``"http"`` or ``"https"``, as appropriate. The
+                       value is a native string.

 ``wsgi.input``         An input stream (file-like object) from which
                        the HTTP request body can be read.  (The server
@@ -646,7 +674,7 @@
 Method               Stream      Notes
 ===================  ==========  ========
 ``read(size)``       ``input``   1
-``readline()``       ``input``   1,2
+``readline(hint)``   ``input``   1,2
 ``readlines(hint)``  ``input``   1,3
 ``__iter__()``       ``input``
 ``flush()``          ``errors``  4
@@ -661,11 +689,12 @@
    ``Content-Length``, and is allowed to simulate an end-of-file
    condition if the application attempts to read past that point.
    The application **should not** attempt to read more data than is
-   specified by the ``CONTENT_LENGTH`` variable.
+   specified by the ``CONTENT_LENGTH`` variable. All read functions
+   are required to return an empty string as the end of input stream
+   marker. They must yield byte strings.

-2. The optional "size" argument to ``readline()`` is not supported,
-   as it may be complex for server authors to implement, and is not
-   often used in practice.
+2. The optional "size" argument to ``readline()`` is required for
+   the implementer, but optional for callers.

 3. Note that the ``hint`` argument to ``readlines()`` is optional for
    both caller and implementer.  The application is free not to
@@ -692,12 +721,15 @@
 ---------------------------------

 The second parameter passed to the application object is a callable
-of the form ``start_response(status,response_headers,exc_info=None)``.
+of the form ``start_response(status, response_headers, exc_info=None)``.
 (As with all WSGI callables, the arguments must be supplied
 positionally, not by keyword.)  The ``start_response`` callable is
 used to begin the HTTP response, and it must return a
 ``write(body_data)`` callable (see the `Buffering and Streaming`_
-section, below).
+section, below). Values passed to the ``write(body_data)`` callable
+should be byte strings. Where native strings are unicode strings, a
+native strings type can also be supplied, in which case it would be
+encoded as ISO-8859-1.

 The ``status`` argument is an HTTP "status" string like ``"200 OK"``
 or ``"404 Not Found"``.  That is, it is a string consisting of a
@@ -705,14 +737,20 @@
 single space, with no surrounding whitespace or other characters.
 (See RFC 2616, Section 6.1.1 for more information.)  The string
 **must not** contain control characters, and must not be terminated
-with a carriage return, linefeed, or combination thereof.
+with a carriage return, linefeed, or combination thereof. This
+value should be a byte string. Where native strings are unicode
+strings, the native string type can also be returned, in which
+case it would be encoded as ISO-8859-1.

 The ``response_headers`` argument is a list of ``(header_name,
 header_value)`` tuples.  It must be a Python list; i.e.
-``type(response_headers) is ListType``, and the server **may** change
+``type(response_headers) is list``, and the server **may** change
 its contents in any way it desires.  Each ``header_name`` must be a
 valid HTTP header field-name (as defined by RFC 2616, Section 4.2),
-without a trailing colon or other punctuation.
+without a trailing colon or other punctuation. Both the header_name
+and the header_value should be byte strings. Where native strings
+are unicode strings, the native string type can also be returned,
+in which case it would be encoded as ISO-8859-1.

 Each ``header_value`` **must not** include *any* control characters,
 including carriage returns or linefeeds, either embedded or at the end.
@@ -809,6 +847,14 @@
 Handling the ``Content-Length`` Header
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+If an application or middleware layer chooses to return a
+Content-Length header, it should not return more data than specified
+by the header value. Any wrapping middleware layer should not
+consume more data than specified in the header value from the
+wrapped component (either middleware or application). Any WSGI
+adapter must similarly not pass on data above what the
+Content-Length response header value defines.
+
 If the application does not supply a ``Content-Length`` header, a
 server or gateway may choose one of several approaches to handling
 it.  The simplest of these is to close the client connection when
@@ -1569,55 +1615,13 @@
    developers.


-Proposed/Under Discussion
-=========================
-
-These items are currently being discussed on the Web-SIG and elsewhere,
-or are on the PEP author's "to-do" list:
-
-* Should ``wsgi.input`` be an iterator instead of a file?  This would
-  help for asynchronous applications and chunked-encoding input
-  streams.
-
-* Optional extensions are being discussed for pausing iteration of an
-  application's ouptut until input is available or until a callback
-  occurs.
-
-* Add a section about synchronous vs. asynchronous apps and servers,
-  the relevant threading models, and issues/design goals in these
-  areas.
-
-
 Acknowledgements
 ================

-Thanks go to the many folks on the Web-SIG mailing list whose
-thoughtful feedback made this revised draft possible.  Especially:
+Thanks go to many folks on the Web-SIG mailing list for helping the work
+on clarifying and improving this specification. In particular:

-* Gregory "Grisha" Trubetskoy, author of ``mod_python``, who beat up
-  on the first draft as not offering any advantages over "plain old
-  CGI", thus encouraging me to look for a better approach.
-
-* Ian Bicking, who helped nag me into properly specifying the
-  multithreading and multiprocess options, as well as badgering me to
-  provide a mechanism for servers to supply custom extension data to
-  an application.
-
-* Tony Lownds, who came up with the concept of a ``start_response``
-  function that took the status and headers, returning a ``write``
-  function.  His input also guided the design of the exception handling
-  facilities, especially in the area of allowing for middleware that
-  overrides application error messages.
-
-* Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython
-  (well before the spec was finalized) helped to shape the "supporting
-  older versions of Python" section, as well as the optional
-  ``wsgi.file_wrapper`` facility.
-
-* Mark Nottingham, who reviewed the spec extensively for issues with
-  HTTP RFC compliance, especially with regard to HTTP/1.1 features that
-  I didn't even know existed until he pointed them out.
-
+* Phillip J. Eby, for writing/editing the 1.0 specification.

 References
 ==========
@@ -1643,8 +1647,6 @@

 This document has been placed in the public domain.

-
-
 ..
    Local Variables:
    mode: indented-text


More information about the Web-SIG mailing list