[Python-Dev] Backup plan: WSGI 1 Addenda and wsgiref update for Py3
P.J. Eby
pje at telecommunity.com
Tue Sep 21 18:09:44 CEST 2010
While the Web-SIG is trying to hash out PEP 444, I thought it would
be a good idea to have a backup plan that would allow the Python 3
stdlib to move forward, without needing a major new spec to settle
out implementation questions.
After all, even if PEP 333 is ultimately replaced by PEP 444, it's
probably a good idea to have *some* sort of WSGI 1-ish thing
available on Python 3, with bytes/unicode and other matters settled.
In the past, I was waiting for some consensuses (consensi?) on
Web-SIG about different approaches to Python 3, looking for some sort
of definite, "yes, we all like this" response. However, I can see
now that this just means it's my fault we don't have a spec yet. :-(
So, unless any last-minute showstopper rebuttals show up this week,
I've decided to go ahead officially bless nearly all of what Graham
Dumpleton (who's not only the mod_wsgi author, but has put huge
amounts of work into shepherding WSGI-on-Python3 proposals, WSGI
amendments, etc.) has proposed, with a few minor exceptions.
In other words: almost none of the following is my own original work;
it's like 90% Graham's. Any praise for this belongs to him; the only
thing that belongs to me is the blame for not doing this
sooner! (Sorry Graham. You asked me to do this ages ago, and you were right.)
Anyway, I'm posting this for comment to both Python-Dev and the
Web-SIG. If you are commenting on the technical details of the
amendments, please reply to the Web-SIG only. If you are commenting
on the development agenda for wsgiref or other Python 3 library
issues, please reply to Python-Dev only. That way, neither list will
see off-topic discussions. Thanks!
The Plan
========
I plan to update the proposal below per comments and feedback during
this week, then update PEP 333 itself over the weekend or early next
week, followed by a code review of Python 3's wsgiref, and
implementation of needed changes (such as recoding os.environ to
latin1-captured bytes in the CGI handler).
To complete the changes, it is possible that I may need assistance
from one or more developers who have more Python 3 experience. If
after reading the proposed changes to the spec, you would like to
volunteer to help with updating wsgiref to match, please let me know!
The Proposal
============
Overview
--------
1. The primary purpose of this update is to provide a uniform porting
pattern for moving Python 2 WSGI code to Python 3, meaning a pattern
of changes that can be mechanically applied to as little code as
practical, while still keeping the WSGI spec easy to programmatically
validate (e.g. via ``wsgiref.validate``).
The Python 3 specific changes are to use:
* ``bytes`` for I/O streams in both directions
* ``str`` for environ keys and values
* ``bytes`` for arguments to start_response() and write()
* text stream for wsgi.errors
In other words, "strings in, bytes out" for headers, bytes for bodies.
In general, only changes that don't break Python 2 WSGI
implementations are allowed. The changes should also not break
mod_wsgi on Python 3, but may make some Python 3 wsgi applications
non-compliant, despite continuing to function on mod_wsgi.
This is because mod_wsgi allows applications to output string headers
and bodies, but I am ruling that option out because it forces every
piece of middleware to have to be tested with arbitrary combinations
of strings and bytes in order to test compliance. If you want your
application to output strings rather than bytes, you can always use a
decorator to do that. (And a sample one could be provided in wsgiref.)
2. The secondary purpose of the update is to address some
long-standing open issues documented here:
http://www.wsgi.org/wsgi/Amendments_1.0
As with the Python 3 changes, only changes that don't retroactively
invalidate existing implementations are allowed.
3. There is no tertiary purpose. ;-) (By which I mean, all other
kinds of changes are out-of-scope for this update.)
4. The section below labeled "A Note On String Types" is proposed for
verbatim addition to the "Specification Overview" section in the PEP;
the other sections below describe changes to be made inline at the
appropriate part of the spec, and changes that were proposed but are
rejected for inclusion in this amendment.
A Note On String Types
----------------------
In general, HTTP deals with bytes, which means that this
specification is mostly about handling bytes.
However, the content of those bytes often has some kind of textual
interpretation, and in Python, strings are the most convenient way to
handle text.
But in many Python versions and implementations, strings are Unicode,
rather than bytes. This requires a careful balance between a usable
API and correct translations between bytes and text in the context of
HTTP... especially to support porting code between Python
implementations with different ``str`` types.
WSGI therefore defines two kinds of "string":
* "Native" strings (which are always implemented using the type named ``str``)
* "Bytestrings" (which are implemented using the ``bytes`` type in
Python 3, and ``str`` elsewhere)
So, even though HTTP is in some sense "really just bytes", there are
many API conveniences to be had by using whatever Python's default
``str`` type is.
Do not be confused however: even if Python's ``str`` is actually
Unicode under the hood, the *content* of a native string is still
restricted to bytes! See the section on `Unicode Issues`_ later in
this document.
In short: where you see the word "string" in this document, it refers
to a "native" string, i.e., an object of type ``str``, whether it is
internally implemented as bytes or unicode. Where you see references
to "bytestring", this should be read as "an object of type ``bytes``
under Python 3, or type ``str`` under Python 2".
Clarifications (To be made in-line)
-----------------------------------
The following amendments are clarifications to parts of the existing
spec that proved over the years to be ambiguous or insufficiently
specified, as well as some attempts to correct practical errors.
(Note: many of these issues cannot be completely fixed in WSGI 1
without breaking existing implementations, and so the text below has
notations such as "(MUST in WSGI 2)" to indicate where any
replacement spec for WSGI 1 should strengthen them.)
* If an application returns a body iterator, a server (or middleware)
MAY stop iterating over it and discard the remainder of the output,
as long as it calls any close() method provided by the
iterator. Applications returning a generator or other custom
iterator SHOULD NOT assume that the entire iterator will be
consumed. (This change makes it explicit that caching middleware or
HEAD-processing servers can throw away the response body.)
* start_response() SHOULD (MUST in WSGI 2) check for errors in the
status or headers at the time it's called, so that an error can be
raised as close to the problem as possible
* If start_response() raises an error when called normally (i.e.
without exc_info), it SHOULD be an error to call it a second time
without passing exc_info
* The SERVER_PORT variable is of type str, just like any other CGI
environ variable. (According to the WSGI wiki, "some
implementations" expect it to be an integer, even though there is
nothing in the WSGI spec that allows a CGI variable to be anything but a str.)
* A server SHOULD (MUST in WSGI 2) support the size hint argument to
readline() on its wsgi.input stream.
* A server SHOULD (MUST in WSGI 2) return an empty bytestring from
read() on wsgi.input to indicate an end-of-file condition. (In WSGI
2, language should be clarified to allow the input stream length and
CONTENT_LENGTH to be out of sync, for reasons explained in Graham's blog post.)
* A server SHOULD (MUST in WSGI 2) allow read() to be called without
an argument, and return the entire remaining contents of the stream
* If an application provides a Content-Length header, the server
SHOULD NOT (MUST NOT in WSGI 2) send more data to the client than was
specified in that header, whether via write(), yielded body
bytestrings, or via a wsgi.file_wrapper. (This rule applies to
middleware as well.)
* wsgi.errors is a text stream accepting "native strings"
Rejected Amendments
-------------------
* Manlio Perillo's suggestion to allow header specification to be
delayed until the response iterator is producing non-empty
output. This would've been a possible win for async WSGI, but could
require substantial changes to existing servers.
More information about the Python-Dev
mailing list