[Web-SIG] PEP 0333 and PEP XXXX Updated

Sun Sep 20 13:37:05 CEST 2009

2009/9/20 Armin Ronacher <armin.ronacher at active-4.com>:
> Hi,
>
> I know I pretty much SPAM the list here now which is why I added all the
> changes of WSGI 1.0 and what could become WSGI 1.1 into a repo on
> bitbucket as two PEPS:
>
> http://bitbucket.org/ianb/wsgi-peps/src/
>
>
> pep-0333.txt
>
> This is basically just a new revision for PEP 333 changing the following
> things:
>
> - removing Jython and Python 2.2 compatibility.  Jython is close enough
>  to modern Python versions now that this does not make any difference.
>
> - fixing wsgi.input by adding a proper readline().  The current version
>  still requires the user to care about not reading past the content
>  length, but if all server implementors agree that could be changed so
>  that the stream provides an end of line marker.
>
> - mentioning that WSGI 1.0 is not supported by Python 3.
>
> - made WSGI 1.0 depend on bytes.
>
> - fixed example code
>
> - servers may no longer add a date or server header if that header is
>  already present.  (This MUST may become a SHOULD for the server
>  header as it's probably hard to control for things like mod_wsgi)
>
> - weakened the rules for buffering and streaming.  Everybody does it,
>  so it should be allowed.
>
> - added middleware warning for `wsgi.file_wrapper`
>
>
> pep-XXXX.txt
>
> This specifies WSGI 1.1 based on #3/#4 in Graham Dumpletons Blog post.
> The differences to his proposal:
>
> - the application iterator must by byte based.  I would really require
>  that, so that people explicitly encode their stuff as utf-8 instead
>  of yielding latin1.  If we want to allow unicode return values I
>  strongly encourage using utf-8 for the return value because we already
>  require UTF-8 URLs.
>
> - clarified wsgi.uri_encoding, that algorithm should not be the default
>  but the only one to make it easier for applications to reencode URIs.
>
> - Stick to `start_response` and `exc_info` but add deprecation warnings
>  for `exc_info` and `write()`.  This should make it easier to port
>  applications over.  Breaking too many APIs at the same time is
>  probably not the best idea.
>
>
> If we really want to get rid of `start_response` at the same time, I
> would suggest using ``(appiter, status, headers)`` instead of
> ``(status, headers, appiter)``.  The former is the current common
> signature of response objects which would make it possible to convert
> from a WSGI application response to a response object by doing something
> like this:
>
>   response = Response(*wsgi_app(request.environ))
>
> The XXXX PEP is currently missing any copyright information and headers
> and should only be considered as a draft.

Regardless of the details of changes being made to the PEP and the
creation of any new ones, do we need to first agree on the overall
direction we are going to take. Ie., the grand plan at a high level.

What I am getting at here is that the likes of PJE has indicated a
preference for skipping any WSGI 1.1 altogether and going straight to
WSGI 2.0. If there isn't going to be support all round for even coming
out with WSGI 1.1, then don't want to see time wasted trying to come
up with a new PEP only for what is needed to change.

I actually suggested going straight to WSGI 2.0 back the start of last
year and got chastised for making the suggestion. The criticisms back
then were because I was saying that since people were going to have to
make changes for Python 3.0 anyway, why not enforce an API change at
the same time. This didn't go down too well with those who wanted to
promote 2to3 as the way of migrating to Python 3.0, even though I was
already pointing out that WSGI as it was probably wasn't going to work
on Python 3.0. Our own ongoing discussions have proved out that point
and that some change will be required to make it usable.

I do acknowledge though that I wanted to skip WSGI 1.X altogether for
Python 3.0, where as PJE is trying to install his preferred definition
#3, and one he has always promoted from day one, as WSGI 1.0 for
Python 3.X, even though it doesn't comply with WSGI PEP and by rights
shouldn't be called WSGI 1.0.

So, I am starting to get nervous that we could go to a great deal of
work to try and resolve the various issues for a specific definition,
only to find that people don't even agree that such a version is
warranted and we get a deadlock.

Looking at the bigger picture, there are three overall goals that I
can see that we would want to address.

1. Clarifications and corrections to existing WSGI for Python 2.X to
allow readline() with size hint, mandatory end of stream sentinel for
wsgi.input, support for chunked request content and rules on amount of
data that should be returned by WSGI applications and how much data
wsg.file_wrapper should send back from a file when Content-Length is
defined. These were the points (11) to (16) that I tacked onto my
definition #4, in my blog post. They are applicable though to any
update to WSGI for any version of Python.

2. Come up with a version of WSGI for Python 3.X. The whole bytes
versus unicode discussion.

3. Drop the start_response() function and ability to use its write()
function returned as result. What people have been calling WSGI 2.0.

To go along with that, there are a couple major questions I think
needs to be answered and this will dictate to a degree what any
roadmap will be.

The first question is, should Python 2.X forever be bytes everywhere,
or if we start introducing unicode into parts of the definition for
Python 3.X, should those versions of the WSGI specification map those
unicode parts back in to the Python 2.X of an equivalent version of
the specification?

In my definitions I introduced 'native' string along with 'bytes' and
'unicode' string in an attempt to try and be able to use one set of
language which would describe WSGI and be interpretable in the context
of both Python 2.X and Python 3.X.

For definition #4, this mean defining SCRIPT_NAME, PATH_INFO and
QUERY_STRING as 'unicode' string. This meant that for Python 2.X, they
would as such also be unicode string. The other option was to define
them as 'native' string, which means the whole 'wsgi.uri_encoding'
flag was only relevant to Python 3.X, as in Python 2.X the native
string is 'bytes' and so the whole encoding issue would still be up to
the WSGI application as it is now for bytes everywhere WSGI in Python
2.X. In effect, if they were 'native' strings and 'wsgi.uri_encoding'
went way, we just have existing WSGI 1.0. The only actual difference
was that I was adding on top of definition #4 the clarifications as
per (1) above.

The second question is, do we want to try and come up with something
for Python 3.X, ie., (2) above, while still preserving the current
start_response() callback, or do we instead want to jump direct to
WSGI (Python 3.X) 2.0, ie., combine (2) and (3) above, and say that
there is no WSGI 1.X for Python 3.X at all?

For example, one option for a roadmap would keep bytes everywhere in
Python 2.X and jump direct to WSGI 2.0 in Python 3.X.

WSGI (Python 2.X) 1.1 - Clarify existing WSGI by adding (1) above.
WSGI (Python 2.X) 2.0 - Drop start_response() from WSGI (Python 2.X)
1.1. Keep bytes everywhere.
WSGI (Python 3.X) 2.0 - Adapt WSGI (Python 2.X) 2.0 to Python 3.X. Use
definition #4 (or more likely a variation on it).

One reason for still keeping bytes everywhere in Python 2.X is that is
because how it is and if unicode introduced then possibly would just
be ignored by people anyway.

Second reason is whereby Ian is promoting PEP 0383 as way of resolving
transcoding issues for Python 3.X. The library functions for PEP 0383
are only in Python 3.1 which straight away says we possibly have to
abandon any concept of supporting Python 3.0, but also means not
really practical to also push back and start using unicode in Python
2.X either. This is because one of the things that makes writing WSGI
adapters easy is that no dependence on a third party package is
required. By having to use PEP 0383, you are effectively then bound to
Python 3.1+. It would just be a PITA if WSGI adapters had to provide
their own implementation of those library functions to support older
Python versions or if WSGI adapters had to depend on a third party
package not part of Python itself.

The second option for a roadmap, if want to start introducing unicode
to Python 2.X, and effectively maintain one WSGI specification that
works for Python 2.X and Python 3.X, would mirror somewhat what I
originally blogged about.

WSGI (Python 3.X) 1.0 - Use definition #3, even though it doesn't
agree with WSGI 1.0 specification and so should not really be labelled
as such. Only really doing this because of fact that wsgiref and some
other implementations were using this already. Don't bother to add
clarifications in (1) above as can't guarantee existing
implementations are implemented that way.
WSGI (Python 2.X/3.X) 1.1 - Use definition #4 (or more likely a
variation on it). Add clarifications in (1) above.
WSGI (Python 2.X/3.X) 2.0 - Drop start_response() from WSGI (Python
2.X/3.X) 1.1.

That is two options and there would be others as well. For example,
replace in second option WSGI (Python 3.X) 1.0 with the bytes only
version of WSGI, ie., use definition #1.

So, perhaps we can step back for a minute and ask those couple of
major questions. To state them again, they were:

1. Do we keep bytes everywhere forever in Python 2.X, or try to
introduce unicode there at all to at least mirror what changes might
be made to make WSGI workable in Python 3.X?

2. Do we skip WSGI 1.X completely for Python 3.X and go straight to
WSGI 2.0 for Python 3.X?

I would like to see all the major players, ie., Robert, Armin, PJE and
Ian, plus if possible, major developers on packages like Pylons, TG,
Django, Zope/Repoze etc, at least comment on these two questions.

Settling on the overall plan before we go any further would be a good
start and avoid have to change course later.

Graham