[Web-SIG] WSGI for Python 3

Graham Dumpleton graham.dumpleton at gmail.com
Mon Aug 30 06:37:14 CEST 2010


On 30 August 2010 13:07, P.J. Eby <pje at telecommunity.com> wrote:
> At 11:16 AM 8/30/2010 +1000, Graham Dumpleton wrote:
>>
>> Although I almost begged that if we are going to discuss bytes,
>> compared to text/unicode, that agreement at least first be made about
>> the definition of the bytes leaning option, that request has pretty
>> well fallen on death ears.
>
> Did you not see my reply?  I (thought I) answered your question, and I
> actually also suggested that a variation of your unicode proposal might
> work, too.  See:
>
> http://mail.python.org/pipermail/web-sig/2010-August/004545.html

I was purely asking about bytes, what that means to people who want to
push that, and set aside the unicode one for the moment.

There have been others as well in the past who have pushed bytes, but
they haven't said anything about what it means and I really wanted
more input given that in the past the discussions had over the unicode
leaning proposals between us core people have been in part derailed by
these people who sit mostly on the sidelines and start shouting 'I
want bytes instead'. So, I want to give those critics their chance to
confirm what they mean by bytes, else we will keep having them pop up
time and time again when we are trying to discuss other stuff. So it
is the lack of response beyond the usual suspects that am grumpy
about.

Even in what you mention about bytes you are a bit fuzzy. Having value
of wsgi.url_scheme be bytes is reasonable and have no issue with that
given that other URL components will be bytes as well, but when you
yourself mention keys, you are a bit unsure because of the 'b' plague.
So, still no clarity on that point and if people are going to keep
raising bytes, would like that better definition of what they are
talking about.

The only other person who has said anything about bytes is Armin but
all that he really said was 'all bytes only'. This isn't much clearer
than when people have in the past said 'bytes everywhere', but in some
cases didn't actually mean keys. This is why I asked that people cut
and paste the definition I gave and change it to exactly what they
meant, so not having to second guess. FWIW, from separate discussion
understand Armin does mean bytes for keys.

So, was really after that clarity so we can say without confusion that
our starting point from now is that have two overall proposals and
that they be A and B as defined, with possibly even a C and D if need
be, not even using the labels bytes and unicode. We can then discuss
each in isolation as to whether as defined they would work or not.
>From that one or more might die, or might mutate further and actually
become closer to the other option but where all are still valid
options. Either way, people up till now have it stuck in their heads
now this bytes vs unicode divide when strictly speaking it isn't
necessarily pure bytes vs pure unicode, but merely a number of
different proposals with certain bits in one case using unicode
instead of bytes.

Given that we have dedicated most time to the unicode leaning
solution, would like to go and look properly at the bytes leaning
solutions now. That way we have the definitions and also have done the
analysis and when people come along later and say 'bytes everywhere',
we have something proper to refer back to about it.

Anyway, rather than keep arguing the point and move forward, let us
perhaps start now with the following definitions and new names to
identify them. We can even go a bit stupid and give each its own code
name so they are in part more memorable. Any next option based on your
suggestions about changing the WHEAT option can be called MAIZE. And
if you thinking I am going stark raving mad and should be put in a
white jacket and locked up, you could well be right. I am not a happy
camper right now, but that is because of many things besides this WSGI
stuff. :-)

 And yes I know about the page that has been just recently put up at:

  http://www.wsgi.org/wsgi/Python_3

>From memory when I first read it I wasn't sure if that it was
completely accurate, but at least it doesn't now mention mod_python
instead of mod_wsgi which was mighty confusing. We can perhaps merge
the following into that page, ie., expand the table, and talk more
about the abstract definitions rather than linking it to specific
implementations at this point. We can perhaps then start capturing the
pros and cons against each option in the page rather than loosing them
in the email chain.

OPTION : BARLEY

1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys in
this dictionary are byte strings.

2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
environment, the value of the variable should be a byte string.

3. For the CGI variables contained in the WSGI environment, the values
of the variables are byte strings.

4. The WSGI input stream 'wsgi.input' contained in the WSGI
environment and from which request content is read, should yield byte
strings.

5. The status line specified by the WSGI application must be a byte string.

6. The list of response headers specified by the WSGI application must
contain tuples consisting of two values, where each value is a byte
string.

7. The iterable returned by the application and from which response
content is derived, must yield byte strings.

OPTION : RYE

1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys in
this dictionary are native strings. For CGI variables, all names are
going to be ISO-8859-1 and so where native strings are unicode
strings, that encoding is used for the names of CGI variables.

2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
environment, the value of the variable should be a byte string.

3. For the CGI variables contained in the WSGI environment, the values
of the variables are byte strings.

4. The WSGI input stream 'wsgi.input' contained in the WSGI
environment and from which request content is read, should yield byte
strings.

5. The status line specified by the WSGI application must be a byte string.

6. The list of response headers specified by the WSGI application must
contain tuples consisting of two values, where each value is a byte
string.

7. The iterable returned by the application and from which response
content is derived, must yield byte strings.

OPTION : WHEAT

1. The application is passed an instance of a Python dictionary
containing what is referred to as the WSGI environment. All keys in
this dictionary are native strings. For CGI variables, all names are
going to be ISO-8859-1 and so where native strings are unicode
strings, that encoding is used for the names of CGI variables

2. For the WSGI variable 'wsgi.url_scheme' contained in the WSGI
environment, the value of the variable should be a native string.

3. For the CGI variables contained in the WSGI environment, the values
of the variables are native strings. Where native strings are unicode
strings, ISO-8859-1 encoding would be used such that the original
character data is preserved and as necessary the unicode string can be
converted back to bytes and thence decoded to unicode again using a
different encoding.

4. The WSGI input stream 'wsgi.input' contained in the WSGI
environment and from which request content is read, should yield byte
strings.

5. The status line specified by the WSGI application should be a byte
string. Where native strings are unicode strings, the native string
type can also be returned in which case it would be encoded as
ISO-8859-1.

6. The list of response headers specified by the WSGI application
should contain tuples consisting of two values, where each value is a
byte string. Where native strings are unicode strings, the native
string type can also be returned in which case it would be encoded as
ISO-8859-1.

7. The iterable returned by the application and from which response
content is derived, should yield byte strings. Where native strings
are unicode strings, the native string type can also be returned in
which case it would be encoded as ISO-8859-1.

Graham


More information about the Web-SIG mailing list