[Python-Dev] PEP 3333: wsgi_string() function
pje at telecommunity.com
Fri Jan 7 06:12:16 CET 2011
At 04:00 PM 1/6/2011 -0800, Raymond Hettinger wrote:
>Can you please take a look at
>to see if it accurately recaps the resolution of the WSGI text/bytes issues.
>I would appreciate any feedback, as it is likely that the whatsnew
>document will be most people's first chance to hear the outcome
>of the multi-year discussion.
Hi Raymond -- nice work there. A few minor suggestions:
1. Native strings are used as the keys and values of the environ
dictionary, not just as headers for start_response.
2. The read_environ() method is strictly for use with CGI-to-WSGI
gateways, or for bridging other CGI-like protocols (e.g. FastCGI) to
WSGI. It is ONLY for server implementers, in other words, and the
typical app developer is doing something terribly wrong if they are
even bothering to read its documentation. ;-)
3. The primary relevance of the "native string" type to an app
developer is that when porting code from Python 2 to 3, they must
still decode environment variable values, even though they are
"already" Unicode. If their code was previously dealing only in
Python 2 'str' objects, then nothing really changes. If they were
previously decoding from environ str's to unicode, then they must
replace their prior .decode('whatever') with
.encode('latin1').decode('whatever'). That's basically it for
porting from Python 2.
IOW, this design choice allows most HTTP header manipulating code
(whether input or output) to be ported to Python 3 with a very
mechanical change pattern. Most such code is working with ASCII
anyway, since normally both input and output headers are, and there
are few headers that an application would be likely to convert to
actual unicode anyway.
On output via send_response(), if an application is currently
encoding an output header -- why they would be, I have no idea, but
if they are -- they need to add a re-encode to latin1. (i.e.,
IOW, a short 2-to-3 porting guide for WSGI:
* If you just used strings for headers before, that part of your code
doesn't change. (And if it was broken before, it's still broken in
exactly the same way. No new breakage is introduced. ;-) )
* If you encoded any output headers or decoded any input headers, you
must take into account the extra latin1 step. This is expected to be
rare, since it's usually only SCRIPT_NAME and PATH_INFO that anybody
would ever care about on input, and almost never anything on output.
* Values yielded by an application or sent via a write() call MUST be
byte strings; The environ and start_response() MUST be native
strings. No mixing and matching.
More information about the Python-Dev