<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7654.12">
<TITLE>Re: [Web-SIG] Python 3.0 and WSGI 1.0.</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>Graham Dumpleton wrote:<BR>
> Robert, do you have any comments on the restricting of response<BR>
> content to bytes and not allow fallback to conversion per latin-1?<BR>
><BR>
> I heard that in CherryPy WSGI server you are only allowing bytes. What<BR>
> is your rational for that at the moment?<BR>
<BR>
<BR>
In Python 2.x, one could easily mix unicode strings and byte strings in<BR>
the same interface, because they mostly supported the same operations.<BR>
Not so in Python 3.x--byte strings are missing everything from<BR>
capitalize() to zfill() [1]. I feel that choosing one type or the other<BR>
is required in order to avoid mountains of if-statements in middleware<BR>
(and lots of 'pass' statements if bytes are found).<BR>
<BR>
I decided that that single type should be byte strings because I want<BR>
WSGI middleware and applications to be able to choose what encoding<BR>
their output is. Passing unicode to the server would require some<BR>
out-of-band method of telling the server which encoding to use per<BR>
response, which seemed unacceptable.<BR>
<BR>
The down side, already alluded to, is that middleware cannot then call<BR>
e.g. response.capitalize() or any of a number of other methods without<BR>
first decoding the response. And it cannot do that reliably unless<BR>
(again) the encoding which was used to produce bytes is communicated<BR>
down the stack out of band.<BR>
<BR>
The python3 branch of CherryPy is by no means complete. I'd be happy to<BR>
explore emitting unicode if we could decide on a method whereby apps<BR>
could inform the server which encoding they want. Middleware which<BR>
transcoded the response would need a means of overriding that. But of<BR>
course, that opens a whole new can of worms if something goes wrong,<BR>
because application authors want control over the error response; if the<BR>
server is encoding the response, and an error occurs, there would have<BR>
to be a way to pass control back up the stack to...what? whichever<BR>
component last set the encoding? That road starts to get complicated<BR>
very quickly.<BR>
<BR>
If some middleware needs to treat the response as unicode, I'd rather<BR>
emit bytes and somehow return the encoding as part of the response.<BR>
Perhaps WSGI 2's mythical "return (status, headers, body-iterable,<BR>
encoding)". Middleware could then decode/transcode as desired. I can't<BR>
think of a downside to that, other than some lost cycles spent<BR>
de/encoding, but perhaps there are some I don't yet foresee.<BR>
<BR>
<BR>
Robert Brewer<BR>
fumanchu@aminus.org<BR>
<BR>
[1] See <A HREF="http://docs.python.org/dev/py3k/library/stdtypes.html#string-methods">http://docs.python.org/dev/py3k/library/stdtypes.html#string-methods</A></FONT>
</P>
</BODY>
</HTML>