<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7654.12">

<TITLE>Re: [Web-SIG] Python 3.0 and WSGI 1.0.</TITLE>

</HEAD>

<BODY>

<!-- Converted from text/plain format -->


<P><FONT SIZE=2>Graham Dumpleton wrote:<BR>

&gt; Robert, do you have any comments on the restricting of response<BR>

&gt; content to bytes and not allow fallback to conversion per latin-1?<BR>

&gt;<BR>

&gt; I heard that in CherryPy WSGI server you are only allowing bytes. What<BR>

&gt; is your rational for that at the moment?<BR>

<BR>

<BR>

In Python 2.x, one could easily mix unicode strings and byte strings in<BR>

the same interface, because they mostly supported the same operations.<BR>

Not so in Python 3.x--byte strings are missing everything from<BR>

capitalize() to zfill() [1]. I feel that choosing one type or the other<BR>

is required in order to avoid mountains of if-statements in middleware<BR>

(and lots of 'pass' statements if bytes are found).<BR>

<BR>

I decided that that single type should be byte strings because I want<BR>

WSGI middleware and applications to be able to choose what encoding<BR>

their output is. Passing unicode to the server would require some<BR>

out-of-band method of telling the server which encoding to use per<BR>

response, which seemed unacceptable.<BR>

<BR>

The down side, already alluded to, is that middleware cannot then call<BR>

e.g. response.capitalize() or any of a number of other methods without<BR>

first decoding the response. And it cannot do that reliably unless<BR>

(again) the encoding which was used to produce bytes is communicated<BR>

down the stack out of band.<BR>

<BR>

The python3 branch of CherryPy is by no means complete. I'd be happy to<BR>

explore emitting unicode if we could decide on a method whereby apps<BR>

could inform the server which encoding they want. Middleware which<BR>

transcoded the response would need a means of overriding that. But of<BR>

course, that opens a whole new can of worms if something goes wrong,<BR>

because application authors want control over the error response; if the<BR>

server is encoding the response, and an error occurs, there would have<BR>

to be a way to pass control back up the stack to...what? whichever<BR>

component last set the encoding? That road starts to get complicated<BR>

very quickly.<BR>

<BR>

If some middleware needs to treat the response as unicode, I'd rather<BR>

emit bytes and somehow return the encoding as part of the response.<BR>

Perhaps WSGI 2's mythical &quot;return (status, headers, body-iterable,<BR>

encoding)&quot;. Middleware could then decode/transcode as desired. I can't<BR>

think of a downside to that, other than some lost cycles spent<BR>

de/encoding, but perhaps there are some I don't yet foresee.<BR>

<BR>

<BR>

Robert Brewer<BR>

fumanchu@aminus.org<BR>

<BR>

[1] See <A HREF="http://docs.python.org/dev/py3k/library/stdtypes.html#string-methods">http://docs.python.org/dev/py3k/library/stdtypes.html#string-methods</A></FONT>

</P>


</BODY>

</HTML>