<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7654.12">

<TITLE>Re: [Web-SIG] Python 3.0 and WSGI 1.0.</TITLE>

</HEAD>

<BODY>

<!-- Converted from text/plain format -->


<P><FONT SIZE=2>P.J. Eby wrote:<BR>

&gt; At 08:07 AM 5/8/2009 -0700, Robert Brewer wrote:<BR>

&gt;&gt; I decided that that single type should be byte strings because I want<BR>

&gt;&gt; WSGI middleware and applications to be able to choose what encoding<BR>

&gt;&gt; their output is. Passing unicode to the server would require some<BR>

&gt;&gt; out-of-band method of telling the server which encoding to use per<BR>

&gt;&gt; response, which seemed unacceptable.<BR>

&gt;<BR>

&gt; I find the above baffling, since PEP 333 explicitly states that<BR>

&gt; when using unicode types, they're not actually supposed to *be*<BR>

&gt; unicode -- they're just bytes decoded with latin-1.<BR>

<BR>

It also explicitly states that &quot;HTTP does not directly support Unicode,<BR>

and neither does this interface. All encoding/decoding must be handled<BR>

by the application; all strings passed to or from the server must be<BR>

standard Python BYTE STRINGS (emphasis mine), not Unicode objects. The<BR>

result of using a Unicode object where a string object is required, is<BR>

undefined.&quot;<BR>

<BR>

PEP 333 is difficult to interpret because it uses the name &quot;str&quot;<BR>

synonymously with the concept &quot;byte string&quot;, which Python 3000 defies. I<BR>

believe the intent was to differentiate unicode from bytes, not elevate<BR>

whatever type happens to be called &quot;str&quot; on your Python du jour. It was<BR>

and is a mistake to standardize on type names (&quot;str&quot;) across platforms<BR>

and not on type behavior (&quot;byte string&quot;).<BR>

<BR>

If Python3 WSGI apps emit unicode strings (py3k type 'str'), you're<BR>

effectively saying the server will always call<BR>

&quot;chunk.encode('latin-1')&quot;. That negates any benefit of using unicode as<BR>

the type for the response. That's not &quot;supporting unicode&quot;; that's using<BR>

unicode exactly as if it were an opaque byte string. That's seems silly<BR>

to me when there is a perfectly useful byte string type.<BR>

<BR>

&gt; So, the server doesn't need to know &quot;what encoding to use&quot; -- it's<BR>

&gt; latin-1, plain and simple.&nbsp; (And it's an error for an application to<BR>

&gt; produce a unicode string that can't be encoded as latin-1.)<BR>

&gt;<BR>

&gt; To be even more specific: an application that produces strings can<BR>

&gt; &quot;choose what encoding to use&quot; by encoding in it, then decoding those<BR>

&gt; bytes via latin-1.&nbsp; (This is more or less what Jython and IronPython<BR>

&gt; users are doing already, I believe.)<BR>

<BR>

That may make sense for Jython and IronPython if they truly do not have<BR>

a usable byte string type. But it doesn't make as much sense for Python3<BR>

which has a usable byte string type. My way:<BR>

<BR>

&nbsp;&nbsp;&nbsp; App&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Server<BR>

&nbsp;&nbsp;&nbsp; ---&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ------<BR>

&nbsp;&nbsp;&nbsp; bchunk = uchunk.encode('utf-8')<BR>

&nbsp;&nbsp;&nbsp; yield bchunk<BR>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; write(bchunk)<BR>

<BR>

Your way:<BR>

<BR>

&nbsp;&nbsp;&nbsp; App&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Server<BR>

&nbsp;&nbsp;&nbsp; ---&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ------<BR>

&nbsp;&nbsp;&nbsp; bchunk = uchunk.encode('utf-8')<BR>

&nbsp;&nbsp;&nbsp; uchunk = chunk.decode('latin-1')<BR>

&nbsp;&nbsp;&nbsp; yield uchunk<BR>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bchunk = uchunk.encode('latin-1')<BR>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; write(bchunk)<BR>

<BR>

I don't see any benefit to that.<BR>

<BR>

<BR>

Robert Brewer<BR>

fumanchu@aminus.org</FONT>

</P>


</BODY>

</HTML>