[Web-SIG] Unicode in Python 3

Armin Ronacher armin.ronacher at active-4.com
Sat Sep 19 19:00:03 CEST 2009


Hi,

René Dudfield schrieb:
> What is proposed:
Where was that proposed?

>     1. Default utf-8 to be used.
That's a possibility yes, but it has to be carefully be considered.

>     2. A buffer to be used for raw data.
What is raw data?  If you mean we keep the unencoded data around, I
would strongly argue against that.  Otherwise it makes middlewares even
harder to write.

>     3. New keys which are callables to request the encoding you want.
Did I miss something?  Why are we requesting encodings now?

>     4. Encoding keys are specified.
>     4.a URI encoding key 'wsgi.uri_encoding'
>     4.b Form data encoding key 'wsgi.form_encoding'
>     4.c Page encoding key 'wsgi.page_encoding'
>     4.d Header encoding key 'wsgi.header_encoding'
I don't know where you are getting that from.  The only WSGI key would
be `wsgi.uri_encoding` and that is only set by the server and only used
for legacy non UTF-8 URLs.

>     5. For next version of wsgi (1.1 or 2.0), using an adapter for
> backwards compat for wsgi 1.0 apps on wsgi2 server.
No decision about WSGI versioning was made so far.  If WSGI in Python 3
is based on unicode, then the version is raised to 1.1,  2.0 is not yet
discussed as far as I'm concerned.

>     2.c Avoiding bytes type and syntax for compatibility with <=
> python 2.5.4 (buffer, and unicode)
If WSGI for Python 3 is based on Unicode it will use '' for textual
context and b'' for bytes.  If it's based on bytes it will obviously use
the byte literals.

>     3. Transcoding to only happen if needed.
I can't see how that would work if it's based on unicode, if it's based
on bytes that's already what happens in WSGI 1.

>     4. URI encoding can be explicitly stated in a URI key
This value is only *set* by the server on decode, the value is to be
ignored by the actual application or middleware except for QUERY_STRING
and REQUEST_URI decoding.  Everything else makes things a lot more
complicated without improving anything.

>     5. Backwards compat for wsgi 1.0 apps on wsgi 2 server.  Also wsgi
> 2.0 apps on wsgi 1.0 server with an adapter.
Again, WSGI 2.0 is something that has to be discussed separately,
otherwise we totally lose track.

> Issues with proposal?  Things this proposal did not consider?
Yes you did:

-  it has no real world advantage over either WSGI based on unicode
   that is utf-8 with latin1 fallback or a WSGI based on bytes.
-  it's backwards incompatible in every way, even to CGI.
-  it is slow because every dict access would also cause a function
   call.  Furthermore middlewares would most likely start causing
   circular dependencies when they replace the callable with a new
   callable and they do not alias the value as a local in the frame
   that created it.


Regards,
Armin


More information about the Web-SIG mailing list