[Web-SIG] Unicode in Python 3
armin.ronacher at active-4.com
Sat Sep 19 19:00:03 CEST 2009
René Dudfield schrieb:
> What is proposed:
Where was that proposed?
> 1. Default utf-8 to be used.
That's a possibility yes, but it has to be carefully be considered.
> 2. A buffer to be used for raw data.
What is raw data? If you mean we keep the unencoded data around, I
would strongly argue against that. Otherwise it makes middlewares even
harder to write.
> 3. New keys which are callables to request the encoding you want.
Did I miss something? Why are we requesting encodings now?
> 4. Encoding keys are specified.
> 4.a URI encoding key 'wsgi.uri_encoding'
> 4.b Form data encoding key 'wsgi.form_encoding'
> 4.c Page encoding key 'wsgi.page_encoding'
> 4.d Header encoding key 'wsgi.header_encoding'
I don't know where you are getting that from. The only WSGI key would
be `wsgi.uri_encoding` and that is only set by the server and only used
for legacy non UTF-8 URLs.
> 5. For next version of wsgi (1.1 or 2.0), using an adapter for
> backwards compat for wsgi 1.0 apps on wsgi2 server.
No decision about WSGI versioning was made so far. If WSGI in Python 3
is based on unicode, then the version is raised to 1.1, 2.0 is not yet
discussed as far as I'm concerned.
> 2.c Avoiding bytes type and syntax for compatibility with <=
> python 2.5.4 (buffer, and unicode)
If WSGI for Python 3 is based on Unicode it will use '' for textual
context and b'' for bytes. If it's based on bytes it will obviously use
the byte literals.
> 3. Transcoding to only happen if needed.
I can't see how that would work if it's based on unicode, if it's based
on bytes that's already what happens in WSGI 1.
> 4. URI encoding can be explicitly stated in a URI key
This value is only *set* by the server on decode, the value is to be
ignored by the actual application or middleware except for QUERY_STRING
and REQUEST_URI decoding. Everything else makes things a lot more
complicated without improving anything.
> 5. Backwards compat for wsgi 1.0 apps on wsgi 2 server. Also wsgi
> 2.0 apps on wsgi 1.0 server with an adapter.
Again, WSGI 2.0 is something that has to be discussed separately,
otherwise we totally lose track.
> Issues with proposal? Things this proposal did not consider?
Yes you did:
- it has no real world advantage over either WSGI based on unicode
that is utf-8 with latin1 fallback or a WSGI based on bytes.
- it's backwards incompatible in every way, even to CGI.
- it is slow because every dict access would also cause a function
call. Furthermore middlewares would most likely start causing
circular dependencies when they replace the callable with a new
callable and they do not alias the value as a local in the frame
that created it.
More information about the Web-SIG