[Web-SIG] Request for Comments on upcoming WSGI Changes

Graham Dumpleton graham.dumpleton at gmail.com
Tue Sep 22 01:16:02 CEST 2009


2009/9/22 Henry Precheur <henry at precheur.org>:
> On Mon, Sep 21, 2009 at 03:26:35PM -0700, Robert Brewer wrote:
>> It looks simpler until you have a site that is not primarily utf-8. In
>> that case, you multiply your (1 line * number of middlewares in the WSGI
>> stack * each request).
>> With wsgi.uri_encoding you get either (1 line * 1
>> middleware designed to transcode * each request), or even 0 if your
>> whole site uses just one charset.
>
> I am not sure I understand your point.
>
> The 0 lines hold true if the whole site is using latin-1 or utf-8 and
> you write your applications/middlewares only for this site. But if it's
> using any other encoding you still have to transcode.
>
> def middleware(start_response, environ):
>    value = environ['some_key'].\
>        encode('utf8', 'surrogateescape').\
>        decode(SITE_ENCODING)
>    ...
>
> With wsgi.uri_encoding you would still have to do the following:
>
> def middleware(start_response, environ):
>    value = environ['some_key'].\
>        encode(environ['some_key.encoding']).\
>        decode(SITE_ENCODING)
>    ...
>
> Of course you can directly use `environ['some_key']` if you know you'll
> get the 'right' encoding all the time. But when the encoding changes,
> you'll have to fix all your middlewares.
>
>
> I am missing something?

For one, we aren't talking about arbitrary keys needing this treatment.

We are only talking about SCRIPT_NAME and PATH_INFO.

Everything else from CGI will be passed as ISO-8859-1 and up to WSGI
components/applications to explicitly worry about those if need to
deal with them in special ways. Eg., REQUEST_URI, QUERY_STRING,
HTTP_COOKIE, HTTP_REFERRER.

Thus, your use of 'some_key' all the time is a bit confusing when just
trying to scan the emails quickly.

Graham


More information about the Web-SIG mailing list