[Web-SIG] Request for Comments on upcoming WSGI Changes
Robert Brewer
fumanchu at aminus.org
Tue Sep 22 00:26:35 CEST 2009
Henry Precheur wrote:
> On Mon, Sep 21, 2009 at 09:14:13PM +0200, Armin Ronacher wrote:
> > So the same standard should have different behavior on different
> > Python versions? That would make framework code a lot more
complicated.
>
> I don't understand why it would be 'a lot more' complicated.
>
> (The following code snippets is Python 3 only, and assumes we're using
> 'native strings' everywhere)
>
> In the gateway, environ would be populated this way:
>
> environ['some_key'] = some_value.decode('utf8', 'surrogateescape')
>
> Compare that to the utf-8-then-latin-1 alternative:
>
> try:
> environ['some_key'] = some_value.decode('utf-8')
> environ['some_key.encoding'] = 'utf-8'
> except UnicodeError:
> environ['some_key'] = some_value.decode('latin-1')
> environ['some_key.encoding'] = 'latin-1'
>
>
> What you would have in the application to get the original value:
>
> environ['some_key'].encode('utf8', 'surrogateescape')
>
> With utf8-then-latin1:
>
> environ['some_key'].encode(environ['some_key.encoding'])
>
>
> The 'surrogateescape' way is clearly simpler.
It looks simpler until you have a site that is not primarily utf-8. In
that case, you multiply your (1 line * number of middlewares in the WSGI
stack * each request). With wsgi.uri_encoding you get either (1 line * 1
middleware designed to transcode * each request), or even 0 if your
whole site uses just one charset.
Robert Brewer
fumanchu at aminus.org
More information about the Web-SIG
mailing list