[Web-SIG] Request for Comments on upcoming WSGI Changes

Robert Brewer fumanchu at aminus.org
Tue Sep 22 00:26:35 CEST 2009


Henry Precheur wrote:
> On Mon, Sep 21, 2009 at 09:14:13PM +0200, Armin Ronacher wrote:
> > So the same standard should have different behavior on different
> > Python versions?  That would make framework code a lot more
complicated.
> 
> I don't understand why it would be 'a lot more' complicated.
> 
> (The following code snippets is Python 3 only, and assumes we're using
> 'native strings' everywhere)
> 
> In the gateway, environ would be populated this way:
> 
>   environ['some_key'] = some_value.decode('utf8', 'surrogateescape')
> 
> Compare that to the utf-8-then-latin-1 alternative:
> 
>   try:
>       environ['some_key'] = some_value.decode('utf-8')
>       environ['some_key.encoding'] = 'utf-8'
>   except UnicodeError:
>       environ['some_key'] = some_value.decode('latin-1')
>       environ['some_key.encoding'] = 'latin-1'
> 
> 
> What you would have in the application to get the original value:
> 
>   environ['some_key'].encode('utf8', 'surrogateescape')
> 
> With utf8-then-latin1:
> 
>   environ['some_key'].encode(environ['some_key.encoding'])
> 
> 
> The 'surrogateescape' way is clearly simpler.

It looks simpler until you have a site that is not primarily utf-8. In
that case, you multiply your (1 line * number of middlewares in the WSGI
stack * each request). With wsgi.uri_encoding you get either (1 line * 1
middleware designed to transcode * each request), or even 0 if your
whole site uses just one charset.


Robert Brewer
fumanchu at aminus.org



More information about the Web-SIG mailing list