[Web-SIG] WSGI 2
James Y Knight
foom at fuhm.net
Wed Aug 5 03:48:30 CEST 2009
On Aug 4, 2009, at 8:53 PM, Graham Dumpleton wrote:
> 2. How would use of bytes work for a CGI-WSGI bridge given that
> os.environ is not bytes? Where does one get what encoding was used for
> os.environ values so it can be converted back to bytes?
On Unix it's simple enough:
On py2.X on Unix: environ is bytes already.
On py3.0: you're screwed, because some env vars were discarded already.
On py3.1+: 'string'.encode(sys.getfilesystemencoding(),
'surrogateescape') should do it.
On Windows, I guess the OS environment is unicode, so, I don't know
precisely what to do to reversibly obtain the bytes sent from the end-
users's browser. It looks to me from source code as if Apache will
encode the bytes from the client (utf-8 or otherwise!) as the Unicode
values 0x00 to 0xFF in the windows environment, that is, as if
decoding the client input in latin-1. But it does that for the
following keys only:
Other values are decoded from utf-8 (or, if passed through from an
enclosing environment, passed through untouched -- via encoding into
utf-8 for internal use and then decoding back from utf-8 to put back
in the Windows environment.)
I'll note that while it's important to get this transformation correct
for a CGI->WSGI bridge to work right in Windows, and thus is
definitely a useful discussion to have here, it doesn't actually need
to be part of the WSGI spec.
More information about the Web-SIG