[Web-SIG] CGI WSGI and Unicode

Manlio Perillo manlio_perillo at libero.it
Mon Dec 7 11:51:31 CET 2009

Graham Dumpleton ha scritto:

Note: I'm sending the entire message to the mailing list.

>> Hi.
>> I'm playing with Python 3.x, current revision.
>> I have noted that the data in the os.environ are noe Unicode strings.
>> In a CGI application, HTTP headers are Unicode strings, and are decoded
>> using system default encoding.
>> In a future WSGI application, HTTP headers are Unicode strings, and are
>> decoded using latin-1 encoding.
>> In both cases, 'surrogateescape' is used.
> No, 'surrogateescape' is not necessary when using latin-1, or at least
> for variables which use latin-1.

The problem is that not all browsers use latin-1.
As an example with HTTP Digest authentication.

> Use of 'surrogateescape' is only relevant in the context of some web
> servers and only relevant for specific variables, some of which aren't
> even part of set of variables which are required by WSGI.
> For example, in Apache/mod_wsgi, 'surrogateescape' is used on

What about HTTP_COOKIE?

>> Can this cause troubles and incompatibility problems?
>> I'm interested in special header handling, like cookies, that contain
>> opaque data.
> The issues which CGI/WSGI bridge in Python 3.X has been discussed
> previously on the list. 

It seems I missed it.

> It is acknowledged that there are problems to
> be solved there, at least to extent that CGI/WSGI bridge
> implementation has to correct the encoding, and also that that may
> only be solvable in Python 3.1 onwards due to not having access to
> what encoding was use for environment variables in Python 3.0. Not
> many people care about CGI these days and so no one has been bother to
> come up with working CGI/WSGI bridge for Python 3.X.

CGI is very important; there are some kind of web applications that have
problems when executing in a long running process.

As an example, I prefer to run Trac and Mercurial instances as CGI.

Regards  Manlio

