[Web-SIG] Move to bless Graham's WSGI 1.1 as official spec
manlio_perillo at libero.it
Thu Dec 3 21:15:06 CET 2009
And Clover ha scritto:
> Manlio Perillo wrote:
>> However what about URI (that is, for PATH_INFO and the like)?
>> For URI (if I remember correctly) the suggested encoding is UTF-8, so
>> URLS should be decoded using
>> url.decode('utf-8', 'surrogateescape')
>> Is this correct?
> The currently-discussed proposal is ISO-8859-1, allowing the real bytes
> to be trivially extracted. This is consistent with the other headers and
> would be my preferred approach.
There is something that I don't understand.
Some HTTP headers, like Accept-Language, contains data described as
token = 1*<any CHAR except CTLs or separators>
So a token, IMHO, is an opaque string, and it SHOULD not decoded.
In Python 3.x it SHOULD be a byte string.
Text content is described as `TEXT`, where:
The TEXT rule is only used for descriptive field contents and values
that are not intended to be interpreted by the message parser. Words
of *TEXT MAY contain characters from character sets other than ISO-
8859-1  only when encoded according to the rules of RFC 2047
TEXT = <any OCTET except CTLs,
but including LWS>
The only type of data where TEXT can be used is `quoted-string`.
A `quoted-string` only appears in well specified portions of an header.
So, IMHO, it is *not* correct for a WSGI middleware, to return all HTTP
headers as Unicode strings.
This is up to the application/framework, that must parse each header,
split it in component and handle them as more appropriate (as byte
string, Unicode string or instance of some other data type).
More information about the Web-SIG