[Web-SIG] Python 3.0 and WSGI 1.0.

Graham Dumpleton graham.dumpleton at gmail.com
Thu Apr 2 06:01:17 CEST 2009


2009/4/2 Bill Janssen <janssen at parc.com>:
> Alan Kennedy <alan at xhaus.com> wrote:
>
>> Hi Bill,
>>
>> [Bill]
>> > I think the controlling reference here is RFC 3875.
>>
>> I think the controlling references are RFC 2616, RFC 2396 and RFC 3987.
>
> I see what you're saying, but it's darn near impossible, as a practical
> matter, to get any guidance on encoding matters from those.
>
> The question is where those names come from, and they come from CGI, and
> that is (practically speaking) defined these days by RFC 3875, as much as
> anything.
>
>> I think the question is "are people using IRIs in the wild"? If so,
>> then we must decide how do we best deal with the problems of
>> recognising iso-8859-1+rfc2037 versus utf-8, or whatever
>> server-configured encoding the user has chosen.
>
> See http://bugs.python.org/issue3300, where we went around and around
> that question.  The answer seems to be, yes.
>
> There are lots of useful fragments in that discussion, for instance:
>
> ``For the authority (server name) portion of a URI, RFC 3986 is
> pretty clear that UTF-8 must be used for non-ASCII values (assuming, for
> a moment, that IDNA addresses are not Punycode encoded already). For
> the path portion of URIs, a large-ish proportion of them are, indeed,
> UTF-8 encoded because that has been the de facto standard in Web browsers
> for a number of years now. For the query and fragment parts, however,
> the encoding is determined by context and often depends on the encoding
> of some page that contains the form from which the data is taken. Thus,
> a large number of URIs contain non-UTF-8 percent-encoded octets.''

Reading that bug detail (very long), reminds me of another sticky
issue that was brought up before which is the Referrer (request) and
Location (response) headers. These being URLs means you have to deal
with the issue of encoding in the URL within a header.

Is there going to be any simple answer to all of this? :-(

Graham


More information about the Web-SIG mailing list